Saltar para:
Esta página em português Ajuda Autenticar-se
ESTB
Você está em: Start > BINF025

Big Data

Code: BINF025     Sigla: BD

Áreas Científicas
Classificação Área Científica
OFICIAL Informática

Ocorrência: 2022/2023 - 1S

Ativa? Yes
Unidade Responsável: Departamento de Matemática e Informática
Curso/CE Responsável: Undergraduate in Bioinformatics

Ciclos de Estudo/Cursos

Sigla Nº de Estudantes Plano de Estudos Anos Curriculares Créditos UCN Créditos ECTS Horas de Contacto Horas Totais
BINF 26 Study Plan 3 - 5 67,5 135

Docência - Responsabilidades

Docente Responsabilidade
António Leonardo Gonçalves

Docência - Horas

Theorethical and Practical : 2,00
Practical and Laboratory: 2,00
Type Docente Turmas Horas
Theorethical and Practical Totais 1 2,00
Maria Raquel Feliciano Barreira 2,00
António Leonardo Gonçalves 2,00
Practical and Laboratory Totais 2 4,00
António Leonardo Gonçalves 4,00
Maria Raquel Feliciano Barreira 4,00

Língua de trabalho

Portuguese

Objetivos









This curricular unit will provide knowledge on tools for storage, processing and visualization of large volumes of data, the development of skills in the construction and testing of efficient algorithms for Big Data, namely the study of paradigms, models, tools and parallel programming languages.
At the end of the course the student should be able to


- Determine the solution to be applied and the instruments to be used in the storage, exploration and analysis of a large volume of data
- Select appropriate visualization options to summarize and extract knowledge from a large volume of data
- Understand the concept of parallel and distributed processing as a way to increase performance in data management and analysis


- Develop algorithms and models to solve problems that explore the management of concurrency, distribution and parallelism
- Recognize the different hardware architectures that support the operation of these algorithms









Resultados de aprendizagem e competências

Not applicable

Modo de trabalho

Presencial

Programa









1. Visualization of large data volumes
2. Large-scale storage
Non-relational databases (key-value, document-oriented, column family, graph-oriented) Comparison between relational and non-relational databases
3. Parallel Programming Models
Shared Memory Model
Thread Model
Distributed memory
Message passing model
Parallel Data Model
Hybrid model
Single Program Multiple Data (SPMD)
Multiple Program Multiple Data (MPMD)
4. Design of parallel programs
Automatic parallelization vs. Manual
Partitioning
Communications
Synchronization
Data dependencies






Load balancing
Granularity
I/O
Debugging
Performance analysis and tuning 5. Parallel Algorithms


Parallel Algorithms for Sequences and Strings
Parallel Algorithms for Trees and Graphics
Parallel Algorithms for Numerical/Scientific Computation













Bibliografia Obrigatória

Sadalage et al.; No SQL distilled : a brief guide to the emerging world of polyglot persistence, Pearson Education, 2012
O'Neil, C. and Schutt, R.; Doing Data Science: Straight Talk from the Frontline, 2013
Leskovec, J., Rajaraman, A., Ullman, K.; Mining of Massive Datasets, Cambridge University Press, 2nd Ed., 2014
White, T.; Hadoop: The Definitive Guide, O'Reilly, 2015
Wilke, C. O; Data Visualisation, O’Reilly, 2019
Pacheco, P.; Introduction to Parallel Algorithms (2nd ed.), 2021
Kleppmann, M. ; Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems, 2017

Bibliografia Complementar

Knaflick, N. C; Storytelling with data, Wiley, 2015

Métodos de ensino e atividades de aprendizagem









The predominant teaching methodologies will be the presentation of concepts, using slides and the demonstration of examples in the computer laboratory. Students will be constantly challenged to solve new problems, based on the examples already demonstrated, and to reflect on the results and performance of the storage and processing processes under study.









Software

Pyspark
Python
MongoDb

Tipo de avaliação

Distributed evaluation with final exam

Componentes de Avaliação

Designation Peso (%)
Teste 30,00
Trabalho escrito 70,00
Total: 100,00

Componentes de Ocupação

Designation Tempo (Horas)
Estudo autónomo 82,50
Frequência das aulas 52,50
Total: 135,00

Obtenção de frequência

Not applicable

Fórmula de cálculo da classificação final

Continuous assessment


  • 30%*project1+35%*project2+35%*testt




Final assessment



  • 30%*project1+35%*project2+35%*exam



The 100% assessment regime per exam is not applicable (that is, an exception regime is applied) since, according to the learning objectives and the skills to be acquired, the student must have a strong practical component in the use of tools for storage and processing of big data.
Recomendar Página Voltar ao Topo
Copyright 1996-2024 © Instituto Politécnico de Setúbal - Escola Superior de Tecnologia do Barreiro  I Termos e Condições  I Acessibilidade  I Índice A-Z
Página gerada em: 2024-05-14 às 15:10:37