Data Analysis and Machine Learning
Áreas Científicas |
Classificação |
Área Científica |
OFICIAL |
Informática |
Ocorrência: 2023/2024 - 1S
Ciclos de Estudo/Cursos
Docência - Responsabilidades
Língua de trabalho
Portuguese - Suitable for English-speaking students
Objetivos
The students will obtain knowledge, skills, and proficiency to:
- Recognize the specific challenges and needs of data analysis.
- Collect (capture), explore, clean, munge, and manipulate datasets (Big-data).
- Know (study) the fundamentals and possibilities of the application of machine learning techniques.
- Implement machine learning models (e.g., regression, classification, decision trees, artificial neural networks), using Python programming language and available (open source) tools/libraries.
- Develop projects that represent solutions to real practical problems, mainly applications in the area of health data analysis, through the analysis and exploration of public and/or private datasets made available.
Resultados de aprendizagem e competências
The UC "Data Analysis and Machine Learning" aims to provide students with the knowledge to recognize the challenges and specific needs of data processing and analysis and in particular to pave the way to develop specific modeling capabilities in the Big-data context. To understand the criteria for differentiating and selecting classes of algorithms and methods, as well as the assumptions of their use with emerging techniques of machine learning and artificial intelligence (AI). Develop theoretical and practical skills of dataset exploration based on the Python programming language, and the use of algorithms and methods developed / established in common tools / libraries (open source) tested and made available by the research and development
community in data science and AI. Among other topics the concepts of Supervised Learning, Unsupervised Learning and Reinforcement Learning will be introduced. In particular, it is intended that students will be able to apply the data analysis and machine learning techniques developed on patient’s data/
health record (clinical data, medical image analysis, etc.)
Modo de trabalho
Presencial
Pré-requisitos (conhecimentos prévios) e co-requisitos (conhecimentos simultâneos)
Basics of Python programming and linear algebra
Programa
PART 1: INTRODUCTION
1. Introduction to Data Análise e Machine Learning (ML)
1.1. What is Data Science (DS)? Why is DS important?
1.2. Analytics Building Blocks
1.3. Data Analysis Examples
1.4. What is ML? Why use ML?
1.5. Machine Learning Framework
1.6. Performance Evaluation
2. Introduction to Python Programing
2.1. Installing Python, Tools for Python
2.2. Control Flow (Conditional Logic, Loops, and Functions)
2.3. Python Collections
2.4. Introduction to NumPy and Pandas
3. Getting and Working with Data
3.1. Capturing Data
3.2. Feature Extraction and Transformation
3.3. Dimension reduction
3.4. Clustering
PART 2: ALGORITHMS AND METHODS
4. Machine Learning Techniques
4.1. Understanding ML (Problems, Goals, Challenges)
4.2. Python Tools/Libraries
4.3. Types of Learning
4.4. Supervised Learning
4.4.1. Classification
4.4.2. Training Models (Regression and Logistic Regression Approaches)
4.4.3. Support Vector Machines
4.4.4. Decision Trees
4.4.5. Random Forest
4.4.6. Ensemble Learning
4.4.7. Dimensionality Reduction
4.5. Unsupervised Learning
4.5.1. Clustering
4.5.2. Gaussian Mixtures
5. Neural Networks and Deep Learning
5.1. Introduction to Artificial Neural Networks
5.2. Training Deep Neural Networks
5.3. Custom Models
5.4. Loading and Preprocessing Data
5.5. Convolutional Networks (Computer Vision)
5.6. Processing Sequences (RNNs and CNNs)
5.7. Natural Language Processing
5.8. Representation and Generative Learning (Autoencoders and Generative Adversarial Networks)
PART 3: APPLICATIONS
6. Examples of Applications
6.1. Classification
6.2. Regression
6.3. Clustering
Bibliografia Obrigatória
Ethem Alpaydın; Introduction to Machine Learning, Second-Edition, MIT Press, 2010
Stuart Russell, Peter Norvig; Artificial Intelligence: A Modern Approach, 4th Edition, 2021. ISBN: 978-0134610993
Andriy Burkov; The Hundred-Page Machine Learning Book.. , 2019. ISBN: 978-1999579500
Sebastian Raschka, Vahid Mirjalili; Python Machine Learning Third Edition, Packt Publishing, 2019
M. Mohri, A. Rostamizadeh, A. Talwalkar; Foundations of Machine Learning, Second Edition, MIT Press, 2018
Bibliografia Complementar
Ian Goodfellow, Yoshua Bengio, and Aaron Courville; Deep Learning, 2016. ISBN: 978-0262035613
Max Kuhn, Kjell Johnson; Applied Predictive Modeling, Springer, 2016. ISBN: 978-1-4614-6849-3 (eBook).
Trevor Hastie, Robert Tibshirani, and Jerome Friedman; The Elements of Statistical Learning: Data Mining, Inference, and Prediction., Springer, 2009. ISBN: 978-0387848570
Métodos de ensino e atividades de aprendizagem
Teaching will have 3 major components:
- Theoretical-practical classes - partially expository and with intensive use of supervised resolution of exercises, analysis of case studies and two seminars on specific topics, which will take place entirely at a distance.
- Laboratory classes - for supervised execution and individual assessment of practical work in a computing environment for personal computers, internet and mobile devices.
- Tutorial guidance - for personalized follow-up of the preparation of seminars and execution of distance projects.
Files will be made available with the matter of laboratory exercises to be executed autonomously (asynchronous regime), but with monitoring by videoconference at the established time and the use of synchronous classes (by video conference) for clarification of doubts and individual monitoring.
All types of classes (OT, TP and PL), as well as seminars may be held remotely since the materials, resources, tools and teacher training allow this in this computer science discipline in a natural way. Laboratory and project work may be done individually or in groups of 2 or 3 students upon registration and approval by the teacher.
Software
Anaconda Distribution
Bibliotecas (módulos) de Machine Learning para Python
Tipo de avaliação
Distributed evaluation without final exam
Componentes de Avaliação
Designation |
Peso (%) |
Apresentação/discussão de um trabalho científico |
20,00 |
Teste |
30,00 |
Trabalho laboratorial |
50,00 |
Total: |
100,00 |
Componentes de Ocupação
Designation |
Tempo (Horas) |
Apresentação/discussão de um trabalho científico |
20,00 |
Elaboração de projeto |
40,00 |
Trabalho escrito |
30,00 |
Trabalho laboratorial |
10,00 |
Trabalho de investigação |
40,00 |
Total: |
140,00 |
Obtenção de frequência
The evaluation will include all components, namely:
- Through two seminars for which students will have to prepare their presentations autonomously. Theoretical knowledge and the ability to apply it to specific cases will be evaluated.
- Through a test or exam.
- Through a selection of the 4 best lab assignments, the practical implementation skills will be evaluated (supervised).
- Through the execution of an individual project, the ability to work independently will be evaluated.
The laboratory work and project can be performed individually or in groups of 2 or up to 3 students upon registration and approval from the professor.
The evaluation is distributed, with the final mark calculated by the formula:
TP (50%) + PL (50%)
TP = 30% test + 20% (2 Seminars - 10% each)
PL = 10% best 4 lab works + 40% final project.
Fórmula de cálculo da classificação final
The evaluation is distributed, with the final mark calculated by the formula:
TP (50%) + PL (50%)
30% tests or exams + 20% (2 Seminars / 10% each) + 10% best 4 lab works + 40% final project.