Sessions will comprise:

  • Talks: Breakthrough presentations in the beginning of every day, by a notable guest (from academy or industry), complemented by open discussion with the class.
  • Lessons: Traditional class sessions, provided by one or more teachers (mainly from academy, but possibly shared with a speaker from the industry); lessons will be focused on a specific technical challenge, defining it and exposing state of the art the methods and techniques to address it, as also of the related technology.
  • Laboratory: Students will be arranged in working groups, which will be asked to apply the competencies acquired during the course to seek a solution to a posed problem representing a real need of a real industry.


  Feb 16 Feb 17 Feb 18
09:00 Welcome!
School Opening

Talk 1 (auditorium FA1)

Real-time Data Mining

João Gama (FEUP)

Talk 2 (auditorium FA1)

Machine Learning for Robotics

Francisco Melo (IST, INESC-ID)

Talk 3 (auditorium FA1)

IBM Watson

Arlindo Dias (IBM)

10:30 …break…

Lesson 1 (auditorium FA1)

Clean your data if you care about its quality

Helena Galhardas (IST, INESC-ID)

Lesson 3 (auditorium FA1)

Visualization of Time-Dependent Data

Daniel Gonçalves (IST, INESC-ID)

Lesson 5 (auditorium FA1)

Cyber-Physical Systems, IoT, and Embedded Systems

Alberto Cunha (IST)

12:30 …lunch…

Lesson 2 (auditorium FA1)

Stream Processing: A Look Under the Hood

Ricardo Ferreira (Feedzai)

Lesson 4 (auditorium FA1)

Introduction to Stream Processing with Apache Flink

Miguel Coimbra (IST, INESC-ID)

Lab 3 (Laboratory 14 and support room F2)

Group Work: Preparation of the reports

15:30 …break…

Lab 1 (Laboratory 14)

  • Presentation of the challenge
  • The IDC Maturity Scape
  • Tools to use in the laboratory

Lab 2 (Laboratory 14)

Group Work: Open laboratory, for work on the challenge...

Lab 4 (auditorium FA1)

Group Work: Presentation and discussion of the of results

18:00 Social Reception Social Dinner ...
School Closing!


Teachers and Special Collaborations


João Gama is Associate Professor of the Faculty of Economy, University of Porto. He is a researcher and vice-director of LIAAD, a group belonging to INESC TEC.

He got the PhD degree from the University of Porto, in 2000. He has worked in several National and European projects on Incremental and Adaptive learning systems, Ubiquitous Knowledge Discovery, Learning from Massive, and Structured Data, etc.

He served as Co-Program chair of ECML'2005, DS'2009, ADMA'2009, IDA' 2011, and ECML/PKDD'2015. He served as track chair on Data Streams with ACM SAC from 2007 till 2016. He organized a series of Workshops on Knowledge Discovery from Data Streams with ECML/PKDD, and Knowledge Discovery from Sensor Data with ACM SIGKDD.

He is author of several books in Data Mining and authored a monograph on Knowledge Discovery from Data Streams. He authored more than 250 peer-reviewed papers in areas related to machine learning, data mining, and data streams.

He is a member of the editorial board of international journals ML, DMKD, TKDE, IDA, NGC, and KAIS. He supervised more than 12 PhD students and 50 Msc students.

Talk 1

Real-Time Data Mining

Nowadays, there are applications where data is best modelled not as persistent tables, but rather as transient data streams. In this keynote, we discuss:

  • The limitations of current machine learning and data mining algorithms;
  • The fundamental issues in learning in dynamic environments like learning decision models that evolve over time, learning and forgetting, concept drift and change detection.

Data streams are characterized by huge amounts of data that introduce new constraints in the design of learning algorithms: limited computational resources in terms of memory, processing time and CPU power.

In this talk, we present some illustrative algorithms designed to taking these constrains into account. We identify the main issues and current challenges that emerge in learning from data streams, and present open research lines for further developments.

Francisco S. Melo is Assistant Professor at the Department of Computer Science and Engineering of Instituto Superior Técnico, and a Senior Researcher of the GAIPS Group of the INESC-ID.

He received his PhD in Electrical and Computer Engineering at Instituto Superior Técnico in 2007. Since then, he held appointments in the Computer Vision Lab of the Institute for Systems and Robotics (2007), in the Computer Science Department of Carnegie Mellon University (2008-2009) and in INESC-ID (2009). He was principal investigator of the projects MAIS+S and INSIDE (CMU-Portugal), involving interaction between humans and robots in open spaces and participated as part of the INESC-ID research team on several other projects on human-robot interaction, such as LIREC and EMOTE (EU-FP7). His research addresses problems within machine learning, particularly on reinforcement learning, planning under uncertainty, multi-agent and multi-robot systems, developmental robotics, and sensor networks.

Talk 2

Machine Learning for robotics

This talk will discuss some of the main challenges in present day robotics, and how machine learning techniques have been used to address some of these challenges. In particular, the talk will go over recent advances in robot perception, reasoning and actuation, boosted and supported by machine learning techniques. I will also describe some more recent work involving human-robot interaction.


Arlindo Dias

IT Architect, IBM Cloud Services

Talk 3

IBM Watson

"IBM Watson is a cognitive system enabling a new partnership between people and computers..."

See more in Youtube:

Helena Galhardas is a tenured Assistant Professor at the Computer Science and Engineering Department at University of Lisbon (Instituto Superior Técnico - IST/UL), and a senior researcher and co-founder of the IDSS Action Line at INESC-ID. She received her Ph.D. (2001) in Informatics from University of Versailles and Saint-Quentin. Her research is focused on Data Cleaning and Integration, Extraction-Transformation-Loading, Databases and Information Extraction. Helena has participated in several projects (research and consulting) and she has authored or co-authored more than 40 peer-reviewed scientific publications. She has evaluated FP7 and H2020 project proposals. She co-chaired the Program Committee of DILS 2014 (International Conference on Data Integration in the Life Sciences). 

Lesson 1

Clean your data if you care about its quality

Data cleaning is crucial to ensure that data is of good quality. In application contexts where we want to make use of data for analysis purposes, data must satisfy a set of quality requirements.

In this lesson, we will first introduce the notion of data quality as an activity that encompasses two main tasks:

(i) data quality assessment; and

(ii) data quality improvement (or data cleaning).

Second, we will explain the notion of data profiling which enables to assess the quality of a data set.

Third, we will describe one of the main tasks of a data cleaning process that consists on the identification and consolidation of records that concern the same real world entity, also called approximate duplicate records.

Finally, we will list some of the main data quality tools.

Ricardo Ferreira Software Engineer and Team Leader at Feedzai.
With experience on distributed and high performance systems, he's passionate about the design and architecture of large scale systems and working on performance sensitive problems. He is always striving to achieve those desired low latencies on Feedzai's real-time platform, while also making sure it scales and operates at high standards.

Lesson 2

Stream processing: A look Under the Hood

Pulse, Feedzai's flagship product, is deployed in extremely demanding environments, with strict latency requirements (max 250 ms latency, 10ms at 50th percentile). Under the hood, it is powered by an event stream processing engine developed in-house to match Feedzai's needs. In this talk you'll see how Feedzai relies on it to catch bad guys and the challenges in developing and maintaining an event stream processing engine that has low latency requirements.

Daniel Gonçalves is a researcher at the Visualization and Multimodal Interfaces Group of INESC-ID and professor in the Computer Science and Engineering Department of IST, the School of Engineering of the University of Lisboa, Portugal.
His research encompasses the areas of Information Visualization, Personal Information Management and Accessibility. He has published over
160 peer-reviewed papers publications in those areas, as well as being the co-author of a textbook on Human-Computer Interaction. He managed INESC-ID’s participation in the national EDUCARE project, where information visualization techniques will be used in novel ways as a pedagogic aid, and EU AAL PAELife, striving to find new interaction techniques for the creation of a personal life assistant for the elderly.
He was one of the organizers and Technical Program Chair of IFIP INTERACT 2011 (September 2011) and of the ACM International Conference on Intelligent User Interfaces (February 2012). Since 2015 he is part of the Editorial Board of the Universal Access to the Information Society journal. He is a member of ACM and the Portuguese Computer Graphics Group (the national Eurographics chapter).

Lesson 3

Visualization of Time-Dependent Data

One of the most effective ways to make sense of a dataset is to leverage our ability to visually identify interesting items patterns, by using Information Visualization (InfoVis) techniques.

In this lesson we will focus on how Information Visualization can be effectively used to make sense of time-dependent data, such as streams.

We will cover the following topics:

  • Introduction to InfoVis
  • Basic Perceptual Principles
  • Representing Time-Dependent data
  • Dealing with complexity

Miguel Coimbra is currently a PhD student at Instituto Superior Técnico, hosted as a researcher at INESC-ID. He is currently researching distributed graph processing, in the scope and with the support of the Distributed Systems Group (GSD) and the Decision Support Systems (IDSS) laboratory.

Lesson 4

Introduction to Stream Processing with Apache Flink

This talk will introduce and detail the architecture and internal mechanisms of Apache Flink that make it a suitable basis for easily developing stream processing-based applications. Quoting from

Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.Flink includes several APIs for creating applications that use the Flink engine:

  1. DataStream API for unbounded streams embedded in Java and Scala, and
  2. DataSet API for static data embedded in Java, Scala, and Python,
  3. Table API with a SQL-like expression language embedded in Java and Scala.

Flink also bundles libraries for domain-specific use cases:

  1. CEP, a complex event processing library,
  2. Machine Learning library, and
  3. Gelly, a graph processing API and library.

Alberto Cunha (IST) is professor at IST. He was a former researcher at INESC were he was involved in the design, development, and technology transfer, of innovative systems for office automation and electronic ticketing. He was member of the BoD of the Aitec-Link group with particular responsibility in the design of solutions and products for the Transports and Mobility sector. In 2009 he co-founded YouMove-Card4Business which developed, and presently maintains, many of the electronic ticketing systems for public transport in Portugal. From 2010 to 2013 he was member of the BoD of Taguspark, the main Science and Technology Park in Portugal, responsible for the bootstrap and operation of Taguspark business Incubator for technology-based startups.

Lesson 5

Cyber-Physical Systems, IoT, and Embedded Systems

Electronics and sensor technologies enable mobile and personal devices and equipment (smartphones, smartcards and tags) with considerable computing and communication power, and will enrich our surrounding environment with sensory and reactive capabilities. The orchestration of all this potential, integrating Internet-based as well as specialized heterogeneous sub-systems, is the foundation of cyber-physical systems able to adapt in time to the context of use.

In this lesson we will address:

  • the architecture of intelligent sensors and embedded systems, its processing and communications constraints;
  • the basic models for local cyber-physical interaction;
  • models for data integration from sensor to IT infrastructure.

Finally we will analyze some cases of application of these technologies to monitor human activities and services, and to create responsive and energy-efficient urban environments.

  • In the Laboratory, students will work in heterogeneous groups (putting together business and engineering backgrounds; industry and academia as also experiences), addressing with complimentary views of a "Digital Transformation" challenge. The objective will be to exercise the technology and concepts from classes in a plausible scenario...
  • In the end, groups must be able to suggest and discuss solutions plausible for the proposed challenge. To assess that and support the discussion, students also will be exposed and motivated to make use of:


Laboratory with the collaboration of:

About the IDC Maturity Scape...

About the Penthao software...
About the Tableau software...