Sessions will comprise:

  • Keynotes (marked in blue): Breakthrough presentations by a notable guest (from academy or industry), or panels, complemented by open discussion with the class.
  • Lessons (marked in green): Traditional technical sessions, or demonstrations, provided by one or more teachers (from academy, or shared with a demonstrator from the industry); lessons will be focused on a specific technical challenge, defining it and exposing state of the art the methods, techniques and technology to address it.
  • Laboratory (marked in orange): Students will be arranged in working groups, as a chance to apply the competencies acquired during the course to seek a solution to a posed problem representing a real need of a real industry.
  Feb 15 Feb 16 Feb 17
09:00 School Opening Morning Coffee  Morning Coffee
09:30 

Keynote 2

Data Visualization - Daniel Gonçalves (UL/IST, INESC-ID)

Keynote 3

Data Mining - João Gama (UPorto, INESC-TEC)

Keynote 1

Information Security - Pedro Adão (UL/IST)
Named Entity Recognition in Text-as-data
 (demo) - Miguel Won (INESC-ID)
Deep Learning in Medical Images (demo) - Miguel Monteiro (INESC-ID)
…Coffee Break… …Coffee Break… …Coffee Break…
11:00

Business Intelligence and Analytics - Elsa Cardoso (ISCTE-IUL, INESC-ID)

Data Cleaning - Luís Cruz (UPorto, INESC-ID)

Social Media Streaming - João Magalhães (Unova/FCT)

Apache Flink (demo) - Miguel Coimbra (INESC-ID)

GDPR - General Data Protection Regulation (panel)
12:30 …Lunch… …Lunch… …Lunch…
13:30

IoT and Data Challenges - Alberto Cunha (UL/IST)

The Lab Chalenge - Elsa Cardoso (ISCTE-IUL, INESC-ID)

Process Mining - Diogo Ferreira (UL/IST), Alberto Sardinha (UL/IST, INESC-ID)

Governance of IT, Privacy, and Security - Bruno Horta Soares (IDC, ISACA)

 

Open Lab
15:00 …Coffee Break…
15:30

IDC Maturity Model - Gabriel Coimbra (IDC)

Lab Tool: Tableau - Ricardo Pires (XpandIT)

Lab Tool: Pentaho - Bruno Silva (Pentaho)

 

Open Lab

 

Lab Reports
17:00 Open Lab School Closing
20:00   Social Program

 

 Pedro Adão has led several FCT-funded projects on security, and currently coordinates the Security Team@Tecnico initiative, a team of students that participates in security-related competitions and currently ranks as one of the top-50 teams in the world (top-20 if restricted to academic teams).

Teachers and Special Collaborations

Sessions

 

Information Security

 Privacy of data is one of the biggest concerns in today’s society. Security and data breaches are responsible for not only the depreciation of value and loss of intellectual property but also for negative impact on a company's brand. In this talk we will review several recent data-privacy breaches, the impact of these in the affected companies and clients, and how this can be seen in the scope of the upcoming enforcement of GDPR in May 2018. Hypothesis about how some of those breaches could had been prevented or the consequences minimized by making use of state of the art techniques will be discussed. Finally, the state of the art of data analytics and its present limitations to address challenges of this kind will be addressed.

Daniel Gonçalves is a researcher at the Visualization and Multimodal Interfaces Group of INESC-ID and professor in the Computer Science and Engineering Department of IST, the School of Engineering of the University of Lisboa, Portugal.

His research encompasses the areas of Information Visualization, Personal Information Management and Accessibility. He has published over 160 peer-reviewed papers publications in those areas, as well as being the co-author of a textbook on Human-Computer Interaction. He managed INESC-ID’s participation in the national EDUCARE project, where information visualization techniques will be used in novel ways as a pedagogic aid, and EU AAL PAELife, striving to find new interaction techniques for the creation of a personal life assistant for the elderly.

Daniel was one of the organizers and Technical Program Chair of IFIP INTERACT 2011 (September 2011) and of the ACM International Conference on Intelligent User Interfaces (February 2012). Since 2015 he is part of the Editorial Board of the Universal Access to the Information Society journal. He is a member of ACM and the Portuguese Computer Graphics Group (the national Eurographics chapter).

Data Visualization

One of the most effective ways to make sense of a dataset is to leverage our ability to visually identify interesting items patterns, by using Information Visualization (InfoVis) techniques.

In this lesson we will focus on how Information Visualization can be effectively used to make sense of, for example, time-dependent data (such as streams).

We will cover the following topics:

  • Introduction to InfoVis
  • Basic Perceptual Principles
  • Representing Time-Dependent data
  • Dealing with complexity
 

João Gama is Associate Professor of the Faculty of Economy, University of Porto. He is a researcher and vice-director of LIAAD, a group belonging to INESC TEC.

João got the PhD degree from the University of Porto, in 2000. He has worked in several National and European projects on Incremental and Adaptive learning systems, Ubiquitous Knowledge Discovery, Learning from Massive, and Structured Data, etc.

He served as Co-Program chair of ECML'2005, DS'2009, ADMA'2009, IDA' 2011, and ECML/PKDD'2015. He served as track chair on Data Streams with ACM SAC from 2007 till 2016. He organized a series of Workshops on Knowledge Discovery from Data Streams with ECML/PKDD, and Knowledge Discovery from Sensor Data with ACM SIGKDD.

He is author of several books in Data Mining and authored a monograph on Knowledge Discovery from Data Streams. He authored more than 250 peer-reviewed papers in areas related to machine learning, data mining, and data streams.

João is a member of the editorial board of international journals ML, DMKD, TKDE, IDA, NGC, and KAIS. He supervised more than 12 PhD students and 50 Msc students.

Data Mining

Nowadays, there are applications where data is best modelled not as persistent tables, but rather as transient data streams. In this keynote, we discuss:

  • The limitations of current machine learning and data mining algorithms;
  • The fundamental issues in learning in dynamic environments like learning decision models that evolve over time, learning and forgetting, concept drift and change detection.

Data streams are characterized by huge amounts of data that introduce new constraints in the design of learning algorithms: limited computational resources in terms of memory, processing time and CPU power.

In this talk, we present some illustrative algorithms designed to taking these constrains into account. We identify the main issues and current challenges that emerge in learning from data streams, and present open research lines for further developments.

 

José Borbinha is professor of Informatics at the IST, Lisbon University, and coordinator of the Information and Decision Support Systems Laboratory of the INESC-ID. He has a background in Electrical Engineering and a PhD in Computer Science and Engineering.

José Borbinha was the CIO of the National Library of Portugal from 1998 to 2005. He is senior member of the IEEE, and a member of BAD, the Portuguese Association of Librarians and Archivists. He is a member of the Steering Committee of the International Conference on Digital Preservation (iPRES) and the present chair of the Steering Committee of the conference on Theory and Practice of Digital Libraries (TPDL). He was a founding member of the IEEE Technical Committee on Digital Libraries (was the elected chair for the period 2008 to 2010), and a member of the Advisory Board of the Dublin Core Metadata Initiative. He is the present coordinator of the DLM Forum MoReq Expert Group (MoReq - modular requirements for records systems) and the representative of Portugal in the “ISO TC46/SC11 WG14 - Records requirements in Enterprise Architecture”.

He has been a coordinator and team member of a number of international research projects in the domains of information systems, Enterprise Architecture, digital libraries and digital archives, with a string focus on digital preservation and information management.

Magda Cocco holds a Law Degree by the Lisbon University Law School. She joined VdA (Vieira de Almeida Sociedade de Advogados) in 1994. Currently she is one of the partners responsible for TMT – Telecoms, Media & Technologies. She is also responsible for the Privacy, Data Protection, & Cybersecurity practice, along with Aerospace.

Magda has in-depth knowledge and experience in the e-communications industry across several jurisdictions, particularly Portugal and the Portuguese-speaking countries, having led multidisciplinary teams in different projects, and assisted Governments and Regulatory Entities in connection with the definition of regulatory policies and legislative drafting. She has also led various International Telecommunications Union and other international entities’ surveys.Magda has provided expert advice to companies and public entities across different industries on Data Protection & Cybersecurity, assisted several entities in connection with governance matters and data-related strategies, coordinated compliance programs and assisted public and private entities in connection with Cybersecurity threat situations.

Magda has also been involved in innovative technological projects with implications in these areas, and in the definition of legal policies and legislative drafting in various countries.Magda has been involved in various Space sector projects, including the negotiation of contracts for satellite construction and launch and for the installation of ground stations, and assisted Governments in connection with the definition and drafting of Space-related strategies and legislation.She is VdA’s representative at the International Astronautic Federation (IAF). She is responsible for liaising with The Alliance for Affordable Internet (A4AI). She participates in several Space sector-related forums, namely the United Nations Office for Outer Space Affairs (UNOOSA).

Magda is Assistant Professor at the University of Lisbon Law Schoo, and also holds these teaching chairs:

  • Lecturer – Space Law Course jointly organized by the ELSA (“European Law Students’ Association”) of the Portuguese Catholic University, Vieira de Almeida & Associados and VdAcademia.
  • Lecturer – Telecommunications Law and Regulation and Data Protection and Privacy Post-Graduate course at the Lisbon Catholic University.
  • Lecturer – Telecommunications Law course as part of the LLM in Public Law of the Lisbon Administration School.

 António Gonçalves ...

Elsa Cardoso is Assistant Professor at Instituto Universitário de Lisboa (ISCTE-IUL), in the Information Sciences and Technologies Department of the School of Technology and Architecture.

Elsa is the Director of the Master in Integrated Business Intelligence Systems at the same university and the leader of the Business Intelligence Task Force of EUNIS (European University Information Systems organization). She is also a researcher at the Information and Decision Support Systems Group of INESC-ID Lisboa, Portugal.

Elsa has a PhD (European Doctorate) in Information Sciences and Technologies from ISCTE-IUL, with a specialization in Business Intelligence. Her research interests include business intelligence and data warehouse, data visualization, and strategic information systems (balanced scorecard) applied to Higher Education, Healthcare and Digital Humanities.

Business intelligence and Analytics

...

 

The Lab Challenge

(coordinator)

Luis Cruz is currently a Ph.D. Student at the University of Porto, hosted as a researcher of the Information and Decision Support Systems Lab at INESC-ID. He is also a member of the Green Software Lab. His research interests are mainly Green Software, Mobile Development, Machine Learning and Living Analytics.

Luis has been involved in data science projects with companies such as PARC, a Xerox company, and Procter & Gamble.Luis is currently applying data science principles and techniques in his Ph.D. to improve mobile development processes concerning energy consumption of smartphone applications.

Data Cleaning

Clean your data if you care about its quality. Data wrangling is crucial to ensure that data is appropriate and ready to be used for analytic purposes. Often raw data comes with a myriad of different types of errors and inconsistencies. Thus, data transformations have to be performed before we can extract meaningful knowledge from our data. Raw data has to be transformed until we reach a criteria of data quality. This is why data wrangling comes with a distinct but no less important task: data quality assessment. In this lesson, we will first introduce the notion of data quality as an activity that encompasses two main tasks:

  1. data quality assessment; and
  2. data wrangling.

Second, we will explain the notion of data profiling which enables to assess the quality of a data set. Third, we will describe one of the main tasks of a data cleaning process that consists on the identification and consolidation of records that concern the same real world entity, also called approximate duplicate records. Finally, we will list some of the main data quality tools.

Alberto Cunha (IST) is professor at IST. He was a former researcher at INESC were he was involved in the design, development, and technology transfer, of innovative systems for office automation and electronic ticketing.

Alberto was member of the BoD of the Aitec-Link group with particular responsibility in the design of solutions and products for the Transports and Mobility sector. In 2009 he co-founded YouMove-Card4Business which developed, and presently maintains, many of the electronic ticketing systems for public transport in Portugal. From 2010 to 2013 he was member of the BoD of Taguspark, the main Science and Technology Park in Portugal, responsible for the bootstrap and operation of Taguspark business Incubator for technology-based startups.

IoT and Data Challenges

Electronics and sensor technologies enable mobile and personal devices and equipment (smartphones, smartcards and tags) with considerable computing and communication power, and will enrich our surrounding environment with sensory and reactive capabilities. The orchestration of all this potential, integrating Internet-based as well as specialized heterogeneous sub-systems, is the foundation of cyber-physical systems able to adapt in time to the context of use.

In this lesson we will address:

  • the architecture of intelligent sensors and embedded systems, its processing and communications constraints;
  • the basic models for local cyber-physical interaction;
  • models for data integration from sensor to IT infrastructure.

Finally we will analyze some cases of application of these technologies to monitor human activities and services, and to create responsive and safe environments, while keeping privacy.

Gabriel Coimbra is Country Manager at IDC Portugal. With more than 20 years' experience in ICT, Gabriel Coimbra is IDC's country manager for Portugal. In addition to his management duties, Coimbra is personally involved in planning and coordinating IDC's market intelligence and advisory services in Portugal.

Gabriel oversees research projects to ensure that IDC's quality standards are always met, and contributes to IDC's consulting projects in Portugal. Coimbra is frequently quoted in trade publications such as Expresso, Diário Económico, Revista Exame, and Jornal de Negócios. He is also a frequent presenter at major digital transformation and ICT events in Portugal, Angola, Mozambique, Brazil, and Spain. He engages with academia and cooperates with the postgraduate programs in information systems and digital transformation at some of the most influential universities in Portugal, such as Católica Lisbon School of Business & Economics, NOVA Information Management School, ISEG, and Porto Business School.

Gabriel has a master's degree in statistics and information management from NOVA Information Management School, and a postgraduate degree in advanced management from the Católica Lisbon School of Business & Economics.

Ricardo Pires is Partner and Business Intelligence Lead at Xpand IT. He worked as a consultant on business intelligence solutions for over 10 years. During this period, he has implemented and managed many projects using mainly agile methodologies and also performed training sessions and workshops worldwide.

Currently, Ricardo is the business intelligence unit leader he manages this practice at Xpand IT, ensuring its’ strategic planning and alignment with the company goals as well as overseeing the team, main projects and key accounts.
He is a data enthusiast as well as a Tableau and Pentaho fan.

Laboratory Tool: Tableau

Tableau Introduction and live demo – Tableau is the leading software for BI & analytics, allowing people to see and understand their data. Data is at the heart of organizations and the fact that Tableau is so intuitive allows answering business questions faster than ever making it easier to adapt on a constantly changing world.

During this presentation you will understand what Tableau is and watch in action, on a live demo, going from data to business insights.

Bruno Silva is a Business Intelligence (BI) consultant and trainer for the Hitachi Vantara company, that now owns the Pentaho software.

Bruno has been working on Business Intelligence projects for 5 years, using the different tools inside Pentaho to cover all the stages of a BI project: from the early phase of data Modeling and data “Extraction, Transformation and Load” (ETL) to the Presentation phase, developing reports and tailored dashboards with custom visualizations. He’s passionate about transforming data into meaningful information.

Lababoratory Tool: Pentaho

Pentaho is a BI software platform based on open standards. It is made of several tools to assist on every stage of a BI project lifecycle. “Pentaho Data Integration” (PDI) is the data integration tool inside Pentaho. It allows you to build your ETL work in a very intuitive and scalable way, but it also allows you to do much more than that…!

In this presentation you will get a very brief introduction to Pentaho as a whole and a Crash Course on PDI, to get the knowledge you will need to accomplish the Winter School’s labs sessions and final challenge.

João Magalhães is a Senior Researcher at the NOVA Laboratory for Computer Science and Informatics and Assistant Professor at the Dep. of Computer Science of the Universidade NOVA Lisboa.

João holds a Ph.D. degree (2008) in Computer Science from Imperial College London, UK. His research interests cover the different problems of multimedia information processing and indexing: image and video analysis for the media and health domains, social-media information analysis, multimedia retrieval models, recommender systems and scalability issues of search systems.

He has coordinated two research projects with the USA, with the University of Texas at Austin (2010-2014) and Carnegie Mellon University (2016-2020). He is currently the coordinator of the NOVA-LINCS participation on the EU-ICT H2020 COGNITUS project. He has also participated in several research projects (national and EU-FP7). He is regularly involved in international program committees and EU project review panels. He is author or co-author of more than 60 publications in journals, conferences and books.

João received the ACM International Conference in Image and Video Retrieval 2007 Best Paper Award, the Portuguese national engineering association Young Engineer Award in 2002, and the Siemens Innovation and Communications Award in 2002

Social Media Streaming

Social-media networks is a commodity of today’s societies. Fast access to relevant media information is a pressing need of modern societies, however, social-media information is spread at a rate that breaks the limits of current information processing architectures.

In this setting, there is a growing interest in systems that answer user information needs in “continuous document streams”. These systems cross information and user-data from many different services and sources, tracking information cascades that are continuously pouring from end-users.

Aggregating or summarizing information picked from social-media streams and redistributing it, creates many technical and business innovation opportunities. In this talk I’m going to discuss algorithms to mine and link information about the same event or to the same information fact and about ways of presenting continuous or temporal information to users.

Fueled by recent advances of the research community and by our own experiences in mining and tracking user data, I will discuss the need for a research agenda that addresses key challenges: information veracity, secure data sharing and ownership of content and information

Miguel Coimbra is currently a PhD student at Instituto Superior Técnico, hosted as a researcher at INESC-ID.

Miguel is currently researching distributed graph processing, in the scope and with the support of the Distributed Systems Group (GSD) and the Decision Support Systems (IDSS) laboratory.

Apache Flink (demo)

This talk will introduce and detail the architecture and internal mechanisms of Apache Flink that make it a suitable basis for easily developing stream processing-based applications. Quoting from https://flink.apache.org/: "Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams."

Diogo R. Ferreira is professor of information systems at IST, University of Lisbon, where he works on process mining, data analysis, and systems integration. He has taught these subjects extensively to BSc, MSc and PhD students in computer science and other engineering degrees. He is the author of two books, namely "A Primer on Process Mining: Practical Skills with Python and Graphviz" (Springer, 2017) and "Enterprise Systems Integration: A Process-Oriented Approach" (Springer, 2013). He has supervised about thirty graduate students, and is the author of numerous publications. He holds a PhD on workflow systems from the University of Porto (2004), and soon afterwards moved to the analysis of event logs and other approaches to discover business processes from real-world event data.

Alberto Sardinha is an Assistant Professor in the Department of Computer Science and Engineering at Instituto Superior Técnico, Universidade de Lisboa, and a Senior Researcher at GAIPS / INESC-ID. His research interests include the fields of machine learning, process mining, multi-agent systems and automated software engineering. He also teaches courses at the BSc, MSc and PhD level on databases, process mining and systems integration with SOA and BPM.

Process Mining

This talk will introduce the fundamental concepts of process mining, namely:

  • what are event logs and where they come from;
  • how to analyze an event log from the control-flow perspective in order to discover the sequence of tasks;
  • how to analyze an event log from the organizational perspective in order to discover the interactions and collaborations between users;
  • how to analyze an event log from the performance perspective in order to study the average time that is spent on performing each task.

We will illustrate the application of these concepts in a real-world case study and, if there is time, we will also provide a brief overview of some representative process mining tools.

Bruno Horta Soares is IT Executive Senior Advisor at IDC Portugal.

With more than 15 years of Information Systems professional services experience, particularly in areas related with Governance, Risk, Control, Audit, Information Security & Privacy and Digital Transformation. Started his career at Deloitte Consulting, worked for Information Risk Management area at KPMG and for Enterprise Risk Services area at Deloitte Portugal.

In 2012 he found GOVaaS - Governance Advisors as-a-service, where he is currently Senior Advisor, and since then devoted enthusiastically to advising, teaching and training professional and Organizations in Portugal, Angola, Brazil and Mozambique. Currently actively collaborates with an ecosystem of local and international partners, particularly IDC Portugal where since 2015 he is IT Executive Senior Advisor for Digital Transformation, Governance, Strategy and Security.

He has a 5 years degree in Management and Computer Science, from ISCTE and a post-degree in Project Management, from ISLA Campus Lisboa.

He is certified in Project Management Professional (PMP), from PMI, Certified Information Systems Auditor (CISA), Certified in the Governance of Enterprise IT (CGEIT) and Certified in Risk and Information Systems Control (CRISC) and COBIT 5 Foundation from ISACA, ITIL® version 3 Foundation and ISO/IEC 27001 LA . He’s also APMG individual accredited trainer for COBIT 5.

He’s advisor and visiting professor at ISCAC - Coimbra Business School, Instituto Superior Técnico (IST), Universidade Portucalense (UPT), Universidade Europeia | Laureate International Universities, Católica Lisbon Business & Economics (UCP), Universidade Católica de Angola, Porto Business School and Unipê - Centro Universitário de João Pessoa - Paraíba, Brasil.

He’s the founding President of the ISACA Lisbon Chapter, member of several professional associations in the areas of Auditing (IIA), IT Governance (ISACA, IPCG), and Project Management (PMI) and keynote speaker at various conferences and seminars.

Governance of IT, Privacy and Security

...

 

Miguel Won is a Post-Doc researcher at INESC-ID. He is currently working in the field of Natural Language Processing (NLP) applied to Political and Communication Sciences.

Named Entity Recognition in Text-as-data (demo)

Within the context of text-as-data paradigm, new type of analysis and metrics are today available to the political science research, allowing new and different approaches that previously were virtually impossible. In this short talk Miguel will show how text processing, in particular named entity recognition and key-phrase extraction tools, can help and guide political scientists in their research questions.

Laboratory:
  • In the Laboratory, students will work in heterogeneous groups (putting together business and engineering backgrounds; industry and academia as also experiences), addressing with complimentary views of a "Digital Transformation" challenge. The objective will be to exercise the technology and concepts from classes in a plausible scenario...
  • In the end, groups must be able to suggest and discuss solutions plausible for the proposed challenge. To assess that and support the discussion, students also will be exposed and motivated to make use of:
Laboratory with the collaboration of:
About the IDC Maturity Scape...

About the Penthao software...

About the Tableau software...