Fall 2015
Data Science Seminar
Lecturers: Mourad Khayati, Djellel Difallah
Teaching language: English
Level: MSc students
Academic year: Fall 2015
Overview
The seminar on data science involves presentations that cover recent topics on data science. In the scope of this seminar, we investigate two sets of papers. The first set of papers will cover scalable machine learning techniques. A special focus will be on clustering, compression and similarity techniques used for time series data and graphs. Additionally, matrix decomposition/factorization and sentiment analysis techniques will be studied.
The second set of papers will cover big-data management infrastructures. We will focus on data storage techniques tailored to specific data types, e.g., graphs, time-series and arrays, in addition to generic data formats used in scalable distributed file systems such as Hadoop's HDFS. We will also consider papers on job scheduling techniques used in large data processing centers shared by thousands of data scientists
Structure
The goal for the students is to learn how to critically read and study research papers, how to describe a paper in a report, and how to present it in a seminar. Under supervision, students will select one paper to study, contrast and compare with related work. This seminar aims to help students to gather in-depth knowledge of an advanced topic and develop the skills required to describe a complex problem in the form of both a presentation and a written report.
IMPORTANT NOTE: The papers will be distributed on a first come first serve basis.
Evaluation and Expectations
The final grade depends on the quality of the report, presentation and active participation during the seminar. Each participant prepares a self contained report of max 10 pages and gives a presentation of 20 minutes. The report should describe in detail the proposed technique(s). The report might contain a small running example and should explore the extreme cases where the proposed approach would perform best and worst.
IMPORTANT NOTE: Attendance is mandatory for the two class seminar sessions.
Schedule
Kickoff Meeting. Date: Tue, 22.09.2015, 14:00-15:00
Setup and organization of seminar, and paper assignment
----------------------------------------------------------------------
Date: Wed, 1.11.2015
Report deadline Batch1
Date: Wed, 08.11.2017, all day
Office meeting with students from Batch1
First Seminar Session. Date: Wed, 15.11.2017, 09:15-13:00, room: A303
Presentations of Batch1
----------------------------------------------------------------------
Date: Wed, 29.11.2017
Report deadline of Batch2
Date: Wed, 06.12.2017, all day
Office meeting with students from Batch2
Second Seminar Session. Date: Wed, 13.12.2017, 09:15-13.00, room: A303
Presentations of Batch2
Date: Wed, 10.01.2018
Deadline final Report of Batch1 and Batch2
Paper Assignment
The papers will be distributed on a first come first serve basis. Please use the following online application to select one paper among the list of papers below: assignment app
Paper | Presentation Date | Presenter | Contact | Report Deadline |
17.11.2015 | Adrian Hänni | 3.11.2015 | ||
17.11.2015 | Soheil Roshankish | 3.11.2015 | ||
(3) Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing | 17.11.2015 | Jeremy Serre | 3.11.2015 | |
(4) Distributed Representations of Words and Phrases and their Compositionality + implementation | 17.11.2015 | Alexandre Nikodemski | 3.11.2015 | |
(5) Entity Linking meets Word Sense Disambiguation: a Unified Approach | 17.11.2015 | Axel Cotting | 3.11.2015 | |
15.12.2015 | Felix Meyenhofer | 1.12.2015 | ||
15.12.2015 | Oliver Stapleton | 1.12.2015 | ||
(8) Time series anomaly discovery with grammar-based compression | 15.12.2015 | Abir Ben Slimane | 1.12.2015 | |
(9) Paxos Quorum Leases: Fast Reads Without Sacrificing Writes | 15.12.2015 | Marian Briceag | 1.12.2015 | |
(10) Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning | 15.12.2015 | Arun Sittampalam | 1.12.2015 |