Fall 2023

Data Science Seminar: Benchmarking

Lecturers: Mourad Khayati and Alberto Lerner

Teaching language: English

Level: MSc students

Academic year: Fall 2023

Overview

Structure

Evaluation and Expectations

Schedule

List of Papers

Overview

The data science seminar involves presentations covering recent topics in data science. The area of this year’s seminar is benchmarking. In the scope of this seminar, we investigate papers that describe various benchmarks for systems, algorithms, and data generation. The papers explore configuration mechanisms and parameterization techniques that optimize the performance of the evaluated entities when applied to large datasets.

Structure

The goal for the students is to learn how to critically read and study research papers, describe a paper in a report, and present it in a seminar. Under supervision, students will select one paper to study and compare it with related work. This seminar aims to help students gather in-depth knowledge of an advanced topic and develop the skills required to describe a complex problem from the time series field in the form of both a presentation, a written report, and an empirical evaluation.

IMPORTANT NOTE: The papers will be distributed on a first-come, first-serve basis.

Evaluation and Expectations

The final grade depends on the quality of the report, presentation, reproducibility experiments, and active participation during the seminar. Each participant prepares a self-contained report of min 6 pages and gives a presentation of 30 minutes. The report should describe the proposed benchmark in detail. The report might contain a small running example, counterexample(s), and should explore the extreme cases where the evaluated systems and algorithms would perform best and worst. The reproducibility consists of reproducing the same set of experiments introduced in the paper using a different setup (dataset, metric, parameters, etc.).

Advice on how to:

write the report.
prepare the presentation.

IMPORTANT NOTE: Attendance is mandatory for the two-class seminar sessions. The total number of participants will be limited to 10.

Schedule

Kickoff Meeting. Date: Tue, 26.09.2022, 15:00-16:30, room: A403

Setup and organization of the seminar and paper assignment

----------------------------------------------------------------------

Date: Tue, 07.11.2023
Report deadline Batch1

Date: Tue, 14.11.2023, all day, room: C433 or C411

Office meeting with students from Batch1

First Seminar Session. Date: Tue, 21.11.2023, 15:00-18:00, room: G414

Presentations of Batch1

----------------------------------------------------------------------

Date: Tue, 28.11.2023
Report deadline of Batch2

Date: Tue, 05.12.2023, all day, room: C433 or C411
Office meeting with students from Batch2

Second Seminar Session. Date: Tue, 12.12.2023, 14:15-17:00, room: G414

Presentations of Batch2

----------------------------------------------------------------------

Date: Tue, 16.01.2024
Deadline final Report of Batch1 and Batch2

Paper Assignment

The papers will be distributed on a first-come, first-serve basis. To select one paper from the list of papers, please use the following link.

Paper & code	Presentation Date	Presenter	Mentor
Pollock: A Data Loading Benchmark, PVLDB’2023	21.11.2023	Sophie Pfister	Alberto Lerner
Towards Benchmarking Feature Type Inference for AutoML Platforms, SIGMOD’21	21.11.2023	Adriana Moisil	Mourad Khayatii
Series2Graph: Graph-based Subsequence Anomaly Detection for Time Series	21.11.2023	Majid Samar	Mourad Khayati
ADBench: Anomaly Detection Benchmark, NeurIPS’2022	21.11.2023	De Soham	Alberto Lerner
TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection, PVLDB’22	12.12.2023	Abeer Refay	Alberto Lerner
Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series, PVLDB’21	12.12.2023	Nadezhda Videneeva	Mourad Khayati
Anomaly Detection in Time Series: A Comprehensive Evaluation, PVLDB’22	12.12.2023	Deborah Schaer	Mourad Khayati
What Is the Price for Joining Securely? Benchmarking Equi-Joins, PVLDB’2022		Not Assigned	Alberto Lerner
FEBench: A Benchmark for Real-Time Relational Data Feature Extraction, PVLDB 2023		Not Assigned	Alberto Lerner
Benchmarking Learned Indexes, PVLDB’21		Not Assigned	Alberto Lerner
M2Bench: A Database Benchmark for Multi-Model Analytic Workloads, PVLDB’23		Not Assigned	Mourad Khayati
Benchmarking Learned Indexes, PVLDB’2022		Not Assigned	Alberto Lerner
Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation, PVLDB 2022		Not Assigned	Alberto Lerner