Available Student Projects

Student Projects

Thanks for your interest in our students projects!

Please note that all the following BSc projects are only for University of Fribourg BSc students, and all MSc projects are only for students admitted to the Swiss Joint Master in Computer Science.

Do not hesitate to contact us if you are looking for a project in Big Data, Database / Information Systems, Semantic Web, Linked Data, Crowdsourcing or Time Series.

The past offerings are listed in the section Completed M.Sc. Projects.

Current Offerings

Title	Category	Contact
ImputeVIS: Visualizing Time Series Recovery using ImputeBench	Time Series, Missing Values, benchmark	Mourad Khayati
A Configuration-Free Repair of Time Series	Time Series, missing values, feature extraction	Mourad Khayati
Digital Data Diver and Multimodal Human Activity Recognition	mobile medical device, data annotation, activity recognition	Philippe Cudre-Mauroux
Comparing Big Graph Databases	Social Networks, noSQL, BigData	Philippe Cudre-Mauroux
DNA_DB: A Database System to Manage Very-Large 3D DNA Data	DNA, Big Data, Spatial Information, Bioinformatics	Philippe Cudre-Mauroux
Open Source Object storage engine	Cloud Computing, Open-Source, Databases, Big Data	Philippe Cudre-Mauroux
Smarter Cities Array Data Management	Smarter Cities, Big Data	Philippe Cudre-Mauroux
Social Marketing FootPrint	Social Networks, noSQL	Philippe Cudre-Mauroux

ImputeVIS: Visualizing Time Series Recovery using ImputeBench

Level: BSC/MSc
Prerequisites: programming skills (web programming), Machine Learning, and statistics
Description:
Real-world time series often contain missing values due to sensor failures, power outages or transmission problems. The recovery of those missing values allows better analysis of time series. Several methods have been proposed to recover missing values in time series, which can be algebraic, statistical, machine learning, etc. Selecting the best recovery method highly depends on the time series features, the missing rate and type, and error metric. ImputeBench is a popular terminal-based benchmark that compares over 15 missing value imputation techniques algorithms. It relies on a reusable code framework, a large selection of representative time series, and mechanisms to automate benchmarking. Prediction in time series is a longstanding problem. Several prediction techniques have been proposed in the literature, each showing superior results in some use cases. However, little is known about their relative performance, as existing comparisons are limited to either a small subset of relevant algorithms or to very few datasets or often both. Drawing general conclusion about the performance of prediction techniques remains a challenge.

The goal of the thesis is to construct a helper graphical tool aimed at visualizing different aspects of missing value recovery process - parametrization, patterns of missing values, time series features, etc. more…
Contact: Mourad Khayati

A Configuration-Free Repair of Time Series

Level: MSc
Prerequisites: programming skills (python or c), Linear Algebra, and Machine Learning
Description:
Real-world time series often contain missing values due to sensor failures, power outages and transmission problems. The recovery of these missing values allows better analysis of time series. Several methods have been proposed to recover missing values in time series, which can be matrix-based, pattern-based or machine learning-based technique. Selecting “the best” recovery method highly depends on the dataset properties and often requires users to run multiple approaches with a different set of configuration parameters.

The aim of this thesis is to study and compare different ways to perform a recommendation of recovery techniques. The recommendation will relieve the user from the task of selecting and configuring recovery algorithms The thesis will focus on two classes: feature-based approach and parameter-based approach. The output of the thesis will be a solution that allows to select the most appropriate recovery technique in in a systematic way more…
Contact: Mourad Khayati

Digital Data Diver and Multimodal Human Activity Recognition

Level: M.Sc.
Prerequisites: R (first part) and Python (second part) programming skills, applied Machine Learning, knowledge in (Non-)SQL Databases, experience in wearable sensor data
Description:
This master thesis will be performed with the Innovative Digital Endpoint Analytics (IDEA) group at Novartis in Basel. The IDEA group is working across different therapeutics areas to develop novel analytical approaches for next-generation mobile medical device data. In particular, it helps the clinical teams to leverage device-derived data within their studies with the goal to show the efficacy of Novartis drugs in proof-of-concept (POC or Phase 2b) clinical trials.

The aim of this master thesis is to develop and validate a device-agnostic web tool (DDD, preferably in R Shiny) for
- ingestion (read raw- and pre-processed data, labels and patient reported outcomes),
- integration (combine multimodal streams into one data structure),
- data annotation (offer the opportunity to manually label data),
- preparation and analysis (make the data “machine-learnable”), and
- visualization (show daily or monthly summaries of interesting features), of clinical trial data.
In the second part of the thesis the student with focus on multimodal human activity recognition. Different methods from literature will be evaluated, including recent approaches using deep learning. The evaluation should result in recommendations on sensor placement, which sensors need to be included and how much data has to be collected in order to get acceptable results.

The data needed to develop DDD has already been collected. For the human activity recognition part the student will have the opportunity to collect his own data from wearable sensors available in the lab.
Contact: Philippe Cudre-Mauroux

Comparing Big Graph Databases

Level: M.Sc. / B.Sc.
Prerequisites: good programming skills
Description:
There is an interesting number of new systems capable of storing and managing very large graphs, e.g., for social networks or for the Web of Data. This project aims to compare the different systems, both from a feature perspective, and from an empirical perspective by developping and the deploying and measuring an application on top of different systems. Some of the systems envisioned for this task are: Neo4j http://www.neo4j.org/ , Titan http://thinkaurelius.github.io/titan/ , Giraffe http://giraph.apache.org/ and AsterData Graph.
Contact: Philippe Cudre-Mauroux

DNA_DB: A Database System to Manage Very-Large 3D DNA Data

Level: M.Sc.
Prerequisites: C++ or Java
Description:
This project deals with the design, implementation and test of a new Data Management system to store 3D DNA data. The idea is to leverage on recent developments in 2D / 3D data management (e.g., our previous TrajStore project ) but to build a new system tailored for DNA analysis. This project will be carried out in cooperation with McGill University.
Contact: Philippe Cudre-Mauroux

Open Source Object storage engine

Level: B.Sc. / M.Sc.
Prerequisites: Cassandra, Java
Description:
Exoscale is the leading swiss cloud platform and provides computing to a large base of swiss and worldwide customers. To complement the public computing offering, a new S3 compatible object storage service will be launched in the near future. The goal of this project is to design the schema of a large distributed system built around Cassandra and release it as both the commercial offering for exoscale and as a standalone Open Source project. In particular, the student will design the schema of the Cassandra cluster, assess the performance and consistency across a large number of nodes and perform reliability testing.
Contact: Philippe Cudre-Mauroux

Smarter Cities Array Data Management

Level: M.Sc.
Prerequisites: C++
Description:
This project deals with the design, implementation and test of a new Data Management system for future (smarter) cities; the system will handle Big Data problems for critical infrastructures such as water networks, energy grids etc. It will be based on the SciDB open-source array data system. This project will be carried out in cooperation with the new IBM Research Smarter Cities Center in Dublin.
Contact: Philippe Cudre-Mauroux

Social Marketing FootPrint

Level: M.Sc. / B.Sc.
Prerequisites: noSQL
Description:
Aka Social Data Acquisition and Social Graph Processing

Today, social networks are the first choice for marketing campaigns. They promise to serve well targeted, viral, highly customizable advertisement while getting direct customer feedback and engagement. The numbers generated by the Internet companies serving digital advertisements are astronomical: Google $43B (2012), Facebook $6B, etc…

In this context of online marketing through social networks, the tasks of this project are split in two parts, the first being more pragmatic (hands-on) and the second more theoretical:
1. Social Data Acquisition
  - Crawl a number of public APIs (Facebook, G+, Twitter, etc…)
  - Store the data in a database
  - Build a web interface to search through the data
2. Social Graph Processing
  - Starting from one given node in the graph (i.e. a particular company), the social graph will be quantified and analyzed.
  - The different data sources (Facebook, Twitter, etc…) will be correlated to discover non obvious links (like sister entities, etc…) and interactions
  - For instance A always retweet B which always retweet C; A, B and C are part of the same cluster
The goal of the project is to quantify and classify the marketing footprint of companies on social networks.

The student choosing this project will have the opportunity to acquire in-depth hands-on experience on state-of-the-art APIs, storage (graph or NoSQL database), graph processing and data mining.
Contact: Philippe Cudre-Mauroux

Completed MSc Projects

Comparison of Synthetic Time Series Data Generation Techniques , Jonas Fontana, September 2021 [pdf]
A Configuration-Free Repair of Time Series , Guillaume Chacun, September 2021
Evaluating Text Classification Models on Multilingual Documents, Julia Eigenmann, September 2021 [pdf]
Multiclass Classification of Open-ended Answers , Louis Müller, March 2021
Benchmark of Time Series Management Systems using Analytical Queries , Gabriela-Carmen Voroneanu, February 2021 [pdf]
P-Hydra: Bridging Transfer Learning And Multitask Learning, Jiyoung Lee, December 2020 [pdf]
GraphEDM: A Graph-Based Approach to Disambiguate Entities in Microposts , Prathyusha Nerella, November 2020 [pdf]
Correlation-based Anomaly Detection in Time Series , Adrian Hänni, September 2020 [pdf]
Using Supervised Machine Learning To Identify Swiss Companies’ Websites , Zeno Bardelli, September 2020 [pdf]
Prostate Cancer Classification: A Transfer Learning Approach to Integrate Information From Diverse Body Parts , Julien Clément and Johan Jobin , March 2020 [pdf]
Incremental Enrichment of Taxonomies using Transfer Learning , Mili Biswas, October 2020 [pdf]
Entity Disambiguation using Embeddings, Lejla Metohajrová, July 2019 [pdf]
Web Table Annotation Using Knowledge Base, Yasamin Eslahi, July 2019 [pdf]
Trend Prediction on Fashion Data , Ahana Malik, November 2019 [pdf]
Modeling the Evolution of Fashion Trends using Matrix Factorization Techniques , Leutrim Kaleci, February 2018 [pdf]
Big Data for Automatic Relation Extraction in Natural Language Processing, Jeremy Serre, November 2017 [thesis] [presentation]
Statistical Type Inference, Soheil Roshankish, September 2017 [thesis] [presentation] [StaTIX project] [TInfES benchmarking]
Integration of DeepDive (Declarative Knowledge Base Construction) and Apache Spark, Ehsan Farhadi, August 2017 [thesis]
Real-Time Centroid Decomposition of Streams of Time Series, Oliver Stapleton, February 2017 [pdf]
Implementation of Centroid Decomposition Algorithm on Big Data Platforms, Qian Liu, February 2016 [pdf]
Online Anomaly Detection over Big Data Streams, Laura Rettig, October 2015 [pdf]
Real-Time Anomaly Detection in Water Distribution Networks using Spark Streaming, Stefan Nüesch, November 2014 [pdf]
HDFS Blocks Placement Strategy, Phokham Nonava, October 2014 [pdf]
Crowdsourced Product Descriptions and Price Estimations, Steve Aschwanden, July 2014 [pdf]
Real Time Data Analysis for Water Distribution Network using Storm, Simpal Kumar, May 2014 [pdf]
Crowd-Flow Designer: An Open-Source Toolkit to Design and Run Complex Crowd-Sourced Tasks, Dani Rotzetter, February 2014 [pdf]
Geographical Impact of Microblogging Social Networks, Roger Kohler, February 2014 [pdf]
Building a full-text index on a NoSQL Store, Thi Thu Hang Nguyen, August 2013 [pdf]
Big Data analytics on high velocity streams, Thibaud Chardonnens, July 2013 [pdf]
Know your crowd: The drivers of success in reward- based crowdfunding, Jonas Wechsler, July 2013 [pdf]
A Comparison of Different Data Structures to Store RDF Data, Rashmi Bakshi, March 2013 [pdf]
Analysis of Mobile Data Services and Internet in Switzerland, India and Tanzania, Ahmed Shams, February 2013 [pdf]
Moji - The advent of large identifiers and how to conquer them as humans, Michael Luggen, September 2012 [pdf]
Unconventional Store Systems for RDF Data - Comparison between Registry Systems used as Semantic Web RDF Data Stores, Iliya Enchev, September 2012 [pdf]
GetSound - A noise level storage and visualization system using models to generate probabilistic data, Mariusz Wisniewski, September 2012 [pdf]
DNS^3: a DNS Semantic-Aware Service for Security and Authoritative Information, Ahmed S. Mostafa, May 2012 [pdf]
Using the Deep Web to Extend the Linked Data Web, Michal Musial, February 2012 [pdf]