Available Student Projects

Student Projects

Thanks for your interest in our students projects!

Please note that all the following BSc projects are only for University of Fribourg BSc students, and all MSc projects are only for students admitted to the Swiss Joint Master in Computer Science.

Do not hesitate to contact us if you're looking for a project in Big Data, Database / Information Systems, Semantic Web, Linked Data, Crowdsourcing or Social Computing.

Current Offerings

Title Category Contact
Trend prediction using fashion datasets Prediction, Fashion trends, Graphical interface Mourad Khayati
Type Inference for Semantic Datasets Linked Data, Semantic Databases, Information Retrieval, Semantic Type Inference Artem Lutov
Graphical Interface for Real Time Recovery of Missing Values Recovery of missing values, Graphical interface Mourad Khayati
A Hybrid Approach to Enable Real-time Queries to End-Users Hadoop, BigData Philippe Cudre-Mauroux
Automating high-quality translations for Mobile Apps (not available) Roman Prokofyev
(Big) Data Scepticism in Practice Philippe Cudre-Mauroux
Comparing Big Graph Databases Social Networks, noSQL, BigData Philippe Cudre-Mauroux
DNA_DB: A Database System to Manage Very-Large 3D DNA Data DNA, Big Data, Spatial Information, Bioinformatics Philippe Cudre-Mauroux
Multi-document Summary Generation Personalized by the Query Information Retrieval Artem Lutov
Open Source Object storage engine Cloud Computing, Open-Source, Databases, Big Data Philippe Cudre-Mauroux
Real-time data collection for IDEs (not available) IDE, Data collection, Python Roman Prokofyev
Recognizing User's Activity for the case of Public Transportation Human Activity Recognition, Public Transport, Machine Learning Roman Prokofyev
Scalable Human-based Grammar Errors Detection and Correction Djellel Difallah
Smarter Cities Array Data Management Smarter Cities, Big Data Philippe Cudre-Mauroux
Social Marketing FootPrint Social Networks, noSQL Philippe Cudre-Mauroux
Optimal Partitioning of Semantic DBs into Shards and Queries Routing Semantic Databases, Operations Research, Big Data Artem Lutov
The past offering are listed as Completed M.Sc. Projects

Trend prediction using fashion datasets

  • Level: B.Sc./M.Sc.
  • Prerequisites: Java script
  • Description:

    Fashion is a fascinating domain that gained a lot of attention during the last decade due the emergence of online shopping, social media and mobile computing. A challenging task in the fashion domain is to answer this query: What is the future of fashion? . Figure 1 graphically illustrates the prediction in fashion datasets more…

  • Contact: Mourad Khayati

Type Inference for Semantic Datasets

  • Level: M.Sc.
  • Prerequisites: programming skills (desirable Scala, or Java, or Python, or Go), understanding of RDF. C++, algorithmic background and probability theory would be a plus
  • Description:

    Understanding semantic datasets is crucial in order to properly use them. Unfortunately, the majority of the published semantic datasets lack type information to some extend. For example, DBpedia entities typically only have ~64% of types defined. However, some of the missing types can be inferred from other entities by analysing their mutual properties. Also, new types can be discovered by identifying groups of objects with similar properties.

    In this project, the student will need to extend our Statistical Type Inference framework (StaTIX) with the semantic type inference taking into account semantics of the entity attributes and entailment rules. The project provides the opportunity to work on Big Data and to contribute to the Open Knowledge community by refining existing Linked Open Data.

  • Contact: Artem Lutov

Graphical Interface for Real Time Recovery of Missing Values

  • Level: B.Sc./M.Sc.
  • Prerequisites: Java
  • Description:

    The Centroid Decomposition (CD) is a matrix decomposition technique that has been successfully applied for the recovery of blocks of missing values in time series. It takes as input a set of correlated time series and reconstructs the type, the shape and the amplitude of the missing bocks by learning from the history of the time series that contains the missing blocks together with the history of other correlated time series. CD based recovery technique outperforms the state-of-the-art techniques, e.g., REBOM, for the recovery of blocks of missing values in shifted time series more…

  • Contact: Mourad Khayati

A Hybrid Approach to Enable Real-time Queries to End-Users

  • Level: M.Sc.
  • Prerequisites: C++
  • Description:

    Since it became an Apache Top Level Project early 2008, Hadoop has established itself as the de-facto industry standard for batch processing. Running data analysis and crunching petabytes of data is no longer fiction. But the MapReduce framework does have two major downsides: query latency and data freshness.

    At the same time, businesses have started to exchange more and more data through REST API, leveraging HTTP words (GET, POST, PUT, DELETE) and URI (for instance http://company/api/v2/domain/identifier), pushing the need to read data in a random access style – from simple key/value to complex queries.

    Enhancing the BigData stack with real time search capabilities is the next natural step for the Hadoop ecosystem, because the MapReduce framework was not designed with synchronous processing in mind.

    There is a lot of traction today in this area and this project will try to answer the question of how to fill in this gap with specific open-source components and build a dedicated platform that will enable real-time queries on an Internet-scale data set. This project will be carried out in cooperation with VeriSign Inc.

  • Contact: Philippe Cudre-Mauroux

Automating high-quality translations for Mobile Apps (not available)

  • Level: M.Sc.
  • Prerequisites: Java, Android, Lucene is a plus
  • Description:

    Every day, hundreds of mobile applications are added to stores such as Google Play or AppStore. Many of them are intended to be used internationally, and thus require translation of the interface. At the same time, many more mobile apps already available for download in these stores. Leveraging the translation bases of existing applications, we could immediately provide high-quality translations for new apps without need to go though a human-translation process. This project aims to extract and parse translations of existing applications to see if they can be used to translate new ones.

  • Contact: Roman Prokofyev

(Big) Data Scepticism in Practice

  • Level: M.Sc.
  • Prerequisites: Java, SQL, R, Hadoop
  • Description:

    “Doubt everything or believe everything: these are two equally convenient strategies. With either we dispense with the need for reflection.” - Henri Poincare

    Any database system represents a certain view on the universe. Tangible data objects in such systems are often called facts, but how much truth is behind those facts? This project aims at setting up a concrete framework for data quality testing oriented towards quantifying how much one can trust the data stored within a database systems. We foresee at least two concrete use cases from different domains, specifically CRM-oriented business processes and mobile network signaling, for which data quality testing process has to be desined and implemented using the Big Data technology stack (Hadoop 2.x, Apache Spark, Apache Solr, etc). This will not only require relying on data-agnostic approaches for anomaly detection, s.a. event counting, but also on incorporating business logic associated with a given business process plus the ability of performing validation tests, using various ground truth data sources by applying inter alia Bayesian statistics techniques. This project will be carried out in cooperation with Swisscom’s Data Science team.

  • Contact: Philippe Cudre-Mauroux

Comparing Big Graph Databases

  • Level: M.Sc. / B.Sc.
  • Prerequisites: good programming skills
  • Description:

    There is an interesting number of new systems capable of storing and managing very large graphs, e.g., for social networks or for the Web of Data. This project aims to compare the different systems, both from a feature perspective, and from an empirical perspective by developping and the deploying and measuring an application on top of different systems. Some of the systems envisioned for this task are: Neo4j http://www.neo4j.org/ , Titan http://thinkaurelius.github.io/titan/ , Giraffe http://giraph.apache.org/ and AsterData Graph.

  • Contact: Philippe Cudre-Mauroux

DNA_DB: A Database System to Manage Very-Large 3D DNA Data

  • Level: M.Sc.
  • Prerequisites: C++ or Java
  • Description:

    This project deals with the design, implementation and test of a new Data Management system to store 3D DNA data. The idea is to leverage on recent developments in 2D / 3D data management (e.g., our previous TrajStore project ) but to build a new system tailored for DNA analysis. This project will be carried out in cooperation with McGill University.

  • Contact: Philippe Cudre-Mauroux

Multi-document Summary Generation Personalized by the Query

  • Level: M.Sc.
  • Prerequisites: good programming skills (desirable Python or Go or Scala), algorithmic background. C++ would be a plus
  • Description:

    Multi-document summarization is an automatic procedure aimed at the extraction of information from multiple texts written about the same topic. Multi-document summarization systems are complementing news aggregators on the road of coping with information overload. We aims to extract the summary over multiple documents in correspondence with the given query. The task is close to systems of automatic answer generation like IBM Watson, but of course much simpler.

    This project is a great opportunity to dive into the full stack of information retrieval! The implementation can vary from a feasibility prototype based on Bayesian Inference between sentences up to a pretty complex system that uses Semantic Graphs and Entities Disambiguation.

  • Contact: Artem Lutov

Open Source Object storage engine

  • Level: B.Sc. / M.Sc.
  • Prerequisites: Cassandra, Java
  • Description:

    Exoscale is the leading swiss cloud platform and provides computing to a large base of swiss and worldwide customers. To complement the public computing offering, a new S3 compatible object storage service will be launched in the near future. The goal of this project is to design the schema of a large distributed system built around Cassandra and release it as both the commercial offering for exoscale and as a standalone Open Source project. In particular, the student will design the schema of the Cassandra cluster, assess the performance and consistency across a large number of nodes and perform reliability testing.

  • Contact: Philippe Cudre-Mauroux

Real-time data collection for IDEs (not available)

  • Level: B.Sc. / M.Sc.
  • Prerequisites: generic python knowledge, hands-on experience is a plus
  • Description:

    Integrated development environments (IDEs) have been around for a few decades already, yet none of the modern IDEs was able to successfully integrate their source code editors with the actual data stream flowing though the code. Ability to display the actual data running through the system promises many potential benefits, including easier debugging and code recall, which results in significantly lower code maintenance costs.

    The goal of this project is to design a proof-of-concept system in one programming language that allows full code instrumentation (like Python). This system should be able to seamlessly capture all values for all variables in source code and store them somewhere, with further possibility to easily retrieve saved values. The system should also provide an API to the storage in order to make the data accessible for navigation and display in third-party applications.

  • Contact: Roman Prokofyev

Recognizing User's Activity for the case of Public Transportation

  • Level: M.Sc.
  • Prerequisites: Java (Android) or Objective C/Swift (iOS), Geo-APIs is a plus, Machine Learning.
  • Description:

    Every day, around two million people use public transport in Switzerland. Despite the presence of various types of flat rate abonnements, there is still a substantial number of people buying single fair tickets on a regular basis. Most of these tickets are still bought at a vending machine, however the number is rapidly decreasing in favor of more convenient distribution channels - smartphones.

    The goal of this project is to design, build and evaluate prediction models for recognizing human activities, such as “Riding a bus”, “Walking”, in the context of a user traveling with a mobile phone. The results can further be used to implement an automatic ticket-buying system for public transport. The project will involve developing an application that will collect data from various mobile device sensors, such as accelerometer and gyroscope, as well as performing feature extraction to extract meaningful values out of raw signals. The extracted features are then going to be used to build a supervised classifier that should correctly predict activities for the new data samples.

  • Contact: Roman Prokofyev

Scalable Human-based Grammar Errors Detection and Correction

  • Level: M.Sc.
  • Prerequisites: Java/Javascript, Hadoop, (Python is a plus)
  • Description:

    Automatic english grammar correction is a complex problem that requires advances in multiple discplines like language modeling and machine learning. However, a native english speaker (given enough amount of time and concentration) can correct and possibly enhance any piece of text in exchange of a monetary reward, and when many such individuals collaborate simulatenously they can even exceed the performance (speed) and match the quality of an expert english proofreader. The Soylent paper[1] introduces the idea to proof read text with the help of the crowd, in this project we aim at minimizing the amount of requests sent to the crowd by trying to find the correct senstences.

    [1] Bernstein, Michael S., et al. “Soylent: a word processor with a crowd inside.” Proceedings of the 23nd annual ACM symposium on User interface software and technology. ACM, 2010.

  • Contact: Djellel Difallah

Smarter Cities Array Data Management

  • Level: M.Sc.
  • Prerequisites: C++
  • Description:

    This project deals with the design, implementation and test of a new Data Management system for future (smarter) cities; the system will handle Big Data problems for critical infrastructures such as water networks, energy grids etc. It will be based on the SciDB open-source array data system. This project will be carried out in cooperation with the new IBM Research Smarter Cities Center in Dublin.

  • Contact: Philippe Cudre-Mauroux

Social Marketing FootPrint

  • Level: M.Sc. / B.Sc.
  • Prerequisites: noSQL
  • Description:

    Aka Social Data Acquisition and Social Graph Processing

    Today, social networks are the first choice for marketing campaigns. They promise to serve well targeted, viral, highly customizable advertisement while getting direct customer feedback and engagement. The numbers generated by the Internet companies serving digital advertisements are astronomical: Google $43B (2012), Facebook $6B, etc…

    In this context of online marketing through social networks, the tasks of this project are split in two parts, the first being more pragmatic (hands-on) and the second more theoretical:

    1. Social Data Acquisition
      • Crawl a number of public APIs (Facebook, G+, Twitter, etc…)
      • Store the data in a database
      • Build a web interface to search through the data
    2. Social Graph Processing
      • Starting from one given node in the graph (i.e. a particular company), the social graph will be quantified and analyzed.
      • The different data sources (Facebook, Twitter, etc…) will be correlated to discover non obvious links (like sister entities, etc…) and interactions
      • For instance A always retweet B which always retweet C; A, B and C are part of the same cluster

    The goal of the project is to quantify and classify the marketing footprint of companies on social networks.

    The student choosing this project will have the opportunity to acquire in-depth hands-on experience on state-of-the-art APIs, storage (graph or NoSQL database), graph processing and data mining.

  • Contact: Philippe Cudre-Mauroux

Optimal Partitioning of Semantic DBs into Shards and Queries Routing

  • Level: M.Sc.
  • Prerequisites: good programming skills in C++ or Java or Go (both would be a plus). Understanding of DBs architecture (especially Graph DBs), RDF and probability theory. Algorithmic background, Hadoop/Spark would be a plus
  • Description:

    Load balancing in a cluster is a complex task that requires advanced knowledge in multiple disciplines to make optimal solutions and overcome bottlenecks on different levels. It requires understanding hardware architectures, principles of networking, distributed heterogeneous systems and databases.

    In this project we aim to define the optimal number of shards for a semantic DB that defines how the system scales and the required number of nodes in the cluster, yielding efficient usage of hardware and minimal response time. Besides static definition of the cluster topology, dynamic routing of queries in the produced topology has to be implemented.

  • Contact: Artem Lutov

Completed M.Sc. Projects

  • Statistical Type Inference, Soheil Roshankish, September 2017 [thesis] [presentation] [StaTIX project] [TInfES benchmarking]
  • Real-Time Centroid Decomposition of Streams of Time Series, Oliver Stapleton, February 2017 [pdf]
  • Implementation of Centroid Decomposition Algorithm on Big Data Platforms—Apache Spark vs. Apache Flink, Qian Liu, February 2016 [pdf]
  • Online Anomaly Detection over Big Data Streams, Laura Rettig, October 2015 [pdf]
  • Real-Time Anomaly Detection in Water Distribution Networks using Spark Streaming, Stefan Nüesch, November 2014 [pdf]
  • HDFS Blocks Placement Strategy, Phokham Nonava, October 2014 [pdf]
  • Crowdsourced Product Descriptions and Price Estimations, Steve Aschwanden, July 2014 [pdf]
  • Real Time Data Analysis for Water Distribution Network using Storm, Simpal Kumar, May 2014 [pdf]
  • Crowd-Flow Designer: An Open-Source Toolkit to Design and Run Complex Crowd-Sourced Tasks, Dani Rotzetter, February 2014 [pdf]
  • Geographical Impact of Microblogging Social Networks, Roger Kohler, February 2014 [pdf]
  • Building a full-text index on a NoSQL Store, Thi Thu Hang Nguyen, August 2013 [pdf]
  • Big Data analytics on high velocity streams, Thibaud Chardonnens, July 2013 [pdf]
  • Know your crowd: The drivers of success in reward- based crowdfunding, Jonas Wechsler, July 2013 [pdf]
  • A Comparison of Different Data Structures to Store RDF Data, Rashmi Bakshi, March 2013 [pdf]
  • Analysis of Mobile Data Services and Internet in Switzerland, India and Tanzania, Ahmed Shams, February 2013 [pdf]
  • Moji - The advent of large identifiers and how to conquer them as humans, Michael Luggen, September 2012 [pdf]
  • Unconventional Store Systems for RDF Data - Comparison between Registry Systems used as Semantic Web RDF Data Stores, Iliya Enchev, September 2012 [pdf]
  • GetSound - A noise level storage and visualization system using models to generate probabilistic data, Mariusz Wisniewski, September 2012 [pdf]
  • DNS^3: a DNS Semantic-Aware Service for Security and Authoritative Information, Ahmed S. Mostafa, May 2012 [pdf]
  • Using the Deep Web to Extend the Linked Data Web, Michal Musial, February 2012 [pdf]