“Doubt everything or believe everything: these are two equally convenient strategies. With either we dispense with the need for reflection.” - Henri Poincare
Any database system represents a certain view on the universe. Tangible data objects in such systems are often called facts, but how much truth is behind those facts? This project aims at setting up a concrete framework for data quality testing oriented towards quantifying how much one can trust the data stored within a database systems. We foresee at least two concrete use cases from different domains, specifically CRM-oriented business processes and mobile network signaling, for which data quality testing process has to be desined and implemented using the Big Data technology stack (Hadoop 2.x, Apache Spark, Apache Solr, etc). This will not only require relying on data-agnostic approaches for anomaly detection, s.a. event counting, but also on incorporating business logic associated with a given business process plus the ability of performing validation tests, using various ground truth data sources by applying inter alia Bayesian statistics techniques. This project will be carried out in cooperation with Swisscom’s Data Science team.