ERC Project


Project on extracting and integrating data from unstructured content using probabilistic and knowledge graphs

Integrating heterogeneous content has become a key hurdle in the deployment of Big Data applications, due to the meteoric rise of user-generated data storing information in a variety of formats. Traditional integration techniques cleaning up, fusing and then mapping heterogeneous data onto rigid abstractions fall short of accurately capturing the complexity and wild heterogeneity of today’s information.

GraphInt proposes an ambitious overhaul of information integration techniques embracing the scale and heterogeneity of today’s data. We propose the use of expressive and heterogeneous graphs of entities to continuously and dynamically interrelate disparate pieces of content while capturing their idiosyncrasies. Our project focuses on three core issues related to extremely large and heterogeneous information graphs:

  1. the effective extraction of fined-grained information from unstructured sources and their proper integration into large-scale heterogeneous and probabilistic graphs,
  2. the design and implementation of declarative back-end system to durably and efficiently manage the profusion of data considered by such graphs using clusters of commodity machines, and
  3. the design of advanced query capabilities (including hypothesis or discursive queries) to effectively take advantage of the extracted content.

This project is supported by a generous grant from the ERC and will run from 2016 to 2020.