GraphInt
Integrating heterogeneous content has become a key hurdle in the deployment of Big Data applications, due to the meteoric rise of user-generated data storing information in a variety of formats. Traditional integration techniques cleaning up, fusing and then mapping heterogeneous data onto rigid abstractions fall short of accurately capturing the complexity and wild heterogeneity of today’s information.
GraphInt proposes an ambitious overhaul of information integration techniques embracing the scale and heterogeneity of today’s data. We propose the use of expressive and heterogeneous graphs of entities to continuously and dynamically interrelate disparate pieces of content while capturing their idiosyncrasies. Our project focuses on three core issues related to extremely large and heterogeneous information graphs:
|
![]() |
This project is supported by a generous grant from the ERC.
Publications
2019
-
Natalia Ostapuk, Jie Yang, and Philippe Cudre-Mauroux. ActiveLink: Deep Active Learning for Link Prediction in Knowledge Graphs. In Proceedings of the Web Conference (WWW 2019), 2019.
-
Dingqi Yang, Bingqing Qu, Jie Yang, and Philippe Cudre-Mauroux. Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach. In Proceedings of the Web Conference (WWW 2019), 2019.
-
Jie Yang, Alisa Smirnova, Dingqi Yang, Gianluca Demartini, Yuan Lu, and Philippe Cudre-Mauroux. Scalpel-CD: Leveraging Crowdsourcing and Deep Probabilistic Modeling for Debugging Noisy Training Data. In Proceedings of the Web Conference (WWW 2019), 2019.
-
Dingqi Yang, Bingqing Qu, and Philippe Cudré-Mauroux. Privacy-Preserving Social Media Data Publishing for Personalized Ranking-Based Recommendation. IEEE Transactions on Knowledge and Data Engineering (TKDE) 31, no. 3 (2019): 507–20.
-
Alisa Smirnova, and Philippe Cudré-Mauroux. Relation Extraction Using Distant Supervision: A Survey. ACM Comput. Surv. 51, no. 5 (2019): 106:1–106:35.
-
Alberto Lerner, Rana Hussein, and Philippe Cudré-Mauroux. The Case For Network Accelerated Query Processing. In CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, 2019.
-
Artem Lutov, Mourad Khayati, and Philippe Cudré-Mauroux. Accuracy Evaluation of Overlapping and Multi-Resolution Clustering Algorithms on Large Datasets. IEEE International Conference on Big Data and Smart Computing (BigComp), 2019.
2018
-
Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. Are Meta-Paths Necessary? Revisiting Heterogeneous Graph Embeddings. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’18), 437–46, 2018.
-
Dingqi Yang, Terence Heaney, Alberto Tonon, Leye Wang, and Philippe Cudré-Mauroux. CrimeTelescope: Crime Hotspot Prediction Based on Urban and Social Media Data Fusion. World Wide Web 21, no. 5 (2018): 1323–47.
-
Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. D2HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.
-
Artem Lutov, Soheil Roshankish, Mourad Khayati, and Philippe Cudré-Mauroux. StaTIX — Statistical Type Inference on Linked Data. In IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018, 2253–62, 2018.
-
Artem Lutov, Mourad Khayati, and Philippe Cudré-Mauroux. Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures. In 2018 IEEE International Conference on Data Mining Workshops, ICDM Workshops, Singapore, Singapore, November 17-20, 2018, 1481–86, 2018.
-
Leye Wang, Gehua Qin, Dingqi Yang, Xiao Han and Xiaojuan Ma. Geographic Differential Privacy for Mobile Crowd Coverage Maximization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018.
-
Longbiao Chen, Dingqi Yang, Daqing Zhang, Cheng Wang, Jonathan Li and Thi Mai Trang Nguyen. Deep mobile traffic forecast and complementary base station clustering for C-RAN optimization. J. Network and Computer Applications (121), pp 59-69, 2018.
-
Leye Wang, Daqing Zhang, Dingqi Yang, Animesh Pathak, Chao Chen, Xiao Han, Haoyi Xiong and Yasha Wang. SPACE-TA: Cost-Effective Task Allocation Exploiting Intra- and Inter-Data Correlations in Sparse Crowdsensing. ACM Transactions on Intelligent Systems and Technology (TIST), 9(2), pp 20:1-20:28, 2018.
-
Jie Yang, Carlo van der Valk, Tobias Hossfeld, Judith Redi and Alessandro Bozzon. How Do Crowdworker Communities and Microtask Markets Influence Each Other? A Data-Driven Study on Amazon Mechanical Turk. In Proceedings of the Sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP), pp. 193-202, 2018.
-
Guanliang Chen, Jie Yang, Claudia Hauff and Geert-Jan Houben. LearningQ: A Large-Scale Dataset for Educational Question Generation. In Proceedings of the Twelfth International Conference on Web and Social Media (ICWSM), pp. 481-490, 2018.
2017
-
Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift. In Proceedings of the IEEE International Conference on Data Mining (ICDM’17). New Orleans, USA, 2017.
-
Roman Prokofyev, Djellel Difallah, Michael Luggen, and Philippe Cudré-Mauroux. SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels. In Proceedings of the 13th International Conference on Semantic Systems (SEMANTICS2017). Amsterdam, The Netherlands, 2017.
-
Julia Proskurnia, Ruslan Mavlyutov, Carlos Castillo, Karl Aberer, and Philippe Cudré-Mauroux. Efficient Document Filtering Using Vector Space Topic Expansion And Pattern-Mining: The Case of Event Detection in Microposts. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017, 457–66, 2017.
-
Leye Wang, Dingqi Yang, Xiao Han, Tianben Wang, Daqing Zhang and Xiaojuan Ma. Location Privacy-Preserving Task Allocation for Mobile Crowdsensing with Differential Geo-Obfuscation. In Proceedings of the 26th International Conference on World Wide Web (WWW), pp. 627-636, 2017.