Two new papers accepted: HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms (ICDM 2017 [acceptance 9%]) and Efficient Document Filtering Using Vector Space Topic Expansion (CIKM 2017 [acceptance 21%]). Also, Julien and Akansha join our lab; welcome!
Two new journal papers accepted: Storing, Tracking, and Querying Provenance Linked Data (Transactions on Knowledge and Data Engineering) and Managing Big Interval Data with CINTIA the Checkpoint INTerval Array (Transactions on Big Data). Also, Rana, Inès and Giuse join our lab; welcome!
Two new papers accepted at IEEE BigData 2015: Online Anomaly Detection over Big Data Streams [pdf] and CINTIA: a Distributed, Low-Latency Index for Big Interval Data [pdf] [acceptance: 17%].
Pooling-Based Continuous Evaluation of Information Retrieval Systems accepted for publication in Information Retrieval; Phil’s keynote @ ICDAR 2015 on Entity-Centric Data Management is now available.
A Comparison of Data Structures to Manage URIs on the Web of Data accepted at ESWC 2015 (acceptance 23%). And our friend from MIT Mike Stonebraker wins the Turing Award! Huge.
Best way to start 2015: two research papers accepted at the 24th International World Wide Web Conference [acceptance: 14%] ! Executing Provenance-Enabled Queries over Web Data and The Dynamics of Micro-Task Crowdsourcing – The Case of Amazon MTurk . Full PDFs coming soon…
The eXascale Infolab (U. of Fribourg–Switzerland) is hiring! We are looking for a highly qualified postdoctoral researcher in Computer Science interested in designing and developing novel information infrastructures to manage big data. See full job description here.
New Smarter Cities paper: TRISTAN: Real-Time Analytics on Massive Time Series Using Sparse Dictionary Compression accepted at IEEE BigData 2014! [acceptance rate: 18%]. Joint work w/ IBM Research. Details here: https://exascale.info/node/286
Our paper on fixing grammatical errors using large N-grams corpora and preposition ranking has been accepted at CIKM (IR track)! Also, TransactiveDB has been accepted at PVLDB. PDFs coming soon…
Scaling-up the Crowd: Micro-Task Pricing Schemes for Worker Retention and Latency Improvement accepted at HCOMP 2014. See you in Pittsburgh this Fall!
Best way to start this new year: two XI papers accepted at WWW 2014! Effective Named Entity Recognition for Idiosyncratic Web Collections, and TripleProv: Efficient Processing of Lineage Queries over a Native RDF Store [acceptance rate 12.9%].
Our new articles on Large-Scale Linked Data Integration Using Probabilistic Reasoning and Crowdsourcing and on Scalable Anomaly Detection for Smart City Infrastructure Networks [joint work w/ the IBM Research Smarter Cities Centre] have been accepted for publication by the VLDB Journal and by IEEE Internet Computing.
Also, Gianluca’s tutorial slides on Crowdsourcing for the Semantic Web are available.
New paper accepted at WWW2013 (acceptance:15%): Pick-A-Crowd: Tell Me What You Like, and I’ll Tell You What to Do.
New ScienceWise paper on Ontology-Based Word Sense Disambiguation for Scientific Literature accepted at ECIR 2013.
We are ecstatic to have won one of the two global research grants from Verisign Inc. Press release here.
Living in Switzerland? Then don’t miss Roman’s Android app offering timetables for Swiss public transportations. Available for free on Google Play.
Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval accepted at SIGIR. An overview of HYRISE in IEEE Data Eng. Bull. Downscaling Entity Registries with VUA and Verisign at DOWNSCALE. Graph Data Management Techniques for the large-scale deployment of Semantic Web technologies invited paper at GDM.
Want to benchmark relational or cloud databases? Here is our one-stop, open-source solution:OLTPBench. We hope you’ll like it as much as we do! This is joint work w/ Carlo Curino [Yahoo! Research] and Andy Pavlo [Brown].
Our latest foray into Online Entity territory was accepted at the World Wide Web conference! [acceptance rate: 12%]
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking
We tackle the problem of entity linking for large collections of online pages; Our system, ZenCrowd, identifies entities from natural language text using state of the art techniques and automatically connects them to the Linked Open Data cloud. We show how one can take advantage of human intelligence to improve the quality of the links by dynamically generating micro-tasks on an online crowdsourcing platform. We develop a probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers. We evaluate ZenCrowd in a real deployment and show how a combination of both probabilistic reasoning and crowdsourcing techniques can significantly improve the quality of the links, while limiting the amount of work performed by the crowd.
Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux
21st International World Wide Web Conference (WWW2012), Lyon (France), April 16-20, 2012.