A Comparison of Data Structures and Hash Functions to Manage URIs on the Web of Data
An Empirical Evaluation of Data Structures to Manage URIs on the Web of Data
Uniform Resource Identifiers (URIs) are one of the corner stones of the Web; They are also exceedingly important on the Web of data, since RDF graphs and Linked Data both heavily rely on URIs to uniquely identify and connect online entities. Due to their hierarchical structure and their string serialization, sets of related URIs typically contain a high degree of redundant information and are typically dictionary-encoded and managed through dedicated structures at the back-end (e.g., in the triplestore). Inserting and querying such data often account for a significant portion of query processing time. This paper presents, to the best of our knowledge, the first systematic comparison of the most common data structures used to handle URI data. We empirically evaluate a series of data structures (such as sparse hashes, hash tables, maps, B+trees, and lexicographic trees) in terms of their read/write performance and storage consumption.
Experiments results
Code for the experiments can be pulled from GitHub.
Download datasets from Google Drive.
Basic data structures
DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 |
---|---|---|---|---|---|---|
Marginal Load Time (orderded datasets) | ||||||
Marginal Load Time (unordered datasets) | ||||||
Total Load Time | ||||||
Incremental Memory consumption (ordered datasets) | ||||||
Incremental Memory consumption (unordered datasets) | ||||||
Total Memory Consumption | ||||||
Marginal Query Time (ordered datasets) | ||||||
Marginal Query Time (unordered datasets) | ||||||
DS1 | DS2 | DS3 | DS4 | DS5 | DS6 | DS7 |
RDF3X
DS1 | DS2 | DS3 | DS4 | DS5 | DS6 |
---|---|---|---|---|---|
Load Time | |||||
Memory consumption | |||||
Key query time | |||||
Id query time | |||||
DS1 | DS2 | DS3 | DS4 | DS5 | DS6 |
HDT
DS1 | DS2 | DS3 | DS4 | DS5 | DS6 |
---|---|---|---|---|---|
Load Time | |||||
Memory consumption | |||||
Key query time | |||||
Id query time | |||||
DS1 | DS2 | DS3 | DS4 | DS5 | DS6 |