RecovDB
Overview
RecovDB shows how to extend a relational database to support recovery of missing blocks in large time series data. Our approach represents the input time series as a relation and maps them into loading and relevance vectors that best account for the correlation. The loading vectors (L) expose the rank of the matrix, which is used to accurately recover the missing blocks. The time series mapping is efficiently computed using our memory-efficient Centroid Decomposition (CD) technique [1, 3]. The recovery algorithm has been tightly integrated into the open-source analytical RDBMS MonetDB as native UDF (in C). Our empirical evaluation on real-world time series [2] shows that RecovDB is respectively, 5x faster and 8x more accurate than the state of the art recovery system ImputeDB (J. Cambronero et al., Query optimization for dynamic imputation, PVLDB’17).
RecovDB offers the following salient features:
- Efficient recovery of large missing blocks in multiple time series
- Parameter free and correlation-aware recovery
- Full-fledged DBMS support
Graphical UI
RecovDB is also available as a GUI through the ReVival tool. The GUI allows users to perform (batch and online) recovery of missing blocks on real-world time series data. Users can select one or multiple time series from a set of datasets, delete a percentage of data from the selected time series, and then recover the missing blocks. The tool also illustrates how the correlation across time series can be used to recover missing values.
Example:
Input: Three water discharge time series each of them has a missing block.
Query: Recover all the incomplete time series in one pass.
Result: The following figure illustrates the result of the recovery. The missing values are shown in dashed lines while the recovered blocks are shown in red dashed lines.
Code:
The source code is available on GitHub. It contains compilation and installation instructions as well as sample recovery queries.
Research Papers
- Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series In Proceedings of the VLDB Endowment (VLDB 2020).
- RecovDB: Accurate and Efficient Missing Blocks Recovery for Large Time Series in IEEE International Conference on Data Engineering (ICDE 2019).
- Memory-Efficient Centroid Decomposition for Long Time Series in IEEE International Conference on Data Engineering (ICDE 2014)
Invited Talks
- Efficient and Accurate Time Series Imputation using Correlation, at CWI 2018 (Amsterdam, the Netherlands).
Contributors
- Ines Arous
- Mourad Khayati ([email protected])
- Philippe Cudré-Mauroux