RecovDB shows how to extend a relational database to support recovery of missing blocks in large time series data. Our approach represents the input time series as a relation and maps them into loading and relevance vectors that best account for the correlation. The loading vectors (L) expose the rank of the matrix which is used to accurately recover the missing blocks. The time series mapping is efficiently computed using our memory-efficient Centroid Decomposition (CD) technique. The recovery algorithm has been tightly integrated into the open-source analytical RDBMS MonetDB as native User Defined Functions (UDFs). RecovDB offers the following salient features:
- Parameter free and correlation-aware recovery
- Recovery of large missing blocks in multiple time series
- Full-fledged DBMS support
The source code is available on GitHub. It contains compilation and installation instructions as well as sample recovery queries through SQL Python-UDFs.
RecovDB is also available as a GUI through the ReVival tool. The GUI allows users to perform (batch and online) recovery of missing blocks on real-world time series data. Users can select one or multiple time series from a set of datasets, delete a percentage of data from the selected time series, and then recover the missing blocks. The tool also illustrates how the correlation across time series can be used to recover missing values.
Input: Three water discharge time series each of them has a missing block.
Query: Recover all the incomplete time series in one pass.
Result: The following figure illustrates the result of the recovery. The missing values are shown in dashed lines while the recovered blocks are shown in red dashed lines.
- Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series In Proceedings of the VLDB Endowment (VLDB 2020).
- RecovDB: Accurate and Efficient Missing Blocks Recovery for Large Time Series in IEEE International Conference on Data Engineering (ICDE 2019).
- Memory-Efficient Centroid Decomposition for Long Time Series in IEEE International Conference on Data Engineering (ICDE 2014)
- Efficient and Accurate Time Series Imputation using Correlation, at CWI 2018 (Amsterdam, the Netherlands).