RecovDB shows how to extend a relational database to support recovery of missing blocks in large time series data. Our approach represents the input time series as a relation and maps them into loading and relevance vectors that best account for the correlation. The loading vectors (L) expose the rank of the matrix which is used to accurately recover the missing blocks. The time series mapping is efficiently computed using our memory-efficient Centroid Decomposition (CD) technique. The recovery algorithm has been tightly integrated into the open-source analytical RDBMS MonetDB as native User Defined Functions (UDFs). RecovDB offers the following salient features:
- Parameter free and correlation-aware recovery
- Recovery of large missing blocks in multiple time series
- Full-fledged DBMS support
The source code is available on GitHub. It contains compilation and installation instructions as well as sample recovery querires through SQL Python-UDFs.
RecovDB is also available as a GUI through the ReVival tool. The latter allows users to perform batch and online recovery of missing blocks on real-world time series data. Users can select one or multiple time series from a particular dataset, drop a percentage of data from the selected time series, and then recover these missing values.
Input: Three water discharge time series each of them has a missing block.
Query: Recover all the incomplete time series in one pass.
Result: The following figure illustrates the result of the recovery. The missing values are shown in dashed lines while the recovered blocks are shown in red dashed lines.
- RecovDB: Accurate and Efficient Missing Blocks Recovery for Large Time Series in IEEE International Conference on Data Engineering (ICDE 2019).
- Memory-Efficient Centroid Decomposition for Long Time Series in IEEE International Conference on Data Engineering (ICDE 2014)
- Efficient and Accurate Time Series Imputation using Correlation, at CWI 2018 (Amsterdam, the Netherlands).