Title | Data Imputation for Multivariate Time Series Sensor Data With Large Gaps of Missing Data |
Publication Type | Journal Article |
Year of Publication | 2022 |
Authors | Wu, R, Hamshaw, SD, Yang, L, Kincaid, DW, Etheridge, R, Ghasemkhani, A |
Journal | IEEE Sensors Journal |
Volume | 22 |
Start Page | 10671 |
Issue | 11 |
Pagination | 10671 - 10683 |
Date Published | 2022/06 |
ISSN | 1530-437X |
Abstract | Imputation of missing sensor-collected data is often an important step prior to machine learning and statistical data analysis. One particular data imputation challenge is filling large data gaps when the only related data comes from the same sensor station. In this paper, we propose a framework to improve the popular multivariate imputation by chained equations (MICE) method for dealing with missing data. One key strategy we use to improve model accuracy is to reshape the original sensor data to leverage the correlation between the missing data and the observed data. We demonstrate our framework using data from continuous water quality monitoring stations in Vermont. Because of possible irregularly spaced peaks throughout the time series, the reshaped data is split into extreme and normal values and two MICE models are built. We also recommend that sensor-collected data should be transformed to meet the machine learning model assumptions. According to our experimental results, these strategies can improve MICE data imputation model accuracy at least 23% for large data gaps based on R2 values and are promising to be applied for other data imputation algorithms. |
URL | https://ieeexplore.ieee.org/document/9755143 |
DOI | 10.1109/JSEN.2022.3166643 |
Short Title | IEEE Sensors J. |
Refereed Designation | Refereed |
Data Imputation for Multivariate Time Series Sensor Data With Large Gaps of Missing Data
Status:
Published
Attributable Grant:
BREE
Grant Year:
Year7
Acknowledged VT EPSCoR:
Ack-Yes
2nd Attributable Grant:
NEWRnet
2nd Grant Year:
2nd_Post_Grant
2nd Acknowledged Grant:
2nd_Ack-Yes