Abstract
In principle, connected sensors allow effortless long-term self-monitoring of health and wellness that can help maintain health and quality of life. However, data collected in the “wild” may be noisy and contain outliers, e.g., due to uncontrolled sources or data from different persons using the same device. The removal of the “outliers” is therefore critical for accurate interpretation of the data. In this paper we study the detection and elimination of outliers in self-weighing time series data obtained from connected weight scales. We examined three techniques: (1) a method based on autoregressive integrated moving average (ARIMA) time series modelling, (2) median absolute deviation (MAD) scale estimate, and (3) a method based on Rosner statistics. We applied these methods to both a data set with real outliers and a clean data set corrupted with simulated outliers. The results suggest that the simple MAD algorithm and ARIMA performed well with both test sets while the Rosner statistics was significantly less effective. In addition, the ARIMA approach appeared to be significantly less sensitive to long periods of missing data than MAD and Rosner statistics.