A generic machine learning framework for fully-unsupervised anomaly detection with contaminated data
Lade...
Lizenz
CC BY 3.0: Namensnennung 3.0 Unported
Autor:innen
Herausgeber:innen
Betreuer:innen
Erfinder:innen
Patentanmelder
Anmeldedatum
Publikationsdatum
26. Januar 2024
Departement
School of Engineering
Organisationseinheit
Institut für Datenanalyse und Prozessdesign (IDP)
Publikationstyp
Beitrag in wissenschaftlicher Zeitschrift
Begutachtung
Peer review (Publikation)
Konferenz
Ãœbergeordnetes Werk
International Journal of Prognostics and Health Management
Tagungsband
Zitierform
Band – Heft – Seitenzahlen - Artikelnummer
15(1)
Reihe
Verlag
Prognostics and Health Management Society
ISBN
Patentnummer
Veröffentlicht als
Zusammenfassung
Anomaly detection (AD) tasks have been solved using machine learning algorithms in various domains and applications. The great majority of these algorithms use normal data to train a residual-based model, and assign anomaly scores to unseen samples based on their dissimilarity with the learned normal regime. The underlying assumption of these approaches is that anomaly-free data is available for training. This is, however, often not the case in real-world operational settings, where the training data may be contaminated with a certain fraction of abnormal samples. Training with contaminated data, in turn, inevitably leads to a deteriorated AD performance of the residual-based algorithms.
In this paper we introduce a framework for a fully unsupervised refinement of contaminated training data for AD tasks. The framework is generic and can be applied to any residual-based machine learning model. We demonstrate the application of the framework to two public datasets of multivariate time series machine data from different application fields. We show its clear superiority over the naive approach of training with contaminated data without refinement. Moreover, we compare it to the ideal, unrealistic reference in which anomaly-free data would be available for training. Since the approach exploits information from the anomalies, and not only from the normal regime, it is comparable and often outperforms the ideal baseline as well.
Beschreibung
Schlagwörter
Deep learning, Machine learning, Anomaly detection, Fully unsupervised learning, Contaminated data, Time series, Data refinement, Fault detection, Acoustic sensor data, Aircraft engine