Musaj, Z., & Hassani, M. (2025). Unsupervised Anomaly Detection of Prefixes in Event Streams Using Online Autoencoders. In M. Comuzzi, D. Grigori, M. Sellami, & Z. Zhou (Eds.), Cooperative Information System: 30th International Conference, CoopIS 2024, Porto, Portugal, November 19–21, 2024, Proceedings (pp. 93-110). Springer. https://doi.org/10.1007/978-3-031-81375-7_6
Abstract
In this work we address the problem of unsupervised online detection of anomalies in traces of logs. Our input is an event log containing multiple traces where each trace is an ordered and finite sequence of activities. This problem presents a significant challenge due to the need to identify abnormal sequence patterns without the benefit of labeled data or the advantage of being able to forget individual event data since an instance is represented by a specific sequence of events. This requires methods that can adapt to evolving data streams and provide timely and accurate anomaly detection while efficiently managing limited memory resources. This paper presents an efficient unsupervised-learning method for online anomaly detection. We leverage a limited data structure to store prefixes. Event stream prefixes are transformed into vector representations using word2vec or one-hot encoding, which are fed into an online autoencoder. The discrepancy between input and output generates a reconstruction error, serving as an anomaly score. We also introduce Progressive Anomaly Labelling (PAL), a dynamic method for real-time anomaly detection which helps in labelling suffixes as anomalous once their prefix is labelled as such. Our approach excels in detecting control-flow and data-flow anomalies, early anomaly identification, and reduced execution time, outperforming state-of-the-art online anomaly detection techniques. The implementation and the datasets are publicly available at https://github.com/zyrako4/sequence-online-ad.