PrefixCDD: Effective Online Concept Drift Detection over Event Streams using Prefix Trees.

Huete, J., Qahtan, A. A., & Hassani, M. (2023). PrefixCDD: Effective Online Concept Drift Detection over Event Streams using Prefix Trees. In H. Shahriar, Y. Teranishi, A. Cuzzocrea, M. Sharmin, D. Towey, AKM. J. A. Majumder, H. Kashiwazaki, J.-J. Yang, M. Takemoto, N. Sakib, R. Banno, & S. I. Ahamed (Eds.), COMPSAC (pp. 328-333) https://doi.org/10.1109/COMPSAC57700.2023.00051

Abstract

Process mining focuses on applying data mining techniques over business process data. Recently, with the improvements in sensoring, collection, and storage of event data, a big demand for both shorter mining time and adaptive models of streaming process events arose. This increased the interest in streaming process mining. Some techniques within this field attempt to identify drifts (change points) from evolving process data streams. Existing work on supervised and unsupervised-learning approaches over data streams have several limitations with regards to the nature of the drifts, the excessive storage required to store and process the stream, and the performance over real-world datasets. This paper contributes PrefixCDD, an efficient unsupervised-learning novel approach for online concept drift detection (CDD) over event streams. Our proposed approach utilizes a data structure, where the data stream components are stored in a set of prefix-trees. It transforms then the discrete data into continuous one using a Principal Component Analysis (PCA) approach over the trees. Then, ADWIN is used to focus on up-to-date information, making it appealing to work with the decaying mechanism logic behind our algorithm. Using six artificial and three real-life datasets, PrefixCDD outperforms state-of-the-art techniques in terms of detecting existing drifts of different natures, discovering them shortly after they appear, and the overall execution time.

Leave a Reply