Syamsiyah, A., & Leemans, S. J. J. (2020). Process discovery using in-database minimum self distance abstractions. In 35th Annual ACM Symposium on Applied Computing, SAC 2020 (pp. 26-35). Association for Computing Machinery, Inc. https://doi.org/10.1145/3341105.3373846
Process executions generate event data that are typically stored in legacy information systems, such as databases. However, process discovery, which requires such event data, is performed in main memory. To bridge this gap, existing techniques must transform and extract event data, which can be expensive steps. This issue has been addressed by processing the event data directly in their origin. However, existing methods rely only on the simplest event data abstraction: the Directly Follows (DF) abstraction. This paper improves upon these existing works by considering another abstraction, the Minimum Self Distance (MSD) abstraction, which enables discovery of a larger class of models than the DF alone. That is, we propose IMw, a process discovery technique without logs and uses both the MSD and DF abstractions. Furthermore, this work proposes an approach to compute the MSD abstraction in-database, thus avoiding the need for transforming and moving event data. We evaluate IMw with real-life logs, and the experimental results show that IMw with in-database abstraction is faster than the traditional approach, aware of dynamic updates on event data, and able to discover models with pareto-optimal results, compared to existing techniques.