Using graph data structures for event logs

Esser, S., & Fahland, D. (2019). Using graph data structures for event logs. https://doi.org/10.5281/zenodo.3333831

Abstract

Process mining as described in by Wil van der Aalst in is a combination of data mining and business process management to a new discipline. The general purpose of process mining is to derive process insights from event data captured by information systems in the form of an event log. Three main applications of process mining are process discovery, process enhancement and process conformance checking. However, all techniques of these applications rely on event logs at some point. Event logs form the base of most techniques and are oftentimes created from multiple sources that are involved in the process that is subject to the research. Today, event logs usually consist of a table where every row represents a single event with activity name, case identifier, timestamp and optionally event attributes and case attributes. Event logs usually come in .csv or .xes format and reach from basic (activity, case
id, timestamp) to extensive forms with different attributes for various activities and cases enriched with workflow life-cycle data. These event logs in table format can be queried with SQL, but interesting queries in the process context like “cases where activity A is followed by activity B” cannot be formulated.
This form of storing data is not considered optimal since it does not implement the second normal form used in database normalization, nor does it generally allow for direct application of basic process mining techniques like the creation of a directly follows graph of activities while retaining the specific event data. This report explores graph data structures as alternative format to store event logs as base for process mining.

Leave a Reply