Tax, N., van Zelst, S.J. & Teinemaa, I. (2018). An experimental evaluation of the generalizing capabilities of process discovery techniques and black-box sequence models. In Palash Bera, Jens Gulden, Iris Reinhartz-Berger, Wided Guédria, Sérgio Guerreiro & Rainer Schmidt (Eds.), Enterprise, Business-Process and Information Systems Modeling (pp. 165-180). (Lecture Notes in Business Information Processing). Dordrecht: Springer Netherlands.
A plethora of automated process discovery techniques have been developed which aim to discover a process model based on event data originating from the execution of business processes. The aim of the discovered process models is to describe the control-flow of the underlying business process. At the same time, a variety of sequence modeling techniques have been developed in the machine learning domain, which aim at finding an accurate, not necessarily interpretable, model describing sequence data. Both approaches ultimately aim to find a model that generalizes the behavior observed, i.e., they describe behavior that is likely to be part of the underlying distribution, whilst disallowing unlikely behavior. While the generalizing capabilities of process discovery algorithms have been studied before, a comparison, in terms of generalization, w.r.t. sequence models is not yet explored. In this paper we present an experimental evaluation of the generalizing capabilities of automated process discovery techniques and black-box sequence models, on the basis of next activity prediction. We compare a range of process discovery and sequence modeling techniques on a range of real-life datasets from the business process management domain. Our results indicate that LSTM neural networks more accurately describe previously unseen traces (i.e., test traces) than existing process discovery methods.