An Experiment on Transfer Learning for Suffix Prediction on Event Logs

van Luijken, M., Ketykó, I., & Mannhardt, F. (2024). An Experiment on Transfer Learning for Suffix Prediction on Event Logs. In J. De Weerdt, & L. Pufahl (Eds.), Business Process Management Workshops – BPM 2023 International Workshops, Utrecht, The Netherlands, September 11–15, 2023, Revised Selected Papers (pp. 31-43). (Lecture Notes in Business Information Processing; Vol. 492 LNBIP). Springer. https://doi.org/10.1007/978-3-031-50974-2_3

Abstract

Predicting future activity occurrences for a process instance is a key challenge in predictive process monitoring. Sequential deep learning models have been improving the prediction accuracy for this suffix prediction task. Training such models with many parameters on large event logs requires expensive hardware and is often time consuming. Transfer learning addresses this issue by starting from a pre-trained model to be used as starting point for the training on other data sets thereby reducing training time or improving accuracy in a given time budget. Transfer learning has shown to be very effective for natural language processing and image classification. However, research on transfer learning for predictive process monitoring is scarce and missing for suffix prediction. This paper contributes an experimental study on the effectiveness of transfer learning for suffix prediction using two sequential deep learning architectures (GPT and LSTM). Base models are trained on two public event logs and used as starting point for transfer learning on eight event logs from different domains. The experiments show that even with half of the available training budget and without using very large event logs for the base model, the results obtained in the transfer learning setting are often better and in some cases competitive to when trained using random initialization. A notable exception is an event log with a very large vocabulary of activity labels. This seems to indicate dependence of transfer learning on specific data properties such as vocabulary size and warranting further research.

Abstract

Leave a Reply Cancel reply