Quantifying the Re-identification Risk in Published Process Models

Maatouk, K., & Mannhardt, F. (2022). Quantifying the Re-identification Risk in Published Process Models. In J. Munoz-Gama, & X. Lu (Eds.), Process Mining Workshops – ICPM 2021 International Workshops, Eindhoven, The Netherlands, October 31 – November 4, 2021, Revised Selected Papers (Vol. 433, pp. 382-394). (Lecture Notes in Business Information Processing (LNBIP)). Springer. https://doi.org/10.1007/978-3-030-98581-3_28


Event logs are the basis of process mining operations such as process discovery, conformance checking, and process optimization. Sensitive information may be obtained by adversaries when re-identifying individuals that relate to the traces of an event log. This re-identification risk is dependent on the assumed background information of an attacker. Multiple techniques have been proposed to quantify the re-identification risks for published event logs. However, in many scenarios there is no need to release the full event log, a discovered process model annotated with frequencies suffices. This raises the question on how to quantify the re-identification risk in published process models. We propose a method based on generating sample traces to quantify this risk for process trees annotated with frequencies. The method was applied on several real-life event logs and process trees discovered by Inductive Miner. Our results show that there can be still a significant re-identification risk when publishing a process tree; however, this risk is often lower than that for releasing the original event log.

Leave a Reply