Query languages are essential for exploring, working with data and directly answering questions from data. SQL is the prime example for answering questions on relational data. Behavioral data is recorded in the form of events with timestamps. Various techniques such as Process Mining use the data in the form of event logs to aggregate and derive insights into the data as whole. But there exist no generic techniques to “drill down” into the data to identify specific subsets or features. So far, each analysis tool has to re-implement filtering, aggregation, and selection functionality on a home-brewn data model.
In prior research, we have developed a standardized event data model to store a large variety of datasets in graph databases such as Neo4j. While Neo4j’s query language Cypher enables querying of such event data in a graph database, Neo4j’s query constructs are not “natural” to querying behavioral properties. Many queries are often long and complex: small variations in the query may lead to extreme variations in running times.
The objective of this project is to survey existing literature for typical patterns in behavioral querying on event data and on behavioral models, to identify how these patterns can be realized efficiently in Cypher, to provide query templates that make query writing easier for the user, and to evaluate query performance in various load and complexity benchmarks.
The project is co-supervised by George Fletcher of the Database group.
Contact: Dirk Fahland <d.fahland@tue.nl>