Spenrath, Y., Hassani, M., van Dongen, B. F., & Tariq, H. Why did my Consumer Shop? Learning an Efficient Distance Metric for Retailer Transaction Data. In Y. Dong, D. Mladenic, & C. Saunders (Eds.), Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track – European Conference, ECML PKDD 2020, Proceedings (pp. 323-338). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12461 LNAI). Springer. https://doi.org/10.1007/978-3-030-67670-4_20
Abstract
Transaction analysis is an important part in studies aiming to understand consumer behaviour. The first step is defining a proper measure of similarity, or more specifically a distance metric, between transactions. Existing distance metrics on transactional data are built on retailer specificc information, such as extensive product hierarchies or a large product catalog. In this paper we propose a new distance metric that is retailer independent by design, allowing cross-retailer and cross-country analysis. The metric comes with a novel method of finding the importance of categories of products, alternating between unsupervised learning techniques and importance calibration. We test our methodology on a real-world dataset and show that we can identify clusters of consumer behaviour.