A novel trace-based sampling method for conformance checking

It is crucial for organizations to ensure that their business processes are executed accurately and comply with internal policies and requirements. Process mining is a discipline of data science that exploits business process execution data to analyze and improve business processes. It provides a da...

Full description

Saved in:
Bibliographic Details
Main Authors: Heidy M. Marin-Castro, Miguel Morales-Sandoval, José Luis González-Compean, Julio Hernandez
Format: Article
Language:English
Published: PeerJ Inc. 2024-12-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2601.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846114248312225792
author Heidy M. Marin-Castro
Miguel Morales-Sandoval
José Luis González-Compean
Julio Hernandez
author_facet Heidy M. Marin-Castro
Miguel Morales-Sandoval
José Luis González-Compean
Julio Hernandez
author_sort Heidy M. Marin-Castro
collection DOAJ
description It is crucial for organizations to ensure that their business processes are executed accurately and comply with internal policies and requirements. Process mining is a discipline of data science that exploits business process execution data to analyze and improve business processes. It provides a data-driven approach to understanding how processes actually work in practice. Conformance checking is one of the three most relevant process mining tasks. It consists of determining the degree of correspondence or deviation between the expected (or modeled) behavior of a process vs the real one observed and revealed from the historical events recorded in an event log during the execution of each instance of the process. Under a big data scenario, traditional conformance checking methods struggle to analyzing the instances or traces in large event logs, increasing the associated computational cost. In this article, we study and address the conformance-checking task supported by a traces selection approach that uses representative sample data of the event log and thus reduces the processing time and computational cost without losing confidence in the obtained conformance value. As main contributions, we present a novel conformance checking method that (i) takes into account the data dispersion that exists in the event log data using a statistic measure, (ii) determines the size of the representative sample of the event log for the conformance checking task, and (iii) establishes selection criteria of traces based on the dispersion level. The method was validated and evaluated using fitness, precision, generalization, and processing time metrics by experiments on three actual event logs in the health domain and two synthetic event logs. The experimental evaluation and results revealed the effectiveness of our method in coping with the problem of conformance between a process model and its corresponding large event log.
format Article
id doaj-art-86f2b3cdecdd403aacaf2fe9069b96e5
institution Kabale University
issn 2376-5992
language English
publishDate 2024-12-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-86f2b3cdecdd403aacaf2fe9069b96e52024-12-20T15:05:12ZengPeerJ Inc.PeerJ Computer Science2376-59922024-12-0110e260110.7717/peerj-cs.2601A novel trace-based sampling method for conformance checkingHeidy M. Marin-Castro0Miguel Morales-Sandoval1José Luis González-Compean2Julio Hernandez3Universidad de las Américas, Cholula, Puebla, MexicoComputer Science, Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla, Puebla, MexicoCinvestav Tamaulipas, Ciudad Victoria, Tamaulipas, MexicoUniversidad de las Américas, Cholula, Puebla, MexicoIt is crucial for organizations to ensure that their business processes are executed accurately and comply with internal policies and requirements. Process mining is a discipline of data science that exploits business process execution data to analyze and improve business processes. It provides a data-driven approach to understanding how processes actually work in practice. Conformance checking is one of the three most relevant process mining tasks. It consists of determining the degree of correspondence or deviation between the expected (or modeled) behavior of a process vs the real one observed and revealed from the historical events recorded in an event log during the execution of each instance of the process. Under a big data scenario, traditional conformance checking methods struggle to analyzing the instances or traces in large event logs, increasing the associated computational cost. In this article, we study and address the conformance-checking task supported by a traces selection approach that uses representative sample data of the event log and thus reduces the processing time and computational cost without losing confidence in the obtained conformance value. As main contributions, we present a novel conformance checking method that (i) takes into account the data dispersion that exists in the event log data using a statistic measure, (ii) determines the size of the representative sample of the event log for the conformance checking task, and (iii) establishes selection criteria of traces based on the dispersion level. The method was validated and evaluated using fitness, precision, generalization, and processing time metrics by experiments on three actual event logs in the health domain and two synthetic event logs. The experimental evaluation and results revealed the effectiveness of our method in coping with the problem of conformance between a process model and its corresponding large event log.https://peerj.com/articles/cs-2601.pdfProcess miningConformance checkingTrace samplingEvent logDispersion level
spellingShingle Heidy M. Marin-Castro
Miguel Morales-Sandoval
José Luis González-Compean
Julio Hernandez
A novel trace-based sampling method for conformance checking
PeerJ Computer Science
Process mining
Conformance checking
Trace sampling
Event log
Dispersion level
title A novel trace-based sampling method for conformance checking
title_full A novel trace-based sampling method for conformance checking
title_fullStr A novel trace-based sampling method for conformance checking
title_full_unstemmed A novel trace-based sampling method for conformance checking
title_short A novel trace-based sampling method for conformance checking
title_sort novel trace based sampling method for conformance checking
topic Process mining
Conformance checking
Trace sampling
Event log
Dispersion level
url https://peerj.com/articles/cs-2601.pdf
work_keys_str_mv AT heidymmarincastro anoveltracebasedsamplingmethodforconformancechecking
AT miguelmoralessandoval anoveltracebasedsamplingmethodforconformancechecking
AT joseluisgonzalezcompean anoveltracebasedsamplingmethodforconformancechecking
AT juliohernandez anoveltracebasedsamplingmethodforconformancechecking
AT heidymmarincastro noveltracebasedsamplingmethodforconformancechecking
AT miguelmoralessandoval noveltracebasedsamplingmethodforconformancechecking
AT joseluisgonzalezcompean noveltracebasedsamplingmethodforconformancechecking
AT juliohernandez noveltracebasedsamplingmethodforconformancechecking