Efficient Time and Space Representation of Uncertain Event Data
<p>The behavior graph of the trace in <a href="#algorithms-13-00285-t001" class="html-table">Table 1</a>. Every node represents an event; the labels in the nodes represent the activity, or set of activities, associated with the event. The arcs represent the partial order relationship between events as defined by their timestamps. The indeterminate event, which might not have occurred, is represented by a dashed node.</p> "> Figure 2
<p>A normative model for the healthcare process case in the running example. The initial marking is displayed; the gray “token slot” represents the final marking.</p> "> Figure 3
<p>The behavior net representing the behavior of the uncertain trace in <a href="#algorithms-13-00285-t001" class="html-table">Table 1</a> and obtained thanks to its behavior graph. The initial marking is displayed; the gray “token slot” represents the final marking. This artifact is necessary to perform conformance checking between uncertain traces and a reference model.</p> "> Figure 4
<p>The <span class="html-italic">Uncertain Directly-Follows Graph</span> (UDFG) computed based on the uncertain event log <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mi>b</mi> <mo>,</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mi>h</mi> <mo>〉</mo> </mrow> <mn>80</mn> </msup> </semantics></math>, <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>c</mi> <mo>}</mo> </mrow> <mo>,</mo> <mrow> <mo>[</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>]</mo> </mrow> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mi>i</mi> <mo>〉</mo> </mrow> <mn>15</mn> </msup> </semantics></math>, <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>c</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> <mo>,</mo> <mrow> <mo>[</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>]</mo> </mrow> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mover> <mi>j</mi> <mo>¯</mo> </mover> <mo>〉</mo> </mrow> <mn>5</mn> </msup> </semantics></math>. The arcs are labeled with the minimum and maximum number of directly-follows relationship observable between activities in the corresponding trace. Notice the large amount of connections extracted from a single and rather short trace. Uncertain directly-follows relationships are inferred from the behavior graphs of the traces in the log. The construction of this object is necessary to perform automatic process discovery over uncertain event data.</p> "> Figure 5
<p>Three different process models for the uncertain event log <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mi>b</mi> <mo>,</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mi>h</mi> <mo>〉</mo> </mrow> <mn>80</mn> </msup> </semantics></math>, <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>c</mi> <mo>}</mo> </mrow> <mo>,</mo> <mrow> <mo>[</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>]</mo> </mrow> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mi>i</mi> <mo>〉</mo> </mrow> <mn>15</mn> </msup> </semantics></math>, <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>c</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> <mo>,</mo> <mrow> <mo>[</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>]</mo> </mrow> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mover> <mi>j</mi> <mo>¯</mo> </mover> <mo>〉</mo> </mrow> <mn>5</mn> </msup> </semantics></math> obtained through inductive mining over an uncertain directly-follows graph. The different filtering parameters for the UDFG yield models with distinct features.</p> "> Figure 5 Cont.
<p>Three different process models for the uncertain event log <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mi>b</mi> <mo>,</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mi>h</mi> <mo>〉</mo> </mrow> <mn>80</mn> </msup> </semantics></math>, <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>c</mi> <mo>}</mo> </mrow> <mo>,</mo> <mrow> <mo>[</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>]</mo> </mrow> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mi>i</mi> <mo>〉</mo> </mrow> <mn>15</mn> </msup> </semantics></math>, <math display="inline"><semantics> <msup> <mrow> <mo>〈</mo> <mi>a</mi> <mo>,</mo> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>c</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> <mo>,</mo> <mrow> <mo>[</mo> <mi>e</mi> <mo>,</mo> <mi>f</mi> <mo>]</mo> </mrow> <mo>,</mo> <mi>g</mi> <mo>,</mo> <mover> <mi>j</mi> <mo>¯</mo> </mover> <mo>〉</mo> </mrow> <mn>5</mn> </msup> </semantics></math> obtained through inductive mining over an uncertain directly-follows graph. The different filtering parameters for the UDFG yield models with distinct features.</p> "> Figure 6
<p>The behavior graph of the trace in <a href="#algorithms-13-00285-t001" class="html-table">Table 1</a> before applying the transitive reduction. All the nodes in the graph are pairwise connected based on precedence relationships; pairs of nodes for which the order is unknown are not connected.</p> "> Figure 7
<p>The same behavior graph after the transitive reduction. The arc between <math display="inline"><semantics> <msub> <mi>e</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>e</mi> <mn>4</mn> </msub> </semantics></math> is removed, since they are reachable through <math display="inline"><semantics> <msub> <mi>e</mi> <mn>2</mn> </msub> </semantics></math>. This graph has a minimal number of arcs while conserving the same reachability relationship between nodes.</p> "> Figure 8
<p>A Gantt diagram visualizing the time perspective of the events in <a href="#algorithms-13-00285-t004" class="html-table">Table 4</a>. The horizontal blue bars represent the interval of possible timestamps of uncertain events: such interval is ample for the event with activity label “c”, which has an uncertain timestamp, and is narrow to indicate a precise point in time for the other events. This diagram can show the order relationship between events in a trace, as well as the dimensions of their interval of possible timestamps in scale.</p> "> Figure 9
<p>The behavior graph of the trace in <a href="#algorithms-13-00285-t004" class="html-table">Table 4</a>.</p> "> Figure 10
<p>Construction of the class of behavior graphs isomorphic to a complete bipartite graph and an instantiated example. For any <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>2</mn> <mi>k</mi> </mrow> </semantics></math>, it is possible to have a behavior graph isomorphic to the graph <math display="inline"><semantics> <msub> <mi>K</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> </semantics></math>, which thus has a number of edges quadratic in the number of vertices.</p> "> Figure 11
<p>Time in seconds for the creation of the behavior graphs for synthetic logs with traces of length <math display="inline"><semantics> <mrow> <mi>l</mi> <mo>=</mo> <mn>20</mn> </mrow> </semantics></math> events and <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math> of uncertain events, with increasing number of traces <span class="html-italic">n</span>. The solid blue line indicates the time needed for the naïve construction; the dashed red line shows the building time of the improved algorithm, and is labeled with the relative time variation (in percentage).</p> "> Figure 12
<p>Time in seconds for the creation of the behavior graphs for synthetic logs with <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math> traces and <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math> of uncertain events, with increasing trace length <span class="html-italic">l</span>.</p> "> Figure 13
<p>Time in seconds for the creation of the behavior graphs for synthetic logs with <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math> traces of length <math display="inline"><semantics> <mrow> <mi>l</mi> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math> events, with increasing percentages of timestamp uncertainty <span class="html-italic">p</span>.</p> "> Figure 14
<p>Memory consumption in bytes needed to store the behavior graphs for synthetic uncertain event logs with traces of length <math display="inline"><semantics> <mrow> <mi>l</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math> events and timestamp uncertainty of <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math>, with an increasing number of traces <span class="html-italic">n</span>.</p> "> Figure 15
<p>Execution times in seconds for real-life event logs with increasing percentages <span class="html-italic">p</span> of timestamp uncertainty.</p> "> Figure 16
<p>Relative variation in computation time obtained through the improved behavior graph construction when applied to the computation of conformance bounds between a synthetic uncertain log and a Petri net with an increasing number of transitions. The synthetic uncertain logs have <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>500</mn> </mrow> </semantics></math> traces and timestamp uncertainty has been introduced with <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>.</p> ">
Abstract
:1. Introduction
2. Conformance Checking over Uncertain Data
3. Process Discovery over Uncertain Data
4. Materials and Methods
4.1. Preliminaries
- Irreflexivity: is false.
- Transitivity: and imply .
- Antisymmetry: implies that is false. It is implied by irreflexivity and transitivity [10].
4.2. Efficient Construction of Behavior Graphs
Algorithm 1:TimestampList() |
Algorithm 2:BehaviorGraph(TimestampList()) |
- an identifier, which in the list construction is an integer representing the rank of the uncertain event by minimum timestamp (computed in line 3);
- the activity labels associated with the event ;
- the attribute , which will carry the information regarding indeterminate events;
- the type of timestamp that generated this entry—if it is a minimum or maximum of an interval.
Algorithm 3:ProcessUncertainLog |
5. Asymptotic Complexity
6. Experimental Results
6.1. Performance of Behavior Graph Construction
- Q1: how does the computation time of the two methods compare when run on logs with an increasing number of traces?
- Q2: how does the computation time of the two methods compare when run on logs with increasing trace lengths?
- Q3: how does the computation time of the two methods compare when run on logs with increasing percentages of events with uncertain timestamps?
- Q4: what degree of reduction in memory consumption for the representation of an uncertain log can we attain with the novel method?
- Q5: do the answers obtained for Q3 hold when simulating uncertainty on real-life event data?
6.2. Applications of the Behavior Graph Construction
7. Related Work
8. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Van der Aalst, W.M.P. Process Mining: Data Science in Action; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Pegoraro, M.; van der Aalst, W.M.P. Mining uncertain event data in process mining. In Proceedings of the 2019 International Conference on Process Mining (ICPM), Aachen, Germany, 24–26 June 2019; pp. 89–96. [Google Scholar]
- Van Der Aalst, W.; Adriansyah, A.; De Medeiros, A.K.A.; Arcieri, F.; Baier, T.; Blickle, T.; Bose, J.C.; Van Den Brand, P.; Brandtjen, R.; Buijs, J.; et al. Process mining manifesto. In International Conference on Business Process Management; Springer: Berlin/Heidelberg, Germany, 2011; pp. 169–194. [Google Scholar]
- Kurniati, A.P.; Rojas, E.; Hogg, D.; Hall, G.; Johnson, O.A. The assessment of data quality issues for process mining in healthcare using Medical Information Mart for Intensive Care III, a freely available e-health record database. Health Inform. J. 2019, 25, 1878–1893. [Google Scholar] [CrossRef] [PubMed]
- Pegoraro, M.; Uysal, M.S.; van der Aalst, W.M.P. Efficient construction of behavior graphs for uncertain event data. In International Conference on Business Information Systems (BIS); Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Berti, A.; van Zelst, S.J.; van der Aalst, W.M.P. Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science. arXiv 2019, arXiv:1905.06169. [Google Scholar]
- Adriansyah, A.; van Dongen, B.F.; van der Aalst, W.M.P. Towards robust conformance checking. In International Conference on Business Process Management; Springer: Berlin/Heidelberg, Germany, 2010; pp. 122–133. [Google Scholar]
- Pegoraro, M.; Uysal, M.S.; van der Aalst, W.M.P. Discovering Process Models from Uncertain Event Data. In International Conference on Business Process Management; Springer: Berlin/Heidelberg, Germany, 2019; pp. 238–249. [Google Scholar]
- Leemans, S.J.J.; Fahland, D.; van der Aalst, W.M.P. Discovering block-structured process models from event logs-a constructive approach. In International Conference on Applications and Theory of Petri Nets and Concurrency; Springer: Berlin/Heidelberg, Germany, 2013; pp. 311–329. [Google Scholar]
- Flaška, V.; Ježek, J.; Kepka, T.; Kortelainen, J. Transitive closures of binary relations. I. Acta Univ. Carol. Math. Phys. 2007, 48, 55–69. [Google Scholar]
- Kalvin, A.D.; Varol, Y.L. On the generation of all topological sortings. J. Algorithms 1983, 4, 150–162. [Google Scholar] [CrossRef]
- Aho, A.V.; Garey, M.R.; Ullman, J.D. The transitive reduction of a directed graph. SIAM J. Comput. 1972, 1, 131–137. [Google Scholar] [CrossRef] [Green Version]
- Strassen, V. Gaussian elimination is not optimal. Numer. Math. 1969, 13, 354–356. [Google Scholar] [CrossRef]
- Coppersmith, D.; Winograd, S. Matrix multiplication via arithmetic progressions. J. Symb. Comput. 1990, 9, 251–280. [Google Scholar] [CrossRef] [Green Version]
- Stothers, A.J. On the Complexity of Matrix Multiplication. Ph.D. Thesis, University of Edinburgh, Edinburgh, UK, 2010. [Google Scholar]
- Williams, V.V. Multiplying matrices faster than Coppersmith-Winograd. In Proceedings of the ACM Symposium on Theory of Computing (STOC), New York, NY, USA, 19–22 May 2012; Volume 12, pp. 887–898. [Google Scholar]
- Le Gall, F. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th international symposium on symbolic and algebraic computation, Kobe, Japan, 23–25 July 2014; pp. 296–303. [Google Scholar]
- D’Alberto, P.; Nicolau, A. Using recursion to boost ATLAS’s performance. In High-Performance Computing; Springer: Berlin/Heidelberg, Germany, 2005; pp. 142–151. [Google Scholar]
- Le Gall, F. Faster algorithms for rectangular matrix multiplication. In Proceedings of the 53rd Annual Symposium on Foundations of Computer Science, New Brunswick, NJ, USA, 20–23 October 2012; pp. 514–523. [Google Scholar]
- Lee, W.L.J.; Verbeek, H.M.W.; Munoz-Gama, J.; van der Aalst, W.M.P.; Sepúlveda, M. Replay using recomposition: Alignment-based conformance checking in the large. In Proceedings of the BPM Demo Track and BPM Dissertation Award co-located with 15th International Conference on Business Process Management (BPM 2017), Barcelona, Spain, 13 September 2017. [Google Scholar]
- Aggarwal, C.C.; Philip, S.Y. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 2008, 21, 609–623. [Google Scholar] [CrossRef] [Green Version]
- Suciu, D.; Olteanu, D.; Ré, C.; Koch, C. Probabilistic databases. Synth. Lect. Data Manag. 2011, 3, 1–180. [Google Scholar] [CrossRef] [Green Version]
- Chui, C.K.; Kao, B.; Hung, E. Mining frequent itemsets from uncertain data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Nanjing, China, 22–25 May 2007; pp. 47–58. [Google Scholar]
- Al-Mutawa, H.A.; Dietrich, J.; Marsland, S.; McCartin, C. On the shape of circular dependencies in Java programs. In Proceedings of the 2014 23rd Australian Software Engineering Conference, Milsons Point, Australia, 7–10 April 2014; pp. 48–57. [Google Scholar]
- Bayes, T. LII. An Essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F.R.S. communicated by Mr. Price, in a letter to John Canton, A.M.F.R.S. Philos. Trans. R. Soc. Lond. 1763, 53, 370–418. [Google Scholar]
- Mariappan, M.; Vora, K. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, 25–28 March 2019; p. 25. [Google Scholar]
- Aho, A.; Lam, M.; Sethi, R.; Ullman, J.; Cooper, K.; Torczon, L.; Muchnick, S. Compilers: Principles, Techniques and Tools; Addison Wesley: Boston, MA, USA, 2007. [Google Scholar]
- Mokhov, A.; Carmona, J.; Beaumont, J. Mining conditional partial order graphs from event logs. In Transactions on Petri Nets and Other Models of Concurrency XI; Springer: Berlin/Heidelberg, Germany, 2016; pp. 114–136. [Google Scholar]
- Lu, X.; Fahland, D.; van der Aalst, W.M.P. Conformance checking based on partially ordered event data. In International Conference on Business Process Management; Springer: Berlin/Heidelberg, Germany, 2014; pp. 75–88. [Google Scholar]
- Lu, X.; Mans, R.S.; Fahland, D.; van der Aalst, W.M.P. Conformance checking in healthcare based on partially ordered event data. In Proceedings of the 2014 IEEE Emerging Technology and Factory Automation (ETFA), Barcelona, Spain, 16–19 September 2014; pp. 1–8. [Google Scholar]
- Genga, L.; Alizadeh, M.; Potena, D.; Diamantini, C.; Zannone, N. Discovering anomalous frequent patterns from partially ordered event logs. J. Intell. Inf. Syst. 2018, 51, 257–300. [Google Scholar] [CrossRef] [Green Version]
- Van der Aa, H.; Leopold, H.; Weidlich, M. Partial order resolution of event logs for process conformance checking. Decis. Support Syst. 2020, 136, 113347. [Google Scholar] [CrossRef]
- Leemans, S.J.J.; Polyvyanyy, A. Stochastic-Aware Conformance Checking: An Entropy-Based Approach. In International Conference on Advanced Information Systems Engineering; Springer: Berlin/Heidelberg, Germany, 2020; pp. 217–233. [Google Scholar]
- Rogge-Solti, A.; van der Aalst, W.M.P.; Weske, M. Discovering stochastic petri nets with arbitrary delay distributions from event logs. In International Conference on Business Process Management; Springer: Berlin/Heidelberg, Germany, 2013; pp. 15–27. [Google Scholar]
Case ID | Event ID | Timestamp | Activity | Indet. Event |
---|---|---|---|---|
ID327 | 5 | NightSweats | ? | |
ID327 | 8 | {PrTP, SecTP} | ! | |
ID327 | [4, 10] | Splenomeg | ! | |
ID327 | 12 | Adm | ! |
≫ | NightSweats | Splenomeg | ≫ | PrTP | Adm |
---|---|---|---|---|---|
NightSweats | Splenomeg | PrTP | Adm | ||
≫ | SecTP | ≫ | Splenomeg | ≫ | ≫ | Adm |
---|---|---|---|---|---|---|
≫ | NightSweats | Splenomeg | PrTP | Adm | ||
Case ID | Event ID | Activity | Timestamp | Event Type |
---|---|---|---|---|
872 | a | 5 December 2011 | ! | |
872 | b | [6 December 2011, 10 December 2011] | ! | |
872 | c | 7 December 2011 | ! | |
872 | d | [8 December 2011, 11 December 2011] | ! | |
872 | e | 9 December 2011 | ! | |
872 | f | [12 December 2011, 13 December 2011] | ! |
Event | List Entry (Minimum Timestamp) | List Entry (Maximum Timestamp) |
---|---|---|
(5 December 2011, 1, , !, ‘MIN’) | (5 December 2011, 1, , !, ‘MAX’) | |
(6 December 2011, 2, , !, ‘MIN’) | (10 December 2011, 2, , !, ‘MAX’) | |
(7 December 2011, 3, , !, ‘MIN’) | (7 December 2011, 3, , !, ‘MAX’) | |
(8 December 2011, 4, , !, ‘MIN’) | (8 December 2011, 4, , !, ‘MAX’) | |
(9 December 2011, 5, , !, ‘MIN’) | (9 December 2011, 5, , !, ‘MAX’) | |
(12 December 2011, 6, , !, ‘MIN’) | (13 December 2011, 6, , !, ‘MAX’) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pegoraro, M.; Uysal, M.S.; van der Aalst, W.M.P. Efficient Time and Space Representation of Uncertain Event Data. Algorithms 2020, 13, 285. https://doi.org/10.3390/a13110285
Pegoraro M, Uysal MS, van der Aalst WMP. Efficient Time and Space Representation of Uncertain Event Data. Algorithms. 2020; 13(11):285. https://doi.org/10.3390/a13110285
Chicago/Turabian StylePegoraro, Marco, Merih Seran Uysal, and Wil M. P. van der Aalst. 2020. "Efficient Time and Space Representation of Uncertain Event Data" Algorithms 13, no. 11: 285. https://doi.org/10.3390/a13110285