Quickening Data-Aware Conformance Checking through Temporal Algebras †
<p>Table of Contents.</p> "> Figure 2
<p>KnoBAB Architecture for Breast Cancer patients. Each trace ➀–➂ represents one single patient’s clinical history, represented with unique colouring, while each Declare clause Ⓐ–Ⓒ prescribes a temporal condition that such traces shall satisfy. Please observe that the atomisation process does not consider data distribution but rather partitions the data space as described by the data activation and target conditions. In the query plan, green arrows indicate access to shared sub-queries as in [<xref ref-type="bibr" rid="B9-information-14-00173">9</xref>], and thick red ellipses indicate which operators are untimed.</p> "> Figure 3
<p>We can express a cyber-security scenario by considering (<bold>a</bold>) possible situations in a Cyber Kill Chain, than are then (<bold>b</bold>) represented in the activity labels’ names associated to the events.</p> "> Figure 4
<p>Two exemplifying clauses distinguishing <monospace>Response</monospace> and <monospace>Precedence</monospace> behaviours. Traces are represented as temporally ordered events associated with activity labels (boxed). Activation (or target) conditions are circled here (or ticked/crossed). Ticks (or crosses) indicate a (un)successful match of a target condition. For all activations, there must be an un-failing target condition; for precedence, we shall consider at most one activation. These conditions require the usage of multiple join tests per trace.</p> "> Figure 5
<p>In-depth representation of the query plan associated with the model described in Example 15.</p> "> Figure 6
<p>Assessing a high-level use case of an intrusion attack on a software system through a declarative model.</p> "> Figure 7
<p>Results for the fast set operations <xref ref-type="sec" rid="sec6dot1-information-14-00173">Section 6.1</xref> against the traditional logical implementation.</p> "> Figure 8
<p>Results for the custom declarative clause implementations <xref ref-type="sec" rid="sec6dot2-information-14-00173">Section 6.2</xref> against the traditional logical implementation.</p> "> Figure 9
<p>Results for the <sc><bold>Until</bold></sc> operator (<xref ref-type="sec" rid="sec6dot3-information-14-00173">Section 6.3</xref>).</p> "> Figure 10
<p>Results for the derived operators <sc><bold>TimedAndFuture</bold></sc> and <sc><bold>TimedAndGlobally</bold></sc> <xref ref-type="sec" rid="sec6dot4-information-14-00173">Section 6.4</xref>. We include both variants of the fast implementations to analyse the environments where each thrive.</p> "> Figure 11
<p>Results for relational temporal mining <xref ref-type="sec" rid="sec7dot2-information-14-00173">Section 7.2</xref>.</p> "> Figure 12
<p>Results for parallelisation <xref ref-type="sec" rid="sec7dot3-information-14-00173">Section 7.3</xref>. <inline-formula><mml:math id="mm1144"><mml:semantics><mml:mi>ω</mml:mi></mml:semantics></mml:math></inline-formula> indicates the set of threads in the thread pool, and the red dashed horizontal lines indicate running times for single threaded instances.</p> "> Figure 13
<p>Running times over different models (<xref ref-type="app" rid="app1-information-14-00173">Table S1a</xref>) for different atomisation strategies.</p> "> Figure 14
<p>Running times for data-aware conformance checking.</p> ">
Abstract
:1. Introduction
- As an extension from our previous work, we fully formalise the logical data model (Section 3.1) and characterise the physical one (Section 4) in order to faithfully represent our log. This will prelude the full formalisation of the xtLTLf algebra;
- Contextually, we also show for the first time that the xtLTLf algebra (Section 3.2) can not only express declarative languages such as Declare [7] as in our previous work but can express the semantics of LTLf formula by returning any non-empty finite trace satisfying the latter if loaded in our relational representation (see Appendix A.2). We also show for the first time a formalisation for data correlation conditions associated with binary temporal operators;
- Differently from our previous work, where we just hinted at the implementation of each operator with some high level, we now propose different possible algorithms for some xtLTLf operators (Section 6), and we then discuss both theoretically (Supplement II.2) and empirically (Section 7.1) which might be preferred under different trace length or log size conditions. This leads to the definition of hybrid algorithms [8];
- Our benchmarks demonstrate that our implementation outperforms conformance checking techniques running on both relational databases (Section 7.2) and on tailored solutions (Section 7.5) when customary algorithms are chosen for implementing xtLTLf operators;
- Finally, this paper considerably extends the experimental section from our previous work. First, we show (Section 7.3) how the query plan’s execution might be parallelised, thus further improving with super-linear speed-up our previous running time results. Then, we also discuss (Section 7.4) how different data accessing strategies achievable through query rewriting might affect the query’s running time.
1.1. Case Studies
1.1.1. Cyber Security
- Reconnaissance (rec): An attacker observes the situation from the outside in order to identify targets and tactics. As the attacker mainly collects information regarding the system’s vulnerabilities, this is the hardest part to detect.
- Weaponisation (weap): After gathering the information, the cybercriminal implements his strategy through a software artefact. This detection will have greater chances of success in the future after post-mortem analysis, when either a temporal model is mined over the collected attack data or the strategy is directly inferred from available artefacts (e.g., malaware).
- Payload or Delivery (del): The cybercriminal devises a way to infiltrate the host system that hides the previously produced artefact (e.g., a Trojan). This must sound as harmless as possible to fool the system.
- Exploitation (expl): The cybercriminal exploits the system’s vulnerabilities and infiltrates it through the previous “cover”. At this stage, the defensive system should raise the alarm if any kind of unusual behaviour is detected while increasing the security level.
- Installation (inst): The weapon escapes the payload and gets installed into the host computer system. At this point, any kind of suspected behaviour might be detected by malicious system calls.
- Command & Control (comm): The weapon establishes a communication with the cybercriminal for receiving orders from the attacker. The system should detect any kind of suspicious network communication and should attempt to break the communication channel.
- Action (act): The intruder starts the attack on the system. At this stage, the attack should be more evident, and the Industrial IoT Shields (iiot_sh), such as network devices protection, should be activated.
1.1.2. Industry 4.0
1.1.3. Healthcare
2. Preliminaries
- An event satisfies the activity label iff. its activity labels is : ;
- An event satisfies the negated formula iff. the same event does not satisfy the non-negated formula: ;
- An event satisfies the disjunction of LTLf sub-formulæ iff. the event satisfies one of the two sub-formulæ: ;
- An event satisfies the conjunction of LTLf formulæ iff. the event satisfies all of the sub-formulæ: ;
- An event satisfies a formula at the next step iff. the formula is satisfied in the incoming event if present: ;
- An event globally satisfies a formula iff. the formula is satisfied in all the following events, including the current one: ;
- An event eventually satisfies a formula iff. the formula is satisfied in either the present or in any future event: ;
- An event satisfies until holds iff. holds at least until becomes true, which must hold at the current or a future position:
- The clause provides no target condition and it exists at least one activating event;
- The clause provides a target condition but no binary (payload) predicate , and the declarative clause establishes a temporal correlation between (at least one) activating event and (at least one) targeting one;
- The clause provides both a target condition and a binary predicate , while the activating and targeting events satisfying the temporal correlation as in the previous case also satisfy a binary predicate over their payloads; in this situation, we state that the activating and targeting event match as they jointly satisfy the correlation condition .
Algorithm 1 Atomisation: -encoding pipeline. |
|
- 1.
- 2.
- 3.
- 4.
3. Logical Model
3.1. Model Definition
- ;
Algorithm 2 Populating the Knowledge Base (Section 4.2) |
|
3.2. eXTended LTLf Algebra (xt LTLf)
3.2.1. Base Operators
3.2.2. Unary Operators
3.2.3. Binary Operators
3.2.4. Derived Operators
4. Physical Database Design
4.1. Primary Memory Data Structures
4.2. Populating the Database
4.2.1. Bulk Insertion
4.2.2. Loading and Indexing
5. Query Processing and Optimisation
5.1. Query Compiler
5.1.1. Atomisation Pipeline
Algorithm 3 Atomisation Pipeline (Section 5.1.1) |
|
5.1.2. Query Optimiser
Algorithm 4 Query Optimiser |
|
5.1.3. Enabling Intraquery Parallelism
Algorithm 5 Query Scheduler (Section 5.1.3) |
|
5.2. Execution Engine
Algorithm 6 Execution Engine (Section 5.2) |
|
5.2.1. Basic Operators’ Execution
5.2.2. Results Propagation
5.2.3. Conjunctive and Aggregation Queries
6. Algorithmic Implementations
6.1. Timed and Untimed Or/And
Algorithm 7 xtLTLf pseudocode implementation for And and Or operators |
|
6.2. Choice and Untimed Or
Algorithm 8 xtLTLf pseudocode implementation for Future and Globally |
|
6.3. Untimed Until(s)
Algorithm 9 Two implementations for the untimed xtLTLf Until. |
|
6.4. Derived Operators
7. Results and Discussion
7.1. Comparing Different Operators’ Algorithms
Algorithm 10 Hybrid Algorithms |
|
7.2. Relational Temporal Mining
7.3. Query Plan Parallelisation
- BLOCKED STATIC: aims to balance the chunk sizes per thread by distributing any leftover iterations;
- BLOCK-CYCLIC STATIC. Does not utilise balancing as the former. Instead, work blocks are cyclically allocated over the threads;
- GUIDED DYNAMIC: aims to distribute large chunks when there is a lot of work still to be completed; tasks are split into smaller chunks as the work load diminishes;
- MONOTONIC DYNAMIC: uses a single centralised counter that is incremented when a thread performs an iteration of work. The schedule issues iterations to threads in an increasing manner.
- TASK SIZE PREDICTION BLOCK STATIC provides an estimation of work required per chunk. Then, these chunks are sorted in ascending work load, with the last providing the greatest amount of computation. Threads are then assigned chunks through a distribution algorithm, distributing the first and last chunk of the sorted work to the first thread, the second and penultimate to the second, etc.. The algorithm aims to distribute equal amounts of work to each thread, though assumes that the workload is strictly increasing while workload sizes are evenly distributed;
- TASK SIZE PREDICTION UNBALANCED DYNAMIC: unlike the former, we assume that the incoming work is not balanced. Instead, a chunk is taken, its work size estimated and assigned to a thread. Then, the next thread will recursively receive chunks until the summed work load is approximate to that of the former. The next thread is then pulled from the pool and the process repeated until all chunks are assigned.
7.4. -Encoding Atomisation Strategies
7.5. Data-Aware Conformance Checking
8. Conclusions
Supplementary Materials
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Sample Availability
Abbreviations
DAG | Direct Acyclic Graph |
KnoBAB | KNOwledge Base for Alignments and Business process modelling |
LTLf | Linear Temporal Logic over finite traces |
RDBMS | Relational Database Management System |
XES | eXtensible Event Stream |
xtLTLf | eXTended Linear Temporal Logic over finite traces |
Appendix A
Appendix A.1
Appendix A.2
- :
- By applying the aforementioned rewriting lemma (from now on simply referred to as by construction of ActivityTable), we can immediately close the goal by choosing as the model will only return data associated with the log of choice:
- :
- If the compound condition is also atomic for which q can be expressed as an interval query for some payload key , we can follow a similar proof from the former case and choose the atom , thus closing the goal as follows:
- :
- by inductive hypothesis, we know the xtLTLf expression returning , which contains when . For this, we choose as , which also guarantees that j never exceeds the trace’s length (). We can therefore expand the definition of our proposed operator by obtaining:
- :
- The application of the induction is similar to the former and, similarly to the former case, we also proceed by expanding the definition of the relational operator. We can hereby choose where the induction is applied over and . We can close the goal as follows:
- :
- Similarly to globally, we obtain for a corresponding to by inductive hypothesis.
- :
- Similarly to the previous unary operators, we choose as xtLTLf operator where the inductive hypothesis links to . We can therefore close the goal as follows:This is doable as stating is equivalent to where the latter can be rewritten as .
- :
- As we have that two inductive hypotheses associate and respectively to and , we choose the xtLTLf formula to be associated with . For this xtLTLf operator, we can state that a result is returned by such an operator if and only if and per definition of operators never returning explicit activation or target condition. We close the goal as follows:
- :
- We can firstly observe that in the classical semantics is equivalent to for any possible proposition A and B (OrRwLem). After observing that the current operator is defined by extension of the previously proved one, we can exploit the previous one as a rewriting lemma. As we have that two inductive hypothesis associating and respectively to and , we choose the xtLTLf formula to be associated with . We close the goal as follows:
- :
- as both the results from the third element of the intermediate results are always empty by construction and preliminary assumption, and we have inductive hypothesis associating and respectively to and , we can immediately close the goal after choosing the xtLTLf formula to be associated with .
- :
- By definition of the Init operator, it is sufficient to consider ;
- :
- Under the assumption that the compound condition corresponds to an atomic query with , we can formulate the former as follows: ;
- :
- By rewriting this definition, this implies to prove that . As the Next operator is a timed one and we cannot assess from the beginning of the trace, we cannot exploit the inductive hypothesis for , but we need to apply the previously proven lemma for the conditions happening at any point in the trace. From the application of the previous lemma, we have that for some xtLTLf expression returning . From this, it follows that . By its definition, returns all events preceding the ones stated in , while, for , we are only interested in restricting all of the possible results of to the ones also corresponding to the beginning of the trace. For this reason, we need to consider as ;
- :
- Similarly to the previous operator, is timed and should be checked for all events of interest within the trace . Even in this case, we need to apply the previous lemma for , thus guaranteeing that an xtLTLf expression exists containing whenever . As globally requires that all of the events satisfy , we have that responds by the intended semantics, and therefore we choose this as our ;
- :
- Similarly to the previous operator, we choose when is linked to the evaluation of for any possible trace event by the previous lemma;
- :
- In this other scenario, we can directly apply the previous lemma, as the evaluation of will always start from the beginning of the trace. After recalling that , we rewrite the definition of while applying the inductive hypothesis for the present lemma over some semantically linked to as follows:Per inductive hypothesis, contains all of the records for which ; as the untimed negation will return a record if and only if there is no event associated with the trace in the provided operand, we can choose and close the goal as follows:
- :
- Similarly to the former operators, both and required a timed evaluation of the events along the trace of interest, for which we need to exploit the former lemma, thus obtaining timed xtLTLf expressions and . We can immediately close the lemma by choosing
- :
- Similarly to the negation operator, we can directly apply the inductive hypothesis on and , as these sub-operators will also be assessed from the beginning of a trace; these will be associated respectively to the xtLTLf expressions and having and as we exploit neither activation nor target conditions. As per construction and will contain no record for some natural number , we chose
- :
- By exploiting similar consideration from the former operator, we chose for some and respectively associated by inductive hypothesis to and .
- This can be trivially closed by choosing ;
- This can be trivially closed by choosing ;
- :
- After observing that , we obtain the following condition by operator’s expansion, where is evaluated over as per inductive hypothesis:
- :
- By following similar consideration as per the former operator, we have:
- :
- By rewriting the definition of the timed And operator, we obtain the following:If And contains for both of its operands an event , it follows that there should be at least one match over the corresponding untimed operator evaluated over . For the latter operator, we can therefore ensure that a j exists and a being and L as well as for which the following condition holds:
- :
- As this operator is derived from the definition of , we can directly close the goal by the previous inductive step if the result represents a match between the elements of the first and second operand. If there were no events that might have been matched, the data come either from the first or from the second operand. As the two cases are symmetric, we just provide proof for the former case. In this situation, we have a corresponding to a for which there is no such that . If there still exists a and such that for which there might be a match between L and , then this case falls under the untimed over , and we still have some for which the latter returns ; if match is never possible or no of such exists, then the untimed operator will return a by definition;
- :
- This is a mere rewriting exercise, as the untimed version of Until is a mere instantiation of the latter where only the case is considered.
Appendix A.3
References
- Agrawal, R.; Imieliński, T.; Swami, A. Mining Association Rules between Sets of Items in Large Databases. SIGMOD Rec. 1993, 22, 207–216. [Google Scholar] [CrossRef]
- Bergami, G.; Maggi, F.M.; Montali, M.; Peñaloza, R. Probabilistic Trace Alignment. In Proceedings of the 2021 3rd International Conference on Process Mining (ICPM), Eindhoven, The Netherlands, 31 October–4 November 2021; pp. 9–16. [Google Scholar] [CrossRef]
- Schön, O.; van Huijgevoort, B.; Haesaert, S.; Soudjani, S. Correct-by-Design Control of Parametric Stochastic Systems. In Proceedings of the 2022 IEEE 61st Conference on Decision and Control, Cancun, Mexico, 6–9 December 2022. [Google Scholar]
- Appleby, S.; Bergami, G.; Morgan, G. Running Temporal Logical Queries on the Relational Model. In Proceedings of the International Database Engineered Applications Symposium (IDEAS’22), Budapest, Hungary, 22–24 August 2022; pp. 222–231. [Google Scholar]
- Schönig, S.; Rogge-Solti, A.; Cabanillas, C.; Jablonski, S.; Mendling, J. Efficient and Customisable Declarative Process Mining with SQL. In Advanced Information Systems Engineering, Proceedings of the 28th International Conference, CAiSE 2016, Ljubljana, Slovenia, 13–17 June 2016; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Burattin, A.; Maggi, F.M.; Sperduti, A. Conformance checking based on multi-perspective declarative process models. Expert Syst. Appl. 2016, 65, 194–211. [Google Scholar] [CrossRef] [Green Version]
- Pesic, M.; Schonenberg, H.; van der Aalst, W.M.P. DECLARE: Full Support for Loosely-Structured Processes. In Proceedings of the 11th IEEE International Enterprise Distributed Object Computing Conference, Annapolis, MA, USA, 15–19 October 2007; pp. 287–300. [Google Scholar]
- Musser, D.R. Introspective Sorting and Selection Algorithms. Softw. Pract. Exp. 1997, 27, 983–993. [Google Scholar] [CrossRef]
- Bellatreche, L.; Kechar, M.; Bahloul, S.N. Bringing Common Subexpression Problem from the Dark to Light: Towards Large-Scale Workload Optimizations. In Proceedings of the 25th International Database Engineering & Applications Symposium, Montreal, QC, Canada, 14–16 July 2021. [Google Scholar]
- Naldurg, P.; Sen, K.; Thati, P. A Temporal Logic Based Framework for Intrusion Detection. In Proceedings of the Formal Techniques for Networked and Distributed Systems—FORTE 2004: 24th IFIP WG 6.1 International Conference, Madrid, Spain, 27–30 September 2004; Núñez, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3235, pp. 359–376. [Google Scholar]
- Ray, I. Security Vulnerabilities in Smart Contracts as Specifications in Linear Temporal Logic. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2021. [Google Scholar]
- Buschjäger, S.; Hess, S.; Morik, K. Shrub Ensembles for Online Classification. In Proceedings of the the AAAI Conference on Artificial Intelligence 2022, Virtual, 22 February–1 March 2022; pp. 6123–6131. [Google Scholar]
- Huo, X.; Hao, K.; Chen, L.; song Tang, X.; Wang, T.; Cai, X. A dynamic soft sensor of industrial fuzzy time series with propositional linear temporal logic. Expert Syst. Appl. 2022, 201, 117176. [Google Scholar] [CrossRef]
- Bergami, G.; Francescomarino, C.D.; Ghidini, C.; Maggi, F.M.; Puura, J. Exploring Business Process Deviance with Sequential and Declarative Patterns. arXiv 2021, arXiv:2111.12454. [Google Scholar]
- Zhou, H.; Milani Fard, A.; Makanju, A. The State of Ethereum Smart Contracts Security: Vulnerabilities, Countermeasures, and Tool Support. J. Cybersecur. Priv. 2022, 2, 358–378. [Google Scholar] [CrossRef]
- Szabo, N. Smart contracts: Building blocks for digital markets. Extropy J. Transhumanist Thought 1996, 18, 28. [Google Scholar]
- Fionda, V.; Greco, G.; Mastratisi, M.A. Reasoning About Smart Contracts Encoded in LTL. In Proceedings of the AIxIA 2021—Advances in Artificial Intelligence: 20th International Conference of the Italian Association for Artificial Intelligence, Virtual Event, 1–3 December 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 123–136. [Google Scholar]
- Bank, H.S.; D’souza, S.; Rasam, A. Temporal Logic (TL)-Based Autonomy for Smart Manufacturing Systems. Procedia Manuf. 2018, 26, 1221–1229. [Google Scholar] [CrossRef]
- Mao, X.; Li, X.; Huang, Y.; Shi, J.; Zhang, Y. Programmable Logic Controllers Past Linear Temporal Logic for Monitoring Applications in Industrial Control Systems. IEEE Trans. Ind. Informatics 2022, 18, 4393–4405. [Google Scholar] [CrossRef]
- Boniol, P.; Linardi, M.; Roncallo, F.; Palpanas, T.; Meftah, M.; Remy, E. Unsupervised and scalable subsequence anomaly detection in large data series. Vldb J. 2021, 30, 909–931. [Google Scholar] [CrossRef]
- Xu, H.; Pang, J.; Yang, X.; Yu, J.; Li, X.; Zhao, D. Modeling clinical activities based on multi-perspective declarative process mining with openEHR’s characteristic. BMC Med. Inform. Decis. Mak. 2020, 20-S, 303. [Google Scholar] [CrossRef]
- Rovani, M.; Maggi, F.M.; de Leoni, M.; van der Aalst, W.M.P. Declarative process mining in healthcare. Expert Syst. Appl. 2015, 42, 9236–9251. [Google Scholar] [CrossRef] [Green Version]
- Bertini, F.; Bergami, G.; Montesi, D.; Veronese, G.; Marchesini, G.; Pandolfi, P. Predicting Frailty Condition in Elderly Using Multidimensional Socioclinical Databases. Proc. IEEE 2018, 106, 723–737. [Google Scholar] [CrossRef]
- De Giacomo, G.; Maggi, F.M.; Marrella, A.; Patrizi, F. On the Disruptive Effectiveness of Automated Planning for LTLf-Based Trace Alignment. In Proceedings of the AAAI Conference on Artificial Intelligence 2017, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Bergami, G.; Maggi, F.M.; Marrella, A.; Montali, M. Aligning Data-Aware Declarative Process Models and Event Logs. In Business Process Management; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 235–251. [Google Scholar]
- Bergami, G. A Logical Model for joining Property Graphs. arXiv 2021, arXiv:2106.14766. [Google Scholar]
- Zhu, S.; Pu, G.; Vardi, M.Y. First-Order vs. Second-Order Encodings for LTLf-to-Automata Translation. arXiv 2019, arXiv:1901.06108. [Google Scholar]
- Ceri, S.; Gottlob, G. Translating SQL Into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries. IEEE Trans. Software Eng. 1985, 11, 324–345. [Google Scholar] [CrossRef]
- Calders, T.; Lakshmanan, L.V.S.; Ng, R.T.; Paredaens, J. Expressive power of an algebra for data mining. ACM Trans. Database Syst. 2006, 31, 1169–1214. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Pu, G.; Zhang, Y.; Vardi, M.Y.; Rozier, K.Y. SAT-based explicit LTLf satisfiability checking. Artif. Intell. 2020, 289, 103369. [Google Scholar] [CrossRef]
- Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E. FoodBroker-Generating Synthetic Datasets for Graph-Based Business Analytics. In Proceedings of the 5th International Workshop, WBDB 2014, Potsdam, Germany, 5–6 August 2014. [Google Scholar]
- Bergami, G. On Declare MAX-SAT and a finite Herbrand Base for data-aware logs. arXiv 2021, arXiv:2106.07781. [Google Scholar]
- Pichler, P.; Weber, B.; Zugal, S.; Pinggera, J.; Mendling, J.; Reijers, H.A. Imperative versus Declarative Process Modeling Languages: An Empirical Investigation. In Proceedings of the BPM 2011 International Workshops, Clermont-Ferrand, France, 29 August 2011; pp. 383–394. [Google Scholar]
- Codd, E.F. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 1970, 13, 377–387. [Google Scholar] [CrossRef]
- Idreos, S.; Groffen, F.; Nes, N.; Manegold, S.; Mullender, K.S.; Kersten, M.L. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 2012, 35, 40–45. [Google Scholar]
- Boncz, P.A.; Manegold, S.; Kersten, M.L. Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct. Proc. VLDB Endow. 2009, 2, 1648–1653. [Google Scholar]
- Roth, M.A.; Korth, H.F.; Silberschatz, A. Extended Algebra and Calculus for Nested Relational Databases. ACM Trans. Database Syst. 1988, 13, 389–417. [Google Scholar] [CrossRef]
- Wang, J.; Ntarmos, N.; Triantafillou, P. GraphCache: A Caching System for Graph Queries. In Proceedings of the International Conference on Extending Database Technology (EDBT) 2017, Venice, Italy, 21–24 March 2017; pp. 13–24. [Google Scholar]
- Keller, A.M.; Basu, J. A Predicate-based Caching Scheme for Client-Server Database Architectures. VLDB J. 1996, 5, 35–47. [Google Scholar] [CrossRef] [Green Version]
- Davey, B.A.; Priestley, H.A. Introduction to Lattices and Order, 2nd ed.; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- de Berg, M.; Cheong, O.; van Kreveld, M.J.; Overmars, M.H. Computational Geometry: Algorithms and Applications, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Elmasri, R.; Navathe, S.B. Fundamentals of Database Systems, 7th ed.; Pearson: Upper Saddle River, NJ, USA, 2015. [Google Scholar]
- Polyvyanyy, A.; ter Hofstede, A.H.M.; Rosa, M.L.; Ouyang, C.; Pika, A. Process Query Language: Design, Implementation, and Evaluation. arXiv 2019, arXiv:1909.09543. [Google Scholar]
- Coffman, E.G.; Graham, R.L. Optimal Scheduling for Two-Processor Systems. Acta Inform. 1972, 1, 200–213. [Google Scholar] [CrossRef]
- Sugiyama, K.; Tagawa, S.; Toda, M. Methods for Visual Understanding of Hierarchical System Structures. IEEE Trans. Syst. Man. Cybern. 1981, 11, 109–125. [Google Scholar] [CrossRef]
- Bergami, G. On Efficiently Equi-Joining Graphs. In Proceedings of the 25th International Database Engineering & Applications Symposium 2021, Montreal, QC, Canada, 14–16 July 2021. [Google Scholar]
- Dittrich, J. Patterns in Data Management: A Flipped Textbook; CreateSpace Independent Publishing Platform: Charleston, SC, USA, 2016. [Google Scholar]
- Schönig, S. SQL Queries for Declarative Process Mining on Event Logs of Relational Databases. arXiv 2015, arXiv:1512.00196. [Google Scholar]
- Shoshany, B. A C++17 Thread Pool for High-Performance Scientific Computing. arXiv 2021, arXiv:2105.00613. [Google Scholar] [CrossRef]
- Klemm, M.; Cownie, J. 8 Scheduling parallel loops. In High Performance Parallel Runtimes; De Gruyter Oldenbourg: Berlin, Germany; Boston, MA, USA, 2021; pp. 228–258. [Google Scholar]
- Ristov, S.; Prodan, R.; Gusev, M.; Skala, K. Superlinear speedup in HPC systems: Why and when? In Proceedings of the 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), Gdańsk, Poland, 11–14 September 2016; pp. 889–898. [Google Scholar]
- Yan, B.; Regueiro, R.A. Superlinear speedup phenomenon in parallel 3D Discrete Element Method (DEM) simulations of complex-shaped particles. Parallel Comput. 2018, 75, 61–87. [Google Scholar] [CrossRef]
- Nagashima, U.; Hyugaji, S.; Sekiguchi, S.; Sato, M.; Hosoya, H. An experience with super-linear speedup achieved by parallel computing on a workstation cluster: Parallel calculation of density of states of large scale cyclic polyacenes. Parallel Comput. 1995, 21, 1491–1504. [Google Scholar] [CrossRef]
- Anselma, L.; Bottrighi, A.; Montani, S.; Terenziani, P. Extending BCDM to Cope with Proposals and Evaluations of Updates. IEEE Trans. Knowl. Data Eng. 2013, 25, 556–570. [Google Scholar] [CrossRef]
- Bergami, G.; Bertini, F.; Montesi, D. Hierarchical embedding for DAG reachability queries. In Proceedings of the IDEAS 2020: 24th International Database Engineering & Applications Symposium, Seoul, Republic of Korea, 12–14 August 2020; Desai, B.C., Cho, W., Eds.; ACM: New York, NY, USA, 2020; pp. 24:1–24:10. [Google Scholar]
- Revesz, P.Z. Introduction to Databases—From Biological to Spatio-Temporal; Texts in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Revesz, P. Geographic Databases. In Introduction to Databases: From Biological to Spatio-Temporal; Springer: London, UK, 2010; pp. 81–109. [Google Scholar]
- Zaki, N.M.; Helal, I.M.A.; Awad, A.; Hassanein, E.E. Efficient Checking of Timed Order Compliance Rules over Graph-encoded Event Logs. arXiv 2022, arXiv:2206.09336. [Google Scholar]
- Rost, C.; Gómez, K.; Täschner, M.; Fritzsche, P.; Schons, L.; Christ, L.; Adameit, T.; Junghanns, M.; Rahm, E. Distributed temporal graph analytics with GRADOOP. VLDB J. 2022, 31, 375–401. [Google Scholar] [CrossRef]
Type | Exemplifying Clause () | Natural Language Specification for Traces | LTLf Semantics () |
---|---|---|---|
Simple | Init() | The trace should start with an activation | |
Exists() | Activations should occur at least n times | ||
Absence() | Activations should occur at most n times | ()〛 | |
Precedence() | Events preceding the activations should not satisfy the target | ||
(Mutual) Correlation | ChainPrecedence() | The activation is immediately preceded by the target. | |
Choice() | At least one of the two activation conditions must appear. | ||
Response() | The activation is either followed by or simultaneous to the target. | ||
ChainResponse() | The activation is immediately followed by the target. | ||
RespExistence() | The activation requires the existence of the target. | ||
ExclChoice() | Only one activation condition must happen. | ||
CoExistence() | RespExistence, and vice versa. | ||
Succession() | The target should only follow the activation. | ||
ChainSuccession() | Activation immediately follows the target, and the target immediately preceeds the activation. | ||
AltResponse() | If an activation occurs, no other activations must happen until the target occurs. | ||
AltPrecedence() | Every activation must be preceded by an target, without any other activation in between | ||
Not. | NotCoExistence() | The activation nand the target happen. | |
NotSuccession() | The activation requires that no target condition should follow. |
Referral | ||
Mastectomy | ||
FollowUp | ||
Lumpectomy | ||
(a) Interval decomposition in basic intervals . | |||||
(b) Atom generation by partitioning the data space with . | |||||
Symbol () | Definition () | Type () | Comments |
---|---|---|---|
Set Theory | |||
∅ | Set | An empty set contains no items. | |
A partially ordered set (poset) is a relational structure for which ⪯ is a partial ordering over S [40]. ⪯ over S might be represented as a lattice, referred to as the Hasse diagram. | |||
S | Given a poset , is the unique greatest element of S. | ||
Set | Complement set: given an universe , the complement returns all of the elements that do not belong to C. | ||
Generalised cross product for ordered sets K where | |||
The cardinality of a finite set indicates the number of contained items. | |||
Set | The powerset of C is the set whose elements are all of the subsets of C. | ||
XES Model & LTLf | |||
Set | Finite set of activity labels | ||
K | Set | Finite set of ordered (payload) keys, | |
V | Set | Finite set of (payload) values | |
p | Tuple (or finite function) mapping keys to values in | ||
⊥ | NULL value | ||
Event | |||
Sequence | Trace, sequence of temporarily ordered events. | ||
Set | Log, set of traces. | ||
Bijection mapping each activity label to its unique identifier. | |||
Equation (1) | Expression | An LTLf expression. | |
⊧ | denotes that is satisfied for the world/environment . | ||
xtLTLf | |||
Section 3.2 | Expression | eXTended LTLf Algebra expression. | |
Marks associated with activation/target/matching conditions. | |||
Intermediate representation returned by each xtLTLf operator | |||
Accessing the i-th record of a sequence T. | |||
Binary Predicate | Correlation condition between activated and targeted events. | ||
Binary Predicate | Inverted/Flipped correlation condition. | ||
Binary Predicate | Always-true binary predicate. | ||
Equation (S1) | Algorithm 7 | Existential matching condition for which there exists at least one event in providing a match. | |
Equation (S2) | Algorithm 9 | Universal matching condition returning a non-empty set if each event expressed in the maps provides a match. | |
Equation (S3) | Testing functor returning False iff., despite the maps containing activated and targeted events, the matching condition is empty. It returns otherwise. | ||
Pseudocode | |||
↑ | Null pointer or terminated iterator. | ||
Pointer | On non-empty, it returns the iterator pointing to the first record in | ||
Dereference | Element pointer by the pointer/iterator it. | ||
LowerBound() | Binary Search | Given a beginning b and end e iterator range within a sequential and sorted data structure by increasing order, LowerBound returns either the first location in this range pointing at a value greater or equal to or e otherwise. | |
UpperBound() | Binary Search | Given a beginning b and end e iterator range within a sequential and sorted data structure by increasing order, UpperBound returns either the first location in this range pointing to a value strictly less to or e otherwise. | |
Time Complexity | |||
Maximum trace length. | |||
ℓ | Maximum length of the third component of the intermediate representation. |
Exemplifying Clause () | xtLTLf Semantics | |
---|---|---|
Otherwise (e.g., Atomisation) | ||
Init() | ||
ChainPrecedence() where | ||
Choice() | ||
where | ||
ChainResponse() where | ||
where | ||
ExclChoice() | ||
CoExistence() where | s.t. and | |
Succession() where | s.t. | |
ChainSuccession() where | ||
AltResponse() where | ||
AltPrecedence() where | ||
NotCoExistence() where | ||
NotSuccession() |
(a) Metric calculations per trace. | ||
Trace | MAX-SAT | in Conjunctive Query |
true | ||
false | ||
false | ||
(b) Metric calculations per clause. | ||
Clause | Support | Confidence |
Ⓐ | ||
Ⓑ | ||
Ⓒ |
Competitor | Dataset | Traces | Events | Distinct Activities |
---|---|---|---|---|
SQL Miner | BPIC 2011 (original) | 1143 | 150,291 | 624 |
BPIC 2011 (10) | 10 | 2613 | 158 | |
BPIC 2011 (100) | 100 | 12,195 | 276 | |
BPIC 2011 (1000) | 1000 | 133,935 | 607 | |
Declare Analyzer | BPIC 2012 (original) | 13,087 | 262,200 | 24 |
Operator | LTLf Rewriting | Optimised |
---|---|---|
Choice | ||
TimedAndFuture | ||
TimedAndGlobally |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bergami, G.; Appleby, S.; Morgan, G. Quickening Data-Aware Conformance Checking through Temporal Algebras. Information 2023, 14, 173. https://doi.org/10.3390/info14030173
Bergami G, Appleby S, Morgan G. Quickening Data-Aware Conformance Checking through Temporal Algebras. Information. 2023; 14(3):173. https://doi.org/10.3390/info14030173
Chicago/Turabian StyleBergami, Giacomo, Samuel Appleby, and Graham Morgan. 2023. "Quickening Data-Aware Conformance Checking through Temporal Algebras" Information 14, no. 3: 173. https://doi.org/10.3390/info14030173
APA StyleBergami, G., Appleby, S., & Morgan, G. (2023). Quickening Data-Aware Conformance Checking through Temporal Algebras. Information, 14(3), 173. https://doi.org/10.3390/info14030173