Abstract
Maximizing the use of digitally captured data is a key requirement for many of the late adopters of digital infrastructure. One of the newcomers is the chemical industry in the area of digitized laboratories. Here, tools and services that satisfy individual needs still need to be developed and distributed within the community. This work explores the potential of using graph databases — specifically those modeled via ontological knowledge graphs — to describe complex data linkages and draw logical conclusions. While knowledge graphs are not widely utilized in catalysis research, this study introduces a methodology to highlight their usability for semantic description and integration into diverse value chains with contact to the domain of (bio)chemistry and catalysis.
A demonstration is performed how ontologies and their knowledge graphs can be applied to perform essential functions of semantic annotation to chemical reactions, which are difficult to model relational. Traditional data description methods can be neglected using description logic, showing how logical inferences at the machine level can enrich data. This work also illustrates the seamless integration of this enhanced data into process simulations, connecting semantic description with practical applications. The immediate benefits for catalysis research are emphasized and the development of new tools and services envisioned. By clarifying how these graphs can be integrated into existing workflows, researchers are empowered to make the most of digitally acquired data in catalytic processes. This practical methodology lays the foundation for improved decision-making and innovation, fostering advancements in the field of catalysis research.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
As the field of catalysis and chemical engineering is quite diverse and research data are hardly ever disseminated in the catalysis disciplines, there are many drivers for increased interest in ontologies and respective higher data quality. For example, the need to create interfaces for generative AI for faster catalyst design, or interconnectivity of applications from laboratory experiments to process simulations. [1] As digitalization continues to advance, there is a constant need for methods that simplify and accelerate data processing by removing hurdles. A prominent approach in this domain is the FAIR principles [2], advocating for enhanced data processing through metadata utilization and standardized data structures. FAIR principles prioritize findability, accessibility, interoperability, and reusability, aiming to standardize data descriptions via metadata, thus facilitating improved data set reusability and critical data selection.
Moreover, discussions often revolve around leveraging ontologies and knowledge graphs alongside FAIR principles to enrich metadata value through structural organization. Ontologies serve as decentralized standards, simplifying data handling by employing descriptive logic-based world modeling, thereby avoiding excessive content intricacies. This work explores ontologies, for modelling reactions and catalysis, as a primary data description option, with further exploration of their benefits to follow.
Datasets described using ontologies and accessible in knowledge graphs are inherently interoperable and reusable at the processing level, benefiting from a uniform domain model. The extensive toolkit surrounding ontologies and knowledge graphs allows for data quality and integrity checks, including Shape Constraint Language (SHACL) shape validation [3], SPARQL queries [4], and consistency checks via inference engines [5].
Ontologies and knowledge graphs are increasingly integrated into the realm of research data management, offering significant advantages. Their knowledge modeling, akin to object-oriented approaches, simplifies both data processing and comprehension. Notably, tools like the Python package owlready2 [6] facilitate modeling ontologies and their semantic artifacts as objects within programming languages, enabling streamlined processing.
In computer science, particularly within the semantic web, ontologies serve to describe the real world in a machine-interpretable manner [7]. They conceptualize the world as a series of delineated mathematical spaces, governed by description logic and interconnected by rules. While maintaining semantic clarity, ontologies ensure human-readable interpretations through definitions, descriptions, and references.
However, the widespread adoption of ontologies faces challenges, primarily concerning their availability across different domains and the varying degrees of semantic richness within each application area. Discrepancies in ontology expressivity arise not solely from their developmental stage or domain specificity but also from factors such as intended application and the desired benefits of description logic [8]. For instance, applications range from simple thesauri employing Ontology Web Language (OWL) syntax [9] to complex ontologies incorporating additional description logic. Furthermore, while ensuring logical consistency is desired, the question of whether this validation should occur solely at the ontology level or also extend to the knowledge graph level remains unanswered.
This work not only explores the use of ontologies and corresponding metadata for describing data records but also delves into the direct utilization of OWL syntax for inference purposes.
Thus, it is appropriate to provide an introductory overview of description logic. Description logics exist in various forms [10], varying in expressivity and inferencing options, ranging from simple markup languages like RuleML [11] to more sophisticated ones like SWRL [12] or SPIN [13]. Among the most significant ones for the semantic network are OWL [9] and its associated SHACL. OWL and SHACL differ in their “world view,” based on either an open-world assumption or a closed-world assumption [14, 15], respectively. The former assumes that further knowledge, data, and relations can be introduced, whereas the latter assumes that all knowledge, and hence all data, is available. These assumptions are processed using different inference engines, taking into account not only the performance of description logic inference but also the type of description logic and the acceptance of mixed description logic [16].
Semantic artifacts, ranging from XML[17] to RDF[18] and OWL syntax, are interconnected and can be combined with related artifacts. For instance, axioms in an ontology can be written using SWRL rules, necessitating the use of an SWRL-compatible reasoner for inference [16]. It’s essential to note that ontologies inferred with different reasoners may not be compatible, especially when one reasoner lacks features present in another. To the best of the author’s knowledge, there are no ontologies present that model the process of chemical reactions with regards to reactants, products and catalytic components, in description logic, as also addressed in previous work [19]. As the application domain of the ontology presented in this work revolves around the modeling of chemical reaction networks in the context of catalysis research with ontologies, the following section gives a brief introduction to the topic.
2 Methods
2.1 Modeling Reaction Networks with Ontologies
Before delving into the description of a catalyzed reaction and its ontology-based data, it’s crucial to clarify a few fundamental aspects that serve as constraints for the modeling process, which will be recurrently addressed directly or indirectly in subsequent discussions.
In the context of the ontology presented here, efforts were made to establish robust connections with existing ontologies to leverage and interconnect with pre-existing knowledge. Simultaneously, the aim was to maintain the ontology’s adherence to factual information. This approach is essential not only for aligning with most top-level ontologies, such as the BFO [20], chosen for its extensive repository of reusable ontologies, but also for ensuring logical consistency.
This requires that all information contained within the ontology is at least based on empirical evidence. However, given the intangible nature of many concepts in catalysis research, certain compromises must be made. For instance, defining a reaction can be approached either as a single molecular interaction or as the aggregate of numerous such interactions. Thus, it’s crucial to address whether modeling reactions in ontologies is necessary and what benefits such modeling should entail.
Presently, several databases catalog reaction conditions, catalysts, and related information, albeit with limited automation. Researchers and digital agents typically obtain necessary information through targeted search queries, requiring manual or semi-automated evaluation by the researcher or programmer.
To address this challenge, the ontology aims to facilitate answering various competency questions more easily, such as “Which side reactions can I expect in a mixture consisting of my specified components?”, “Which of my materials have a catalytic effect for a reaction in my system?”, or “Which reactants cause which side reaction?”. Table 1 provides a list of the competency questions, allowing for both simplified classification of primary and secondary reactions and enhanced automated evaluations.
Having outlined the competency questions the ontology seeks to address, the concept of a reaction within the ontology is focused. Given that an ontology delineates specific, individually defined mathematical spaces, it becomes necessary to define reactions both conceptually and mathematically/logically. The general understanding of an observed reaction — a mixture of reactants reacting over time under defined conditions to produce a product — is therefore used to define a reaction.
Furthermore, establishing the boundary for what constitutes a chemical reaction is crucial. To accommodate all reactions, including those involving often very low concentrations in biocatalysis, users are entrusted with the task of reaction selection. To facilitate this, the class hierarchy of the designed ontology is specifically structured in such a way that all information is modeled at the “data level” of the ontology using individuals.
With the framework of functions and objectives in place, a review of the ontology used is warranted before proceeding further. The RXNO [21] and its successor the MOP[22] serve as the foundation for describing chemical processes, while the ChEBI [23] ontology forms the basis for characterizing chemical substances.
All three mentioned ontologies belong to the OBO Foundry [24], utilizing the BFO as the top-level ontology and pursuit the guidelines of the OBO Foundry, for ontology development. Although the ontology, which will be presented here, incorporates semantic artifacts from the OBO community, such as relations from RO [25], it does not claim conformity with the OBO community guidelines [26]. This divergence primarily stems from the intention to integrate the concept into ontologies that may not strictly adhere to OBO standards or employ a different top-level ontology, such as the EMMO [27]. However, ChEBI, RXNO, and MOP are particularly valuable, as they already encompass a wide array of substance and reaction classifications. Consequently, there is no need to recreate these classifications independently when reusing them.
To elucidate the ontology’s workings further, an example based on a Haber-Bosch reaction is employed, illustrating how individual components within the ontology operate. The Haber-Bosch reaction is defined as the reduction of nitrogen and hydrogen to ammonia, catalyzed most often by nickel and iron.
Although this example provides only a simplified illustration, it demonstrates the ontology’s capability for more intricate modeling of reactions beyond those of explicitly named reactions. Notably, the ontology facilitates the representation of not only explicitly named reactions but also unnamed reactions and entire reaction groups.
For a description of a reaction within an ontology, it’s essential to note the relatively open definition of a reaction as “a process in which a mixture of reactants reacts partially over a certain period under defined conditions to form products”. This definition inherently implies a unidirectional process, given the constraints of OWL syntax, which primarily supports unary operations. Given the desire to assign various reaction types to an experiment, this must be modeled accordingly. To automate this process, “reaction roles” have been devised, which can be assigned to an experiment. For instance, if “Reaction_1” satisfies all conditions indicative of a Haber-Bosch reaction, it should be assigned the “HaberBoschReactionRole”. This role also encompasses specific information, which will be elaborated upon later.
As direct association of reactants to products in a given reaction experiment is often not possible, initially, a reaction experiment is modeled to indicate only the mixture subjected to the reaction, termed as the “EductMixture”, and the resulting “ProductSet” that could be measured. Given the limitations of the OWL syntax, which was not designed for complex calculations [28], critical component quantities are not considered and instead all measured and thereby modeled components as considered inside the logic. The terms “EductMixture” and “ProductSet” refer to individuals that can be categorized under the “ProcessMixture” class and are just related differently to the individual representing a reaction experiment. This modeling necessitates assigning individual components not directly to a reaction but to a mixture instead. This approach, representing a mixture as an N‑ary structure [29], prevents incorrect substance assignments in a reaction and facilitates the reuse of “EductMixture” and “ProductSet” as a “ProcessMixture” for measurement series, among other applications. Each substance individual can then be assigned to the respective classes as single individuals. For instance, “EductMixture_1,” representing the reactant mixture of “Reaction_1,” would include substances like “H2” and “N2”, denoted as individuals of the corresponding classes in ChEBI.
Fig. 1, illustrates parts of the class hierarchy, known as the Terminology Box (TBox), showcasing the classes and individuals utilized in the example. The upper section illustrates the reactant and product aspects of the reaction, while the lower left section relates to additional materials influencing the reaction. The lower right segment exhibits the reference to the “HaberBoschReactionRoleIndividual” along with additional relations indicating which components can catalyze a Haber-Bosch reaction.
To keep the illustration as clear as possible, more complex class axioms and rules are only shown as a dotted line, direct hierarchical assignments are shown with a labeled arrow and nested hierarchical assignments whose complete depth should not be shown are represented with an arrow with a double head. Relations that have an exact object property have the relation written directly on the arrows. Finally, the dashed arrow represents the relation “hasReactionRole” which is to be automatically inferred.
Automatically inferring that a reaction experiment embodies a specific role, thus representing a distinct reaction, can be accomplished through the utilization of a logic approach rarely employed in ontologies, known as left-hand-side logic. In this framework, rather than assigning a complex relation to a predefined class or individual, an object satisfying a complex axiom is designated with a class, individual, or straightforward relation. Because of its formulation, this technique is typically less prevalent in serialization formats like Turtle syntax (TTL) [30] or ontology editing tools such as Protégé [31], where it is referred to as General Class Axiom (GCA). Within this ontology, left-hand side logic is utilized to directly deduce [32], via a reasoner, that a reaction possessing all requisite reactants and products for a given role is indeed assigned that particular role. The intention of this is to avoid naming the precise concept, as the number of named reactions in context of catalysis as well as the number of substances that can be used to model a named reaction are quite high.
However, since the reactants and products are initially only linked to the actual reaction experiment via the “EductMixture” and “ProductSet” individuals, the “hasEductComponent” or “hasProductComponent” subproperty chain is used to link them if the concatenation “hasEductComponentMixture” followed by “hasComponent” or, respectively, “hasProductComponentSet” followed by “hasComponent” is possible. The left-hand-side logic can now check for all individuals (this should only be applicable for reaction experiments) whether they have all educts and products to be assigned to a respective reaction. In the Haber-Bosch reaction, for example, this would be “N2” and “H2” as reactants and “NH4” as product. This means that a reaction experiment that fulfills this left-hand side should have the relation “hasReactionRole” assigned to a “HaberBoschReactionRoleIndividual”. In Protégé, this GCA can then be written as:
(hasEductComponent some dinitrogen) and (hasEductComponent some dihydrogen) and (hasProductComponent some ammonia) SubClassOf hasReactionRole some ({HaberBoschReacRole_Ind})
Unfortunately, the inferencing cannot be written in a more generalized way here, as the open world assumption of the OWL syntax would not generate a unique inferencing. The curly brackets in the GCA indicate that this is not a reference to a class but to the individual “HaberBoschReacRole_Ind”.
This poses the challenge that all GCAs that are to show a similar structure must be brought into the ontology either manually or automatically. How this is realized will be discussed later in the context of automation.
Since catalyzing materials, as in the example of a Haber-Bosch reaction, are not always found in the reactants or products, it is also important to model catalyst and reactor materials as influencing materials. The example shown in Fig. 1 does not exhaust all possibilities for modeling catalyzing effects. Nevertheless, iron, as the material of the reactor wall, and nickel as the material in a catalyst sample are listed here. The individual “HaberBoschReactionRoleIndividual” already defines that nickel and iron can have a catalyzing effect on a Haber-Bosch reaction. To be able to classify a reaction as a catalyzed variant of itself, both the relation “isCatalizedBy” of the reaction role and a relation based on the reaction experiment itself must refer to one and the same substance individual. Similar to the “hasEductComponent” and “hasProductComponet” relations, the “hasCatalystSampleComponent” and “hasReactionVesselComponent” relations are set up for this purpose. All relations are integrated as sub-relations of the “hasReactionComponent” relation.
The relationships described are illustrated in Fig. 2, offering a simplified overview of the entities involved. Blue arrows represent the “hasReactionComponent” relations discussed earlier, while green arrows indicate additional relations inferred within the reaction experiment using GCAs. Determining which component functions as a catalyst can be deduced through an additional GCA. This GCA can be interpreted as follows: “If a reaction experiment includes a component that acts as both an effective reaction component and is linked to the experiment via the isCatalyzedBy followed by the hasReactionRole relation, then this component functions as an active catalyst.” In Protégé, for example, this relation might be expressed as:
(hasReactionComponent some ({Sub_Fe})) and (isPotentiallyCatalyzedBy some ({Sub_Fe})) SubClassOf hasCatalyst some ({Sub_Fe})
It’s important to note that explicit reference is made to an individual rather than using a generalization in the form of a class, as the OWL syntax prohibits passing variables from the left-hand side to the right-hand side. Therefore, only an explicit statement can establish this relation.
However, the “hasCatalyzed” relation enables the inference of each reaction experiment from its non-catalyzed counterpart to its catalyzed subclass. Therefore an additional GCA is used which can be written in Protégé as:
'Haber Bosch reaction' and (hasCatalyst some (('material entity' or 'chemical entity') and ( inverse (hasReactionComponent) some 'Haber Bosch reaction'))) SubClassOf 'catalysed Haber Bosch reaction'
This GCA checks whether a reaction experiment has a component individual connected via the “hasCatalyst” relation and if this component is also connected back to the reaction experiment via the “hasReactionComponent” relation.
By using the above-mentioned relationships, axioms, classes, and individuals, an ontology is built that is able to model the knowledge structures behind the aforementioned competency questions and answer them using, for example, simple SPARQL queries. To further showcase the use of this, a real-world example of a knowledge graph is combined with this approach.
2.2 Reaction in a Knowledge Graph for Process Simulation and Laboratory Data
As reaction networks can be modeled quite universally, they can also be used to model biocatalytic reactions. In the process industry, (bio)chemical processes involve the controlled manipulation and conversion of substances so that the reactions can be transferred from the laboratory to a larger scale and thus brought into widespread use. The reactor is a central part of these processes, typically requiring precise control of parameters like concentrations, or pressure to optimize the yield, purity, and efficiency of the process.
Developing new bioprocesses that integrate these biocatalytic reactions into industrial production processes is a complex task. To aid in this task, process simulators can be used to accelerate the development phase of such industrial processes, saving both time and costs. The open-source process simulator DWSIM [33] facilitates the desired computation of process streams and with it enables the user to model experiments before their execution in the laboratory. However, process simulation requires input from real-world experiments to calculate realistic results. In these real-world experiments, structured data uptake ensures FAIR research data integration. For biocatalytic experiments, the XML-based data exchange format EnzymeML[34] utilizes ontology classes from the Systems Biology Ontology (SBO)[35].
Thus, laboratory data was integrated into process simulation in previous work [36],utilizing both EnzymeML and DWSIM. Here, data from laboratory experiments regarding the process design of a biocatalytic redox reaction with Laccase was taken up in a flow reactor. Part of this data was recorded in spreadsheets based on EnzymeML[34], thus complying to the SBO. The pyEnzyme module of the EnzymeML-framework allows for direct import of the recorded data into Python objects[37], thus allowing for direct import of the SBO-related data contained in the spreadsheets. As not all concepts necessary to the description of flow (bio-)chemistry are described by the aforementioned spreadsheets, another one was set up for ease of data recording. The second spreadsheet mapped the data according to its object properties and several ontologies, like the metadata4ing [38] and the OBO Relation Ontology [25]. With this, the process of mixing two liquids and consecutive biocatalytic reactions in a flow reactor is described sufficiently by the concepts presented in this work and in [36].Hence, a partially automated workflow for integrating laboratory data into a process simulation using standardized ontological concepts is facilitated. Enzyme-catalyzed reaction data and data of process simulation results is parsed into an ontology-based knowledge graph. Fig. 3 shows an excerpt of the resulting knowledge graph revolving around the reaction and process streams of the process simulation.
The class “Biochemical Reaction” is assigned as a general class of the specific reaction taking place in the reactor. As this reaction is a “Redox reaction with Laccase”, the approach presented in the previous section is applied to the knowledge graph presented here, to assert a more detailed classification of the reaction. To achieve this, the Reac4Cat ontology is imported manually into the knowledge graph using the ontology editor Protégé. The necessary GCAs are then implemented automatically via a Python code, to help future automation of this method. Finally, to accelerate reasoning, the knowledge graph is stripped from unused classes that are included by ontology imports using the OBO ROBOT TOOL [39]. With this automated workflow, the semantic implementations of the modelling of reaction networks with ontologies are coupled with a real-world knowledge graph on laboratory and process simulation data.
3 Results
The Reac4Cat ontology presented in this work, the knowledge graph of the laboratory and simulation data, and the code to automatically implement the necessary GCAs with Python are found in the GitHub repository at https://github.com/AleSteB/Reac4Cat.
3.1 Inferring Knowledge of Reaction Networks
In the context of ontologies, it is of course interesting to generate a representation of the world that is as complete and error-free as possible, but this usually comes with high costs in aspects such as required computing capacities, extensibility of the model and intuitive understanding. Therefore, one usually limits oneself to a simplified representation of the world and prefers to check whether it is logically error-free and simple to implement. For this reason, let’s take another look at the ontology with a somewhat larger data set. In the ontology provided with examples [40], two reactions can be found in catalyzed and non-catalyzed form, as well as permutations of reactant mixture compositions and different product mixtures. This ontology (with 844 axioms) needs 718 milliseconds with the reasoner HermiT [41] to perform all inferences. To test whether any inferencing problems occur, several permutations were set up from the examples. Thus, Reaction_1 should represent a Haber-Bosch reaction, Reaction_2 both a catalyzed Haber-Bosch reaction and a catalyzed methanation reaction, Reaction_3 and Reaction_4 a regular methanation reaction, and Reaction_5 again a catalyzed methanation reaction. Similar permutations were also carried out in the reactants and products, for example, to test whether a catalyst can also be present in the product or reactant. An example excerpt from these consistency tests is found in Fig. 4, which shows an excerpt from the Protégé software. The axioms highlighted in white and written in bold are asserted relations, while the axioms highlighted in yellow and written in thin type are axioms independently inferred by the ontology. Fig. 5 shows the knowledge graph created in the reasoning step with the most relevant relations. In order not to overfill the representation with relations, some were intentionally hidden as can be seen in the legend on the right. As can be seen from the relation “hasCatalyst” between the individual “Reaction_2” and the individuals of the substances iron and nickel, this was correctly inferred and thus, as can be seen in Fig. 4, the “Reaction_2” was also correctly classified as catalyzed methanation and catalyzed Haber-Bosch reaction.
3.2 Application of Reac4Cat on Process Simulation-related Knowledge Graphs
To show the benefit of the Reac4Cat ontology, it is implemented on the knowledge graph containing laboratory and process simulation data of Laccase-catalyzed red-oxidation. With this, the knowledge graph is refined and the reasoning of the ontology leads to new inferred axioms, thus classifying the individuals of reactions accordingly.
Besides the conceptual work done to implement the semantics as presented, Python code was generated to automate the creation of the necessary GCAs to ease the process of the ontology creation.
Focusing on the classification of the reaction individuals, Fig. 6 shows an excerpt of the inferred knowledge graph with HermiT, which took 360.7 seconds to infer. Here, the relation “hasReactionRole” is assigned to the reaction individual, pointing to the correct reaction role of a redox reaction with Laccase. This helps to automatically classify data in a knowledge graph with regards to specific reactions that took place in a reactor.
3.3 Limitations of the Current Reaction Model
After outlining the structure of the ontology, it is crucial to briefly address its known limitations. Most of these limitations stem from the inherent constraints of OWL syntax, primarily its predominantly unary Description Logic nature and the restricted capacity to handle complex mathematical expressions. Reactions and catalysts, for instance, are conceptual entities that manifest only under specific environmental conditions with can only suitably be described with math.
While there exist methods to model such environmental conditions, they often entail a significant increase in the number of axioms, leading to heightened computational demands [42]. Given that the modeling of reactions within this ontology is already axiom-intensive, incorporating additional conditions could potentially outweigh the benefits. Additionally, these mathematical expressions [28] would currently require the addition of other logic syntaxes such as SWRL or more sophisticated reasoning engines, and would thereby interfere with some reasoning engines[16]. While reasoners exist that can infer this compounded description logic, ways of how to model mathematical relations, which are quite important for limiting reactions and catalysts to certain reaction conditions, are not listed here.
Furthermore, it is essential to reassess the fundamental aspects of modeling. Since inference engines operating with OWL syntax can only identify logical loops to a limited extent, explicitly setting up ring closures is not advisable. This limitation poses a challenge, especially in the context of complex reactions, where a reaction system may consist of multiple sub-reactions exhibiting cyclic behavior. The issue is exemplified by simple equilibrium reactions, which inherently entail ring closures. Consequently, modeling subsequent reactions becomes problematic, as they can also result in ring closures. Many useful modeling options thereby cannot be effectively represented.
Lastly, the foundational principle that the ontology should only store factual information will be re-discussed. It becomes apparent upon close examination that certain aspects, such as intermediate reactions, elude effective modeling, implying inherent limitations to the factual data that can be represented. Consequently, users intending to construct a knowledge graph using this ontology bear the responsibility to include information adequate for their current level of detail.
4 Summary and Outlook
Different goals can be achieved with the ontology and the code provided for it. On the one hand, a knowledge graph can be created for reaction and catalysis research that helps researchers answer their questions quickly and easily, similar to the competence questions described. However, the ontology can also be used in automation processes or the semantic network for the interaction of and with digital agents.
The use of the ontology and its application of left-hand-side logic show great potential to simplify automation but also to enable digital process intensification through extensions to adjacent domains. Domains into which these semantic structures can be introduced are, for example, experiment planning or the modeling of chemical processes. Extensions could include, for example, a chemical unit operation ontology that automatically suggests separation processes and the associated media and process conditions.
As shown by implementation of Python codes, the left-hand-side logic and its GCA can be introduced automatically, which could find application in future use of the semantics. However, since large knowledge graphs can currently still be very computationally intensive, it makes no sense to set up a single knowledge graph for the whole area of reaction and catalysis. Consequently, it may not be advisable to establish a singular knowledge graph encompassing the entire realm of reaction and catalysis. Instead, by tailoring specialized knowledge graphs to specific domains within catalysis, such as biocatalysis, computational resources can be optimized. For instance, a biocatalysis-focused knowledge graph may choose not to include modeling of heterogeneous catalysts to conserve computing capacity. Some of the explicit GCA could be introduced into the ontology via a combination of semantic artifacts, for example by creating merge-able graphs via SPARQL queries or by inference via SHACL rules.
In future, this approach could be used in connected databases such as DataVerses to enhance the value of the stored data by automated ontology-based classification. Furthermore, querying of the resulting knowledge graphs would be enhanced, as the GCAs implement relations in a structured way. To elevate the use of the presented approach even more, the implementation of LinkML [43] to the data uptake should take place. This directly maps the data to ontology concepts, streamlining the overall data workflow even more. Finally, this would enable for automated classification of metadata in the realm of catalysis and reaction engineering.
Code availability
All code and data addressed in this work is available on a GitHub repository. [44]
References
Wulf C, Beller M, Boenisch T, Deutschmann O, Hanf S, Kockmann N, Kraehnert R, Oezaslan M, Palkovits S, Schimmler S, Schunk SA, Wagemann K, Linke D (2021) A unified research data infrastructure for catalysis research – challenges and concepts. ChemCatChem 13(14):3223–3236. https://doi.org/10.1002/cctc.202001974
Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, Da Santos Silva LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR guiding principles for scientific data management and stewardship. SciData 3:160018. https://doi.org/10.1038/sdata.2016.18
Kontokostas D, Knublauch H (2017) Shapes constraint language (SHACL). W3C recommendation, W3C. https://www.w3.org/TR/2017/REC-shacl-20170720/. Accessed 15 Feb 2024
(2013) SPARQL 1.1 overview. W3C recommendation, W3C. https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/. Accessed 15 Feb 2024
Horrocks I, Patel-Schneider PF, McGuinness DL, Welty CA (2010) Owl: a description-logic-based ontology language for the semantic web. In: Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF (eds) The Description Logic Handbook. Cambridge University Press, Cambridge, pp 458–486 https://doi.org/10.1017/CBO9780511711787.016
Lamy J-B (2017) Owlready: ontology-oriented programming in python with automatic classification and high level constructs for biomedical ontologies. Artif Intell Med 80:11–28. https://doi.org/10.1016/j.artmed.2017.07.002
Hitzler P, Patel-Schneider P, Krötzsch M, Rudolph S, Parsia B (2012) OWL 2 web ontology language primer (second edition). W3C recommendation, W3C. https://www.w3.org/TR/2012/REC-owl2-primer-20121211/. Accessed 15 Feb 2024
Deborah LJ, Karthika R, Audithan S, Bala BK (2015) Enhanced expressivity using deontic logic and reuse measure of ontologies. Procedia Comput Sci 54:318–326. https://doi.org/10.1016/j.procs.2015.06.037
Krötzsch M, Patel-Schneider P, Hitzler P, Parsia B, Rudolph S (2012) OWL 2 web ontology language primer (second edition). W3C recommendation, W3C. https://www.w3.org/TR/2012/REC-owl2-primer-20121211/. Accessed 15 Feb 2024
Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF (eds) (2010) The description logic handbook. Cambridge University Press, Cambridge https://doi.org/10.1017/CBO9780511711787
Hutchison D, Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Pandu Rangan C, Steffen B, Sudan M, Terzopoulos D, Tygar D, Vardi MY, Weikum G, Antoniou G, Boley H (2004) Rules and rule markup languages for the semantic web vol 3323. Springer Berlin Heidelberg, Berlin, Heidelberg https://doi.org/10.1007/b102922
Horrocks I, Boley H, Tabet S, Grosof B, Dean M, Patel-Schneider PF (2004) SWRL A semantic web rule language combining owl and ruleml. W3C member submission, W3C. https://www.w3.org/submissions/SWRL/ (Created 05.2004). Accessed 15 Feb 2024
Knublauch H SPIN: overview and motivation. https://www.w3.org/submissions/spin-overview/. Accessed 15 Feb 2024
Bogaerts B, Jakubowski M, van den Bussche J SHACL: a description logic in disguise. http://arxiv.org/pdf/2108.06096.pdf. Accessed 15 Feb 2024
Knublauch H SHACL and OWL compared. https://spinrdf.org/shacl-and-owl.html. Accessed 15 Feb 2024
Dentler K, Cornet R, ten Teije A, de Keizer N (2011) Comparison of reasoners for large ontologies in the owl 2 el profile. SW 2(2):71–87. https://doi.org/10.3233/SW-2011-0034
Sperberg-McQueen M, Yergeau F, Maler E, Paoli J, Bray T (2008) Extensible markup language (XML) 1.0 (fifth edition). W3C recommendation, W3C. https://www.w3.org/TR/2008/REC-xml-20081126/. Accessed 15 Feb 2024
Raimond Y, Schreiber G (2014) RDF 1.1 primer. W3C note, W3C. https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/. Accessed 15 Feb 2024
Behr AS, Borgelt H, Kockmann N (2024) Ontologies4cat: investigating the landscape of ontologies for catalysis research data management. J Cheminform. https://doi.org/10.1186/s13321-024-00807-2
Arp R, Smith B, Spear AD (2015) Building ontologies with basic formal ontology. Massachusetts Institute of Technology, Cambridge, Massachusetts
Batchelor C (2012) Chemical Reactions Ontology (RXNO). https://github.com/rsc-ontologies/rxno. Accessed 15 Feb 2024
Batchelor C (2012) Molecular Process Ontology (MOP). https://github.com/rsc-ontologies/rxno. Accessed 15 Feb 2024
ChEBI (2016) Improved services and an expanding collection of metabolites. Nucleic Acids Res 44(D1):1214–1219. https://doi.org/10.1093/nar/gkv1031 (2016)
OBO Foundary http://obofoundry.org/. Accessed 15 Feb 2024
Mungall C, Matentzoglu N, Balhoff J, Osumi-Sutherland D, Duncan B, Tan pgaudet S, Hoyt CT, Pilgrim C, Overton JA, Caron Lauren A, Harris N, Moxon S, lschriml Vasilevsky N, Toro S, Goutte-Gattat D, Brush M, Touré V, Bretaudeau A, Cain S, Haendel M, Zhang diatomsRcool B, Dowland C, Dooley D, actions-user Hammock J (2023) oborel/obo-relations: 2023-08-18 Release. Zenodo. https://doi.org/10.5281/zenodo.8263469
Jackson R, Matentzoglu N, Overton JA, Vita R, Balhoff JP, Buttigieg PL, Carbon S, Courtot M, Diehl AD, Dooley DM, Duncan WD, Harris NL, Haendel MA, Lewis SE, Natale DA, Osumi-Sutherland D, Ruttenberg A, Schriml LM, Smith B, Stoeckert CJ, Vasilevsky NA, Walls RL, Zheng J, Mungall CJ, Peters B (2021) OBO foundry in 2021: operationalizing open data principles to evaluate ontologies. Database. https://doi.org/10.1093/database/baab069
Hashibon A, Ghedini E, Schmitz G, Goldbeck G, Friis J (2022) Elemental Multiperspective material ontology. EMMC ASBL. http://emmo.info/emmo
Sattler U, Parsia B (2012) OWL 2 web ontology language data range extension: linear equations. W3C Note W3C (https://www.w3.org/TR/2012/NOTE-owl2-dr-linear-20121211)
Hammar K (2014) Ontology design patterns: improving findability and composition. In: Blomqvist E, Troncy R, Papadakis I, Tordai A, Presutti V, Sack H (eds) The Semantic Web: ESWC 2014 Satellite Events. Lecture Notes in Computer Science, vol 8798. Springer, Cham, pp 3–13 https://doi.org/10.1007/978-3-319-11955-7_1
Kellogg G, Tomaszuk D (2023) RDF 1.2 turtle. W3C working draft, W3C. https://www.w3.org/TR/2023/WD-rdf12-turtle-20231104/. Accessed 15 Feb 2024
Musen MA (2015) The Protégé project: a look back and a look forward. AI Matters 1(4):4–12. https://doi.org/10.1145/2757001.2757003
Sattler U, Stevens R Being complex on the left-hand-side: general concept inclusions. https://ontogenesis.knowledgeblog.org/1288/. Accessed 15 Feb 2024
Medeiros D (2021) DWSIM – open source process simulator. https://dwsim.org/ (Version 6.5.3). Accessed 15 Feb 2024
Pleiss J (2021) Standardized data, scalable documentation, sustainable storage – enzymeml as a basis for fair data management in biocatalysis. ChemCatChem 13(18):3909–3913. https://doi.org/10.1002/cctc.202100822
Juty N, Le Novère N (2013) Systems biology ontology. In: Dubitzky W, Wolkenhauer O, Cho K-H, Yokota H (eds) Encyclopedia of Systems Biology. Springer, New York, p 2063 https://doi.org/10.1007/978-1-4419-9863-7
Behr AS, Surkamp J, Abbaspour E, Häußler M, Lütz S, Pleiss J, Kockmann N, Rosenthal K (2024) Fluent integration of laboratory data into biocatalytic process simulation using EnzymeML, DWSIM, and ontologies. Processes 12(3):597. https://doi.org/10.3390/pr12030597
Range J, Bergmann F, Rohwer J, benjaminhadzovic, Swainston N, AnnaReisch, Dienhart H, Pleiss J, Max Häußler SL (2023) EnzymeML/PyEnzyme: v1.1.5. Zenodo. https://doi.org/10.5281/zenodo.10156616
Arndt S, Farnbacher B, Fuhrmans M, Hachinger S, Hickmann J, Hoppe N, Horsch MT, Iglezakis D, Karmacharya A, Lanza G, Leimer S, Munke J, Terzijska D, Theissen-Lipp J, Wiljes C, Windeck J (2023) Metadata4Ing: an ontology for describing the generation of research data within a scientific activity. Zenodo. https://doi.org/10.5281/zenodo.5957103
Jackson RC, Balhoff JP, Douglass E, Harris NL, Mungall CJ, Overton JA (2019) ROBOT: a tool for automating ontology workflows. BMC Bioinform 20(1):407. https://doi.org/10.1186/s12859-019-3002-3
Behr AS, Borgelt H Reac4Cat ontology with examples. https://github.com/AleSteB/Reac4Cat/blob/main/reac4cat_with_examples.owl. Accessed 15 Feb 2024
Glimm B, Horrocks I, Motik B, Stoilos G, Wang Z (2014) HermiT: an OWL 2 reasoner. J Autom Reason 53(3):245–269. https://doi.org/10.1007/s10817-014-9305-1
Parsia B, Matentzoglu N, Gonçalves RS, Glimm B, Steigmiller A (2017) The owl reasoner evaluation (ore) 2015 competition report. J Autom Reason 59(4):455–482. https://doi.org/10.1007/s10817-017-9406-8
Moxon S, Unni D, Vaidya G, Hegde H, Patil S, Schafer K, Kalita P, Harris N, Putman T, Solbrig H, Haendel M, Mungall C, Link M https://github.com/linkml/linkml. Accessed 15 Feb 2024
Behr AS, Borgelt H Reac4Cat Ontology Repository. https://github.com/AleSteB/Reac4Cat. Accessed 15 Feb 2024
Acknowledgements
A.S.B. thanks the networking program ‘Sustainable Chemical Synthesis 2.0’ (SusChemSys 2.0) for the support and fruitful discussions across disciplines.
Funding
The research was funded via the Deutsche Forschungsgemeinschaft (DFG) as part of the Nationale Forschungsdateninfrastruktur (NFDI) initiative (grant No.: NFDI/2-1-2021) for catalysis Research (NFDI4Cat).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
A.S.B.: Conceptualization, Data Curation, Methodology, Software, Validation, Investigation, Writing – Original Draft, Writing – Review & Editing, Visualization H.B.: Conceptualization, Methodology, Software, Validation, Investigation, Writing – Original Draft, Writing – Review & Editing, Visualization N.K.: Conceptualization, Funding acquisition, Supervision, Writing – Review & Editing All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Alexander S. Behr and Hendrik Borgelt contributed equally to this work.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Behr, A.S., Borgelt, H. & Kockmann, N. Reac4Cat-Ontology: Harnessing the Power of Ontological Description Logic in Catalysis Research as a Practical Approach to Knowledge Inferences. Datenbank Spektrum 24, 139–150 (2024). https://doi.org/10.1007/s13222-024-00476-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-024-00476-3