Nothing Special   »   [go: up one dir, main page]

Link Search Menu Expand Document
Start for Free

Reasoning & Inference

This chapter discusses what Stardog’s reasoning capabilities are and how to use them. This page provides an overview of the reasoning capabilities.

Page Contents
  1. Overview
  2. What is reasoning?
  3. How does reasoning work?
    1. Why Query Rewriting
  4. Stardog Reasoners
    1. Blackout Reasoner
    2. Stride Reasoner (Alpha)
  5. Query Answering with Reasoning
  6. Reasoning Schemas
    1. Schema Versioning
  7. Chapter Contents

Overview

In this chapter, we describe how to use Stardog’s reasoning capabilities; we address some common problems and known issues. We also describe Stardog’s approach to query answering with reasoning in some detail, as well as a set of guidelines that contribute to efficient query answering with reasoning. Throughout this chapter, the terms “reasoning” and “inference” are used interchangeably to mean the same capability; that is, the ability to infer implicit knowledge from explicit data. Similarly, “reasoner” and “inference engine” are used interchangeably to refer to the Stardog component that implements this capability.

Stardog performs reasoning in a lazy and late-binding fashion: it does not materialize inferences; rather, reasoning is performed at query time. This means inferences are visible in query results, but they are not explicitly stored within the Stardog database. This is how Stardog can do reasoning over virtual graphs. We start with a high-level introduction to reasoning with some examples and explain the details of query-time reasoning and how to use reasoning for queries.

What is reasoning?

At the very basic level, reasoning is the process of inferring new types and relationships from existing data, given a schema. A schema, sometimes called a “data model”, “ontology”, or “TBox”, is set of RDFS or OWL axioms plus user-defined rules. Schemas contain the information for the reasoner to compute inferences. As a result, queries with reasoning will return additional results compared to queries that do not use reasoning.

Let’s start with a simple example where we define a class Person and its two subclasses Employee and Customer in our schema. We also have our instance data where we see one instance for each subclass. To follow along with this example, insert the schema and data like so:

INSERT DATA {
    GRAPH <tag:stardog:api:context:schema> {
        :Person a owl:Class .
        :Customer a owl:Class ;
            rdfs:subClassOf :Person .
        :Employee a owl:Class ;
            rdfs:subClassOf :Person .
    }
}
INSERT DATA {
   :Alice a :Employee .
   :Bob a :Customer .
}

The following query retrieves all Person instances:

SELECT ?person {
   ?person a :Person
}

This query would return no results by default since there are no explicit :Person-type triples in the data. But enabling reasoning for this query will return the results:

person
:Alice
:Bob

The following example shows a user-defined rule to infer :coworker relationship between two people if they work for the same organization:

IF {
   ?person1 :worksFor ?organization .
   ?person2 :worksFor ?organization .
   FILTER (?person1 != ?person2)
}
THEN {
   ?person1 :coworker ?person2 .
}
   
:Alice :worksFor :ACME .
:Charlie :worksFor :ACME .

If we enable reasoning, we can execute any of the following queries:

SELECT ?person { ?person :coworker :Alice }
SELECT ?person { :Alice :coworker ?person }
SELECT ?person ?coworker { ?person :coworker ?coworker }

and get Alice and Charlie as coworkers in the results.

How does reasoning work?

Stardog computes inferences as needed, on-the-fly, using a query rewriting approach: Stardog rewrites the user’s query with respect to a schema, and then executes the resulting expanded query against the data in the normal way. This process is completely automated and requires no intervention from the user.

If we consider the example schema above, the input query:

SELECT ?person {
   ?person a :Person
}

would be rewritten to a query equivalent to:

SELECT DISTINCT ?person {
    { ?person a :Person } UNION
    { ?person a :Customer } UNION
    { ?person a :Employee }
}

There are various optimizations involved in the query rewriting process that would simplify the rewritten query. For example, it is very common that some classes will have no explicit instances in the data (think about abstract super classes). In our very simple dataset above, there are no instances with an explicit Person type. The reasoner will detect these cases, and in this example, not include the pattern ?person a :Person in the expanded query.

The expanded queries created by Stardog are not directly expressed in SPARQL syntax, but you can see the effect of reasoning in the query plans. The query plan for the above example would look like this:

Distinct [#2]
`─ Projection(?person) [#2]
   `─ Union [#2]
      +─ Scan[POSC](?person, rdf:type, :Customer) [#1]
      `─ Scan[POSC](?person, rdf:type, :Employee) [#1]

Reasoning in Stardog is primarily founded in the Datalog formalism. RDFS and OWL axioms along with user-defined rules are (basically) Datalog rules over a graph. Stardog will translate the schemas into an internal Datalog representation and perform the query rewriting process.

Why Query Rewriting

Query rewriting has several advantages over materialization. The query rewriting approach allows for maximum flexibility while maintaining excellent performance; you only pay for the reasoning you use - no more and no less. In materialization, on the other hand, the data gets expanded with respect to the schema, not with respect to any actual query. And it’s the data – all of the data – that gets expanded, whether any subsequent query actually requires reasoning or not. The schema is used to generate new triples, typically when data is added or removed from the system. However, materialization introduces several thorny issues:

  • data freshness. Materialization has to be performed every time the data or the schema change. This is particularly unsuitable for applications where the data changes frequently or data is stored externally and accessed via a virtualization layer.
  • data size. Depending on the schema, materialization can significantly increase the size of the data. The cost of this data size blowup may be applied to every query (in terms of increased I/O).
  • fixed schema. Materialization is computed based on a fixed schema. If there are different applications that require different kinds of inference rules, there will not be the flexibility to switch between different schemas.
  • resources. Depending on the size of the original data and the complexity of the schema, materialization may be computationally expensive. And truth maintenance, which materialization requires, is always computationally expensive, especially after deletions.

Stardog Reasoners

As of version 9.0, Stardog comes with two different reasoner implementations, both providing query-time reasoning capability with some differences:

  • Blackout is the more mature reasoner implementation that supports more of the RDFS and OWL specifications but has limitations with respect to user-defined rules.
  • Stride (alpha) is the next-generation reasoner implementation that supports more expressive user-defined rules, including negation and aggregation but a smaller subset of RDFS and OWL.

Users can switch between the two reasoner implementations by setting the database configuration option reasoning.stride to true or false. The default value for this option is false, which means the Blackout reasoner will be used. No other changes are required after setting this option, and the corresponding reasoner will be used automatically behind the scenes.

Blackout Reasoner

The Blackout reasoner supports the expressivity of OWL 2 profiles, which means schemas can contain complex OWL axioms. Furthermore, OWL axioms can be filtered automatically by setting the reasoning.type database option. The default value of reasoning.type is SL, and for the most part, users don’t need to worry too much about which reasoning type is necessary since SL covers all of the OWL 2 profiles, as well as user-defined rules. This value may be set to a different value:

  • RDFS for RDF Schema, mainly subclass, subproperty, domain, and range axioms
  • QL for the OWL 2 QL axioms
  • RL for the OWL 2 RL axioms
  • EL for the OWL 2 EL axioms
  • SL for a combination of RDFS, QL, RL, and EL axioms, plus SWRL rules.

Any axiom outside the selected type will be ignored by the reasoner.

The following table lists patterns (and the corresponding restrictions) which can be used in the body of a user-defined rule supported by Blackout:

Rule Features Limitations
Triple patterns No variables in predicate position or object position if the predicate is rdf:type. No property path operators *, + or ?.
FILTER EXISTS, NOT EXISTS or non-deterministic functions, e.g. RAND, cannot be used in filters
BIND EXISTS, NOT EXISTS or non-deterministic functions, e.g. RAND, cannot be used in bind expressions
UNION No limitations

In addition to the above rules, Blackout only supports limited forms of recursive rules. Only recursive rules that can be translated to SPARQL property paths are supported.

Stride Reasoner (Alpha)

The Stride reasoner has been introduced in Stardog 9.0 and is currently in alpha state. It is designed to support more expressive rules and exhibit more robust performance, but it is currently not recommended for production usage.

Stride only supports the following RDFS and OWL constructs and ignores any other axiom, regardless of the reasoning.type option value:

Terms Description
rdfs:subClassOf, owl:equivalentClass Class hierarchies and inheritance between named classes
rdfs:subPropertyOf, owl:equivalentProperty Property hierarchies and inheritance between properties
owl:inverseOf, owl:SymmetricProperty Inverse properties
owl:TransitiveProperty Transitive properties

The following table lists patterns (and the corresponding restrictions) which can be used in the body of a user-defined rule supported by Stride:

Rule Features Limitations
Triple patterns No variables in predicate position or object position if the predicate is rdf:type. No property path operators *, + or ?.
FILTER Non-deterministic functions, e.g. RAND, cannot be used in filters
BIND Non-deterministic functions, e.g. RAND, cannot be used in bind expressions
UNION No limitations
VALUES No UNDEF values
GROUP BY No cyclic dependencies between rules involving GROUP BY

Stride behaves differently than Blackout if there is an invalid rule in the schema. Blackout logs such problematic rules or axioms and performs reasoning with the valid rules and axioms. This might cause subtle issues, as errors in the Stardog log can easily go unnoticed. Stride, on the other hand, will refuse to do any reasoning if there is an invalid rule or axiom, requiring the user to fix the issue first. Note that if multiple schemas are being used, errors in one schema will not affect reasoning with other schemas. Rules causing problems can be moved to named graphs outside the schema graphs so they can be fixed without preventing reasoning with other rules.

Stride reasoner in its alpha status does not support reasoning for triple patterns that have variables in the predicate position or that have rdf:type in the predicate position and a variable in the object position. Such triple patterns will be answered without reasoning, as if the #pragma reasoning off hint has been used for that triple pattern. This limitation will be lifted in a future release.

Query Answering with Reasoning

As explained above, Stardog uses a query-time reasoning approach. This means you do not need to do anything up front when you create your database or add data to it if you want to use reasoning. You merely need to enable reasoning for your queries. All of Stardog’s interfaces (API, network, and CLI) support reasoning during query evaluation. All types of queries (that is, SELECT, ASK, CONSTRUCT, PATHS, DESCRIBE, VALIDATE) can be evaluated with reasoning. When reasoning is enabled, it applies to all query patterns in WHERE and VIA blocks.

When reasoning is enabled, the query execution will take into account the axioms and rules in the schema. There is one default schema associated with a database, but there can also be multiple named schemas (as explained in the next section). Reasoning queries will use the default schema by default, but a different reasoning schema can be selected for queries.

When reasoning is enabled for a query, it is possible to selectively disable reasoning for certain parts of the query using the #pragma reasoning hint. See Reasoning Query Hints.

CLI

In order to evaluate queries using reasoning via the command line, use the --reasoning flag in the query execute command:

$ stardog query execute --reasoning myDB "SELECT ?s { ?s a :Employee }"

This will use the default reasoning schema for the database. A named schema can be specified using the --schema option:

$ stardog query execute --schema schema-1.0 myDB "SELECT ?s { ?s a :Employee }"

HTTP

For HTTP, the reasoning flag is specified either with the other HTTP request parameters:

$ curl -u admin:admin -X GET "http://localhost:5820/myDB/query?reasoning=true&query=..."

or, as a segment in the URL:

$ curl -u admin:admin -X GET "http://localhost:5820/myDB/query/reasoning?query=..."

See the HTTP API for a detailed look at how to perform a SPARQL query with reasoning enabled.

Programmatically

See the chapter on Programming for the details of how to use reasoning in the various programming languages Stardog supports.

Reasoning Schemas

A reasoning schema is simply one or more named graphs that contain RDFS/OWL axioms and user-defined rules. The schema elements stored in the corresponding named graphs are automatically identified and extracted by Stardog. There is a default schema associated with each database, which is configured by the reasoning.schema.graphs database configuration option. The default value for this option is the special named graph tag:stardog:api:context:schema, which is initally an empty named graph so reasoner will see an empty schema. You can load your schema into this named graph or change this option to point to a named graph that you create.

It is best practice to store your reasoning schema in specific named graphs and specify the named graphs explicitly in database configuration. This makes management of schemas easier and allows Stardog to extract schema elements more efficiently.

Prior to Stardog version 10, the default reasoning schema was set to be tag:stardog:api:context:local which is a built-in wildcard for all local graphs, including the default graph. Using wildcards for reasoning schemas is deprecated in Stardog 10. Wildcards will continue working for Stardog 10 but support for reasoning schema wildcards is scheduled to be removed in version 11.

No additional operations are needed when schema named graphs are updated. Stardog will automatically detect when schemas are updated and use the new versions of schemas going forward. Since schemas are represented as RDF triples, loading and unloading schemas into Stardog is done by following the regular instructions for adding data.

There are certain use cases where one might need to use different schemas to answer different queries. Some examples:

  • There are two different versions of a schema that evolved over time. Older legacy applications need to use the previous version of the schema, whereas the newer applications need to use the newer version.
  • Different applications require different rules and business logic. e.g., the threshold for a concept like Low or High might change based on the context.
  • There could be a very large number of axioms and rules in the domain that can be partitioned into smaller schema subsets for performance reasons.

Starting with version 7.0, Stardog supports schema multi-tenancy: reasoning with multiple schemas and specifying a schema to be used for answering a query. Each schema has a name and a set of named graphs associated with it. When the schema is selected for answering a query, the axioms and the rules stored in the associated graphs will be taken into account. A named schema can be selected for a query using the --schema parameter in the query execute command:

$ stardog query execute --schema employeeSchema myDB "SELECT ?s { ?s a :Employee }"

When the --schema parameter is used, the --reasoning parameter does not need to be specified and will have no effect. Using the --reasoning flag without a --schema parameter is equivalent to specifying --schema default.

The named schemas are defined via the reasoning.schemas configuration option that is a set of schema names and graph IRI pairs. There is convenience functionality provided in the CLI and Java API to manage schemas. The named graphs for a new or an existing schema can be set as follows (using stored namespaces or full IRIs):

$ stardog reasoning schema --add employeeSchema --graphs :employeeGraph :personGraph -- myDB

The schemas can be removed using the reasoning schema command with the --remove flag. The --list option will list all the defined schemas and their named graphs:

$ stardog reasoning schema --list myDB
+----------------+----------------------------------+
|     Schema     |              Graphs              |
+----------------+----------------------------------+
| default        | <tag:stardog:api:context:schema> |
| employeeSchema | :personGraph, :employeeGraph     |
| customerSchema | :personGraph, :customerGraph     |
| personSchema   | :personGraph                     |
+----------------+----------------------------------+

Stardog does not follow ontology owl:imports statements automatically. Any schema information that is relevant for reasoning should be loaded into Stardog explicitly.


Schema Versioning

Stardog 10 introduces a new capability to track versions of schema graphs automatically. Tracking changes to schema graphs is an optimization used by the reasoner to avoid reloading and reprocessing the schema unless the schema graphs have been updated. The schema versioning is exposed to the end users so external applications can check if a schema has been updated or not easily without inspecting the contents of schema graphs.

The database option reasoning.schema.versioning.enabled needs to be set to true for schema versioning to be active. When this option is enabled Stardog will automatically compute a 64-bit hash from the contents of the reasoning schema graphs. This will be updated every time any of the reasoning schemas are updated. The hash value for a schema can be checked to determine if any of the associated named graphs have been modified.

The configuration option reasoning.precompute.non_empty.predicates should be set to false before reasoning.schema.versioning.enabled can be set to true.

Care should be taken to enable schema versioning with very large schemas, e.g. if there are millions of triples in schema graphs, especially if the schema contents are being updated frequently. Computing the version hash requires iteration over the schema graphs so performing this operation frequently on large graphs could have a noticeable performance overhead. With smaller schemas there should not be any noticeable performance overhead.

Schema version hashes can be retrieved using the following SPARQL query:

SELECT ?schema ?version {
   SERVICE stardog:schema:service {
        [] stardog:schema:schema ?schema ;
           stardog:schema:version ?version
   }
}

This query will return all the schemas associated with the database and their version hashes. Different filters can be used within the query to retrieve the version hash for specific schemas. The name “default” can be used for the default schema.

The following example shows how updating any named graph associated with a schema will cause the version to be updated as a result. But updating a non-schema graph will not change the schema version.

$ stardog reasoning schema --list myDB
+---------+--------------------------------+
| Schema  |             Graphs             |
+---------+--------------------------------+
| default | :customerGraph, :employeeGraph |
+---------+--------------------------------+
$ stardog query myDB 'SELECT ?schema ?version {
   SERVICE stardog:schema:service {
        [] stardog:schema:schema ?schema ;
           stardog:schema:version ?version
   }
}'
+-----------+--------------------+
|  schema   |      version       |
+-----------+--------------------+
| "default" | "defce74405837cbb" |
+-----------+--------------------+

Query returned 1 results in 00:00:00.210
$ stardog query myDB 'INSERT DATA { GRAPH :customerGraph { :Customer a owl:Class } }'
Transaction committed successfully in 00:00:00.156
$ sq myDB 'SELECT ?schema ?version {
   SERVICE stardog:schema:service {
        [] stardog:schema:schema ?schema ;
           stardog:schema:version ?version
   }
}'
+-----------+------------------+
|  schema   |     version      |
+-----------+------------------+
| "default" | "aa14af549c5661" |
+-----------+------------------+

Query returned 1 results in 00:00:00.226
$ stardog query myDB 'INSERT DATA { GRAPH :employeeGraph { :Employee a owl:Class } }'
Transaction committed successfully in 00:00:00.137
$ stardog query myDB 'SELECT ?schema ?version {
   SERVICE stardog:schema:service {
        [] stardog:schema:schema ?schema ;
           stardog:schema:version ?version
   }
}'
+-----------+--------------------+
|  schema   |      version       |
+-----------+--------------------+
| "default" | "b0d1a9c49cc9e825" |
+-----------+--------------------+

Query returned 1 results in 00:00:00.214
$ stardog query myDB 'INSERT DATA { GRAPH :dataGraph { :JohnDoe a :Customer } }'
Transaction committed successfully in 00:00:00.148
$ stardog query myDB 'SELECT ?schema ?version {
   SERVICE stardog:schema:service {
        [] stardog:schema:schema ?schema ;
           stardog:schema:version ?version
   }
}'
+-----------+--------------------+
|  schema   |      version       |
+-----------+--------------------+
| "default" | "b0d1a9c49cc9e825" |
+-----------+--------------------+

Query returned 1 results in 00:00:00.204

Updating the reasoning schema configuration to add or remove a non-empty named graph will also cause the schema version to change since the contents of a schema is the union of all its named graphs.

Chapter Contents