Nothing Special   »   [go: up one dir, main page]

Skip to toolbar

Community & Business Groups

RDF-DEV Community Group

RDF-DEV, for developments relating to W3C RDF, including collaboration around applications, schemas, and past/present/future related standards. Successor to SWIG/RDFIG.

w3c/rdf-star
Group's public email, repo and wiki activity over time

Note: Community Groups are proposed and run by the community. Although W3C hosts these conversations, the groups do not necessarily represent the views of the W3C Membership or staff.

Final reports / licensing info

Date Name Commitments
RDF-star and SPARQL-star Licensing commitments

Chairs, when logged in, may publish draft and final reports. Please see report requirements.

RDF-star patterns for provenance

The RDF-star specification was published as a final community group report in December 2021. For a little more than a year, some participants of the RDF-DEV community group have joined forces to provide a consolidated description of RDF-star and an associated test suite. The goal is to foster the convergence of existing implementations, and the emergence of interoperable new ones.

The development of RDF-star has been surrounded with a lot of enthusiasm and expectations, which is both a blessing and a curse. Many people, with different backgrounds and needs, seem to expect RDF-star to be the perfect and straightforward solution to their specific problem. To enable multiple use cases the group has striven to make RDF-star a generic enough toolbox, out of which most use-cases can be solved ­– while sometimes requiring the user to go the extra mile.

In this post, we present some lessons learned by the group through discussions and exchanges. This is meant to give some insight about the rationale behind RDF-star, and some guidelines about how to best use it for modeling provenance data.

There’s only so much a quoted triple can carry

The first example of the RDF-star specification is repeated below:

PREFIX : <http://www.example.org/>
 
:employee38 :familyName "Smith" .
<< :employee38 :jobTitle "Assistant Designer" >> :accordingTo :employee22 .

The intended meaning of this small RDF-star graph is: “employee #38 is named Smith, and employee #22 claims that employee #38 is an assistant designer”. This example illustrates, in particular, how a quoted triple (between double angle brackets) can be used (here, as the subject of another triple) without being asserted: we (the authors of the graph) do not endorse the claim made by employee #22. By quoting, we are referring to the triple without making the triple itself part of the graph.

This example could be extended as follows:

PREFIX : <http://www.example.org/>
 
<< :employee38 :jobTitle "Assistant Designer" >>
    :accordingTo :employee22, :employee38 ;
    :confidence 0.8 .

In this new example, both employee #22 and employee #38 are making an identical claim, still not endorsed by us. Furthermore, we assign a confidence score to the statement that employee #38’s job title is “Assistant Designer”.

To illustrate how this kind of modeling could be useful, imagine an RDF store containing a collection of claims, described as above with claimers and confidence level. The following SPARQL-star query could be used to retrieve, for each claimer, the minimum confidence we have in the statements they claimed about themselves.

PREFIX : <http://www.example.org/>
SELECT ?claimer (MIN(?conf) as ?minConfidence)
{
    << ?claimer ?p ?o >> :accordingTo ?claimer; :confidence ?conf
}
GROUP BY ?claimer

It is however important to understand that this basic design has limitations. Namely, each statement made about a particular triple must be interpretable independently of the other statements made about that triple. (This is actually a general feature of RDF, not just RDF-star: two statements about the same subject must always be interpretable independently from each other. On the open web, if we assume that another triple that we have not yet discovered could change the meaning of the triples that we know, then reasoning with what we know would become much more hazardous.)

Therefore, while it could be tempting to extend the examples above as follows, it would be a bad design, as we will show.

# ⚠ YOU MUST NOT DO THIS
PREFIX : <http://www.example.org/>
 
<< :employee38 :jobTitle "Assistant Designer" >>
    :accordingTo :employee22; :confidence 0.2 .
    # we don’t trust employee22 about someone else’s job title


<< :employee38 :jobTitle "Assistant Designer" >>
    :accordingTo :employee38; :confidence 0.8 .
    # we quite trust employee38 about their own job title

First, note that the example above changes the meaning of the :confidence predicate. It is not used anymore to represent the general confidence we have in the triple itself, but to represent the confidence that we have in a particular person claiming the triple. If we were to use an actual ontology, those two different notions of “confidence” would require two distinct IRIs.

But most importantly, the problem with the example above is that it does not accurately capture the intended meaning, because it is equivalent to:

PREFIX : <http://www.example.org/>
 
<< :employee38 :jobTitle "Assistant Designer" >>
    :accordingTo :employee22;
    :accordingTo :employee38;
    :confidence 0.2;
    :confidence 0.8 .

The four triples asserted by this graph have the same subject (namely, the quoted triple << :employee38 :jobTitle “Assistant Designer” >>), and there is no way to know which claimer is associated to which confidence score.

This contrasts RDF-star with (some implementations of) Property Graphs, which allow multiple identical edges to co-exist between two nodes, and to carry different properties. Note that this “impedance mismatch” has been recognized as early as 2014, but that some solutions were already envisioned then.

More complex provenance modeling

The problem with the last example above is that we are not talking about the triple << :employee38 :jobTitle “Assistant Designer” >> (which is uniquely identified by its subject, predicate and object). We want to talk about two similar but distinct claims, each claim with its own identity, and its own properties. Let us introduce a new property linking a given triple to one or several of its claims. A correct version of the previous example would now be:

PREFIX : <http://www.example.org/>
 
<< :employee38 :jobTitle "Assistant Designer" >> :hasClaim <#c1>, <#c2>.


<#c1> :claimer :employee22; :claimConfidence 0.2 .
<#c2> :claimer :employee38; :claimConfidence 0.8 .

As an autonomous entity, each claim can have any number of properties that will no longer be confused with the properties of other claims of the same triple. We could for instance extend this example by adding to each claim a date, a source document…

With such a design, the SPARQL-star query above needs to be updated, and would become:

PREFIX : <http://www.example.org/>
SELECT ?claimer (MIN(?conf) as ?minConfidence)
{
    << ?claimer ?p ?o >> :hasClaim [
        :claimer ?claimer; :claimConfidence ?conf
    ]
}
GROUP BY ?claimer

Epilogue

Note that it could be argued that we have always been talking about claims, even in the two first examples of this post, and so that these two examples were badly designed and should have used the :hasClaim property as well. We argue that the design of the first two examples is sufficient, when the properties recorded about claims are simple enough. A balance always has to be found between, on the one hand, simplicity and usability, and on the other hand, purity and scalability. Following George Box’s aphorism that “all models are wrong, but some are useful”, we consider that the design of the first two examples is useful enough in some situations.

Acknowledgement

Thanks to the members of the RDF-star group for their reviews and feedback on this post.

New public draft of the RDF-star report

The RDF-star “task force” is proud to announce the third public draft of its report RDF-star and SPARQL-star. RDF-star (previously known as RDF*) extends RDF with a compact way of annotating triples, using them as the subject or object of other triples. This makes expressing, e.g., provenance or qualified relationships, easier than with standard RDF. SPARQL-star makes it possible to query such data in the same style.

Most of the report is now stable. Areas that are controversial include a discussion about the pros and cons that have been raised inside the group. Implementers are encouraged to run the test suite and submit implementation reports.

The group has started working on a draft charter for a future RDF-star working group. The goal is to promote RDF-star as proper W3C recommendation. Stay tuned.

Olaf Hartig
Pierre-Antoine Champin
Gregg Kellogg
Andy Seaborne

Progress towards RDF*/SPARQL* Community Report

Since the middle of 2019, there has been an active community of researchers, implementers, and practitioners discussing a small extension to RDF that had become known under the name RDF* (pronounced “RDF star”). The aim of this extension is to support various use cases related to annotations and statements about individual RDF triples. Recently, the community has started having regular telecons and moved up a gear to produce a community report and test cases for RDF* and its corresponding extension to SPARQL, called SPARQL*. With this short update we want to highlight the progress and the direction of this effort. Please note that this work does not affect the formal status or stability of W3C’s RDF-related standards, and is not currently being proposed as a recommendation-track activity.

Starting from the experience of supporting the RDF*/SPARQL* approach in several implementations, the group is working on material needed to consolidate the definition of the approach and to provide a formal as well as a practical foundation for an ecosystem of interoperable implementations. This material includes documents and test cases, which are developed through contributions on a corresponding github project.

The general direction is to extend the RDF data model by allowing triples to be used as the subject or object of other triples. Concrete serialization formats for RDF are then extended accordingly. For instance, by the corresponding extension of the RDF Turtle format, a triple with a triple in its subject position is represented as follows.

<<:bob a :Doctor>> :accordingTo :alice .

We call a triple that is in the subject or object of another triple an embedded triple. To support the widest set of use cases, such an embedded triple on its own does not assert this triple (i.e., the example above does not assert that Bob is indeed a doctor). However, there is also an annotation syntax now, which makes it convenient to assert a triple and also refer to that triple. For instance, in the extended version of Turtle, we may write the following.

:bob a :Doctor {| :accordingTo :alice |} .

This is equivalent to:

:bob a :Doctor.
<<:bob a :Doctor>> :accordingTo :alice .

SPARQL is extended accordingly so that embedded triples and annotations can also be queried. For instance, the following query retrieves all the alleged doctors, and who made the respective claims.

SELECT * { << ?d a :Doctor >> :accordingTo ?a }

The relevant extensions to the abstract data model of RDF, to concrete syntaxes, and to other parts of the stack (such as SPARQL query semantics and result formats) are being documented in a Community Group report and in a corresponding set of test suites. The community home page with links to all this material can be found at https://w3c.github.io/rdf-star/.

Please join the effort if you are interested in helping the group to complete this work.

Pierre-Antoine Champin
Olaf Hartig
Gregg Kellogg
Andy Seaborne

Call for Participation in RDF-DEV Community Group

The RDF-DEV Community Group has been launched:


RDF-DEV, for developments relating to W3C RDF, including collaboration
around applications, schemas, and past/present/future related standards. Successor to SWIG/RDFIG.


In order to join the group, you will need a W3C account. Please note, however, that W3C Membership is not required to join a Community Group.

This is a community initiative. This group was originally proposed on 2018-10-16 by Dan Brickley. The following people supported its creation: Dan Brickley, Nathan Rixham, Gregg Kellogg, Simon Cox, Masahide Kanzaki, Raúl García Castro, Emidio Stani, Dimitris Kontokostas, Bart Hanssens, Bill Roberts, brandon whitehead, Andrea Wei-Ching Huang, Jen Williams, Franck Cotton, Hugh Glaser, Andy Seaborne. W3C’s hosting of this group does not imply endorsement of the activities.

The group must now choose a chair. Read more about how to get started in a new group and good practice for running a group.

We invite you to share news of this new group in social media and other channels.

If you believe that there is an issue with this group that requires the attention of the W3C staff, please email us at site-comments@w3.org

Thank you,
W3C Community Development Team