Linked Data means connecting individual pieces of data on the Web, so that automated clients can interpret them more easily. Servers can offer access to such data through different standardized and non-standardized interfaces, the properties of which profoundly influence the characteristics of clients and servers during interactions. This document defines Linked Data Fragments, a uniform view on all possible interfaces to publish Linked Data. This view allows us to analyze the properties of existing interfaces, and to define new interfaces with different combinations of properties. Additionally, this document explains how existing interfaces fit into this uniform view.
This specification was published by the Hydra W3C Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.
To participate in the development of this specification, please join the Hydra W3C Community Group. If you have questions, want to suggest a feature, or raise an issue, please send a mail to the public-linked-data-fragments@w3.org mailing list.
A gigantic amount of digital information exists, and new documents are created every day. Most of them are written in natural languages, which machines cannot fully interpret yet. And even if a document contains machine-interpretable information, the appropriate context is often missing. For instance, what do thousands of numbers in a comma-separated file mean?
Machines prefer structured data using unambiguous identifiers. Linked Data [[LINKED-DATA]] combines both to make it easier for machines to process and integrate data from different sources. URLs—the unambiguous identifiers of the Web—not only identify a resource, they also allow to retrieve a representation thereof. Machine-interpretable structured data is possible using the triple-based model of RDF [[RDF11-CONCEPTS]].
All RDF triples have a subject, predicate, and object,
and in the case of Linked Data, these components are dereferenceable URLs.
For example, the following triple expresses that Walt Disney is a person:
<http://dbpedia.org/resource/Walt_Disney>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person>.
Linked Data is thus linked on two levels:
on one level, we link “Walt Disney” and “Person” together with the “type” relation;
on another level, each of those three components is a link toward more information.
This combination of structure and URLs is the essence of Linked Data:
if you don't know what
http://dbpedia.org/resource/Walt_Disney
or
http://xmlns.com/foaf/0.1/Person
mean,
you can look up information about those topics through their URL.
You can convey Linked Data in the RDF model through various concrete forms:
The most straightforward way to access Linked Data
is to follow the URL of a Linked Data document.
In other words, we use the HTTP protocol [[RFC7230]]
to retrieve a representation
of the resource identified by that URL.
This process is called dereferencing.
For example, you can copy and paste the URL
http://dbpedia.org/resource/Walt_Disney
in your browser, which will lead to an HTML document with triples in RDFa.
Automated clients might ask for other representations of this resource,
for instance, in JSON-LD or Turtle.
However, such an interface based on Linked Data documents and dereferencing
has its limitations.
For example, while the URL
http://xmlns.com/foaf/0.1/Person
describes the notion of “a person”,
it does not give access to a list of all persons.
This would in fact be impossible:
the Web server at xmlns.com
is not supposed to know
which resources from dbpedia.org
use this type.
The alternative to scan all documents on dbpedia.org
and extract this information would be highly impractical.
Therefore, if we want to retrieve the members of this list efficiently,
we need another interface.
An alternative interface is a data dump,
which is a typically large file that contains all triples from a certain dataset.
Using a data dump of dbpedia.org
,
we could find the list of all people.
Unfortunately, this would involve downloading a lot of information,
even though we are only interested in a small fraction.
SPARQL endpoints [[SPARQL11-PROTOCOL]] offer an interface
that allows to select data much more granularly.
This is more convenient for clients,
but individual requests are considerably more expensive for servers.
The above indicates that each type of interface to Linked Data comes with its own characteristics, which can lead to advantages or disadvantages in particular situations.
The goal of Linked Data Fragments is to provide a uniform view on all possible interfaces to Linked Data. Thereby, we want to provide a conceptual framework to characterize all Linked Data interfaces in order to enable qualitative and quantitative comparisons. Furthermore, we want to stimulate the development of new kinds of interfaces that address the current and emerging needs of the Semantic Web.
This documents defines Linked Data Fragments, and specifies what clients and servers of Linked Data Fragments are. It does not redefine existing interfaces or introduce new ones. Instead, it explains how these interfaces can be seen from the Linked Data Fragments perspective.
If you want to analyze existing Linked Data interfaces or define new interfaces, we encourage you to read this document. If instead you want to implement one of the discussed interfaces, the individual specifications (which are linked from this document) will serve you better.
We write triples in this document in the Turtle RDF syntax [[!TURTLE]] using the following namespace prefixes:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
Any piece of data always occurs in a certain context; it never stands on its own. Unsurprisingly, this also applies to data structured as RDF triples [[!RDF11-CONCEPTS]]. In order to refer to collections of RDF triples, we introduce the following definition, derived from the VoID Vocabulary [[VOID]]:
A Linked Data dataset is a collection of RDF triples that are published, maintained or aggregated by a single provider.
Often, we are interested in specific parts of a dataset. Such parts can be a few or several triples in size, ranging from an empty part to the whole dataset. To be able to define what a specific part looks like, we introduce the following concept.
A selector is a boolean function that decides whether or not a certain triple (or graph of triples) belongs to a part of a dataset.
Some selectors are more closely related than others. For instance, a group of selectors might have a similar structure or computational complexity. The following definition allows us to talk about them collectively.
A selector type is a class of selectors with similar structural characteristics.
Apart from the triples that describe data in a (part of a) dataset, some triples capture data about it. They do not belong to the dataset as such, but they can nonetheless be helpful to understand properties of this dataset.
Metadata of a dataset, or a part thereof, consists of RDF triples that describe data about that dataset or part, but that do not belong to the dataset itself.
Pieces of data and information on the Web can be connected to each other. This is because the Web is filled with hypermedia controls: most HTML pages contain several hyperlinks, some pages also contain forms with text fields and buttons. HTML is not the only format with hypermedia support; specific RDF vocabularies can be used to express hypermedia controls as well. Regardless of format, what all hypermedia controls on the Web have in common is that they somehow lead to an URL a client can visit. The following definition generalizes this notion.
A hypermedia control is a function that generates an IRI [[!RFC3987]] based on zero or more arguments. In particular, a hyperlink is a zero-argument function (i.e., an IRI), and a hypermedia form is a multi-argument function.
The read aspect of each interface to Linked Data is characterized by its possible set of responses. We therefore introduce a concept to capture such responses.
A Linked Data Fragment of a Linked Data dataset is a set of RDF triples that consists of three parts:
The selector, elements of the metadata set, and elements of the control sets are specific to each Linked Data Fragment. Each of the three parts is allowed to be empty. Any (proper or improper) subset of a Linked Data dataset, regardless of how this subset was created, is by thus definition a Linked Data Fragment.
In the general definition of a Linked Data Fragment, there are no restrictions on what selectors should look like. They could be triple patterns, basic graph patterns, SPARQL queries, or even natural language queries. Like selectors, Linked Data Fragments can be organized in types.
A Linked Data Fragment type is a class of Linked Data Fragments with the same selector type and metadata and control sets with similar characteristics.
We can analyze existing and new Linked Data interfaces by characterizing their responses as a specific Linked Data Fragment type.
Linked Data Fragment types of existing interfaces are listed in the next section.
The data part of some Linked Data Fragments can become quite large. For instance, the fragment that contains all triples of a dataset can contain millions of triples. To make such large fragments more manageable, their data can be split across multiple pages.
A Linked Data Fragments page contains a subset of all data triples of a Linked Data Fragment, together with all of its metadata and control triples.
Conceptually speaking, each fragment remains one whole, but its data can be retrieved through several requests. This additionally allows to retrieve the metadata and control set without having to download a disproportionally large part of the dataset. Not all fragments support paging.
A Linked Data Fragments server is a server that offers all possible Linked Data Fragments of one or more specific Linked Data Fragment types of one or more datasets. It MUST support at least one RDF-based representation for each fragment.
Servers can choose what types of Linked Data Fragments they offer, whether or not they support paging, and what representations they provide.
A Linked Data Fragments client is a client that can access Linked Data Fragments of at least one specific Linked Data Fragments type. It MUST be able to consume at least one RDF-based representation of the fragments it supports.
Since the goal of Linked Data Fragments is to provide a uniform view on Linked Data interfaces on the Web, this section describes how existing types of Linked Data interfaces fit into the Linked Data Fragments definition. Basically, each interface offers Linked Data Fragments of a specific type, which is thus characterized by its data selector, metadata set, and control set.
A data dump of a dataset is an instance of a Linked Data Fragment type with the following characteristics:
f(triple) = true
for all triples.
In other words,
a data dump is an RDF representation
of all triples of its dataset.
Many publishers of Linked Data offer such downloadable data dumps of their datasets. They can be used to set up a local triple store, but are not fit for live querying because of their typically large file size.
A Linked Data document of a dataset is an instance of a Linked Data Fragment type with the following characteristics:
{ <entity> ?predicate ?object. }
and possibly triples matching { ?subject ?predicate <entity>. }
.
Linked Data documents can be used to browse a dataset, or to execute queries using link-traversal-based query execution.
A SPARQL query result of a dataset is an instance of a Linked Data Fragment type with the following characteristics:
CONSTRUCT
query;
the data consists of those triples that result from executing this query.
SPARQL results allow to extract very specific fragments of a dataset.
The fact that a SPARQL query result is a Linked Data Fragment means that each SPARQL endpoint is, by definition, a Linked Data Fragments server.
Only results of CONSTRUCT
(and thus not SELECT
or ASK
) SPARQL queries
are considered Linked Data Fragments.
This is because only the execution of CONSTRUCT
queries
results in data triples.
However, the CONSTRUCT
query can contain SELECT
subqueries.
A triple pattern fragment (also known as basic Linked Data Fragment) is an instance of a Linked Data Fragment type with the following characteristics:
{ ?subject ?predicate ?object. }
,
in which each of the three components can be variable or constant.
The data consists of those triples that match the triple pattern.
Triple pattern fragments can be used to browse a dataset with more flexibility than Linked Data documents, because they can also select based on predicates and objects (instead of only subjects).
Triple pattern fragments are described in detail in a separate document.