Thursday, May 10, 2012

W3C Launches Linked Data Platform (LDP) Working Group

Following on the W3C Submission Linked Data Basic Profile 1.0 and the Linked Enterprise Data Workshop around needs for standardization around Linked Data, the W3C has created the Linked Data Platform Working Group to look into this problem and produce a specification (W3C Recommendation) to address the stated items in the charter.

This is great news and follows on what we have been leveraging and learning when developing OSLC specifications built off of Linked Data as an architectural basis.  Many of the things you see in the charter and use cases the working group will face, have been a result of work by the OSLC community and community members such as IBM Rational.  By taking this effort to W3C this enables a broader set of applications and data to interoperate.  How does it do this?  By prescribing a single, simple, consistent way for applications to support Linked Data this means that more we can expect to see more and more compliant servers that are exposing their data.  This has a great quality that it makes it easier for client implementers being able to wring a client once and use across multiple server vendors -- this is a standard reason that we have done standards and continue to do standards.  There is much value in this alone, no doubt.  We gain additional value by building applications that pull data from a variety of servers and sources, leveraging the relationships (links) between the data and the meaning behind these links using standard and common vocabularies.  Reading the data is all good, though we need to a way to create and update the data.  We additionally need to a have a simple way to have these clients be able to create and update this data, it improves the quality of the data from the source and all benefit from a social and distributed set of clients/applications that operate over this data.

As the charter points out "The combination of RDF and RESTful APIs is therefore natural".  Why is this natural?  Since we operate on resources on the Web: get them, create them, update them, remove them, we need a way to identify these things, which URIs provide a nice mechanism for that, and access them over a network, which HTTP URIs provide a nice mechanism for that.  Since these resources on the Web are really not on the Web, for example a URI that produces a representations for a toaster isn't the toaster but a Web resource representing some information about it: it is a useful to have a model (RDF) for representing the state of these things.  Even though these concepts seem simple, there are many variations that exist on interpretation of existing specifications, best practices when multiple options exist and so on.  The Linked Data Platform WG looks to address a number of questions such as:

What is the current set of RDF datatypes that must be supported?
Currently it is left open to implementations to select a set, hopefully from a common and compatible set.
Which RDF formats must be supported?
RDF is a "resource description framework" and not a format, it has multiple formats that support it: RDF/XML, Turtle, N-3, JSON-LD, etc.  It would be nice for clients to be able to rely on a minimal subset of representations.
Which vocabulary terms are recommended?
When pulling together resource representations from separate servers, you gain value in analyzing these two resources by seeing which links they have in common and other interesting relationship.  Though, if both the resource representations used common, well-understood and standardized vocabulary terms to describe their data and links, then applications could correlate meaning from these common term usages without the need for extensive mappings applied.
What URL should be used to create new resources?  What should I receive back from that URL?
A common best practice is to enable new resource creation by POSTing a representation on a given say container URL.  A subsequent GET of that container URL would find that new resource any any previous resources that were created and still exist in that container.  This same model could apply to deleting the previously created resource, by issuing a DELETE on the resource URL it would both delete the resource and remove it from the container.
How can a client ensure it can create and update resources that have constraints applied by the server?
There are a number of existing applications and cases for future applications, where constraints may need to be made on the validity of the resource representation prior to the server creating a resource or allowing a new updated revision to be created.  It would helpful to have a representation of these constraints that a client could easily fetch and apply to its representation prior to submitting it to a server.
These are just a sample of the issues that the charter highlights and the working group looks to tackle.  The member submission for a Linked Data Basic Profile address a subset of these issues and a future derivative or version will likely be introduced to handle these cases.  Though in the spirit of making progress and getting value out of something in the near future, it would seem logical that the working group would focus on a subset in its version 1.0 specification.

The working group charter outlines a timetable that is an initial guess on when a final "Recommendation" is available, which is looking like early 2014 with a first draft for review later in 2012.  I highly encourage  everyone to get involved: providing use cases and requirements, reviewing material, providing implementation feedback, writing specification...the more review and collaboration we have on the better changes for success we'll have.  This has proven true in OSLC and W3C.