What is JSON-LD? A Talk with Gregg Kellogg

by Aaron Bradley on September 10, 2014

in Semantic Web

What is JSON-LD?

JSON-LD is everywhere.

Okay, perhaps not everywhere, but JSON-LD loomed large at the 2014 Semantic Web Technology and Business Conference in San Jose, where it was on many speakers' lips, and could be seen in the code examples of many presentations.

I've read much about the format – and have even provided a thumbnail definition of JSON-LD in these pages – but I wanted to take advantage of the conference to learn more about JSON-LD, and to better understand why this very recently-developed standard has been such a runaway hit with developers.

In this quest I could not have been more fortunate than to sit down with Gregg Kellogg, one of the editors of the W3C Recommendation for JSON-LD, to learn more about the format, its promise as a developmental tool, and – particularly important to me as a search marketer – the role in the evolution of schema.org.

I was also fortunate enough to have two semantic web luminaries join us for this conversation.

Stéphane Corlosquet has been the driving force at incorporating RDFa into Drupal, and has patiently provided me with invaluable help in my efforts to better understand RDFa and related technologies.

Phil Archer is the indomitable Data Activity Lead at W3C, and had the day before delivered a rousing keynote on "10 years of success and achievement" of the semantic web.

Our discussion – transcribed below – has been lightly edited for clarity and length.

– – – – – – – – – – – – – – – – – – – –

Aaron: Most developers are well-acquainted with JSON. So rather than starting with the question "what is JSON-LD" let me ask, what's the difference between JSON and JSON-LD?

JSON-LDGregg: I think the interesting thing about JSON as a representational format is that it is really composed of simple syntactic structures – arrays and dictionaries or hashes, simple string values. That's the real simplicity there, and of course it derives from JavaScript – JavaScript Object Notation – so as a developer it makes great sense to use something like that when you're feeding your web application. When you're extracting data from a service into your web application it automatically just updates your JavaScript runtime; you can access it quite easily.

JSON-LD, then, really looks to leverage this, but adds a lot of what is otherwise missing in JSON. So JSON being a very simple structure – it has strings, it has keys, it has strings as values, where it can have objects or hashes as values – it is very convenient but it has no inherent meaning. It has only the meaning that the person that's describing their API – or sometimes not describing it – put in it. What we were striving to do in JSON-LD is to be able to take advantage of that syntactic simplicity, but provide unambiguous meaning.

In Linked Data meaning is ascribed using URLs (IRIs). So if I see a URL – let's say schema.org/name – that is defined to mean if used as a URL describing a property that is used for name. So that anyone that wants to know what it is can dereference the URL and find out that information. But using schema.org/name in a key within a JSON dictionary is obviously not very convenient.

So what we want to be able to do is to be able to give people the notation they're used to using, such as just "name", but have "name" mean that full thing. And this is true in a variety of different places within JSON.

Aaron: Syntactically, then, how would a developer would recognize the differences between JSON and JSON-LD?

Gregg: It's possible that you might not, actually. But typically you do because there's the presence of various keys that use an @ sign – so @id or @type or @context. In fact context is the only one that you absolutely need. Everyone else you can actually alias away using the context – that's the chicken-egg reason we didn't allow context to be aliased.

JSON-LD example showing encoding of description, opening hours, telephone number, and menu

JSON-LD code employing @context and @type. From the Google Developers page for schema.org examples for location pages.

It's possible to not use a context, actually, if it's provided separately. So you can, for instance, treat regular JSON as JSON-LD. If it's served up with application/json, say, you can add a link header that allows you to find the associated context. And for some people that's a nice compromise. The downside of that is that does divorce the actual JSON-LD file from the context.

Aaron: Manu Sporny [an editor – along with Gregg and Markus Lanthaler – of the JSON-LD Recommendation] said that "[the] desire for better Web APIs is what motivated the creation of JSON-LD, not the Semantic Web." How does JSON-LD facilitate the construction of "better Web APIs"?

Gregg: It allows you to construct JSON the way that you always would. There are some constraints, but they're really very easy to work around. So for a developer looking at it, it just looks like JSON such as GitHub might publish. But we can ascribe it meaning by attaching this context. So that's why there's a fair amount of complexity in the processing algorithms of JSON-LD: in order to allow for a variety of different ways in which the actual JSON might be structured, but ensure that it gets down to having unambiguous meaning.

The way that the algorithms do this is by taking the original JSON structure, applying the context which now expands all of the shortened terms, such as "name" – into an unusable mess, perhaps, that starts to look very much like RDF (for those that are familiar with expanded RDF forms such as N-Triples). But then I can now compact it, so it's round-tripable given the right context.

And through other algorithms such as framing you can shape it, so if for instance I'm describing a volume in a library that has books, and chapters, and sub-chapters, these form a structure that if appropriately typed I can use that type, and I can basically create a frame – which is itself a JSON document – and it says this includes this type, includes this type, includes this type using some properties, and you apply that to a document and now your whole document achieves that structure. It's like XSLT. It's also, in some respects, like doing a query. It's like querying into the JSON.

Aaron: In April 2014 the actions vocabulary was added to schema.org, and all of the examples of Actions on the site are provided in schema.org. What's the relationship between schema.org actions and JSON-LD?

Gregg: I think the schema.org powers that be wisely saw that JSON was much easier for people to produce, so that's one aspect of why it's interesting. Versus, say, trying to take the underlying data model, because typically these pages are built out of databases for large systems, so I think to structure that information within HTML using RDFa or microdata is error-prone, and they were noticing a lot of people making errors. Whereas taking something from a data model and doing it as JSON is something that developers easily know how to do. So to try to find a way to marry that into your HTML seems like an obvious choice – using the <script> tag in this case, giving it a type of "application/ld+json".

For the action case what they were looking for is an easy way of describing markup that you might not see on the web – perhaps it's markup within an enhanced email message – and allow you to take action on that. So for instance if it's an airline reservation, a little blob in the <script> tag within the email describes the JSON-LD, describes the reservation, and by using their action vocabulary extension it allows you to do things directly from an email message.

Extracted structured data for a flight reservation encoded with JSON-LD

Extracted structured data for a flight reservation encoded with JSON-LD. Code from the flight reservation page for Google's Actions in the Inbox, as rendered by Google's Email Markup Tester.

So I think there's two parts to that. One is to use things in a context where you're not on a website. And two, so that you have a way of describing information other than by marking up your HTML.

Stéphane: You can also take actions with those JSON-LD embedded in the email. For example, when Dropbox sends you an email inviting you to join a folder, in your preview you have a little box where you can accept or refuse from there – it will send back a ping or something to Dropbox.

Aaron: And that's powered by actions and JSON-LD?

Stéphane: Yes.

Gregg: And Alex Passant had a really interesting early example of using actions by taking existing markup and inserting an action which would allow you to take a description of a playlist and actually play it. By just adding an action that goes out to – I forget exactly which site he used [Spotify] – in order to find the track and play it.

Aaron: Aside from those schema.org examples, can you describe any situations where JSON-LD is now being employed, and/or any future use cases where you think it's a good technology?

Gregg. I think for anyone that's writing a modern web service JSON is an obvious choice, and increasingly Linked Data is important. Even though people don't necessarily know they're using it, the amount of – I think [Google's Ramanathan V.] Guha had a statistic about 25% of the web is marked up?

Phil: 21% of web pages, 6% of domain names.

[Editor's note: Phil is accurately citing numbers provided by R.V. Guha earlier at the Conference, based on a sample of 12 billion web pages. Thanks to Kenichi Suzuki, here's the pertinent slides.]

Gregg: It's pretty remarkable. The penetration out there using these technologies is just stunning. The fact that people don't know that it's RDF under the hood, that's great, you know. So what? It's not about RDF. It's about the solution. I think JSON-LD really serves that quite well, because it follows the natural progression of the way that web applications are developed. It just works.

Stéphane: I think any front-end application querying a service for data could also use that. So it could be like a mobile application, or just a single page application.

Aaron: I'm seeing that a lot – applications using AJAX or AngularJS where there's all sorts of difficulties in rendering the HTML with all that JavaScript, so again that seems a natural choice in those situations.

Gregg: I was very surprised at how little AngularJS you need to do some very interesting things. For instance I'm working with YourSports now who is employing JSON-LD as a fundamental part of their API, and embracing the principles of linked data.

Which to some degree is at odds with what people feel the existing practice of APIs are, where there's templated URIs that are described. I think there was a concession on this yesterday, which is actually very handy for Linked Data, oddly – we're embracing the linked data meme using JSON-LD and some other upcoming technologies for being able to describe our own site and to be able to get access into it. In some cases it's simply a matter of embedding JSON-LD in the web page using a little Angular template which can now interpret that. Now you've got a model described in JSON you can turn that into an actual model that you can respond to from within a little web application. So, I was able to do a vocabulary browser in about a day, which is otherwise quite a sophisticated piece of work.

Aaron: Awesome. A good segue into what I'm about to ask – perhaps a little redundant because we've already discussed some of the developmental problems – but Phil, yesterday you said, "JSON-LD puts the semantic web in the hands of people who don't know what they have, and probably don't care." Can you elaborate a bit on what you meant by that, or perhaps speak to the developmental promise of JSON-LD in that context?

Phil: As Gregg's been saying, the whole point about it is, it is JSON first and RDF second. And the fact that it carries RDF is simply unimportant. And it's particularly unimportant to people who are JSON users – which is basically every web developer these days. Web developers don't want you to send them XML, even though they have the tools to handle it, and they certainly don't want you to start sending them triples and stuff, so it just makes it available in that way.

For historical reasons we could talk about forever there is this antipathy towards the whole semantic web, the whole RDF thing – it raises antibodies in people. Because it's the fashion, a lot of the time. They said its crap, so I'm going to say its crap, because it's the right thing to say.

I got a cheap laugh yesterday. Anyone can get a cheap laugh by just talking about RDF/XML. It's easy to do. You don't have to do a lot of work to get a cheap laugh. "Oh RDF/XML's really bad isn't it," because everyone knows that's when you're supposed to laugh.

So there are people who – sometimes for good reason but very often simply because of that kind of baggage – don't want anything to do with anything that looks like RDF. They don't know. They literally do not know that JSON-LD is a full serialization of RDF, just like triples.

Stéphane: That's why Manu was so adamant about not mentioning RDF in the spec. I think it ended up having to mention RDF somewhere, but I know he fought very hard to not mention RDF.

Phil: A wise choice. The danger in what I'm saying is that it comes across as we're a bunch of smug gits and everyone else is stupid, and I have to be careful to make sure that isn't perceived as being a reality, because it quite clearly isn't.

It's horses for courses, people know about different things, nobody knows about everything, and I'm someone who talks about this stuff rather than does it. To do what Gregg just did in a day would take me a damn sight longer than a day. So I don't pretend to have that kind of hands-on knowledge that a lot of people I work with do have.

People don't need to know everything, they can create really cool applications, and if they find JSON-LD useful – fantastic. If they don't know that it's RDF, I don't care. One or two people do, one or two people say no, no we should tell them, we should tell them "ah, we've got you now, did you know actually what you're doing…." I don't think we should do that at all, I think we should be very happy they're using it. I think, great if you're doing that that means we can work together and we can achieve more.

Phil Archer's keynote address at #SemTechBiz 2014

Phil Archer's keynote address at #SemTechBiz 2014. Photo via Bernadette Hyland on Twitter.

Aaron: Because it is so task-oriented, and successful from that vantage point – this is almost certainly hyperbolic, but nonetheless – in combination with schema.org actions is it possible that JSON-LD in some way represents the much sought-for "killer app" of the semantic web?

Phil: Ah … no schema.org is the killer app of the semantic web. Google is the killer app of the semantic web.

Aaron: On a side note, not directly related to JSON-LD, that's what I always say. To say, "oh, when is the semantic web going to finally make it?" – it's like, "have you never Googled something?"

Gregg: Actually I think I disagree a bit. Neither schema.org nor JSON are applications, they are tools. And they're tools to enable killer apps. I think we've seen quite a number of killer apps come out that are just making use of these tools.

Stéphane: Do you [Gregg] have examples? You said there are examples of killer apps that were enabled with schema.org or JSON-LD?

Gregg: Well, I think there's a number of businesses now that completely depend on being able to mark up data. We wouldn't see the adoption rates we do if businesses weren't finding value in doing schema.org markup. So in that sense all of those businesses are succeeding because of schema.org. Increasingly, perhaps, because of JSON-LD, certainly because of semantic technologies – RDFa and microdata.

Stéphane: I'm thinking of Parse.ly, a New York-based company that sells tracking tools for news companies. They were using their own vocabulary, but now they're using schema.org.

[Editor's note: Parse.ly's integration guide provides a metadata example where both schema.org and JSON-LD are employed.]

Aaron: A final question. In June of 2013 schema.org officially recognized JSON-LD as one of the recommended formats for use with schema.org. But JSON-LD, unlike RDFa and microdata, isn't a mechanism for marking up existing visible HTML content, but instead is what Kingsley Idehen called "structured data islands in HTML documents." This is problematic for Google and Bing because, if they were to accept JSON-LD-provided data at face value, it would be possible for spammers to provide malicious structured data in HTML documents that sported benign presentation layers. Is this an intractable issue of trust, or can you envision mechanisms by which search engines will be able to consume JSON-LD-provided data with confidence?

Gregg: Well, there's a variety of ways to approach that. One way would be to perhaps use some of the upcoming graph signatures work that Manu Sporny's been working on in the context of web payments, to be able to sign JSON that's being sent, so you have a digital certificate way of being able to trust the data.

I think for the purpose of let's say a search engine that is bringing in data that includes JSON-LD, they have a number of mechanisms they currently use to try and determine whether the page is being fraudulent at all. I think that JSON-LD content is just another one.

I think a good practice is that the JSON-LD you provide in the page should actually describe what's on the page. And in the case of a web app, now the page might not have anything other than JavaScript, which built it out, in which case there's no contradiction.

In the case where we have a page which, say, visually looks like a recipe for my wife's magic mineral broth, it perhaps has secretly encoded a recipe for something you might not want to otherwise publish. It should be obvious with a high level semantic analysis that the two are in conflict, and therefore the data's unreliable.

– – – – – – – – – – – – – – – – – – – –

My heartfelt thanks to Gregg, Stéphane and Phil for taking the time to talk with me at length (and so early in the day, to boot)! All three are truly scholars and gentlemen.

Further Resources


{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: