JSON-LD, the Google Knowledge Graph and schema.org SEO

by Aaron Bradley on March 13, 2014

in Semantic Web, SEO

JSON-LD, the Google Knowledge Graph and schema.org SEO

Finally got your head around using RDFa or microdata for marking up HTML documents with schema.org? Prepare to to come to terms with a new protocol that will almost certainly become a vital item in the contemporary digital marketer's tool kit: JSON-LD.

Google today announced two types of information that will be integrated into their Knowledge Graph results.

Official tour date information will now start appearing in the Knowledge Graph, as will contact phone numbers for companies.

In both cases the route to getting into the Knowledge Graph is by employing existing schema.org item types on official websites: ContactPoint nested within Organization for contact phone numbers, and MusicEvent for band concert dates.

These are well-established and well-understood schema.org types. And the appearance of band tour dates in the Knowledge Graph has received extensive coverage in search engine marketing circles – although the business contact numbers have flown a bit lower under the radar, as only the help article referenced above has so far been published (thanks Manu Sporny for the heads-up on that).

The bigger news here is that this is the first time that Google has officially endorsed JSON-LD as a way of providing schema.org information.

It's big news because JSON-LD is a significantly different method of providing structured data to search engines (and other data consumers).

RDFa and microdata – the only two methods of adding schema.org to a website previously sanctioned by Google – are both markup syntaxes. That is, they rely on adding schema.org information directly to the HTML code already present on a page.

JSON-LD (JavaScript Object Notation – Linked Data), by contrast, is an alternative to using HTML markup. JSON is "JSON-based format to serialize Linked Data," meaning it relies on JSON to provide that same schema.org information to data consumers.

So while RDFa and microdata require HTML, JSON-LD can be provided as islands embedded in HTML, or used directly with data-based web services and in application environments.

Here, for example, is some HTML code containing schema.org authorship information marked up with microdata:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
    <title>What's in a Name?</title>
  </head>
  <body itemscope itemtype="http://schema.org/Article">
    <div>
    <h1 itemprop="name">What's in a Name?</h1>
    <p>By <span itemprop="author" itemscope itemtype="http://schema.org/Person"><a href="/author/samuel-jones-md.html" itemprop="url"><span itemprop="name">Samuel Jones</span></a></span></p>
	<p>A name is a terrible thing to waste.</p>
    </div>
  </body>
</html>

Here are those same data provided using JSON-LD:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
    <script type="application/ld+json">
    {
      "@context": "http://schema.org/",
      "@type": "Article",
      "name": "What's in a Name?",
      "author": {
        "@type": "Person",
        "url": "http://authors.airshock.com/samuel-jones-rl.html", 
        "honorificPrefix": "Dr.",
        "name": "Samuel Jones",
        "honorificSuffix": "PhD"
		}
      }
    </script>
    <title>What's in a Name?</title>
  </head>
  <body>
    <div>
    <h1>What's in a Name?</h1>
    <p>By <a href="/author/samuel-jones-jsonld.html">Dr. Samuel Jones, PhD</a></p>
	<p>A name is a terrible thing to waste.</p>
    </div>
  </body>
</html>

As you can see, the JSON-LD – unlike the microdata – is entirely separate from the HTML code where the schema.org values are found, although at the end of the day the same property/value pairs are provided to Google with both protocols.

This represents both a challenge and opportunity for SEO. The challenge is keeping the JSON-LD data in sync with what appears on the page, as it is important for the search engines that the data you're providing to them (via JSON-LD) is the same as the data you're providing to humans (via HTML).

It's an opportunity insofar as SEOs are freed from including structured data within HTML documents. It could conceivably be provided directly in JSON-LD without HTML, or as <script>-encoded islands within documents that might be difficult to mark up (such as AJAX-based web pages).

With this nascent JSON-LD support also comes two new structured data markup testing tools, both of which accept, parse and provide feedback for JSON-LD code: one for musical events (the Events Markup Tester), and another for corporate contact information (the Corporate Contacts Markup Tester).

This is what the Events Markup Tester returns when the first block of example JSON-LD music event code from the Google Webmaster Tools help page on music events is run through it (complete with a helpful suggestion to include a ticket price):

Sample Output from the Google Event Markup Tester

For Google, this is a limited foray into JSON-LD support for schema.org. Aside from these two narrow categories of data, Google has not indicated that JSON-LD is a method of providing them with structured data that they'll respect. And in part, they are almost certainly using these initial integrations in order to test how well JSON-LD works in this context, and to what degree webmasters avail themselves of this (relatively recently developed) protocol.

However, it's a pretty clear sign that JSON-LD is going to loom larger on the SEO stage (it has already been embraced in a major way by semantic web community, and especially among developers in that community).

And – as a concluding aside – the integration of music events and contact phone numbers in the Knowledge Graph demonstrate, again, how being an early adopter of structured data technologies can pay off.

{ 19 comments… read them below or add one }

1 Chris McCoy March 13, 2014 at 9:10 pm

Very excited to see this. Thanks for all the help Aaron.

Reply

2 Hortense Soulier March 14, 2014 at 10:47 am

Always appreciate learning about the more technical aspects of the semantic web and how to integrate that into SEO strategies. I’m not a developer so I’m surely wrong but this seems even more complicated than using schema.org with microdata markup though.

Reply

3 Krystian Szastok March 14, 2014 at 3:20 am

Thanks for a great article!

I do agree – the fact that JSON can be inserted without disrupting current content is really great and should make implementation a lot easier.

I’ll give it a go on my blog too as an experiment. Thanks again.

Reply

4 Brandon Grimes February 18, 2015 at 1:53 pm

How did that work for you? Did rich snippets show up in Google SERPS?

Reply

5 Aaron Bradley February 19, 2015 at 12:01 pm

Thanks for your comment Brandon, but I’m afraid I don’t know what “that” refers to, as I didn’t outline any procedure I was trying out, nor did I reference rich snippets. Can you be more specific? Thanks.

Reply

6 Brandon Grimes February 19, 2015 at 12:06 pm

Hi Aaron,

Well I guess what I’m trying to figure out is if I create schema using JSON-LD and implement the code via Google Tag Manager; will Google read that and place the rich snippets in their SERPs. I have did this for one of my clients about 4 weeks ago and nothing has appeared in the SERPs. So i was wondering if you I was doing something wrong. I used the Google structured data tool, and everything looks great, however like I mentioned its been a month and hasn’t appeared in SERPs yet.

Thanks. Hopefully that clears up my question.

Reply

7 Aaron Bradley February 19, 2015 at 12:39 pm

That’s helpful indeed Brandon – thanks.

First, there’s only a limited number of rich types for which JSON-LD will generate rich snippets in Google; they’re outlined here (summarized near the end of the post if you want to skip directly there).

Second, I’m not an expert in Google Tag Manager (or JavaScript), so I don’t know whether or not the Tag Manager-bound code you’re implementing is consumable by Google or not. Fortunately Simo Ahava is such an expert, and has written on this subject here.

In regard to verifying whether or not Google is correctly indexing your JSON-LD, you should be able to see the relevant data in the Webmasters Tools structured data report, as I’ve outlined here. Again, though, as per my first point, that Google acknowledges your JSON-LD-encoded data doesn’t (yet) mean that they’re going to use it to generate rich snippets (and even if a rich snippet type is officially acknowledged by Google to be supported by JSON-LD that doesn’t necessarily mean they’ll produce one, just as proper microdata- or RDFa-encoded schema.org that supports rich snippets doesn’t always actually result in a rich snippet being generated).

8 Adam Lapsley March 14, 2014 at 4:35 am

Very useful thanks Aaron, this is so much easier to achieve than integrating semantic markup in the code. The problem with that method has always been small errors creeping in, it is almost impossible to keep it straight on a large site where the CMS had semantic data grafted on after the event. Having the markup in one place makes it easy to check and test. Will get some corporate contacts set up and see how it goes.

Reply

9 Bryant Jaquez March 17, 2014 at 10:24 pm

this is really cool. I am going to have to learn more about JSON

Reply

10 Paul Watson March 19, 2014 at 6:00 am

I think JSON-LD will be excellent as a way to send structured schema.org data from one application to another via a web service.

Where it falls down for SEO and website performance is in the extra file size it adds to every page.

For example, if you have a 10,000 word article then you have all 10,000 words in the HTML. If you are using RDFa or Microdata with schema/Article then you simply put a property attribute of “articleBody” on the div surround the 10,000 words of article content (and the other properties on the title, author, etc). However if you are using JSON-LD for SEO then you would need to repeat all 10,000 words of the article body in the JSON as well as having them in the HTML for visitors to read. 10,000 words is around 60 KB, so when using JSON-LD in this manner your web page would be 60KB larger and therefore slower to load (especially on mobile devices using non-wifi connections).

So, for web pages that contain large amounts of text that need to be flagged as a structured data property, JSON-LD can cause significant performance problems on mobile devices due to increased page sizes. For smaller chunks of structured data it would be fine, and as a format in which to send data via a web service it’s brilliant. It’s a great new tool, when used appropriately.

Reply

11 Aaron Bradley March 20, 2014 at 9:55 am

Interesting and valid point Paul.

Nothing more clearly brings into view the difference between marking up existing data, and the direct provision of property/value pairs.

I can perhaps see more value to a data consumer with the former than the latter. That is, I think one of the way Google benefits by having articleBody declared in markup is that it allows it to better understand document structure. In some ways it might be more important to Google to know where that data resides so it can access and parse it rather than the actual data itself, if that makes sense. (And as articleBody is a text type, as “data” it is perhaps less useful to Google than the article body in context, which includes non-textual elements like headings, images and their alt attributes, image and – critically – hyperlinks.)

To use an analogy, if there’s a 2MB image on a page, it obviously doesn’t increase the document size by 2MB to declare it with JSON-LD rather than in microdata or RDFa markup, because it’s only a URL – a pointer to the image rather than the image content itself. Thinking out loud it might be – at least for an HTML document – more useful for JSON-LD to refer to that page data an ID, rather than provide that directly in the JSON-LD. How? I’ve no idea, just thinking out loud. :)

And am still thinking: I may pose this as a question to some I know more knowledgeable about JSON-LD than I and report what they have to say.

Reply

12 Ali April 8, 2014 at 12:53 am

I think, if these metadata is used only by search engines, there is no reason to expose it to user clients (e.g. mobile clients). We can use content negotiation to only expose Schema.org meta data to search engines..

Reply

13 lol April 1, 2014 at 8:55 pm

templating…

Reply

14 Tahir Fayyaz April 7, 2014 at 3:49 pm

I wonder what your thoughts are on the similarities between this and a dataLayer for tag management systems (eg Google Tag Manager).

I ask as the format is slightly different but I can see how this markup and the dataLayer could be planned at the same time especially as they are both geared towards marketing.

You can read more about what a dataLayer is here.

There is also a official W3C standard now.

It will be great to see a synergy between this all instead of having totally different naming conventions.

Reply

15 Aaron Bradley May 8, 2014 at 4:16 pm

Interesting analogy Tahir – but I think it is an analogy. A data layer (at least in the Tag Manager context) is basically a JavaScript variable, whereas JSON is the backbone technology for actually transmitting the data objects that consist of key:value pairs (and from that JSON-LD is a way of exchanging data in JSON). So I don’t think it’s only a difference in naming conventions: the data layer of tag management system and JSON (and JSON-LD) are different animals.

Reply

16 John Biundo April 22, 2014 at 12:40 pm

Nice article Aaron.

I landed here during my search for a better understanding of what Google might have in mind for a wider implementation of JSON-LD across more schema.org types.

I’m struggling with a large client site that grafts schema.org microdata markup into a web page that has many dynamically generated components, including stuff generated by a CMS-like system. It is difficult to get them to isolate the semantic markup, and particularly to get complex nesting organized properly. I find myself in the awkward position of parsing the markup into JSON to understand and validate it, then translating it back into a skeleton schema.org microdata structure that represents the target of what I want the final rendered page to look like.

Then the developers have to manipulate the page production process to generate the target structure. The whole process is sluggish, indirect, and very hard to iterate on. Embedding JSON-LD directly as islands in the HTML would provide a far superior and direct path.

Of course we may be trading one demon for another, having to then ensure that the generated markup does in fact match the embedded JSON. I’m sure one of the issues that is holding Google up is a concern over structured data spamming! While our intentions would be to map the structured data one to one with the markup, this disconnection does seem to open the door for spamming if Google doesn’t carefully match up the two. And aside from that, it means that we would have to maintain both sets of data in parallel.

Anyway, those objections (which seem solvable) aside, I am excited by the prospect of using JSON-LD as a direct way to communicate structure on web pages. Thanks for following this development and providing some visibility into where things are headed. I’ll be keeping an eye on seoskeptic for more on this topic!

Reply

17 sunny February 25, 2015 at 4:39 am

Great post and I think JSON-LD will be excellent as a way to send structured schema.org data from one application to another via a web service.

Reply

18 Evert July 20, 2015 at 6:00 am

There seems to be no explicit link between a @type and an html-element. How does a bot make that link? (with microdata the relaionship is obvious because that data resides inside the tag, not so with json-ld).

Reply

19 Aaron Bradley July 20, 2015 at 2:37 pm

Thanks for your comment Evert. And you’re right, there’s no link between @type (or any other JSON-LD property/value pair) and the HTML of the page on which that JSON-LD may appear.

In the case of microdata, RDFa (or, for its own classes, microformats) a bot needs to parse HTML to extract meaning because the data is expressed using the attributes of HTML tags. But in the case of structured data the bot doesn’t actually care about the HTML, as the HTML is just a carrier of the meaning (the data) the bot is endeavoring to uncover.

JSON-LD simply provides this data to the search engines directly: there’s no link between the JSON-LD code and the HTML, as there’s no relationship between the two. A search engine may choose to look at the HTML so it can compare it against the data that’s been declared with JSON-LD to try and judge if that JSON-LD accurately reflects the content of the page, but there’s no intrinsic reason why the bot needs to look at the HTML at all in its ingestion of structured data provided by JSON-LD.

Accordingly, from a purely functional perspective, one not need provide any “matching” HTML content alongside JSON-LD at all. For example, when specifying corporate contact numbers to Google, Google recommends placing JSON-LD on a site’s actual “contact” page “for ease of maintenance as phone numbers change over time” but it does not require it there: “any other page (including your home page) is also acceptable.”

Reply

Leave a Comment

Previous post:

Next post: