JSON-LD, the Google Knowledge Graph and schema.org SEO

by Aaron Bradley on March 13, 2014

in Semantic Web, SEO

JSON-LD, the Google Knowledge Graph and schema.org SEO

Finally got your head around using RDFa or microdata for marking up HTML documents with schema.org? Prepare to to come to terms with a new protocol that will almost certainly become a vital item in the contemporary digital marketer's tool kit: JSON-LD.

Google today announced two types of information that will be integrated into their Knowledge Graph results.

Official tour date information will now start appearing in the Knowledge Graph, as will contact phone numbers for companies.

In both cases the route to getting into the Knowledge Graph is by employing existing schema.org item types on official websites: ContactPoint nested within Organization for contact phone numbers, and MusicEvent for band concert dates.

These are well-established and well-understood schema.org types. And the appearance of band tour dates in the Knowledge Graph has received extensive coverage in search engine marketing circles – although the business contact numbers have flown a bit lower under the radar, as only the help article referenced above has so far been published (thanks Manu Sporny for the heads-up on that).

The bigger news here is that this is the first time that Google has officially endorsed JSON-LD as a way of providing schema.org information.

It's big news because JSON-LD is a significantly different method of providing structured data to search engines (and other data consumers).

RDFa and microdata – the only two methods of adding schema.org to a website previously sanctioned by Google – are both markup syntaxes. That is, they rely on adding schema.org information directly to the HTML code already present on a page.

JSON-LD (JavaScript Object Notation – Linked Data), by contrast, is an alternative to using HTML markup. JSON is "JSON-based format to serialize Linked Data," meaning it relies on JSON to provide that same schema.org information to data consumers.

So while RDFa and microdata require HTML, JSON-LD can be provided as islands embedded in HTML, or used directly with data-based web services and in application environments.

Here, for example, is some HTML code containing schema.org authorship information marked up with microdata:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
    <title>What's in a Name?</title>
  </head>
  <body itemscope itemtype="http://schema.org/Article">
    <div>
    <h1 itemprop="name">What's in a Name?</h1>
    <p>By <span itemprop="author" itemscope itemtype="http://schema.org/Person"><a href="/author/samuel-jones-md.html" itemprop="url"><span itemprop="name">Samuel Jones</span></a></span></p>
	<p>A name is a terrible thing to waste.</p>
    </div>
  </body>
</html>

Here are those same data provided using JSON-LD:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
    <script type="application/ld+json">
    {
      "@context": "http://schema.org/",
      "@type": "Article",
      "name": "What's in a Name?",
      "author": {
        "@type": "Person",
        "url": "http://authors.airshock.com/samuel-jones-rl.html", 
        "honorificPrefix": "Dr.",
        "name": "Samuel Jones",
        "honorificSuffix": "PhD"
		}
      }
    </script>
    <title>What's in a Name?</title>
  </head>
  <body>
    <div>
    <h1>What's in a Name?</h1>
    <p>By <a href="/author/samuel-jones-jsonld.html">Dr. Samuel Jones, PhD</a></p>
	<p>A name is a terrible thing to waste.</p>
    </div>
  </body>
</html>

As you can see, the JSON-LD – unlike the microdata – is entirely separate from the HTML code where the schema.org values are found, although at the end of the day the same property/value pairs are provided to Google with both protocols.

This represents both a challenge and opportunity for SEO. The challenge is keeping the JSON-LD data in sync with what appears on the page, as it is important for the search engines that the data you're providing to them (via JSON-LD) is the same as the data you're providing to humans (via HTML).

It's an opportunity insofar as SEOs are freed from including structured data within HTML documents. It could conceivably be provided directly in JSON-LD without HTML, or as <script>-encoded islands within documents that might be difficult to mark up (such as AJAX-based web pages).

With this nascent JSON-LD support also comes two new structured data markup testing tools, both of which accept, parse and provide feedback for JSON-LD code: one for musical events (the Events Markup Tester), and another for corporate contact information (the Corporate Contacts Markup Tester).

This is what the Events Markup Tester returns when the first block of example JSON-LD music event code from the Google Webmaster Tools help page on music events is run through it (complete with a helpful suggestion to include a ticket price):

Sample Output from the Google Event Markup Tester

For Google, this is a limited foray into JSON-LD support for schema.org. Aside from these two narrow categories of data, Google has not indicated that JSON-LD is a method of providing them with structured data that they'll respect. And in part, they are almost certainly using these initial integrations in order to test how well JSON-LD works in this context, and to what degree webmasters avail themselves of this (relatively recently developed) protocol.

However, it's a pretty clear sign that JSON-LD is going to loom larger on the SEO stage (it has already been embraced in a major way by semantic web community, and especially among developers in that community).

And – as a concluding aside – the integration of music events and contact phone numbers in the Knowledge Graph demonstrate, again, how being an early adopter of structured data technologies can pay off.

{ 12 comments… read them below or add one }

1 Chris McCoy March 13, 2014 at 9:10 pm

Very excited to see this. Thanks for all the help Aaron.

Reply

2 Hortense Soulier March 14, 2014 at 10:47 am

Always appreciate learning about the more technical aspects of the semantic web and how to integrate that into SEO strategies. I’m not a developer so I’m surely wrong but this seems even more complicated than using schema.org with microdata markup though.

Reply

3 Krystian Szastok March 14, 2014 at 3:20 am

Thanks for a great article!

I do agree – the fact that JSON can be inserted without disrupting current content is really great and should make implementation a lot easier.

I’ll give it a go on my blog too as an experiment. Thanks again.

Reply

4 Adam Lapsley March 14, 2014 at 4:35 am

Very useful thanks Aaron, this is so much easier to achieve than integrating semantic markup in the code. The problem with that method has always been small errors creeping in, it is almost impossible to keep it straight on a large site where the CMS had semantic data grafted on after the event. Having the markup in one place makes it easy to check and test. Will get some corporate contacts set up and see how it goes.

Reply

5 Bryant Jaquez March 17, 2014 at 10:24 pm

this is really cool. I am going to have to learn more about JSON

Reply

6 Paul Watson March 19, 2014 at 6:00 am

I think JSON-LD will be excellent as a way to send structured schema.org data from one application to another via a web service.

Where it falls down for SEO and website performance is in the extra file size it adds to every page.

For example, if you have a 10,000 word article then you have all 10,000 words in the HTML. If you are using RDFa or Microdata with schema/Article then you simply put a property attribute of “articleBody” on the div surround the 10,000 words of article content (and the other properties on the title, author, etc). However if you are using JSON-LD for SEO then you would need to repeat all 10,000 words of the article body in the JSON as well as having them in the HTML for visitors to read. 10,000 words is around 60 KB, so when using JSON-LD in this manner your web page would be 60KB larger and therefore slower to load (especially on mobile devices using non-wifi connections).

So, for web pages that contain large amounts of text that need to be flagged as a structured data property, JSON-LD can cause significant performance problems on mobile devices due to increased page sizes. For smaller chunks of structured data it would be fine, and as a format in which to send data via a web service it’s brilliant. It’s a great new tool, when used appropriately.

Reply

7 Aaron Bradley March 20, 2014 at 9:55 am

Interesting and valid point Paul.

Nothing more clearly brings into view the difference between marking up existing data, and the direct provision of property/value pairs.

I can perhaps see more value to a data consumer with the former than the latter. That is, I think one of the way Google benefits by having articleBody declared in markup is that it allows it to better understand document structure. In some ways it might be more important to Google to know where that data resides so it can access and parse it rather than the actual data itself, if that makes sense. (And as articleBody is a text type, as “data” it is perhaps less useful to Google than the article body in context, which includes non-textual elements like headings, images and their alt attributes, image and – critically – hyperlinks.)

To use an analogy, if there’s a 2MB image on a page, it obviously doesn’t increase the document size by 2MB to declare it with JSON-LD rather than in microdata or RDFa markup, because it’s only a URL – a pointer to the image rather than the image content itself. Thinking out loud it might be – at least for an HTML document – more useful for JSON-LD to refer to that page data an ID, rather than provide that directly in the JSON-LD. How? I’ve no idea, just thinking out loud. :)

And am still thinking: I may pose this as a question to some I know more knowledgeable about JSON-LD than I and report what they have to say.

Reply

8 Ali April 8, 2014 at 12:53 am

I think, if these metadata is used only by search engines, there is no reason to expose it to user clients (e.g. mobile clients). We can use content negotiation to only expose Schema.org meta data to search engines..

Reply

9 lol April 1, 2014 at 8:55 pm

templating…

Reply

10 Tahir Fayyaz April 7, 2014 at 3:49 pm

I wonder what your thoughts are on the similarities between this and a dataLayer for tag management systems (eg Google Tag Manager).

I ask as the format is slightly different but I can see how this markup and the dataLayer could be planned at the same time especially as they are both geared towards marketing.

You can read more about what a dataLayer is here.

There is also a official W3C standard now.

It will be great to see a synergy between this all instead of having totally different naming conventions.

Reply

11 Aaron Bradley May 8, 2014 at 4:16 pm

Interesting analogy Tahir – but I think it is an analogy. A data layer (at least in the Tag Manager context) is basically a JavaScript variable, whereas JSON is the backbone technology for actually transmitting the data objects that consist of key:value pairs (and from that JSON-LD is a way of exchanging data in JSON). So I don’t think it’s only a difference in naming conventions: the data layer of tag management system and JSON (and JSON-LD) are different animals.

Reply

12 John Biundo April 22, 2014 at 12:40 pm

Nice article Aaron.

I landed here during my search for a better understanding of what Google might have in mind for a wider implementation of JSON-LD across more schema.org types.

I’m struggling with a large client site that grafts schema.org microdata markup into a web page that has many dynamically generated components, including stuff generated by a CMS-like system. It is difficult to get them to isolate the semantic markup, and particularly to get complex nesting organized properly. I find myself in the awkward position of parsing the markup into JSON to understand and validate it, then translating it back into a skeleton schema.org microdata structure that represents the target of what I want the final rendered page to look like.

Then the developers have to manipulate the page production process to generate the target structure. The whole process is sluggish, indirect, and very hard to iterate on. Embedding JSON-LD directly as islands in the HTML would provide a far superior and direct path.

Of course we may be trading one demon for another, having to then ensure that the generated markup does in fact match the embedded JSON. I’m sure one of the issues that is holding Google up is a concern over structured data spamming! While our intentions would be to map the structured data one to one with the markup, this disconnection does seem to open the door for spamming if Google doesn’t carefully match up the two. And aside from that, it means that we would have to maintain both sets of data in parallel.

Anyway, those objections (which seem solvable) aside, I am excited by the prospect of using JSON-LD as a direct way to communicate structure on web pages. Thanks for following this development and providing some visibility into where things are headed. I’ll be keeping an eye on seoskeptic for more on this topic!

Reply

Leave a Comment

Previous post:

Next post: