Google Data Highlighter: Markup-Free Structured Data for Google

by Aaron Bradley on December 12, 2012

in Search Engines, Semantic Web

Google Data Highlighter - Markup-Free Structured Data for Google

Google announced today a new, potentially powerful addition to its Webmaster Tools options today:  the Data Highlighter.

The Data Highlighter allows webmasters to select text on a web page and associate it with properties for a particular data type.  At this time the only supported data type is for events, but (perhaps based on lessons learned from this initial rollout) expect more to follow.  Indeed, in the Data Highlighter help section on Google, there is already a page listing "data types supported by Data Highlighter."

While marking up – I mean "highlighting" – a single page with data related to an event without needing to add any additional code to a site is pretty useful in itself, what makes the Data Highlighter a potentially important leap forward in the the provision of structured information to Google is the ability to highlight patterns that pertain similar pages on a site.  In the sphere of events, this means that if your site has multiple, similarly structured pages containing event data, you can highlight event properties on a single page and have Google apply the same logic to pages that have "a consistent format."

The utility of this move is obvious:  it allows webmasters to inform Google of structured information on a site without the need to add additional markup to a page, or even without knowledge of any vocabularies that pertain to the information at hand.   This in turn facilitates the production of rich snippets in Google search results, and provides Google with de facto structured data that it can use in all sorts of situations.

Put another way, if yesterday I wanted to reliably provide Google with structured information about an upcoming event listed on my website, I would need to know a fair amount about schema.org/Events and its properties, encode that information using either microdata or RDFa, and publish or republish the page.  Today, I can trot on over to Webmaster Tools, spend a few minutes highlighting some text and press "publish."

Conceptually, I personally find the Data Highlighter fascinating because the tool replicates for webmasters what Google has been doing behind the scenes for many, many years:  making sense of unstructured and semi-structured data by uncovering consistent patterns on web pages.  Now everyday webmasters have the ability to "train" Google about the presence of consistently-formatted on their site.

Before I get to a walk-through of the Highlighter, a couple of quick initial thoughts about some of the Highlighter features and benefits.

  • The Data Highlighter explicitly names required properties for events.  As schema.org is parser-agnostic, previously the only way webmasters could learn what properties Google required for any given data type was to run a page or some code through the Structured Data Testing Tool and try to decipher the resulting error messages.
  • While the event properties available in the Data Highlighter are clearly based on schema.org/Events it is apparently not a faithful one-to-one mapping (they say that the "information that Data Highlighter can extract for events is slightly different from the event data that you can specify with HTML markup").  Though in their enumeration of the differences between the two, the only thing that appears radically different is the date formats that Data Highlighter allows.  This makes sense as it's unlikely that many websites reliably publish dates on-page in ISO 8601 format (when encoded in microdata or RDFa date information is almost always placed in a non-displaying <meta> tag).
  • Knowledge Graph sources are being extended.  That is, in their video and documentation Google specifically references the Knowledge Graph, and even provides a graphic of how successfully-extracted event information will appear in a Knowledge Graph vertical.  This may not mean a change in Google's Knowledge Graph aim – in the words of Google's Emily Moxley – to "purposely try to show things that are definitely true," but it does represent a vast expansion of what sources Google might consider to be "definitely true."
  • From a trust and proof point of view, "data highlighting" may circumvent the ability of websites to game Google with inaccurate structured data.  While the tool allows for the addition of "missing data" it is still largely predicated on the highlighting of existing, visible content on a website (and one would certainly think that Google would have a higher confidence in highlighted, visible on-page data than that supplied through the "missing tags" interface).  Which may be why (to my point above) Google is willing to push Data Highlighter-supplied information to the Knowledge Graph.

Adding event information to using the Data Highlighter

To access the data highlighter select "Data Highlighter" in the "Optimization" section of Google Webmaster Tools.

Google Data Highlighter - Initial Webmaster Tools Interface

When you click on the big blue "Start Highlighting" button you're required to enter the URL of a page, and to select whether or not your want to tag just the page in question, or the page "and others like it."  Here I selected the single page option.

Google Data Highlighter - URL Input and Tagging Options

Once the page is loaded and a portion of text is highlighted, you assign it to a predefined tag (equivalent to a schema.org property).  As you'll see, I used what isn't really an event page per se (and it happened in the past), but had most of the elements of a "proper" event page.

Google Data Highlighter - Associating On-Page Data with a Tag

Once a piece of information has been tagged, a label appears next to the highlighted text, and the tag content appears in a sidebar.  Sidebar items that Google regards as problematic appear with an "attention" icon next to them.

Google Data Highlighter - Display Once Data Has Been Assigned to a Tag

The Data Highlighter will allow data to be selected from any portion of the page.  Here I've selected my name to populate the "Performer" tag by using the anchor text of my linked WordPress author byline.

Google Data Highlighter - Example Performer Data Selection

Data that isn't visually present on the page can be added to a tag by selecting "Add missing tags" from the settings icon at the upper right, and then following the prompts.

Google Data Highlighter - Intitial Interface for Adding a Missing Tag

Google Data Highlighter - Adding Missing Tag Data

Once you click "Publish" information about the page or pages you've tagged using the Data Highlighter appear in the Data Highligther section of Webmaster Tools.  What data "will become available" once Google has recrawled the site remains to be seen (for example, whether or not successfully published and crawled Highlighter-entered data will appear in the Structured Data report).

Google Data Highlighter - Interface Showing Published Pages

The interface for tagging multiple, pattern-based pages is similar, but includes a progress meter and additional steps (I haven't fully explored this yet).

Google Data Highlighter - Intitial Interface for Tagging Multiple, Pattern-Based Pages

At first blush the highlighting interface is intuitive and seems to work well.  Now it's a waiting game to see when and if rich snippets appear for marked up events, how long it takes for rich snippets or Webmaster Tools data about an event to appear, and whether or not an event tagged through the Highlighter makes its way to the Knowledge Graph.

All of this unlikely with my example event that occurred in the past, but I have many other (and arguably more important) pages to tag with event properties.  I'll keep you posted.

{ 7 comments… read them below or add one }

1 Dean Cruddace December 13, 2012 at 2:40 am

Thanks for taking the time to write this up Aaron, I did notice very early on that the Data Highlighter does not want to play with recently published non indexed pages.

For me I may well be waiting for this this tool to bed in and incorporate other properties before letting it loose on a client site and stick to hard coding Schema.org, I will however be playing with it across other personal sites of mine.

Reply

2 Aaron Bradley December 13, 2012 at 1:28 pm

Thanks Dean! And when it’s possible to do so, actual coding using schema.org is absolutely the way to go as – of course – this makes the data available to other search engines and other data consumers aside from Google.

Reply

3 Dan December 21, 2012 at 3:37 pm

Thanks for the info.

Can wait to see this tool working for other schema.org classes.

Reply

4 Dan January 7, 2013 at 9:18 am

Thanks for the info, I’ve had a crack at it. The interface was pretty slick. Hitting F5 until something happens in Google now haha.

Reply

5 Pri May 22, 2014 at 12:19 am

Interesting and I recently discovered this tool. However how much help does this tool provide in Search rankings for a website like http://www.cinnamon-kitchen.com? I tried highlighting an events page but unsure if it’s really useful in Search marketing strategis and should i be doing it for selective few pages or all?

Reply

6 Aaron Bradley May 22, 2014 at 10:39 am

Upcoming events are good to mark up, but given the nature of the site I’d say that the “Local Businesses” and “Restaurants” data highlighter categories are the most relevant to it.

Reply

7 Maury Markowitz July 1, 2014 at 2:03 pm

I’ve had mixed results with this tool. There are a couple of interacting problems that seem to be biting me.

For one, the tool only shows you a subset of your pages to markup. I supposed it’s supposed to figure out the pattern from the samples, but in my experience it had a hard time doing that. So I don’t really know if it’s working on all my pages, and there’s no way to check.

Another issue is that it seems to get very confused by content that moves. On my layout the header information isn’t nicely presented like your own, so things like the categories sometimes move down a line. Google doesn’t seem to understand vertical whitespace, and I get all sorts of failures. Same if the title of the article is more than one line, causing the date to push down a line too.

Finally, in spite of Google almost always finding the “image”, they aren’t showing up in search results. This is *really* annoying. If I go to the highlighter, there’s the image, and it’s the right one every time. But none of them show up in search results. In the past it would default this out by putting in your G+ profile pic, but they changed that recently so now my listings are image-free. :-(

This is a very good tool, but needs tweaking!

Reply

Leave a Comment

Previous post:

Next post: