Google Data Highlighter: Markup-Free Structured Data for Google

by Aaron Bradley on December 12, 2012

in Search Engines, Semantic Web

Google Data Highlighter - Markup-Free Structured Data for Google

Google announced today a new, potentially powerful addition to its Webmaster Tools options today:  the Data Highlighter.

The Data Highlighter allows webmasters to select text on a web page and associate it with properties for a particular data type.  At this time the only supported data type is for events, but (perhaps based on lessons learned from this initial rollout) expect more to follow.  Indeed, in the Data Highlighter help section on Google, there is already a page listing "data types supported by Data Highlighter."

While marking up – I mean "highlighting" – a single page with data related to an event without needing to add any additional code to a site is pretty useful in itself, what makes the Data Highlighter a potentially important leap forward in the the provision of structured information to Google is the ability to highlight patterns that pertain similar pages on a site.  In the sphere of events, this means that if your site has multiple, similarly structured pages containing event data, you can highlight event properties on a single page and have Google apply the same logic to pages that have "a consistent format."

The utility of this move is obvious:  it allows webmasters to inform Google of structured information on a site without the need to add additional markup to a page, or even without knowledge of any vocabularies that pertain to the information at hand.   This in turn facilitates the production of rich snippets in Google search results, and provides Google with de facto structured data that it can use in all sorts of situations.

Put another way, if yesterday I wanted to reliably provide Google with structured information about an upcoming event listed on my website, I would need to know a fair amount about schema.org/Events and its properties, encode that information using either microdata or RDFa, and publish or republish the page.  Today, I can trot on over to Webmaster Tools, spend a few minutes highlighting some text and press "publish."

Conceptually, I personally find the Data Highlighter fascinating because the tool replicates for webmasters what Google has been doing behind the scenes for many, many years:  making sense of unstructured and semi-structured data by uncovering consistent patterns on web pages.  Now everyday webmasters have the ability to "train" Google about the presence of consistently-formatted on their site.

Before I get to a walk-through of the Highlighter, a couple of quick initial thoughts about some of the Highlighter features and benefits.

  • The Data Highlighter explicitly names required properties for events.  As schema.org is parser-agnostic, previously the only way webmasters could learn what properties Google required for any given data type was to run a page or some code through the Structured Data Testing Tool and try to decipher the resulting error messages.
  • While the event properties available in the Data Highlighter are clearly based on schema.org/Events it is apparently not a faithful one-to-one mapping (they say that the "information that Data Highlighter can extract for events is slightly different from the event data that you can specify with HTML markup").  Though in their enumeration of the differences between the two, the only thing that appears radically different is the date formats that Data Highlighter allows.  This makes sense as it's unlikely that many websites reliably publish dates on-page in ISO 8601 format (when encoded in microdata or RDFa date information is almost always placed in a non-displaying <meta> tag).
  • Knowledge Graph sources are being extended.  That is, in their video and documentation Google specifically references the Knowledge Graph, and even provides a graphic of how successfully-extracted event information will appear in a Knowledge Graph vertical.  This may not mean a change in Google's Knowledge Graph aim – in the words of Google's Emily Moxley – to "purposely try to show things that are definitely true," but it does represent a vast expansion of what sources Google might consider to be "definitely true."
  • From a trust and proof point of view, "data highlighting" may circumvent the ability of websites to game Google with inaccurate structured data.  While the tool allows for the addition of "missing data" it is still largely predicated on the highlighting of existing, visible content on a website (and one would certainly think that Google would have a higher confidence in highlighted, visible on-page data than that supplied through the "missing tags" interface).  Which may be why (to my point above) Google is willing to push Data Highlighter-supplied information to the Knowledge Graph.

Adding event information to using the Data Highlighter

To access the data highlighter select "Data Highlighter" in the "Optimization" section of Google Webmaster Tools.

Google Data Highlighter - Initial Webmaster Tools Interface

When you click on the big blue "Start Highlighting" button you're required to enter the URL of a page, and to select whether or not your want to tag just the page in question, or the page "and others like it."  Here I selected the single page option.

Google Data Highlighter - URL Input and Tagging Options

Once the page is loaded and a portion of text is highlighted, you assign it to a predefined tag (equivalent to a schema.org property).  As you'll see, I used what isn't really an event page per se (and it happened in the past), but had most of the elements of a "proper" event page.

Google Data Highlighter - Associating On-Page Data with a Tag

Once a piece of information has been tagged, a label appears next to the highlighted text, and the tag content appears in a sidebar.  Sidebar items that Google regards as problematic appear with an "attention" icon next to them.

Google Data Highlighter - Display Once Data Has Been Assigned to a Tag

The Data Highlighter will allow data to be selected from any portion of the page.  Here I've selected my name to populate the "Performer" tag by using the anchor text of my linked WordPress author byline.

Google Data Highlighter - Example Performer Data Selection

Data that isn't visually present on the page can be added to a tag by selecting "Add missing tags" from the settings icon at the upper right, and then following the prompts.

Google Data Highlighter - Intitial Interface for Adding a Missing Tag

Google Data Highlighter - Adding Missing Tag Data

Once you click "Publish" information about the page or pages you've tagged using the Data Highlighter appear in the Data Highligther section of Webmaster Tools.  What data "will become available" once Google has recrawled the site remains to be seen (for example, whether or not successfully published and crawled Highlighter-entered data will appear in the Structured Data report).

Google Data Highlighter - Interface Showing Published Pages

The interface for tagging multiple, pattern-based pages is similar, but includes a progress meter and additional steps (I haven't fully explored this yet).

Google Data Highlighter - Intitial Interface for Tagging Multiple, Pattern-Based Pages

At first blush the highlighting interface is intuitive and seems to work well.  Now it's a waiting game to see when and if rich snippets appear for marked up events, how long it takes for rich snippets or Webmaster Tools data about an event to appear, and whether or not an event tagged through the Highlighter makes its way to the Knowledge Graph.

All of this unlikely with my example event that occurred in the past, but I have many other (and arguably more important) pages to tag with event properties.  I'll keep you posted.

{ 7 comments… read them below or add one }

Leave a Comment

Previous post:

Next post: