With the release of schema.org v2.0 (the too large to fully describe in a single post edition) came a new property I think deserves to be in the spotlight as it resolves some long outstanding issues.
Indicates a page (or other CreativeWork) for which this thing is the main entity being described.
Now I recommend you read the property’s full description on http://schema.org/mainEntityOfPage as it goes on to describe its intended use and where it stands in relation to properties like http://schema.org/about, http://schema/org/sameAs and http://schema.org/url. Properties to which mainEntityOfPage is very closely related and which are part of schema.org’s fundamental building blocks.
Unfortunately though the reason why one should use this property is only is described by:
“Many (but not all) pages have a fairly clear primary topic, some entity or thing that the page describes. For example a restaurant's home page might be primarily about that Restaurant, or an event listing page might represent a single event. The mainEntity and mainEntityOfPage properties allow you to explicitly express the relationship between the page and the primary entity.”
Which is illustrated via the following markup example further down the page (also available in RDFa and JSON-LD):
<div itemscope itemtype="http://schema.org/Restaurant" itemid="#thecafe"> <a itemprop="mainEntityOfPage" href="http://cathscafe.example.com/"><h1 itemprop="name">Cath's Cafe</h1></a> <p> Open: <time itemprop="openingHours" datetime="Mo,Tu,We,Th,Fr,Sa,Su 11:00-20:00">Daily from 11:00am till 8pm</time> </p> <p> Phone: <span itemprop="telephone" content="+155501003344">555-0100-3344</span> </p> <p> View <a itemprop="menu" href="/menu">our menu</a>. </p> </div>
But which I think is an odd example as it doesn’t illustrate the property’s true value by mentioning how it can help webmasters deal with some very nasty and hard to resolve situations.
Before going ahead and explaining this to you though, let me point out some schema.org basics first.
Let’s say you’re a producer of goods (ACME Corporation) and you’re selling your products on your website. One day you decided to lay down some http://schema.org/Product markup on your product listing pages, and because of this each of the products listed on those pages probably contains some markup along these lines:
<li itemscope itemtype="http://schema.org/Product"> <a itemprop="url" href="http://example.com/explosive-tennis-balls/"> <span itemprop="name">Explosive tennis balls</span> </a> ... </li>
But were you aware that by specifying the Product urls you were not only telling the search engines where the products can be found but that you also told them they can be found on a http://schema.org/WebPage?
If you visit http://schema.org/WebPage you can read a description that says (emphasis mine):
“Every web page is implicitly assumed to be declared to be of type WebPage, so the various properties about that webpage, such as breadcrumb may be used. We recommend explicit declaration if these properties are specified, but if they are found outside of an itemscope, they will be assumed to be about the page.”
Meaning that even if you, for example, didn’t specify an individual product page is of the type http://schema.org/WebPage (or any of its sub types like for example ItemPage) than the search engines will assume you did so for you by treating markup like:
<body> <main> <article itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name">Explosive tennis balls</h1> ... </article> </main> </body>
as if you had specified:
<body itemscope itemtype="http://schema.org/WebPage"> <main> <article itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name">Explosive tennis balls</h1> ... </article> </main> </body>
and you should know you can be even more specific by informing the search engines about the type of page they’re dealing with:
<body itemscope itemtype="http://schema.org/ItemPage"> <main> <article itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name">Explosive tennis balls</h1> ... </article> </main> </body>
Now you might think “What’s so special about that?” but are you than also aware the rules for HTML don’t apply to structured data and that parsers threat them both very differently?
An important difference between HTML and structured data
In HTML an < article > nested within the < main > nested within the < body > form hierarchical statements. Yet from a structured data point of view no statement about the relation between the specified entities has been made. Meaning what you’ve accomplished, from a graph point of view, is creating 2 top level entities that have no relation to each other, namely a WebPage and a Product.
Now search engines deal with this by treating the Product as if it’s the main entity of the page by default, and only in those cases where no second top level entity has been specified will they treat the http://schema.org/WebPage itself as the actual subject.
A method that works flawlessly and which doesn’t seem to justify the act of adding a new property like mainEntityOfPage – And so the question is, why did they come up with this new property?
Having multiple top level entities can cause trouble
Throughout the years I've encountered plenty of situations where having multiple top level entities was causing some serious headaches as they [a] resulted in the wrong type of rich snippet being shown in Google’s search results and/or [b] were causing pages to rank for the wrong types of search queries – a situation that can lead to a negative impact on, for example, an ecommerce site’s CTR in search results (and thus the income such a site generates).
An example of a real-life problem
Imagine all individual product pages on ACME Corporation’s site have a sidebar that, amongst others, contains an event widget, and that the event contained within that widget has some http://schema.org/Event markup:
<body itemscope itemtype="http://schema.org/WebPage"> <aside> <section itemscope itemtype="http://schema.org/Event"> <h2 itemprop="name">Acme Product Launch</h2> ... </section> </aside> <main> <article itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name">Explosive tennis balls</h1> ... </article> </main> </body>
Now since it isn't common knowledge that the types of HTML elements used, the order they are in and the way they are nested doesn’t necesarily matter for structured data, folks make the understandable mistake of assuming that because the http://schema.org/Event is nested within an < aside > search engines will understand this markup should be considered secondary content and that the http://schema.org/Product should be considered the page's main content because it’s nested within the < main > element.
However, what actually got created, from a graph point of view, were 3 top level entities which have no relation to each other. The only thing they have in common is the fact they all can be found at the same url:
And it’s exactly this type of situation that can lead to the opposite result of what you probably set out to accomplish – marking up additional information so the search engines have a better understanding of what the content on your page is about. Yet for which the ‘reward’ turned out to be that search engines now show a http://schema.org/Event rich snippet where you were hoping for a http://schema.org/Product rich snippet.
And if you’re really out of luck than you’ll also discover your Product pages all of a sudden only rank for event related queries.
Oh, and than there’s also the case where destiny really tries to have some fun at your expense if it turns out this behavior is different (and inconsistent) for different search engines.
#@!&***%_! What happened?
Well at least this part is easy enough to explain – each individual search engine simply has its own rules (heuristics) for dealing with multiple top level entities (or any other form of structured data for that matter) and they’ll be damned before they'll decide to conform their methods for treating structured data as that’s how they can differentiate themselves and is part of how they compete for market share.
Although if you’re the victim of such a situation you might be inclined to be less understanding – I know I wasn’t that understanding the first couple of times I ran into this and it took me like what seems forever to figure out what was happening.
And so I decided to take some of my issues over to the schema.org mailing list, where, over a period of 1.5 – 2 years, during several discussions it not only became clear that individual search engine heuristics were causing these issues but that this was an unwanted sitation that should be resolved.
mainEntityOfPage – a solution for a real life problem
A solution was found by introducing the mainEntityOfPage property – which works because “every web page is implicitly assumed to be declared to be of type WebPage”.
And thus by simply adding a < link > element within the element you want to specify as being the main entity of the page, and having it’s @href value be the page’s url, you now can tell the search engines which entity they should consider to be the primary topic of the page:
<html> <head> <link rel="canonical" href="http://example.com/explosive-tennis-balls/"> </head> <body itemscope itemtype="http://schema.org/WebPage"> <aside> <section itemscope itemtype="http://schema.org/Event"> <h2 itemprop="name">Acme Product Launch</h2> ... </section> </aside> <main> <article itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name">Explosive tennis balls</h1> <link itemprop="mainEntityOfPage" href="http://example.com/explosive-tennis-balls/"> ... </article> </main> </body> </html>
Which from a graph point of view looks like this:
Being able to specify the relation in both directions
In the original discussions about this topic the first property to get proposed actually was the opposite of what I’ve mentioned so far, namely mainEntity. And only late January this year did schema.org's Dan Brickley suggest it might be a good idea to add a property in the opposite direction as well, which lead to the addition of mainEntityOfPage.
Which means there’s another property I haven’t highlighted yet, because so far I’ve only illustrated what can happen if a secondary entity (Event in this case) has been specified before the primary entity (Product).
Now before I continue, be aware that the same issues I mentioned earlier also can occur when the main entity has been specified first (eg, because the HTML was written in a different order). Which, again, is due to the heuristics of each individual search engine and has little to do with the order of the DOM.
The reason for being able to specify this relation in both directions is to prevent that it can't be specified because the order of the HTML or programmatic issues prohibit a one direction relation being made. And therefore one also has the possibillity to specify the relation in the opposite direction by using mainEntityOfPage's inverse: http://schema.org/mainEntity">
<html> <head> <link rel="canonical" href="http://example.com/explosive-tennis-balls/"> </head> <body itemscope itemtype="http://schema.org/WebPage"> <main> <article itemprop="mainEntity" itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name">Explosive tennis balls</h1> ... </article> </main> <aside> <section itemscope itemtype="http://schema.org/Event"> <h2 itemprop="name">Acme Product Launch</h2> ... </section> </aside> </body> </html>
Which looks like this in a graph:
Let me finish by saying that everything I just demonstrated also can be achieved by using RDFa or JSON-LD, and that the reason why I chose to use microdata is because it’s still the most commonly used syntax out there right now.
I hope you enjoyed the read and that you'll start using the mainEntityOfPage and mainEntity properties for resolving any multiple top level entities disambiguity out there. And if you're interested in knowing more about what schema.org v2.0 incompasses than make sure to read the article my partner in semantic crime Aaron Bradley wrote.