I'm hesitant to either use superlatives or make predictions concerning search engine innovations (I'm the first to deride commentators that use the phrase "game changer" in almost any context), but the joint announcement by Google, Yahoo and Bing introducing schema.org is, in my opinion, pretty big news. Schema.org at once provides a mechanism by which semantic web technologies can become a lot more mainstream, and at the same time offers the possibility of superior search visibility for search marketers that embrace this standardized, structured on-page markup.
Both searchers and publishers of quality content (by which, in this context, I really mean "quality data") stand to gain by the introduction of schema.org. If schema.org is adopted widely, search engine users will potentially have much better answers to more complex queries, and publishers will have a mechanism to provide the search engines with much more detailed information then the engines are currently able (or, in some cases, willing) to digest. This promise rests on the power of structured data.
An Extraordinarily Brief Introduction to Structured Data
Structured data is a mechanism by which relationships between things can be expressed in a machine-readable format. "This computer mouse has the price $29.99," "the father of Jane Doe is John Doe," "a mouse is a mammal," and so on. Structured data separates the presentation layer (what a web user sees) from the data layer (what a computer robot sees): machines consuming structured data don't have to "guess" what a web resource is about, but are provided with very exact information that can be parsed and queried.
The bedrock of structured data is the resource definition framework (RDF), a "a standard model for data interchange on the Web" that permits data to be shared across different applications, and supports the evolution of different schemas over time. RDFa provides a set of attributes that allow the embedding of rich metadata within web documents: that is, the addition of machine-readable attributes to standard XHTML. Microformats allow publishers to add specific attributes to existing HTML/XHTML for a topical realm defined by the microformat, providing machine-readable information for things such as recipes, events and products. Microdata is a (proposed) HTML5 specification that allows for the nesting of semantic information within the code of existing web pages. Like microformats, microdata relies on a supporting vocabulary to describe an item; unlike microformats, microdata allows for (relatively) extensible vocabularies, and presents no risk of conflicting with CSS attributes.
Schema.org is microdata. More specifically, it is a structured data markup schema with a shared vocabulary that easily allows webmasters to embed machine-readable information in their HTML5 code.
Structured data falls under the broad topical umbrella of the semantic web, for which I've previously compiled a list of resources for the beginner. Those interested in learning more about the semantic web as it pertains specifically to search engine optimization might want to check out a presentation I gave on the subject.
Major Implications of Schema.org
The introduction of schema.org, with its support from the major search engines, accomplishes two important things. It normalizes the structured markup supported by the search engines, and it extends the topical domains of presently supported micoformats and structured vocabularies.
From a search engine optimization perspective, the normalization of markup vocabulary now makes it easy for webmasters and SEOs to decide upon which vocabulary to implement if they're starting from scratch. It seems unlikely that the search engines are ever going to abandon structured data, and the flavor of the month is now unequivocally clear. And there is greater incentive to invest the time and effort into producing structured markup, because Bing (which previously, unlike Google, had no acknowledged support for either microformats or RDFa) is on board with the initiative.
Perhaps even more importantly, the adoption of a standard vocabulary by the biggest conduits of web-based information queries – the major search engines – will almost certainly accelerate the adoption of structured data across the web. This is analagous in some ways to the introduction of Facebook's Open Graph protocol. While Open Graph wasn't particularly warmly received by the semantic web community, the benefits of sharing structured data with Facebook resulted in rapid and widespread adoption of Open Graph in the form of the now-ubiquitous Facebook "like" button.
Most importantly, OpenGraph is one component in a wider ecosystem. Its deployment benefits are apparent to the consumer and the developer: add the metatags, get the "likes," know your customers.
Such consumer causality is critical to the adoption of any semantic mark-up. We've seen it before with microformats, whose eventual popularity was driven by their ability to improve how a page is represented in search engine listings, and not by an abstract desire to structure the unstructured. Successful adoption will often entail sacrificing standardization and semantic purity for pragmatic ease-of-use; this is where the semantic web appears to have stumbled, and where linked data will most likely succeed.
Despite the relatively limited topical domains of microformats supported by Google, adoption has been widespread where there's been a demonstrable benefit for search engine visibility (notably with hRecipe). Now that there's a standardized vocubulary respected by all the major search engines, and covering a much broader topical range than those supported by microformats.
While it's anything but an ontology of everything, schema.org does vastly extend the vocabulary available with Google-supported microformats. RDFa, as the schema.org FAQ acknowledges, is much more extensible than either microdata or microformats, "but the substantial complexity of the language has contributed to slower adoption." Certainly the relatively few search marketers I've known with an interest in structured data have almost all focused on microformats, and marking up pages in schema.org microdata is far easier for non-specialists than RDFa.
To the extent that marketers are willing to deploy relatively simple structured data, the schema.org types now supported potentially extends the use of structured data into much more diverse topical realms than microformats. There was, for example, no microformat available by which one could specify the URL of a movie theatre. And schema.org often provides a richer vocabulary than that available with microformats: the schema.org organization type will be welcomed by anyone who has tried to markup complex information about a company using hCard.
Current Search Engine Support for Structured Data and the Road Ahead
The search engines have long been consumers of structured and semi-structured data. In 2008 Google started to display reviews from such sites as Yelp and Citysearch directly in search results as a result of parsing review data, in what was probably the first broad-based appearance of rich snippets in the SERPs. Since then Google, especially, has supported more and more structured data types, including various microformats, RDFa, and the product vocabulary GoodRelations. This is in addition to other types of structured and semi-structured data submitted directly from publishers to the search engines, such as product feeds (such as feeds for Google Products), RSS and XML sitemaps.
As noted, the most obvious manifestation of the search engines' consumption of structured data has been the appearance of rich snippets in the search results: a search engine snippet that provides more information directly in the SERPs than the traditional linked title, description and URL of a web resource. An example is a product snippet in the Web SERPs that includes price, availability and review information.
Despite the broad range of microformats and structured vocabularies for which Google has professed support, what's most typified Google's use of structured data to date is the extremely uneven appearance of rich snippets. Google will sometimes display a rich snippet in web results for a page and sometimes not, despite the availability of specifically Google-supported structured data for that resource.
In the example above, the URL referenced in the shopping results and the URL referenced in the web results contain identical GoodRelations markup, but only the shopping result appears as a rich snippet. While this might be ascribed to the degree of trust Google accords to a given source, just because a rich snippet appears for one result doesn't mean it will appear for a similar result from that same domain.
In short, given the amount of structured data being offered to Google, one would expect to see a far greater number of rich snippets appearing in Google than has actually been the case.
This situation changed somewhat with Google's announced support for improved recipe rich snippets based on RDFa or hRecipe in April 2010, and the introduction of "Recipe View" in February 2011. While Recipe View provided ways with users to refine their searches based on attributes made available with structured data, Google's consumption of structured recipe data has resulted in the generation of far more rich snippets for recipes than in any other topical realm.
Google's embrace of structured data for recipes can be seen as something of a precursor to schema.org, especially as it pertains to Google's confidence in the veracity of structured data. I don't know how successful Recipe View itself has been, but I'm willing to bet that the creation and consumption of structured recipe data has resulted in "better" recipe search results in Google, whether that success is based on the metric of higher CTRs on top results or some of other measure of search satisfaction.
What's really interesting about recipe search results is that, unlike things like consumer products, rich snippets are being fairly consistently displayed in the SERPs. Google seems to have a high degree of trust for recipes coded with hRecipe/RDFa, and there's reasons to think that this trust in data may extend to documents marked up with schema.org types and properties. Schema.org microdata may offer the search engines a superior methodology for evaluating the veracity of structured data. A comment from Alan Bleiweiss on the first Search Engine Land report about schema.org summarizes this admirably:
I can already see scenarios where the engines look at content within these and say “does this belong here, or is this a spammy use of this area of the page?” I know they already evaluate such things to a certain degree, but with the new uniform elements, breaking down pages into consistent uniform blocks will make it much easier for them to do that evaluation within an individual page, across a site, and across competitive sites.
This is, in my opinion, an extremely important point. Providing structured data to search engines is of little use if there's a low probability that the search engines will use it. It is likely that Google's evaluation of recipe structured data concluded that those data were trustworthy, for the simple reason that there's not a lot of incentive for publishers to go to the trouble of producing hRecipe markup unless the resource is actually a recipe. It will be very interesting to see if Google, Yahoo and Bing express the same confidence in schema.org markup: if more varied rich snippets start to appear quickly in the SERPs, this will be an indication that Google et al. have been able to successfully roll in trust measures with their roll-out of schema.org.
The Impact of Schema.org on Search Results and SEO
As suggested above, the likeliest impact of schema.org data on search results will be the appearance of rich snippets for a much broader range of topics. For example, a result for a book search might include the display of the number of pages and ISBN directly in the search results. Related to this is a possible increase in the number of custom search refinements facilitated by microdata, such as those currently offered in Recipe View. One way or another, wide scale adoption of schema.org markup certainly opens up the potential for the search engines to be able to provide more exact answers to a broader range of very specific queries.
Because schema.org microdata allows web publishers to provide attributes for sections of a web page, this will make it easier for the search engines to extract specific information from the content of web pages with less guessing. This is likely to result in the inclusion of more information directly in the search results, as opposed to forcing the user to visit the linked resource. This is not, as per the Einstein query above, uncommon at present, but answers delivered directly in the SERPs could become much more prevalent with schema.org. Certainly the existence of schema.org markup on a page will make it much, much easier for search engines to parse the information that appears on a web page. This, of course, offers something of a challenge for web publishers that want to encourage click-throughs from the SERPs to their web page: there's less reason for a searcher to leave Google if the information they're seeking is displayed directly in the search results.
From this perspective, it seems likely that a web page containing schema.org-compliant markup will have greater visibility in the search results than a page containing similar information, but lacking structured data. So all things being equal, web publishers that include schema.org markup in their code should have a competitive SEO advantage over those that don't. "All things being equal" is a pretty big caveat, and the degree to which this is an actual competitive advantage revolves around the degree of trust that the search engines put in schema.org data. However, it seems unlikely that the search engines have collaborated on a structured data schema without a fair degree of confidence that this schema will pay off in the form of better search results for users.
A bigger conundrum faces publishers that have already been employing microformats or RDFa in their code. While Google says "it’s OK to use the new schema.org markup or continue to use existing microformats or RDFa markup, you should avoid mixing the formats together on the same web page, as this can confuse our parsers." This puts publishers between a rock and hard place: it may not be advisable simply to add schema.org markup to existing code because of this confusion, but leaving things as is fails to realize the benefits of Bing's adoption of schema.org. As one of the stated goals of schema.org is to offer a common vocabulary that the search engines agree upon, the prospects for continued non-schema.org structured data support (let alone search engine support for new microformats or structured data schemas) seems slim.
One may also expect to see an increase in the amount of semantic spam being fed to the search engines. I explored this to some degree in a previous post on the subject of trust in the semantic web, but schema.org potentially makes it much more attractive for nefarious publishers to misrepresent their data in the interests of increased traffic from search engines. The degree to which the search engines are able to readily evaluate the veracity of schema.org data will be a determining factor is whether it's actually worthwhile to try to spam the search engines in this manner, which in turn may have bearing on how much the search engines trust (and so draw upon) microdata attributes in general.
Initial Reactions from the Semantic Web Community
Even early on, it's clear that schema.org has evoked two very different reactions from those in the semantic web community, many of whom have been working on structured data for a long time. On one hand, schema.org may the equivalent of a "killer app" for the semantic web that finally results in the wide scale adoption of structured data that most semantic web researchers think is long overdue. This is best summarized by the opening paragraph of a blog post from Structured Dynamics CEO Michael K. Bergman about schema.org (the post title, Structured Web Gets Massive Boost, is a pretty good summary in itself):
In my opinion, perhaps the most important event for the structured Web since RDF was released a dozen years ago was today’s joint announcement by the search engine triumvirate of Google, Bing and Yahoo! releasing Schema.org. Schema.org is a vendor specification for nearly 300 mini-schema (or structured record definitions) that can be used to tag information in Web pages. These schema are organized into a clean little hierarchy and cover many of the leading things — from organizations to people to products and creative works — that can be written about and characterized on the Web.
On the other hand, as Bergman acknowledges in his post, those that spent years on RDF and RDFa see this as the rejection of a superior set of structured data standards in favor of an inferior schema, and bemoan how these efforts are now potentially undervalued (Jay Myers, the trail-blazing web developer that brought GoodRelations to Best Buy, tweeted that "there's just nothing quite like throwing away years of vocabulary/ontology work").
Following up on a tweet by Italian semantic web researcher Irene Celino where she said she was "astonished & disappointed" by schema.org, I asked her about the reason for these feelings, to which she was kind enough to reply:
Bear in mind this is only *my* very personal point of view, and other Semantic Web-ers could partially or totally disagree.
I was already quite disappointed by the W3C to standardize microdata instead of RDFa within HTML5, since the latter is (1) much more expressive and (2) strictly connected to the Linked Data efforts of the Semantic Web community.
The fact that Schema.org FAQ explicitly suggest to drop RDFa is even worse, especially after Yahoo and Google supported the adoption of GoodRelations for product description. Of course they are free to choose the format they like, but somehow they are saying "if you want to appear in search results follow our rules". Instead, as Web site owner I'd say "dear major search engines, do your best to keep up with what the Web is offering you to index and do not restrict the natural evolution of the Web and the _data_ Web sites offer, whatever their format".
This has so far been a common reaction from semantic web researchers, and the two viewpoints taken together are bittersweet in aggregate: isn't it great that the search engines have made this massive stride toward embracing the semantic web, but isn't it lousy that the specific standard they've adopted is microdata. The debate within the semantic web community will be interesting to watch as schema.org markup starts to appear. It will be also interesting to see what this means for the future of microformats, given that there's basically no longer any reason to employ them for SEO purposes (as of time of writing, there's been no reaction yet on the microformats blog or from their Twitter account).
[Update, 6 June: I'd be remiss not to mention here Manu Sporny's eloquent polemic The False Choice of Schema.org, which I discovered after publishing this post.]
Search engine marketers are a lot more accustomed to prescriptive directives from the search engines, so the reaction from the SEO community so far has generally been favorable. As a long time advocate of leveraging structured data for search engine visibility, I'm personally pleased that there's now a common structured data vocabulary for SEOs to stand behind (as little as a month ago, I was asking about whether to employ hCard or RDFa for the best representation of company information in search, now a moot question). And while, like Celino, I'm disappointed by the limitations of schema.org compared to RDFa, I certainly do anticipate an easier time of it when I try to sell the semantic web to web publishers concerned with their search engine visibility.
Should coding pages using schema.org markup now become the priority task for on-page optimization? Perhaps not, but schema.org certainly can't long be ignored in for SEOs seeking a competitive advantage and superior search engine visibility.