I've just come from the Semantic Technology and Business Conference in San Francisco, and it was both thoroughly enjoyable and highly instructive.
I was joined there on the panel "Search Marketing in a Semantic World" by my esteemed colleagues Matthew J. Brown and Barbara Starr. Our panel discussion was lively and, I hope, interesting and elucidatory for those that attended: thanks Matt and Barbara!
The conference sessions were not only interesting, but in aggregate both helped consolidate some thoughts about search marketing and the semantic web that have been on my mind for some time, and provided me with new perspectives on the relationship between the two disciplines.
Insofar as these ruminations led to a eureka moment it is this: an understanding of the semantic web is not simply a useful accessory for SEOs, but is now required by any search marketer who wants to excel in their chosen profession.
Semantic web technologies, I was reminded time and time again, are now intrinsic to almost every operation performed by the search engines. Bing and – especially – Google would be radically different and decidedly inferior products without this technological foundation.
It follows, then, that any search marketer serious about maximizing the visibility of specific websites in the current search engine environment needs a decent working knowledge of the relevant semantic web technologies that now in large part fuel the query results we see today.
"The semantic web" is unwieldy label, and incorporates a large number of specific, highly complex technologies that 99% of search marketers are likely to neither understand nor ever use. But some of the semantic web concepts that are now core to how the search engines work are readily within the mental grasp of most search marketers, and are critical to their future success.
Let me try to explain the most of important of these to you.
Welcome your new overlord, the entity
Keyword optimization, at least as traditionally practiced, is soon to gasp its last breath. However much Google and Bing's robots may parse a bunch of text, what they're interested in, take great pains to understand, and ultimately store are entities: the people, places and concepts that make up the underlying meaning of a page. In short, search is no longer about words, but about the things to which the words on a web page make reference.
And when I say "web page" I am actually using this as accessible shorthand for "web resource." Images, videos and even individual pieces of data may be described in terms of the entities they contain and make reference to. Indeed, this is one of the fundamental reasons why the search engines' shift in focus from keywords to entities is so transformational and so powerful: a resource no longer needs to be associated with keywords to be useful for the search engines and to be used by them.
Why is this important? It's important because when Google receives a user query it's increasingly not trying to provide a match for the query keywords, but (informed, whenever it's possible for them to do, by the context of the query) to understand the meaning underlying the query, and then return information about the entities it has identified.
Google even has a snappy new mantra to summarize this seismic shift, first used when it launched its Knowledge Graph: "things not strings."
Your new overlord has a unique identifier
"But Bradley, aren't entities ultimately referred to by a bunch of words? Don't you need words – whether or you call these 'keywords' or not – to label an entity?"
In the world of the semantic web entities are stored as unique, web-accessible identifiers. These identifiers take the form of URIs (uniform resource identifiers): to all intents and purposes you can think of these as URLs.
That's the difference between Wikipedia and (for those of you who are familiar with it) DBpedia: Wikipedia is made up of a bunch of words, whereas DBpedia is made up of a bunch of entities, each with a unique identifier. The singer Tom Jones has this URI:
That this happens to include the name of the entity is incidental to how DBpedia encodes URIs: an identifier can be composed of any alphanumeric string that resolves via HTTP. The important point is that it is a permanent URI that uniquely identifies an entity (if you want to get really confused look into the related concept of dereferencing, and ponder that hapless redirect, the 303).
One of the major strengths of such an identifier is that it allows search engines to disambiguate entities and collapse references to them. A single URI identifies the singer Tom Jones, T. Jones and even "that guy who wrote 'What's New Pussycat'" – the last of these, of course, lacking any keywords that form part of the singer's name.
From a practical perspective having a grasp on this concept is helpful because you can then understand that what you need to help Google identify on a web page isn't keywords, but the entities underlying the page's textual content.
For those of you already using schema.org this is exactly the sort of helpful information you're providing to Google by employing the vocabulary. When you mark up the phrase "Paris Hilton" using schema.org/Hotel it knows you're referring to a hotel in Paris, France rather than the celebrity Paris Hilton (which, if it was she you meant, you would mark up with schema.org/Person).
Google's Knowledge Graph is through and through fueled by entities, to each of which Google has assigned a unique identifier (although this identifier is not accessible to non-Googlers). That is why if you ask the question "what is the GDP of France?" Google is able to determine your intent and present the figure. The context of the query in combination with entity disambiguation allows it to understand that "GDP" means "gross domestic product" (there's an identifier for that), and that you want to know the value of the gross domestic product for the country France (there's an identifier for that).
Because these entities are stored as triples (of which I'll have more to say later), it is also likely to display the GDP for the United States, as well as the population for France, because it knows that users that search for the GDP of France frequently look for these figures as well – and is able to do so because it understands all of these entities and the relationships between them.
While I want to stress that this is a loose analogy, when it comes to unique entity identifiers SEOs are already familiar with the similar concept of canonicalization. Just as an SEO works to ensure that different URLs don't point to identical pages, Google works to ensure that different URIs in their backend don't point to the same entity (one of the main day-to-day tasks undertaken in support of Google's Knowledge Graph is consolidating duplicates).
The revolution is fueled by mobile
To paraphrase one thing that Jason Douglas of Google said early in his talk on the Knowledge Graph, search is changing because the lives of Google's users are changing: they now carry their computers with them.
This embrace of mobile computing impacts search behavior in a number of important ways.
First, it makes the process of refining search queries much more tiresome. As anyone who uses search engines knows, it's anything but uncommon to find that an initial query produces unsatisfactory results. Users are required to modify their search terms – adding or removing words – in order to arrive at the desired set of search results. This process is known as refinement, and often several refinements are necessary to produce decent results.
While refining queries is never a great user experience, on a mobile device (and particularly on a mobile phone) it is especially onerous. This has provided the search engines with a compelling incentive to ensure that the right search results are delivered to users on the first go, freeing them of laborious refinements.
Second, the process of navigating to web pages, and moving back and forth between search results and the web pages they reference, is trivial on desktop computers but a royal pain on a hand-held mobile device.
This situation provides a compelling incentive for the search engines to circumvent additional web page visits altogether, and instead present answers to queries – especially straightforward informational queries – directly in the search results.
While many in the search marketing field have suggested that the search engines have increasingly introduced direct answers in the search results to rob publishers of clicks, there's more than a trivial case to be made that this is in the best interest of mobile users. Is it really a good thing to compel an iPhone user to browse to a web page – which may or may not be optimized for mobile – and wait for it to load in order to learn the height of the Eiffel Tower?
This also sheds considerable light on the usability impetuous behind Google+ Local. A well-formed Google+ Local Page enables Google to display things like business hours and an interactive map in a mobile-friendly fashion in response to a query like "Jones Aquarium Supplies" (which is, of course, an entity).
Finally, mobile devices make lookup functions like airline flight information more difficult. Semantic technologies facilitate linking information about a personal entity (like you) and information about another relevant entity (like Air Canada flight AC632) in a mobile-friendly fashion. Like the Knowledge Graph, Google Now would not be possible without semantic technology, and this also explains Google's push behind allowing actions to be encoded as structured data in Gmail.
Some new acronyms and a bit of a learning curve are in your future
As I suggested earlier, search marketers don't need to become semantic web experts, but they do need to come to terms with some basic semantic web concepts and technologies if they don't want to become yesterday's news.
Key concepts for search marketers
For all search marketers, I think at a minimum they need at least a conceptual grasp of two key concepts.
Entities are … oh, wait, I think I may have mentioned entities once or twice above. Oh well, it bears repeating, and if you can only be bothered to give some thought to single subject relevant to both the semantic web and search engines this is it.
To provide one further perspective on the subject, and to forestall a comment from a particular user, here's another way of looking at entities compared to keywords. In computer science lingo keywords are literals – Google's "strings." You'll be pleased to hear that entities – Google's "things" can be formally described with a word with which you should already be well acquainted: "hyperlinks" (because, of course, they are identified by URIs you can resolve using HTTP).
Underpinning the semantic web is a standard called RDF: the resource description framework.
For the everyday SEO technical knowledge of RDF is not required. What's important is concept of the triple that underlies RDF.
A triple is the mechanism by which a relationship between any two entities may be expressed. It consists of a subject, a predicate and an object. A triple can be as simple as this statement:
Aaron lives in Vancouver
Triples are immensely powerful things because they allow you not only describe any relationship between any two things, but then map relationships between other statements about those things.
Because entities in the semantic web have unique identifiers, there's no ambiguity about what Aaron or what Vancouver is being referenced. And information about those entities from other triples – like Aaron knows John, Aaron knows Bob, John lives in Vancouver, Bob lives in Toronto – enables queries like "who does Aaron know who also lives in Vancouver" to be easily and accurately answered (yes, the correct answer is "John").
Think of an ecommerce-related query like "canary diamond bangle." In the world of keywords without entities or triples, a search engine would be forced to rely largely on the presence or absence of those keywords in the content in order to return a relevant product result.
But with statements stored as triples like "canary is the same as yellow" and "bangle is a type of bracelet" a search engine can readily understand that the product page titled "Yellow Diamond Bracelet" is a good match for the query "canary diamond bangle."
Internal ecommerce search has had these capabilities for a long time, though of late they have been based more and more on formal semantic web technologies, ultimately because of the power of triples. (The act of product classification falls to people called taxonomists, who were in the past only found in disciplines like biology and library science. Guess who I ran into at one of the conference lunches? A couple of taxonomists from REI.)
If you poke around the semantic web a bit you'll eventually run into the term "triplestore." Since you now know about triples, you'll find triplestores easy to understand. A triplestore means exactly what you'd expect it to mean: a collection of triples. They are the databases of the semantic web.
For an overview of the semantic web (that either directly or indirectly references these key concepts) I cannot recommend enough this video produced by Manu Sporny, embedded below. You've made it this far: what's another six minutes?
Key technologies for technical SEOs and SEO developers
In order to make the most of their search marketing efforts in the future, technical search marketers will need to have at least a working knowledge of some of the key semantic web technologies that – in some form or another – are now employed by the search engines.
I won't go into any of these in great detail, as I want to quickly get to some concluding thoughts relevant to all SEOs (this brevity also saving me – I'm not a developer – from misrepresenting any of the technical details).
As previously discussed, RDF is the main framework used by the semantic web, facilitating the encoding and linking of triples (in more technical terms it is a graph data model that uses URIs.)
RDFa and microdata
These are both attribute-based syntaxes that allow semantic data to be marked up directly in HTML (RDFa stands for "the Resource Description Framework in attributes"). From a practical standpoint knowing RDFa and microdata will be more important to most search marketers than knowing RDF, as these are the chief mechanisms for marking up code with schema.org – but a good grasp of RDF is required for any developer who wants or needs to dive deeper into semantic web technologies.
Again, you'll find a good textual introduction to RDFa from Juan Sequeda and an excellent video on RDFa from Manu Sporny (you might appreciate the fact that this video's subtitle is "how to procrastinate at work by learning RDFa"). You'll also want to check out a simpler version of RDFa recently developed – that is especially well-suited to marking up code with schema.org – called RDFa Lite.
Just as SQL is the standardized query language for relational database, SPARQL is the standardized query language for RDF. It is specifically designed for querying triples.
Yet again, you'll find a useful introduction from Juan Sequeda.
The squeaky wheel gets the schema.org/Grease
The adoption rate of schema.org has exceeded the most optimistic expectations of the search engines, and schema.org came up time and time again at SemTechBiz, usually in contexts that had nothing to do with search engines.
Based in large part on its initial success, the people behind schema.org are eager to see it expanded and refined. While initial reactions to schema.org were mixed, the semantic web community has largely rallied around it because they acknowledge that is has resulted in large-scale adoption of a semantic web technology.
Search marketers can help themselves, the SEO community and the semantic web community by being more active in helping to develop this resource. If you can identify a need in your topical domain for the addition of a new schema.org type, or additional properties for an existing type, don't hesitate to pursue it.
The primary platform for this is the public-vocabs (a.k.a. web schemas) mailing list on W3C. You can subscribe directly from that list or find out more about it and the Web Schemas task force (the group that "runs" schema.org) on the Web Schemas Wiki page.
The Wiki is also the place where you'll find the list of current and past proposals for schema.org. If you have an idea for a schema.org extension, it's useful to review this proposal list to learn more about the process of extending schema.org, which is anything but formal. In general, you'll have better traction with schema.org extension proposals if you can bring together several players in your industry to work on collaboratively building the extension (and so presenting a united front), and providing real-life use cases that demonstrate the potential usefulness of the extension.
Before diving into public-vocabs directly – especially if your vocabulary idea is in the nascent phase – starting a W3C community group is a great way to get the effort under way and find more collaborators. At the conference panel on schema.org Sandro Hawke also highlighted the support that's available for vocabulary development at W3C.
Don't be intimidated by the public-vocabs mailing list format, the lack of clear procedures for proposing an extension, the highly technical discussions that often take place on the list, or the, um, strong personalities that you'll sometimes encounter. The search marketing community has been one of the key drivers of schema.org adoption, but has had virtually no presence when it comes to higher-level schema.org discussions: this has to change.
Modify your conference calendar
I started out by mentioning that I served with Matthew J. Brown and Barbara Starr on a panel where we discussed search marketing and the semantic web. The number of search marketers on that panel – three – also represented the entire number of search marketers present among the 500 or so attendees at SemTechBiz San Franciso.
The Semantic Technology & Business Conference series is an excellent way of dipping your toe in the semantic web water, and getting better acquainted with the technologies that you'll need to excel as an SEO in the years ahead.
As an immediate step I also urge to you to join your local semantic web Meetup Group, if one exists in your area (I can't put together a reliable query string for you – go to meetup.com and search for "semantic web"). This is obviously a good thing too for those without the wherewithal or inclination to take in a semantic technology conference.
This outreach cuts both ways. While this post is directed at search marketers, I think anyone in the semantic web community would get a lot of value by attending a search marketing conference. SMX and SES are both great search marketing series that have an international scope. And there's no shortage of SEO Meetup groups out there.
Given the considerable overlap between interests and activities, the chasm between the search marketing and semantic web communities is almost bizarrely vast. Any efforts that result in better communication between these two communities will ultimately benefit both.
A parting definition
What is semantic search? I've saved this excellent definition provided by Tamas Doszkocs of WebLib, as – for those of you not previously familiar with concepts like entities and triples – it will now make a lot more sense:
Semantic search is a search or a question or an action that produces meaningful results, even when the retrieved items contain none of the query terms, or the search involves no query text at all.
As someone that grits his teeth whenever the "SEO is dead" meme makes an appearance, I make this statement in a somewhat tongue-and-cheek fashion, but I'm sure you'll appreciate (now having made it to the end of this post – thanks!) the sentiment behind it: the keyword is dead.