Semantic SEO: Making the Shift from Strings to Things

by Aaron Bradley on October 2, 2013

in Semantic Web, SEO

Semantic SEO

Semantic SEO is the nascent art of optimizing web sites and other web-based resources for semantic search. But, strictly speaking, it's unnecessary to speak of "semantic SEO" or "semantic search" because the reality of contemporary search engines have made the qualifier redundant.

Semantic web technologies are now intrinsic to the way modern search engines work, and organic search marketing strategies need to address this reality.

This is a version of a talk I gave at SMX East in New York City on 2 October, 2013, under the slightly different title "Strategic Semantic SEO".

I think that the subject Mike, Jon and I will be addressing today is a vitally important one for SEO. In fact, I think that the changes wrought by semantic web technologies on the way search engines operate signal a turning point in the way in which SEO is practiced.

Before I start to explore the nature of this change, a quick word about me. I've been a student of semiotics, a librarian, a web designer and an SEO. I don't know if that background that makes me specially qualified to talk about the intersection of search marketing and the semantic web, but it's certainly been what's led to my interest in this field.

And it's a field that's exploding! From the Google Knowledge Graph to Bing Snapshots to Google Hummingbird, all of a sudden semantic search is popping up all over the place, and people are taking notice because, well, it's hard to ignore how search has changed.

How search, I think, has fundamentally changed.

What is this nature of this seismic shift?

Fundamentally it is a shift from strings to things.

“Things, not strings”

“Things, not strings”
The phrase coined by Google to highlight the shift to real-life things in search

From keywords to entities. From the words that are used to describe things to the to the thing being described.

As a catchphrase used to promote the Knowledge Graph "strings to things" is a minor marketing triumph. But as actionable information for SEO the phrase is pretty useless, at least on its own.

Not that SEOs have been idle in their efforts to optimize for semantic search.

In particular, search marketers have been quick to embrace and Google authorship.

Semantic search and the SEO strategy gap

Semantic search and the SEO strategy gap
So far most semantic SEO work has been carried out in a strategic void

These have been relatively easy sells for search marketers, because the reward is obvious and demonstrable: get rich snippets, get a higher CTR from the SERPs, make more money.

These are fine tactics, and they produce good results, but they've mostly been carried out in a strategic void.

As search becomes ever more semantic, marketers need to understand the context of these tactics, so they can develop strategies for semantic SEO, and from there develop effective and innovative tactics of their own.

That's what I hope to help provide today – context, and some strategies that emerge from better understanding that context.

And that context is all about the semantic web technologies without which Google and Bing wouldn't look remotely like they do today.

But don't be alarmed. I don't think semantic web stuff is anywhere near as difficult as people make out, and in the interest of putting my money where my mouth is I'm going to talk about just two really simple principles of the semantic web and, after that, provide you with a three word definition of semantic SEO that, unlike "strings to things," will be actionable in buckets if I'm successful in explaining it to you.

So what about those strings and things?

The first semantic web technology I want to talk about has to do very much with strings, not things, so a quick word on the differences between the two.

In search engine and semantic parlance those "things" are called "entities."

Entities are different than keywords. They're what keywords are used to identify.

Keywords themselves are imprecise.

Different keywords can be used to refer to the same entity.

Different keywords, same entity

Different keywords, same entity
Keywords are not a reliable way of designating real-life things

"Dean Martin," "Jerry Lewis' sidekick" and "the hardest-drinking member of the Rat Pack" all refer to the same named personal entity.

And the same keywords can be used to refer to different entities. I used to pass through Paris from time to time as a kid – but not the Paris that probably comes to mind when you hear the word.

Same keyword, different entities

Same keyword, different entities
Keywords are not a reliable way of designating real-life things

There are many types of entities. "Dean Martin" is an named personal entity. "Paris" is a geographic entity. But entities need not be proper nouns: concepts like "cat" and "desk" are also entities – topical entities.

So how – now finally arriving at the first of those two simple principles I want to discuss – are entities handled in the world of semantic search?

In semantic web applications each entity is assigned a unique identifier.

Unique identifiers

Unique identifiers
For a given domain an identifier, not keywords, represents the thing itself

Unique identifiers allow computers to talk about things: a unique identifier represents the actual thing that a word is talking about. Not a keyword, but the meaning underlying a keyword.

This is a critical distinction, because in the keyword universe there's no canonical "aubergine" or "eggplant" word that can be used to reliably and unambiguously refer to the concept of that particular vegetable, but in the entity universe – the formal universe of the semantic web – there is.

In the semantic web world those unique identifiers tend to be a URL – a URI if you want to get all fancy and semantic-webby – like the Wikipedia and IMDb addresses on the slide.

And URLs make awesome identifiers for a whole whack of reasons, including that fact that they're readily accessible on the web, and that you can provide useful information about the thing at that URL.

What sort of information? That brings us to the other semantic web fundamental that I want to talk about.

The semantic web has a standard for describing things, a description framework that's based on a triple.

As the name suggests a triple is a three part statement about something.

Triples: subject > predicate > object

Triples: subject > predicate > object
Semantic web applications use a standard model for describing resources

A triple is composed of a subject, a predicate and an object.

The subject is what's being described – in the first example Mr. Dean Martin.

The predicate states what thing about the subject is being described – that would be his height.

The object is the value – which can be a piece of text or a number – of the thing about the subject described by the predicate. Here that's that value of Dean Martin's height – 5'10".

This framework enables you to describe pretty much anything about any entity in a format that computers can readily understand. That format is the structure that's referred to in the phrase "structured data."

And if you've ever marked up HTML using microformats, or, or Open Graph, you've been using triples.

Triples in action

Triples in action properties express relationships between a subject and an object

When you publish the code above you're saying to Google, very unambiguously, "this product has the name 'Acme 8 Gigabyte USB Drive'."

And, this in, turn allows you to say other very unambiguous things about the named USB drive, like it has a price of ten bucks, or that 32 people have reviewed it, or that it's blue.

And when put that description framework together with unique URL identifiers, it all starts to get terribly exciting.

No, really!

Because when you're able to figure out what things are and how to find them – unique identifiers – and understand information provided to you about those things – the description framework – you're able to make all sorts of meaningful connections between all sorts of things.

The GDP of France ... and a lot more

The GDP of France … and a lot more
From structured sources, and/or sources Google has parsed and structured

Take a look at these Knowledge Graph results for the query "GDP of France." Aside from the desired answer, you'll see that the results are awash in other entities and information about them.

Among the facts displayed you'll find things like the population of France and the GDP of the UK. Why is Google displaying this information?

It knows through its query logs that people who searched for "GDP of France" also searched for these other figures. But it doesn't know this by simply adding up the occurrences of keywords and keyword phrases of queries executed in the same session.

Related queries:  strings vs. things

Related queries: strings vs. things
Semantic search allows Google to determine the meaning behind queries

It knows this because it has used those keywords and the context of the queries to extract and disambiguate entities (to a unique identifier), and to store statements about them (as triples). Results like this – or, say, the comparison feature of Google Hummingbird – simply wouldn't be possible without the technologies I've described.

This is the semantic web at work, and it's the new face of search.

It changes web pages from isolated islands, to islands joined by billions of bridges. It's a search environment that doesn't only try to provide answers about things, but about the connections between things. And it's the environment for which search marketers require an optimization strategy.

And with that we're back to strings and things.

SEO strategy to date has been focused on keywords – strings describing things.

Traditional definition of “SEO”

Traditional definition of “SEO”
For most of its history SEO has been focused on keywords

While keywords will continue to play a central role in search – precisely because they do describe things – strategies developed for keywords alone, for strings, are inadequate for the dynamic world of things.

Toward a definition of “semantic SEO”

Toward a definition of “semantic SEO”
Entities are a necessary component – but only a component – of semantic SEO

That entities are important for semantic SEO is obvious, but simply replacing "keywords" with "entities" as your optimization target isn't particularly helpful, and it doesn't address what makes semantic search so powerful.

That power is the ability to understand what things are and how they're connected – and it's those relationships you want your web page, or video, or email, or tweet, or pin, or picture, or post to play a role in.

You want your site to make an appearance just at the moment Google connects the dots for a searcher.

You need your search engine optimization strategy to include not just nouns, but verbs.

Semantic SEO is not about optimizing for strings, or for things, but for the connections between things.

“Semantic SEO”

“Semantic SEO”
A shift in focus from matching strings to the relationships between things

Semantic SEO is optimizing for relationships.

The relationships between entities facilitated by the ability to uniquely and unambiguously identify them, and to provide unambiguous data about them.

And if you're successful in this, your presence in search will be extended, and you'll be connected to searchers looking for very specific things. You'll appear not just for "blender," but for "blender recommendations," "good blenders under $200," and "blender under 18 inches tall," along with implicit queries that the search engines are increasingly able to work out from the query context and information about the user, like "blenders recommended by my friends" or "machine for crushed ice margaritas" or "compare blenders and juicers."

As semantic SEO is rooted in the world of things, a logical starting point for semantic SEO strategy is the identification of things, and in particular the things found on your website.

Semantic SEO strategy: identify and disambiguate entities

STRATEGY: identify and disambiguate entities
Extraction utilities and APIs can aid in identifying named and topical entities

There are powerful tools that you can use – like entity extraction APIs – to identify the entities present in your content. Many of these APIs, in fact, lean on the same resources, like Wikipedia and Freebase, used by the Google Knowledge Graph or Bing's Snapshots.

But identifying entities is not unlike the tried-and-true task of identifying keywords to target in search, and a lot of the techniques and tools used in keyword research can be applied to the task – though with the critical difference that entities are actual things that keywords are used to describe.

Just as identifying entities is not unlike keyword research, entity disambiguation is not unlike hunting down and consolidating pages that cannibalize each other – keyword cannibalization – and it isn't conceptually dissimilar to specifying the canonical version of a URL.

Entities:  only as many identifiers as necessary

Entities: only as many identifiers as necessary
Entity cannibalization is the new keyword cannibalization: just say no

However, a site free of pages that cannibalize keywords may have several pages that refer to the same entity – different strings that point to the same thing. In the age of semantic search, using multiple pages to cover off synonyms referring to the same underlying thing is exactly the wrong approach.

Another important approach is to start thinking of your content – or rather the data that resides in that content – in the same way that a search engine does.

Semantic SEO strategy:  identify items and properties

STRATEGY: identify items and properties
Structuring your content will help when it comes to structuring it for search engines

With your entities identified you can then work out the properties associated with them, the types of values you'd expect to see for those properties and – most importantly – the properties and values that are shared between entities.

You may or may not end up creating triples of your own – like marking up code with – but understand that Google and Bing are going to use triples in storing your stuff and processing queries whether or not the data on your page is structured or not.

And approaching your content from this vantage point will help you immensely for all sorts of tasks, from query targeting to site architecture.

I have no time to go into this in detail, but I think this sort of data organization is the keyword analysis of the future – and, indeed, keyword analysis is a crucial tool for organizing data in this fashion.

Of course, a primary means of ensuring that search engines unambiguously understand your entities is to formally declare them and provide information about them.

Semantic SEO strategy:  declare your data

STRATEGY: declare your data
Provide your data in a structured format for machine consumption

And the most obvious way of doing this is with structured data markup. This includes marking up existing code with (using microdata or RDFa), microformats, and Open Graph meta tags.

If a particular type of entity is important to your business, but it isn't a part of any readily usable schema, find a way – any way – of declaring those entities and their properties. Leverage an existing structured vocabulary or, better yet, extend and work at getting the added schema or schemas added to the vocabulary.

Missing in action ... but not necessarily forever

Missing in action … but not necessarily forever
If a resource is absent in you can add it as an extension

Did you know that there is no schema available for video games? I know it's now a mere $15 billion dollar industry in the US, but I nonetheless think that a well-thought out extension that supports the markup of video games would be favorably received.

But is it worthwhile marking up entities for which search result rich snippets aren't currently generated?

In a word, yes.

Search engines are going out of their way to get webmasters to feed them structured data, which suggests that they find it useful for reasons other than producing rich snippets.

No snippet?  No problem!

No snippet? No problem!
Google still wants your data

Where's the rich snippet generated by the "musicBy" property for Where's the rich snippet when you tell Google's about a restaurant's cuisine with the Data Highlighter? Is Google ignoring this information?

No, it's using the data to get a better understanding of the resources being described. And while the promise of rich snippets continues to be the carrot dangled in front of webmasters to encourage the use of structured data markup, ultimately this markup – in the words of the Data Highlighter – helps search engines "understand your site's data."

Finally, if – as I've argued – semantic SEO is about optimizing for relationships, then you need to know how things are connected across your site or sites.

Fortunately, the mechanism for exposing the relationships that exist between things on the web is not a mysterious one: it's the hyperlink.

Structured data provides a method of explicitly declaring relationships between things, but for any type of resource a search engine won't connect the dots when there's nothing connecting them, so you need to ensure that your content is sensibly linked.

Take a product page on an ecommerce site.

Semantic SEO strategy:  map and declare your relationships

STRATEGY: map and declare your relationships
Hyperlink relevant entities and resources, preferably with structured data

Is it linked to similar types of items? To products that belong to that same brand? To an upper-level page that represents the brand on that domain? Does a company blog link to this same page when discussing that brand? Are share buttons on that page connected to the verified accounts of the company? And so on.

While it is necessary it is not, however, sufficient to identify, provide information about and make connections between things on and beyond your pages. For search engines, you must also demonstrate the data you're providing is trustworthy.

Keywords – strings – aren't judged by the search engines on data quality, since they're only indirectly related to data. But when semantically declared entities are in play it's all about data – after all, it's not called structured data for nothing.

So while the search engines can judge how relevant a resource like, say, a web page or video might be for a particular keyword by looking at the keyword universe of that resource, for semantic search they're also concerned about the veracity of data that's been offered.

How do you demonstrate to the search engines that your data is trustworthy?

Semantic SEO strategy:  build trust in your data

STRATEGY: build trust in your data
Trust by verification: Google+, Google’s verified identity network

Certainly use verification methods when they're available, and as they become available.

What makes Google Authorship perhaps the killer search application is Google+. While I'm sure Google has every hope that Google+ will evolve into everyone's favorite social network, it would have enormous value to Google if it contained exactly zero posts, photos and videos by zero contributors. It is a verified identity network that allows Google to disambiguate individuals, businesses and other corporate entities, and connect all of these to websites and website pages.

From a data point of view what a byline says is, "this article was written by so-and-so." When that byline is linked to a verified identity, Google knows exactly who so-and-so is, including, possibly, what people, organizations, social networks, websites and topics to which they have a connection.

Avail yourself of all verification methods

Avail yourself of all verification methods
Verify identities and data with search engines AND social networks

Bing Tags, Twitter Cards and Pinterest Rich Pins are all similar methods of verifying identities, and in turn help search engines and other data consumers see your data as more trustworthy.

In addition to verifying data – and especially in the absence of verification methods – you should ensure that your data is consistent between sources, and even go out of your way to demonstrate that data fidelity.

Building trust with data consistency

Building trust with data consistency
The search engines will trust your data more if it’s consistent across domains

In an ecommerce environment, this means that you should provide the same product information displayed on your site, encoded on the structured data on your site, listed in your search engine product feeds, and anywhere else you might display it – like Facebook, Twitter or Pinterest.

Building trust in ecommerce environments

Building trust in ecommerce environments
Products should be uniquely identified, and ecommerce data kept consistent

Google Shopping now requires unique product identifiers in merchant feeds. Why? To "continue improving data quality on Google Shopping." And Bing, in a move that explicitly demonstrates the principle of data fidelity, has now started offering "Rich Captions" that display product price and availability information if – and only if – the information displayed on the merchant's site is identical to the information provided to Bing in a Product Ads feed. Tied together, of course, by a unique identifier, the product URL.

If these are all good strategies for semantic SEO, what are some expected outcomes of employing them?

Outcomes:  increased search visibility

Outcomes: increased search visibility
From authorship to answer boxes rich snippets are everywhere – even email!

First, you should see improved search visibility in the form of "rich snippets." I use the phrase "rich snippets" in quotes because I mean any sort of enhanced search result, call-out, answer box, vertical and anything else that looks like or is related to Google's Knowledge Graph or Hummingbird, or Bing's Snapshots.

Pete Myers of Moz recently identified 85 – count 'em, 85 – different type of "rich SERPs," and it's likely we'll see even more diversity with time. The era of 10 blue links is well and truly dead, and semantic search killed it.

The less readily visible outcome of effective semantic SEO – and, I think in the long run, the more important one – is that the search engines will come to understand your content much better. You'll "rank" better in the sense that you'll be better associated with the entities referenced by your content – making timely appearances in the SERPs as the search engines make connections on behalf of their users.

For both of these outcomes, measuring success is – alas – currently problematic.

Measurement woes:  it’s still all about keywords

Measurement woes: it’s still all about keywords
Conventional analytics make it difficult to assess semantic SEO success

Reporting on organic search success has, until recently, focused on keywords. Any efforts to classify traffic by the things referenced – as opposed to strings referencing them – require a lot of manual heavy lifting, and there's virtually no way of reliably tracing back a click to an enhanced search result, let alone the type of the answer box or vertical or rich snippet where that result appeared.

And even when it comes to strings, semantic search is rendering keyword data less and less reliable because it facilitates information discovery.

Strings and things don’t always mix

Strings and things don’t always mix
Visits from semantic search can obscure the original query

But I no longer need to walk you through this slide showing how keyword data is being muddied by semantic search – based largely on an excellent and prescient presentation titled "Breaking Up With Your Keyword Data" by our Q&A coordinator Annie Cushing – because in the two or so weeks since I created this deck Google has announced its intention to break up with all of us.

At a general level, successful semantic SEO should result in increased traffic from search, insofar as your content supports it.

But even if that's not the case, you should expect the quality of search traffic to improve because the search engines are better matching the things present in user queries to the things present on your site.

So you'd expect to see higher conversion rates, fewer bounces, increased engagement and more return visits for search-derived traffic.

I'm hopeful that the coming (not provided) apocalypse will stimulate the development of reporting tools and techniques, but its likely that producing metrics for semantic search will remain a challenge for the foreseeable future.

To conclude, semantic search is all about search engines connecting users with data. Make those connections your targets, and let the search engines be your matchmakers.

{ 12 comments… read them below or add one }

Leave a Comment

Previous post:

Next post: