A Website's Structured Data Success Story

by Jarno van Driel on August 16, 2015

in Semantic Web, SEO

A website's structured data success story

The topic of structured data has a rapidly growing audience looking into options beyond rich snippets. But some webmasters, if not most, are not sure what to do next. Many have questions like What are the benefits of publishing structured data?, Does it influence rankings?, and can its impact be measured?.

The slide deck of a presentation I gave during SmartData Conference 2015 tries to answer these types of questions by telling the success story of a website that was semantically optimized by using schema.org and other protocols.

Details are provided on what was optimized, data showing semantically-enriched webpages generated substantially more search engine-referred traffic than non- or less-enriched control groups did, and insights gained by re-using structured data to measure its impact on search.

Slide deck transcription

#002 – Welcome to A website's structured data success story.

#003 – My name is Jarno van Driel and I'm an In-house SEO for Sanoma Digital The Netherlands.

In 1995 I got a degree in Mechanical Engineering but when I was introduced to the world-wide-web in 1997 I discovered I found this much more exciting than I did Mechanical Engineering, and so in 1998 I switched careers and became a web designer.

During the early 2000's I specialized in building accessible websites and applications, when early on in that period I noticed that when one builds websites according to accessibility best practices and guidelines these sites quickly start to do well in search.

A realization that sparked an interest for SEO and after that it didn't take me long to make the switch from building sites to optimizing them for search.

Then around late 2007/early 2008 the news broke that Drupal CMS would introduce a semantic technology called ‘RDFa' into its core. Something that immediately drew my attention since I had plenty of experience in how much could be achieved by using semantic HTML, and thus I couldn't wait to find out what I might be able to achieve with RDFa.

Unfortunately though, getting familiar with RDFa in that time was still very tricky, mainly due to the fact there were very few resources available on the web explaining how to properly use it, while support for it by the search engines was experimental at best.

Luckily though, just around the same time I was getting ready to throw the towel into the ring in regards to the whole idea of structured data, I ran into the Google+ community Semantic Search Marketing, started by Aaron Bradley.

#004 – As I got involved in this community I discovered it had a growing group of members with interests in various aspects of semantic technologies and the possibilities these offer for the future. Something that helped me learn a lot and for which I give back by regularly helping folks out with their structured data issues.

#005 – Which is how I ran into cosmeticsurg.net Autumn 2013 when Cosmeticsurg's Leeza Rodriguez joined Semantic Search Marketing and asked if anybody could advice her on how she could publish valid schema.org/BlogPosting markup from within their WordPress CMS.

I helped her out by showing how they could modify their WYSISWYG editor's configuration so that it would allow microdata attributes as well as some additional HTML elements, which made it possible for her to start publishing proper schema.org/BlogPosting markup.

#006 – Then, late January 2014, a request came in asking if I was available for writing some microdata plus schema.org for their medical procedure pages.

#007 – A pretty straightforward request, and since I still was working as a freelance SEO at that time I was always looking for new projects to start.

#008 – So we called to discuss what she had in mind.

#009 – But wouldn't you know it… During that call it quickly became clear this project would be anything but straightforward.

#010 – I learned that in Spring 2012 the people involved in the site had noticed the site's traffic and leads were on a steadily declining path.

#011 – Something this Google Analytics graph illustrates as it shows that by late February 2012 the site's traffic turned into a steadily declining curve all the way through to January 2014.

#012 – A type of situation where I quickly tend to turn to an SEO tool provided by a company called Searchmetrics, which SEOs can use to monitor and analyze a range of different metrics.

My favorite feature is their Visibility Score Graph – a feature that provides an indication of how well a website is doing in Google Search according to formulas Searchmetrics developed. The reason why I like to use this feature is because it is quite successful in picking up on algorithmic changes made by Google, as illustrated by the steep fall-off mid October 2011.

#013 – Often when you encounter such a steep fall-off this implies either one of two things namely, a site has been hit by a Google Penalty (which wasn't the case here), or a site has been hit by an algorithmic change possibly even a series of algorithmic changes. Either way, reasons enough for some serious concerns.

#014 – Concerns the client confirmed, while also being able to provide an immediate candidate for the algorithmic hit, namely Panda.

#015 – Now with Panda I don't mean this one.

#016 – But this one: New and important site quality algorithms Google introduced late February 2011. Algorithms created by Google so they could better recognize and reward high-quality websites while devaluating low-quality websites in its search results.

#017 – Algorithmic changes that seemed to be so successful Google deployed them worldwide less then 2 months after introducing them in the US.

#018 – Now of course with any major algorithmic change Google makes there will always be winners and losers. And so to help webmasters with any Panda related issues they might have, Google engineer Amit Singhal wrote a 23-point list in which he tried to provide insight into the ideas, research and mindset these algorithmic changes were based on.

#019 – A list the people involved in the site took at heart and to which they responded by doing a range of things.

#020 – Now as you can see for yourself, they deployed quite a comprehensive list of actions, yet despite their efforts, in May 2013 their site took another big hit. Which was the moment the site's Chief of Marketing decided to try reach out to Google Engineer John Mueller hoping he could provide some help or insights.

Luckily John Mueller was very helpful and after some time had past by the most important remark he made was that the Panda algorithms were having a hard time recognizing the overall quality of the website and that they should keep focusing on fixing things mentioned in the 23-point list by Google engineer Amit Singhal.

#021 – I compared the activities they deployed starting summer 2012 yet the only thing I could find was that the site kept getting negatively hit, by nearly every Panda update Google rolled out, and that none of the activities the client deployed had generated any positive outcome.

#022 – Something they themselves concluded as well and why they came up with the idea of trying to turn things around by marking up pages with structured data.

#023 – Now making such a statement may sound like a good idea if you say it fast enough, but in reality it's something that doesn't make any sense at all really.

This is because the average Panda victim tends to have all kinds of serious content issues that should be addressed before even thinking about applying any form of structured data markup.

Reason for this is that, if a site's content is a mess, adding structured data markup to that won't yield any positive result at all. You'll only end up adding an additional layer of complexity to content which Google already has a problem with.

And in the event you do end up doing a good job in adding markup, than the most likely outcome of that effort will be that you succeeded in helping Google understand even better that your content is not what it needs to be to rank well.

#024 – Now in the weeks that followed, I spend some time studying their actions, the site's topical hierarchy, as well as it's content and came to the conclusion that I couldn't pinpoint why the Panda algorithms weren't able to determine the overall quality of the site

Sure, every SEO finds something when first looking at a site, even when others have been working on it before, and so of course I also found some things that needed to be fixed, like the way they had implemented structured data for example.

But overall the site came across as a pretty healthy site with loads of in-depth content to work with.

#025 – And with that in mind I started imagining things I might be able to do for this site, thus forming a rough outline, which I then proposed to the client.

#026 – First off, I suggested they'd stop doing everything they had been doing for a period of one year! A radical proposal which the client needed to agree with if we wanted to be able to measure the effects of reworking the site's structured data.

Reason for this is, is that Google unfortunately doesn't provide any insights into whether or not structured data markup helps in realizing better organic traffic results or not.

And since the request made by the client was to try to turn things around by means of structured data, we needed to create conditions under which we hopefully would be able to generate some actionable data.

Secondly I suggested we should focus on getting their structured data fundamentals in place first, so that we would have a proper foundation to build further upon later on in the project.

And thirdly I suggested that once the first half of the project was done we would take a break to analyze the outcome of what we had done so far, to try and see if we could form a hypothesis as to why the site had been negatively hit by Panda.

Which then hopefully would lead to ideas as to how we could try to attempt to turn things around by means of structured data.

#027 – We quickly came to an agreement, and thus the time had come to move ahead.

#028 – First order of business on the agenda was getting an idea of what state Google's Knowledge Graph was in after everything that had been done until thus far.

#029 – Now the first thing I noticed was that when you searched for the branded term ‘cosmeticsurg' only 1 business entity showed up in Google's search results where there should have been 2.

#030 – This is because Cosmeticsurg consists out of 2 business entities at the same location, namely a practice and a surgery center. Relations Google didn't seem to pick up as it only showed the practice in its search results, almost as if the surgery center didn't exist.

Something, which seemed strange given the fact both business entities, had their own Google Business page set up correctly; meaning Google had access to the correct information yet somehow wasn't able to display this in Google's search results.

#031 – Now getting multiple businesses located at the same address to show up in search results has been a hot topic for SEOs for a long time already, especially amongst SEOs specializing in local search. And unfortunately, even until this day, this can still be cause for some extremely difficult to resolve situations.

#032 – And as if having two different types of businesses at the same address wasn't difficult enough already, we also had to deal with a third entity.

#033 – A third entity that isn't a business entity but a person, namely the surgeon.

This is because a physician's office doesn't perform any surgery as that's something a surgeon, attached to the physician's office, does.

At the same time a surgeon doesn't perform surgery at a physician's office but at a hospital or surgery center.

So for Cosmeticsurg's surgeon to be able to perform surgery at their location they need to have their own certified surgery center, so the surgeon can be attached to the surgery center as well and perform medical procedures there.

Meaning that Cosmeticsurg as an organization actually is a triptych consisting out of a practice, a surgery center and a surgeon.

#034 – Which lead to the question: What can we do about this?

#035 – Now by looking at how search engines have started building knowledge graphs by aggregating information from all kinds of resources spread out over the web, and bringing that information together within their own data warehouses, I came up with the idea of doing something similar within the website: We would build our own mini knowledge graph consisting out of the three main entities involved in Cosmeticsurg.

#036 – Since the site already contained quite some microdata + schema.org markup we decided that it would be easiest if we continued down that path. And so I started building a knowledge graph in microdata + schema.org.

A graph I chose to build in the footer of the site due to the way the site's layout was set up and because it already contained most of the information I needed.

Second reason for using the footer was that by doing so I would open up the possibility of other entities connecting to that data later on in the project.

#037 – And thus I marked up the three main entities contained in the footer of the site by specifying:

  • The practice as the leading entity in the graph, which is specified as an schema.org/Physician – a subtype of schema.org/LocalBusiness.
  • The surgery center as the second entity in the graph, which is specified as an schema.org/MedicalClinic – also a subtype of schema.org/LocalBusiness
  • The surgeon as the third entity in the graph, which is specified as a schema.org/Person

Now all three entities contain their own unique property value pairs as well as sharing some of their data, thus chaining all three together in a way that resembles the actual state of things in a way that should make sense to the search engines.

#038 – In an attempt to provide additional information about the three main entities, as well as trying to strengthen the validity of on-site statements made, I referred to all kinds of external enumerations by specifying multiple schema:sameAs properties for all three entities; External enumerations like Freebase, DBPedia, American Society of Plastic Surgeons and a whole range of other resources, while also being able to specify which social media account belonged to which entity.

#039 – This resulted in an on-site knowledge graph that looks something like this:

  • The orange discs represent the three main entities: schema:Physician, schema:MedicalClinic and schema:Person, and their property value pairs
  • The blue discs represent second level entities like for example schema:PostalAddress, a type that both the schema:Physician and schema:MedicalClinic share by means of their individual schema:address properties
  • The green disc represents a third level entity, which in this case is an schema:BusinessFunction with an enumerated value of ‘ProvideService' for the schema:Offer both the schema:Physician and schema:MedicalClinic make

#040 – But just when we were about to release, Google decided to update Panda on May 16th 2014. A fact that, given the partial recovery the site made, was good news but since we were also running a case study this meant it would probably interfere with our traffic stats.

#041 – Thus we decided to adjust our plans by delaying our update roughly 6 weeks, hoping that the site's traffic stats would have stabilized by that time, so that we would have enough of a baseline to compare any future data against.

#042 – Late June 2014 we finally released our update and calmly waited to see if something would happen.

#043 – Than we hit the jackpot! roughly 2 weeks before Bill Slawski gave a relevant presentation, right here at this venue last year, during which he touched on some of the issues we were trying to resolve for Cosmeticsurg.

#044 – When we searched for the branded term ‘cosmeticsurg', the SERPs local pack suddenly contained 2 business entities with the same NAP information, while the Knowledge Panel was showing a Google map containing 2 markers.

#045 – When activating the surgery center's search result listing the Knowledge Panel would show its Google Business page information.

#046 – And when activating the practice's search result listing the Knowledge Panel would show the practice's Google Business page information.

#047 – Now one of the new features Google released over the last year was the ability to easily find a business address, by searching for a company's name together with the term ‘address', as show in this example.

A new type of featured snippet that immediately worked for Cosmeticsurg.

#048 – Another new feature Google released was the ability to easily find a business telephone number by searching for the company's name together with the term ‘phone', as shown in this example.

Again a new type of featured snippet that immediately worked for Cosmeticsurg.

Interesting note about this featured snippet is that the phone number contained in this snippet actually is a link that, when activated, opens a Google hangout that immediately starts dialing the phone number.

#049 – And not long after Freebase was closed down for any further editing, Google's search result pages changed again.

This time, when I searched for the query ‘ricardo l rodriguez' in combination with the prefix ‘dr', as well as the suffix ‘md', the search result page would contain a Knowledge Panel showing the practice's Google Business page.

Yet below that a Knowledge Card showed up containing two different entities, namely the surgeon as well as the practice, due to the fact the search query overlapped information both entities contain.

#050 – When I selected the practice contained within the Knowledge Card, I would get a new search result page, of which the query had changed to the branded term ‘cosmeticsurg', while the Knowledge Panel contained some of the information Google has in its Knowledge Graph about the practice.

#051 – Yet when I performed a search that contained the practice's legal name I would get a search result page that shows a Knowledge Panel containing the practice's Google Business page information without any Knowledge Card below it.

#052 – Though when I searched for the term ‘ricardo l rodriguez' without any prefix or suffix, Google would still show a Knowledge Panel filled with the practice's Google Business page information, but the Knowledge Card below it only contained one entity, namely the surgeon.

#053 – And finally, when I selected the surgeon's entity contained in the Knowledge Card, I would get yet another search result page in which the query changed to ‘dr ricardo rodriguez', while the Knowledge Panel contained some of the information Google has in its Knowledge Graph about the surgeon.

#054 – In all enough examples of search result pages to be able to state that building our own Knowledge Graph resulted in what we were aiming for.

Biggest lesson we learned from this?
Triples rule the Knowledge Graph!

#055 – In case you'd like to know more about how Knowledge Graphs work and how triples fit into this I suggest you read Aaron Bradley's summary of SemTechBiz 2013, as it has some great additional information about some of the topics I've spoken about until thus far.

#056 – Now getting the main organizational entities sorted out by creating our own Knowledge Graph concluded the first part of getting the structure data fundamentals in place. Second item on the list was adding social media markup.

#057 – The site's blog contained some markup in its article pages due to a plugin but nevertheless there were still some very big gaps to fill. And therefore…

#058 – Markup was added for Twitter's Summary Card

#059 – and Summary Card with Large Image

#060 – Open Graph article markup was added for Facebook

#061 – But getting Google+ right needed some sorting out

#062 – Because the site's blog no longer really served as a blog but had grown into a repository of articles mostly about all kinds of medical topics, I decided the markup should reflect this and thus schema.org/BlogPosting became schema.org/Article.

Something for which we had to rework the CMSs article templates, and so I also took the opportunity to chain each article to the data contained in the footer of the site, as was planned earlier on.

#063 – This resulted in schema.org/Article graphs that look something like this:

  • The purple disc represents the schema.org/Article, together with its property value pairs
  • The orange discs represent second level entities like: schema:Physician, schema:MedicalClinic and schema:Person, schema.org/Comment and schema.org/Article, of which each entity also has its own property value pairs
  • The blue discs represent third level entities like for example schema:MedicalSpecialty, an entity that both the schema:Physician and schema:MedicalClinic share by means of their individual schema:medicalSpecialty properties
  • The green disc represents a fourth level entity, which in this case is a schema:BusinessFunction with a value of ‘ProvideService' for the schema:Offer both the schema:Physician and schema:MedicalClinic make.

#064 – Which led to the desired results for Google+ shares.

#065 – But overall this didn't lead to any results that would help the site fully recover.

Now looking at the traffic spikes throughout the year one might get the feeling the markup did help out to achieve some results that hadn't been achieved before but, given the low amount of traffic spikes coming from social media, it's nearly impossible to tell whether or not this would have happened if the markup hadn't been in place.

There simply was too little data to base any conclusions on, but one…

#066 – Social Media requires more than just adding some markup – shocking, right?

#067 – And thus, with the first part of the project done, it was time to look back at what we had done so far, before going after Panda.

#068 – First we looked to see what the outcome of our initial upgrade was. And as you can see for yourself, Sessions didn't move even though we had success with Google's Knowledge Graph, as well as showing up in social media as it should.

#069 – By this time the people involved at the client's side had been beating the 23-point list by Amit Singhal to death for over 2 years already – and since I was aware of all they had done during that period, as well as having gotten closely familiar with the website and its content – I decided to dig into Google's Search Console; A tool that Google provides freely to webmasters so to give them some insight into how Google crawls, indexes and understands a website.

#070 – One of the things I noticed in Google's Search Console was that it reported there were many overlapping queries for which it was returning multiple pages in it's search results.

Something, that from an SEO point of view, you normally don't like to see happening a lot for a website because this means you have multiple pages on your site that are competing to rank well for the same queries, at the expense of their overall ranking position in Google's search results.

Now of course every website is going to have some overlap here and there, but in case of Cosmeticsurg it seemed it was more rule than exception and thus I decided to see if I could pinpoint the exact reasons for that.

#071 – Looking at the site by comparing pages that were showing up for related search queries this suddenly made sense.

When I took a look at, for example, search queries including the term ‘Tummy Tuck' I could easily find more than 16 different urls that were competing against each other for overlapping search queries.

#072 – Which made me wonder whether the Panda algorithms might be struggling to tell Things apart.

#073 – The moment when one bullet in Amit Singhal's 23-point list finally started to make sense:

Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variation?.

A point nobody picked up on before because the site's articles gave the impression they were about different topics.

Something I suspected that the Panda algorithms weren't able to establish due to the fact that most of the site's articles highlight a topic surrounding a medical procedure, or highlight a specific property of a medical procedure.

#074 – Which, again, lead to the question: What can we do about this?

#075 – Now since I suspected the Panda algorithms were having difficulty telling Things apart I came up with the idea of running a test to see if it would help if I'd specify what the articles are about.

#076 – And so we selected a test group of 15 articles for which we specified what they were about by adding schema:about property value pairs. Which visualised looks like this.

#077 – But alas, as this Google Analytics Graph shows, Sessions still didn't move.

#078 – And then I stood still at the big lesson we learned by publishing our own Knowledge Graph, namely that one can help the search engines connect the dots by means of triples.

#079 – And so I came up with the hypothesis that:

Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?

Should be read and treated as if it said:

Does the site have duplicate, overlapping, or redundant entities on the same or similar topics with slightly different property variations?

#080 – Which meant I had some serious work cut out for me as I would have to analyze each and every article, and add quite an impressive amount markup to the already existing markup we had put in place during the first half of the project.

#081 – Most schema:about triples specified, for a control group of 98 blog articles out of a total 168, contained a multi-type entity and/or some schema.org/Thing entities which each also had their own property value pairs.

Now the reason why I had to use a multi-type entity – which is a very useful type of entity declaration most publishers aren't aware off – is because, strangely enough, the schema.org/MedicalProcedure type doesn't have a schema.org/provider or schema.org/seller property, even though in real-life a procedure is considered a service.

So to be able to specify the procedures have a provider, just like any other service, I had to extend the entity with a second type, namely schema.org/Service, which does have a schema:provider property, thus making it possible to express that the medical procedures the articles are about are being provided by Cosmeticsurg's practice.

#082 – And just like we did when we built our Knowledge Graph in the footer of the site, we provided additional information about all entities by referring to all kinds of external enumerations by specifying multiple schema:sameAs properties.

#083 – Meaning the schema:about relations I put in place look something like this:

  • The purple disc represents the schema.org/Article – the Subject of the statements
  • Which has an schema:about property – the predicate of the statements
  • And the orange discs represent the entities that are the Objects of the statements, each with it's own property value pairs.

#084 – Making the complete graph of an article look something like this.

#085 – This time around though we were anything but calm after launching the first round of updates. A serious amount of work had gone into this and so we were very nervous to find out if this would work.

#086 – Luckily we didn't have to wait long to see the effects of what I had done. Sessions took off and went up as fast as I was able to add the additional markup to the articles.

#087 – And even after I was done with adding markup to the control group by mid December 2014, we noticed things kept going up after that period, ultimately reaching heights the site had never reached before.

#088 – Even after a new Panda update in mid October 2014. Meaning the site by then had underwent a second Panda update unharmed, while being able to establish further growth after that. Something that hadn't happened in years!

#089 – Now normally a presentation like this would have ended here because Google doesn't provide any data I can show you which allows me to underpin how the structured data improvements influenced the site's traffic stats.

Leaving me with no options to satisfy any skeptics present in the room.

#090 – That is until an SEO named Mike Arnesen woke up one day in October 2014 with an idea that would revolutionize how SEOs and Data scientists can work together. He came up with a method for getting structured data information present on a page to show up in Google Analytics by using Google Tag Manager's Tags, Rules and Custom Javascript Macro's.

An idea he wrote down, and which Moz – a very well known company within the world of SEO – published October 28th 2014.

#091 – A solution I had been looking for, for years already, as having such information available in Google Analytics provides an enormous amount of new sorts of data which can help in finding answers to almost as many questions one can come up with. Simply said, adding schema.org based data to Google Analytics equals data science tagging on steroids.

#092 – And so I reached out to Mike and asked him to implement his method on cosmeticsurg.net.

#093 – Now having this type of data in GA is just the beginning, as Google Analytics is a nice tool to have for marketers but is doesn't allow you to do any comprehensive analysis.

#094 – The real fun starts when you export and slice & dice Google Analytics' enriched data and get a data scientist involved to do some serious analysis.

#095 – In this graph you see all the landing page sessions over a period of one year, in which ‘t-1' represents the period before the Panda 4.0 update that caused cosmeticsurg.net to make a partial recovery. And in which ‘t+1' represents the period after the Panda 4.0 update.

Now imagine you'd like to know what the effects of adding ‘about' triples were. A question you normally wouldn't be able to answer as that would force you to filter the results for tons of individual urls and apply insane amounts of manual labor to filter those results in pages that have an article as their main entity (subjects), and pages containing medical procedures as their main entity (objects), while also having to identify which articles were referring to which procedures via the schema:about property (predicates).

#096 – Yet when such a dataset contains semantic information you can, for example, filter the dataset on urls which its main entity is an schema.org/Article.

And once you do that you clearly see that the amount of landing page sessions for all the schema.org/Article pages started to go up almost immediately after I started adding the schema:about triples.

#097 – Now when you apply some further filtering one can also quite easily see the difference the effects of adding the schema:about triples had for articles that did have instances of this property specified and the ones that didn't.

And as this graphs shows, even though the amount of landing page sessions for all articles on the site went up, the effect seems to be strongest for the control group of pages which do have their schema:about properties specified.

#098 – When you apply the same type of filtering to the average amount of pageviews each landing page session led to, you see that the visitor behavior of people that landed on articles without schema:about properties starts to change once I started adding the schema:about properties.

The thing that's notable here is that the amount of pages people visit, after landing on a page without schema:about properties, starts to go down, while the variance of how many pages people visit becomes less. Yet the overall trend of the amount of pages people visit, after landing on a page with schema:about properties hardly changes.

#099 – And when you apply this filtering to the average session duration of each landing page session, than you see that the average time a visitors spends on a page starts to go down for those articles that don't have their schema:about properties specified, yet starts to go up for those pages that do have their schema:about properties specified.

#100 – Now of course, as I've been mentioning ‘triples' over and over again during this presentation, having a look at the effects of what I did over at the Object side of the schema:about statements couldn't be left out.

And as best practice dictates, one should leave the best for the last.

#101 – The results, of adding the schema:about properties to the blog's articles, for the site's medical procedure pages completely flabbergasted me the first time I saw these stats, as from an SEO point of view it's unheard of that adding some metadata to one part of a site can have such a dramatic impact on another part of a site, as this last graph of the presentation shows.

An SEO normally has very little options for being able to tell search engines which part of a site should be considered important, besides working on a site's internal link graph and getting a N amount of relevant links from other sites to refer to pages we care about the most.

And to see structured data yield this type of results – which it should looking at how structured data works in theory but of which it is unknown how Google uses this type of information – from this SEOs perspective is simply breathtaking.

#102 – And therefore: Hail, Triples – King of the Graph

#103 – Now of course there's nothing more annoying than having to take someone's word for it. And I also wouldn't be surprised if all of you would prefer to let the data speak as opposed to a Dutch SEO and his accent.

And so, in contrast to what seems to be common practice when it comes to SEO case studies, YOU CAN let the data speak for itself.

All the datasets used for this case study can be downloaded so that anybody who's interested can do their own analysis. An example I hope many other SEOs and their clients or employers will follow as this case study clearly shows we need a lot more public data on the effects of applying structured data markup.

There's much more to it than Rich snippets alone folks, helping search engines understand your content matters!

#104 – Now of course I'm not solely responsible for each and every effort that has been put into this story, and I need to say that each of the following people has fulfilled a part in it I could not have done without (in alphabetical order):

  • Aaron BradleySEO Analyst – Electronic Arts
  • Daniel BosDirector Data & Analytics – Takeaway.com
  • Leeza RodriguezChief Marketing and Operations – Cosmeticsurg
  • Leigh AucoinWeb Development Team Lead – Search Influence
  • Lisa MelvinSEO Consultant – Windy Hills, Inc.
  • Mike ArnesenFounder & CEO – UpBuild

1 Patrick Coombe August 19, 2015 at 7:14 pm

Wow Jarno, so well done. Digging through all this data now in the zip file, very well organized. It looks as though you got had a real “entity” problem on your hand here. Google really couldn’t make up their mind there but it all seems like you got it worked out. A few relevant snippets I found from a “clean slate” (never searched before from this IP):


I had this same exact problem for a very long time and the only thing I could do was structure as much data as possible consistently and of course Freebase was a huge help.

Your GA Annotations within the Excel spreadsheet were so helpful and extremely thorough. I think at this point you’ve structured about as much data as you possibly can and any major changes that have come through from the Knowledge Graph has already happened.

Side note: I have 2 projects I am working on in this niche that I think might be mutually beneficial to one another if you want to hit me up.

2 Jarno van Driel August 20, 2015 at 5:15 pm

First off, thanks for the screenshots of the queries you ran through Google. I did a whole bunch myself but at a certain moment gave up on it realizing it wouldn’t make sense to add more of those to the deck than I already did. Nevertheless they’re cool to have though.

Interesting note is that Bing seems to do a better job at it than Google does, something I didn’t put into the slide deck because the case study revolved around Panda. Though it was very satisfying to note that building our own knowledge graph worked for another search engine than Google as well.

As for the thorough datasets, that’s something I really felt necessary to do, not only for my own analysis but also to provide others the opportunity of doing extensive and accurate analysis independently.

In my book case studies are only valuable if you give others the opportunity to question your own findings, and for that reason all parties need to have access to exactly the same data.

I can really say this is it, there’s nothing more to share, everything we used is in the datasets as well.

“at this point you’ve structured about as much data as you possibly can”

hell no, there’s still plenty of work to be done to that site. First off, the control group of articles that I didn’t markup for the case study, and secondly I still have tons of work to do to procedure pages, video pages and image pages.

And if that yields any interesting results I might do another presentation about that next year.

3 Bob van Biezen August 24, 2015 at 1:33 am

Great Case study Jarno, thanks a lot for sharing this with the community! Never saw an example of schematic markup that made such an difference. Also, great that your client was willing to share their data, keep up the good work!

4 Jarno van Driel August 24, 2015 at 4:42 pm

I’m happy to hear you liked it Bob. Thanks for the kind words.

5 Andrew Stocker August 27, 2015 at 5:15 am

Hi Jarno,

Thanks for sharing these insights, really appreciate it. Just to clarify:

You basically used structured data to show Google their content wasn’t duplicate / overlapping. Then when the panda refresh came they jumped back up because Google knew the content was ‘unique’ – due to the structured data.

They got loads more sessions to articles because those articles now ranked due to not being considered duplicate / overlapping?


6 Jarno van Driel August 28, 2015 at 6:11 am

“You basically used structured data to show Google their content wasn’t duplicate / overlapping.”

Sort of yes. What I did was help Google understand the granularity of the content. Which led to less pages showing up in the SERPs for overlapping queries as Google was able to understand the topical subtleties of the site’s articles better.

“Then when the panda refresh came they jumped back up”

No, the Panda 4.0 update caused the first jump up, but that was only partial recovery. The things I did to that site after the Panda 4.0 upgrade seem to have caused it to fully recover as well as going beyond previous records.

7 Andrew Stocker August 28, 2015 at 6:17 am

Right got it, thanks! 🙂

8 kevin thompson October 30, 2016 at 7:37 pm

Helpful suggestions – I loved the specifics ! Does someone know where my assistant could access a fillable OK ODH 805 example to use ?

9 Jarno van Driel November 8, 2016 at 3:22 pm

Glad to hear you you liked it Kevin. In regards to your question, alas I have no idea what you mean with “a fillable OK DH 805 example”. Could you explain what you mean?

10 Andy C December 14, 2016 at 4:22 am

Hi Jarno,
I have been trying to learn about structured data and its effect on SEO, and this is by far the best article I’ve read by a long shot, no contest. Congrats! Reading it gave me a lot of ideas.
I am an amateur, retired guy giving away free software that fits a quite obscure application category, and my thought was to try structured data as a way of getting Google to understand the purpose of my (very 1990s looking) static site better. I don’t know if it’s going to work, but at least I’m giving it a try.

I watched the structured data hangout with the Google guys on 13 Dec 2016 to see how they answered your questions, and was astonished at their answers. If one were to believe what they said about whether structured data was worthwhile beyond rich snippets, the things you described in such detail and with so much evidence probably would have never happened! I wonder if these guys just aren’t aware of what\s going on, or maybe they are just trying to keep things under wraps as much as possible? It was very strange watching that video.

11 Rick Burgers (LEQUAL) January 19, 2018 at 1:39 am

Hi Jarno,

Very nice case (despite the fact that it’s a case from 2015)!
Thanks for sharing and most of all, keep sharing! 🙂

Do you have updated tests/cases?

12 Shashi Kumar June 26, 2018 at 9:58 am

Hi everyone I am facing problem-related to structured data with my website Govt Jobs, It showing this 4:13 PM Cannot understand the value 4:13 PM as a date and time. Learn more about date/time formats. and would like to know more about my website issues and how I can improve it. Tell me in detail how I can solve it,

13 Aaron Bradley June 26, 2018 at 11:15 am

If you provide your date in time in the prescribed format you should be good-to-go.

Previous post:

Next post: