HTML5, SEO and Microdata

By Kai Gittens tags:  reading time: 5 min

MUCH thanks to at Oli Studholme at HTML5 Doctor for helping me understand this!!!!

Update July 28, 2013: Not only are many points in this post erroneous, but the correct points are a bit outdated. The schema.org library is recommend for marking up microdata instead of the older data-vocabulary.org one.

Update February 21, 2011: Oli looked at this post and suggested some code & semantics changes. Simply put, there are a lot of semantic mistakes in this article. The code below was changed as per his suggestions but the semantics were many. So many that it was easier to create a new post listing them instead of editing this article. Review the code below as it contains his edits, then read my post listing his semantic suggestions. – k

I’ve learned a few things about how HTML5 handles search engine optimization, or, SEO. The main thing I’ve learned is that we all need to fully understand microdata since Google uses it to collect detailed information about your web page.

While I’m still learning about microdata, I understand 95% of it…and let me be clear from the beginning about what I do understand:

Often called "HTML5′s best kept secret," Microdata, allows you to place a custom vocabulary of data onto your web page. "If the microdata uses a Google "rich snippet" vocabulary, it may also be used by Google".

Let’s see it in action:

I recently created this test page with the following code:


<section itemscope itemtype="http://www.data-vocabulary.org/Person">

  <img itemprop="photo" class="me" width="80" height="80" src="http://en.gravatar.com/userimage/4528928/87cc8430c1f9a5c3b809cdde885f565a.jpg"  alt="[Kai Gittens, circa 2010]">

  <h1 class="entry-title">About Kai Gittens, AKA Kaidez</h1>
  <br />
  <h2>Posted by Kai Gittens on January <abbr>24th</abbr></h2>
  <br />
  <h2 class="updated"><time>January 24, 2011</time></h2>
      <dl>
        <dt>Name</dt>
        <dd itemprop="name">Kai Gittens</dd>
        <dt>Position</dt>
        <dd>
        <span itemprop="title">Founder</span> of <span itemprop="affiliation">kaidez.com</span></dd>
        <dd>
        <span itemprop="title">Web Designer for Revlon/Almay</span></dd>

        <dt>Mailing address</dt>
        <dd itemprop="address" itemscope itemtype="http://data-vocabulary.org/Address">
          <span itemprop="street-address">237 Park Ave.</span>

          <span itemprop="locality">New York </span>,   <span itemprop="region">NY </span> <span itemprop="postal-code">10017</span>
          <span itemprop="country-name">USA  </span>
        </dd>
      </dl>
      <h2>Social Networking Info </h2>
      <ul>
        <li><a href="https://kaidez.com/" itemprop="url">Blog</a></li>
        <li><a href="http://facebook.com/kaidez" itemprop="url">Facebook Profile</a></li>
        <li><a href="http://www.twitter.com/kaidez" itemprop="url">Twitter Page  </a></li>
      </ul>
  </section>

Let’s breakdown the code…

If you need more proof of this result, see what information comes back when my test page is plugged into Google’s Rich Snippets Testing Tool.

I’ve done a variety of Google searches trying to get this snippet to come up…no luck yet. But I’m confident that it will eventually and know that the microdata is still doing things behind the scenes.

Microdata isn’t really that new of a concept: it’s similar to existing technologies such as RDFa and microformats. But RDFa needs to be written in XHTML, which is headed for W3C deprecation; while microformats don’t really work without CSS, meaning you’ll have write extra code. Getting microdata to work requires writing non-deprecated HTML5 code and nothing else.

Speaking of microformats using CSS classes, here’s a quick FYI: placing the above-code into a Twenty Ten-themed WordPress page will still send you a positive result when placed into the Rich Snippet tool, but generate a warning saying that certain things are missing. It’s due to the fact that Twenty Ten, which is HTML5-ready, uses a lot of the same CSS classes as the ones used by the hatom feed format, which is similar to the RSS feed.

I plugged the code into this blog’s About Page and got that warning. Since this blog design is based on Twenty Ten and uses hatom classes like entry-title and entry-content, the presence of these classes is forcing the Snippet tool to look for hatom feed content in my About page. And as the lack of an author class makes the hatom data incomplete, the error shows up. I could fix this by putting a tag with a class named author somewhere on my post pages, but I’m happy with my design so I’m not going to do this.

If you want to get a feel of how much microdata is out there, check out the Operator plug-in for Firefox. It inserts a toolbar that detects microdata along with RDFa and Microformats, showing you what data is being collected and its potential use...especially for e-commerce. Plug in Operator, then do some general web surfing while paying attention to the toolbar…you’d be surprised what you’ll find out.

For further reading, HTML5 Doctor’s microdata article is a great starting pointa on the subject. After that, read what Mark Pilgrim has to say about microdata and definitely read Google’s microdata documentation.

Some other HTML5 SEO things...

In closing, remember that microdata works if used properly. So let’s all now take a blood oath and promise not to use it to create spam bait and ruin the party for everyone.