Mythical Differences: RDFa Lite vs. Microdata

Full disclosure: I’m the current chair of the standards group at the World Wide Web Consortium that created the newest version of RDFa.

RDFa 1.1 became an official Web specification last month. Google started supporting RDFa in Google Rich Snippets some time ago and has recently announced that they will support RDFa Lite for schema.org as well. These announcements have led to a weekly increase in the number of times the following question is asked by Web developers on Twitter and Google+:

“What should I implement on my website? Microdata or RDFa?”

This blog post attempts to answer the question once and for all. It dispels some of the myths around the Microdata vs. RDFa debate and outlines how the two languages evolved to solve the same problem in almost exactly the same way.

 

Here’s the short answer for those of you that don’t have the time to read this entire blog post: Use RDFa Lite – it does everything important that Microdata does, it’s an official standard, and has the strongest deployment of the two.

Functionally Equivalent

Microdata was initially designed as a simple subset of RDFa and Microformats, primarily focusing on the core features of RDFa. Unfortunately, when this was done, the choice was made to break compatibility with RDFa and effectively fork the specification. Conversely, RDFa Lite highlights the subset of RDFa that Microdata did, but does it in a way that does not break backwards compatibility with RDFa. This was done on purpose, so that Web developers wouldn’t have a hard decision in front of them.

RDFa Lite contains all of the simplicity of Microdata coupled with the extensibility of and compatibility with RDFa. This is an important point that is often lost in the debate – there is no solid technical reason for choosing Microdata over RDFa Lite anymore. There may have been a year ago, but RDFa Lite made a few tweaks in such a way as to achieve feature-parity with Microdata today while being able to do much more than Microdata if you ever need the flexibility. If you don’t want to code yourself into a corner – use RDFa Lite.

To examine why RDFa Lite is a better choice, let’s take a look at the markup attributes for Microdata and the functionally equivalent ones provided by RDFa Lite:

Microdata 1.0 RDFa Lite 1.1 Purpose
itemid resource Used to identify the exact thing that is being described using a URL, such as a specific person, event, or place.
itemprop property Used to identify a property of the thing being described, such as a name, date, or location.
itemscope not needed Used to signal that a new thing is being described.
itemtype typeof Used to identify the type of thing being described, such as a person, event, or place.
itemref not needed Used to copy-paste a piece of data and associate it with multiple things.
not supported vocab Used to specify a default vocabulary that contains terms that are used by markup.
not supported prefix Used to mix different vocabularies in the same document, like ones provided by Facebook, Google, and open source projects.

As you can see above, both languages have exactly the same number of attributes. There are nuanced differences on what each attribute allows one to do, but Web developers only need to remember one thing from this blog post: Over 99% of all Microdata markup in the wild can be expressed in RDFa Lite just as easily. This is a provable fact – replace all Microdata attributes with the equivalent RDFa Lite attributes, add vocab="http://schema.org/" to the markup block, and you’re done.

At this point, you may be asking yourself why the two languages are so similar. There is almost 8 years of history here, but to summarize: RDFa was created around the 2004 time frame, Microdata came much later and used RDFa as a design template. Microdata chose a subset of the original RDFa design to support, but did so in an incompatible way. RDFa Lite then highlighted the subset of the functionality that Microdata did, but in a way that is backwards compatible with RDFa. RDFa Lite did this while keeping the flexibility of the original RDFa intact.

That leaves us where we are today – with two languages, Microdata and RDFa Lite, that accomplish the same things using the same markup patterns. The reason both exist is a very long story involving politics, egos, and a fair amount of disfunctionality between various standards groups – all of which doesn’t have any impact on the actual functionality of either language. The bottom line is that we now have two languages that do almost exactly the same thing. One of them, RDFa Lite 1.1, is currently an official standard. The other one, Microdata, probably won’t become a standard until 2014.

Markup Similarity

The biggest deployment of Microdata on the Web is for implementing the schema.org vocabulary by Google. Recently, with the release of RDFa Lite 1.1, Google has announced their intent to “officially” support RDFa as well. To see what this means for Web developers, let’s take a look at some markup. Here is a side-by-side comparison of two markup examples – one in Microdata and another in RDFa Lite 1.1:

Microdata 1.0 RDFa Lite 1.1
<div itemscope itemtype="http://schema.org/Product">
  <img itemprop="image" src="dell-30in-lcd.jpg" />
  <span itemprop="name">Dell UltraSharp 30" LCD Monitor</span>
</div>
<div vocab="http://schema.org/" typeof="Product">
  <img property="image" src="dell-30in-lcd.jpg" />
  <span property="name">Dell UltraSharp 30" LCD Monitor</span>
</div>

If the markup above looks similar to you, that was no accident. RDFa Lite 1.1 is designed to function as a drop-in replacement for Microdata.

The Bits that Don’t Matter

Only two features of Microdata aren’t supported by RDFa Lite; itemref and itemscope. Regarding itemref, the RDFa Working Group discussed the addition of that property and, upon reviewing Microdata markup in the wild, saw almost no use of itemref in production code. The schema.org examples steer clear of using itemref as well, so it was fairly clear that itemref is, and will continue to be, an unused feature of Microdata. The itemscope property is redundant in RDFa Lite and is thus unnecessary.

5 Reasons

For those of you that still are not convinced, here are the top five reasons that you should pick RDFa Lite 1.1 over Microdata:

  1. RDFa is supported by all of the major search crawlers, including Google (and schema.org), Microsoft, Yahoo!, Yandex, and Facebook. Microdata is not supported by Facebook.
  2. RDFa Lite 1.1 is feature-equivalent to Microdata. Over 99% of Microdata markup can be expressed easily in RDFa Lite 1.1. Converting from Microdata to RDFa Lite is as simple as a search and replace of the Microdata attributes with RDFa Lite attributes. Conversely, Microdata does not support a number of the more advanced RDFa features, like being able to tell the difference between feet and meters.
  3. You can mix vocabularies with RDFa Lite 1.1, supporting both schema.org and Facebook’s Open Graph Protocol (OGP) using a single markup language. You don’t have to learn Microdata for schema.org and RDFa for Facebook – just use RDFa for both.
  4. RDFa Lite 1.1 is fully upward-compatible with RDFa 1.1, allowing you to seamlessly migrate to a more feature-rich language as your Linked Data needs grow. Microdata does not support any of the more advanced features provided by RDFa 1.1.
  5. RDFa deployment is greater than Microdata. RDFa deployment continues to grow at a rapid pace.

Hopefully the reasons above are enough to convince most Web developers that RDFa Lite is the best bet for expressing Linked Data in web pages, boosting your Search Engine Page rank, and ensuring that you’re future-proofing your website as your data markup needs grow over the next several years. If it’s not, please leave a comment below explaining why you’re still not convinced.

If you’d like to learn more about RDFa, try the rdfa.info website. If you’d like to see more RDFa Lite examples and play around with the live RDFa editor, check out RDFa Play.

Thanks to Tattoo Tabatha for the artwork in this blog piece.

41 Comments

Got something to say? Feel free, I want to hear from you! Leave a Comment

  1. 1. Why did you skip the topmost reason? Namespace scoping is basic requirement for functionality with higher complexity than “Hello World” apps! Microdata undos the xmlns xhtml as well as prefix RDFa HTML/XHTML feature to account that newbies should do Hello World equivalent work and should learn to stay dumb forever rather than mature as developers.
    2. Why are you ignoring that schema.org only addresses the cross-cutting concern of search, but the real meat of Domain-Specific Aspects is covered by older vocabs from RDFa profile initial contexts mainly, such as gr for commercial organisations, cc for licensing legislative framework, & so on?
    3. Most critically, having an entire vocab dedicated to mere cross-cutting concern amounts to forcing ordinary people like me, who are yet to earn affordability of overly expensive untrustworthy web-development services- judging by the poor quality of celebrity web-sites, owing to lack of time-worthiness of separating the NOT APPLICABLE portion of schema.org vocab from the useful part from building provenance by self-help. Using freebies fb, stackoverflow & G+ without anything I “own” makes me look lacking credibility. Does politics of some engineer lacking social understanding merit hurting technical interest of different contributors (particularly other domain specific experts) of our community wanting to depend on the Web as another media? Instead of taking RDFa to next level, the wheel is reinvented with the goal of glorifying search brands..

    • ManuSporny says: (Author)

      re #1: Yes, the prefixes enable decentralized extensibility.

      The argument from the Microdata folks is that you shouldn’t allow just anybody to do decentralized extensibility, which is why there is an aversion to xmlns and prefix. Microdata folks tend to believe that vocabularies should have to go through a standards body if you want to define a vocabulary on the Web. This leads to less vocabulary explosion and better designed vocabularies. The down-side is that you turn away everybody that doesn’t want to join a standardization body to create their domain-specific vocabulary.

      The RDFa position is that you shouldn’t have to go through a standards body to define your own vocabulary. This will lead to vocabulary explosion and some really badly designed vocabularies. However, everybody is on an even playing field and knowledge about how to write a standard is not a prerequisite for creating a vocabulary. The barrier to entry is very, very low.

      re #2: Yes, schema.org is a lowest-common-denominator everything-and-the-kitchen-sink vocabulary. It’s was fairly badly designed and first and is improving slowly. The great thing about schema.org is it got non-semantic-web folks interested in Linked Data and so it was a huge win for the Semantic Web. The downside is how Google went about developing it and releasing it to the public. Yes, schema.org isn’t going to scale. It’s certainly not going to scale to specific disciplines like anthropology, bioinformatics, drug research, etc. That’s why we need decentralized extensibility and that’s why RDFa chose to go that route.

      re #3: Yes, there was a great deal of politics being played when Microdata came onto the scene. I’ve written about it extensively on this blog:

      http://manu.sporny.org/2012/microdata-cr/
      http://manu.sporny.org/2011/those-six-guys/
      http://manu.sporny.org/2011/case-for-curies/
      http://manu.sporny.org/2011/false-choice/

  2. bruce lawson says:

    Hi Manu

    “Over 99% of all Microdata markup in the wild can be expressed in RDFa Lite just as easily”

    What is the 1% that can’t be?

    bruce

    • ManuSporny says: (Author)

      When I wrote the blog post, the 1% was anything using the @itemref attribute in Microdata. Last we checked, there isn’t much markup out there that uses it.

      However, that was then, and this is now. We’re putting in an @itemref equivalent into HTML+RDFa 1.1 and we have agreement from almost every implementer to implement the feature. Don’t worry, we’ll rename the feature as it’s fairly obtuse. We’ll probably call the feature something like ‘Object Prototyping’:

      http://www.w3.org/2010/02/rdfa/sources/rdfa-in-html/Overview.html#rdfa-reference-folding

      So, that means that every feature of Microdata is now available in RDFa Lite 1.1. Personally, I don’t think the feature is necessary. Most of the Working Group disagrees and will probably approve the feature in the next week.

  3. bruce lawson says:

    Gotcha, thanks.

    I have to say, though, that the bit of the spec that you link to does nothing to explain to me what it means. Neither does “Object Prototyping”. How about using similar language and examples that Oli uses here? http://html5doctor.com/microdata/#itemref

    • ManuSporny says: (Author)

      re: bad explanation

      Yes, it’s confusing right now. I will clean up the text before we go into Last Call (later this month). We’ll try to use better examples as well as they’re pretty confusing.

      The Microdata @itemref example on HTML5Doctor is not good either, here’s why:

      1. You’ve modeled a band, but not used itemtype=”http://schema.org/MusicGroup”.
      2. You’re attempting to model an event, but not used itemtype=”http://schema.org/Event”.
      3. You’ve used the ‘date’ property, which probably should be associated with the Event, but it’s associated with the band.
      4. There is no such thing as the ‘date’ property in schema.org. Even if there was, its value is “next \n week”… which is pretty meaningless… next week from what date?
      5. You’ve used @itemref to reference the band members, when there is no reason to do such a thing – you might as well put them in the MusicGroup block.

      I suggest that you change the example to a list of tour dates, which should be modeled as events. Make the band a separate block and @itemref the entire band into each event… that’s a fairly appropriate use of @itemref.

      Also keep in mind that you’re effectively copy-pasting the bands info into each event and thus duplicating all of the bands information for as many events as there are. If there are 100 dates, you’re copying the bands information 100 times in the output data. I think that’s a very bad way to model information, and is why I think @itemref is a bad feature. Ideally, you’d just have one band description on the page and link to it from each event instead of copying all of the information into each event. Both Microdata and RDFa can do that pretty easily, except that in the case of Microdata, you can’t tell if it’s a link without doing some regex’ing or making assumptions about the data model. Both Microdata approaches are bad for a variety of reasons.

      In any case – glad you brought this up as we do need to fix the HTML+RDFa 1.1 spec and make it easier to understand. We’ll probably use this example there as well.

  4. I pointed out decentralized vocab development as its part of human development. Researchers expertising on domains come up with terms in seminal work. Whether ACM in my case bestows recognition is separate matter. An actor in movies might introduce style or accent that kids will find fun imitating in short-run, and that’s not vocab. Standardization deals with effective adoption practices optimized towards best efficiency. (Okay, I’ll stop on negotiating over standardization owing to my limited commitment to that process). Maybe if I make an analogy, programming exist regardless of OOP paradigm or aspect oriented terms always have domain-specific terms before terminology is derived. BIG POINT IS my concern isn’t about making “my” own vocab, because terminology comes with very few terms per contributor per vocab (other-wise new-born infants noises would count as vocabs), but the concern remains how can 6 ape-mens vocab over-rule domain-specific advances from multiple (aspect-oriented) contributors of human society. Bias towards schema.org and plagiarism by microdata spec are counter-intuitive to advancement. To quantitatively advance Google’s indexing of content, should HTML spec itself aim to fall low enough to standardize compromising on “rich”-ness of content. More eyeballs to linked data yes, but at the cost of stooping how much lower?
    I researched family of bots, the correct term is swarm, not family, but you are not from my domain so excuse me. Not at the statistical advanced level of PhD, but nevertheless in-depth enough to prototype an application. The related part is Intelligent System functionality when Computing is split between Cloud service provider (my study’s backend tier) and the personal computer (my frontend) which might be like Google Android tablet. The prototype was domain-specific to digital forensics. Aspect oriented model isn’t constrained to C like functional programming or C++ OOP, as although limited, still substantial relation with disjoint domains such as cloud and forensics were at play. Mine might be one opinion and I’ll respect with expert opinions, but Hixie with microdata is opening PANDORA’s BOX for FUTURE SPEC BACKWARD COMPATIBILITY better known in software as code rot+multiple wheel reinvention, as his bias for an all-in-one vocab+unreasonably over-simplified for search indexing syntax is direct attack on adopting pre-existing vocabs from unrelated domains postponing enrichment. He should’ve stuck to physics as his newbie 6 subjects obsession for “item” term is comparable to my former DBMS students craze to name multiple attributes of say Employee table as EmpName, EmpSal, etc. so they themselves ‘d get fed-up later from addressing these as Employee.EmpName, Employee.EmpSal. If Google totally failed to spot newbie silly behavior in usability test for new syntax, then Google should be embarassed and clean-up some of its staff, in the extreme case of monopolising HTML syntax over whole world. Everyone makes mistakes, but Galileo’s being guillitoned was evil by mob mentality! Domain experts from other domains of far higher standing than little me in Intelligent Systems will respect the spec, but its direct attack in poor taste. Yea, politics at W3 might’ve been bad, but getting carried away into extremism and becoming the very demon despised is worse.
    I just wanted to clarify on “my own vocab”, but ended up speaking my mind. Anyhow, best luck and may your work flourish! :)

  5. Doug Chase says:

    Thanks For this Manu, you have cleared up a lot for me. I do have a question however with regards to doctype. Do you need use the HTML 4.01+RDFa Lite 1.1 doctype with schema.org:

    Or could i use XHTML 1.1+RDFa 1.1 as my site is written in XHTML:

    Thank You

    • ManuSporny says: (Author)

      You can use just about any DOCTYPE that you want – HTML4, HTML5, XHTML1, etc. The schema.org processors (and all conforming RDFa processors) will pick up the markup. If you’re writing in XHTML1, then use the XHTML+RDFa 1.1 DOCTYPE:

      <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML+RDFa 1.1//EN” “http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd”>

  6. “RDFa deployment is greater than Microdata. RDFa deployment continues to grow at a rapid pace.”

    Do you know of other (ideally more recent) studies in support of this than those of the LDOW paper which is based on January 2012 data? And if possible rather than counted by raw number of triples, assessing proportion of websites using either format?

  7. Julie Setbon says:

    Hello Manu,

    Could you please give me a use case for a public facing website why a developer would want to extract and process RDFa markup in Web document ? For example in an e-commerce or service marketing type website ?

    Also, in http://rdfa.info/dev/ they talk about a C processor but do not list a C# – if a developer is working on an asp.net 4.5 mvc4 (or mvc5) website, what should they use ?

    Best regards,
    Julie

    • ManuSporny says: (Author)

      Hi Julie, for a public facing site, a developer would probably want to use RDFa to show up higher in the Google search rankings (or have a more accurate search profile created for them). RDFa is best when you’re publishing data that you want other developers to be able to easily extract from your website. You may want to read more about this at http://schema.org/ or http://ogp.me/

      The bottom line with both of those is that when you put RDFa in your web page, you make it easier for companies like Google, Microsoft (Bing), Yahoo!, Yandex, and Facebook to categorize your pages and offer them as destinations for people doing searches or reading social networking messages.

      For e-commerce, the PaySwarm payment protocol uses RDFa heavily to list items for sale. It’s still early days for the protocol, but that’s another place where RDFa is used heavily: https://payswarm.com/

      As for a C# RDFa processor, I don’t think one exists as far as I can tell. However, there are many RDFa processors that can be run by making a call to an external program. So if you have Python or Ruby available to your application, you could use those processors to extract RDFa from web pages. If you have JavaScript available to your application, you could use the JavaScript RDFa processor to extract data.

  8. I recently implemented rich snippets/org.schema in a site that gets 2 million visits a month. Just wondering how long does it take before my clients logo appears in the search results ?

    • ManuSporny says: (Author)

      Unfortunately, it’s hard to say. It depends on hundreds of variables and Google won’t say what goes into the decision to put the rich snippet on the search page. That said, it seems like the average time is somewhere between 6 weeks and 6 months. However, don’t take my word for it, I don’t do SEO for a living.

  9. Lennart Borgman says:

    I would be glad for an update or clarification. Does not Google say that they will focus on microdata?

    schema.org FAQ – Webmaster Tools Help https://support.google.com/webmasters/answer/1211158

Trackbacks for this post

  1. RDFa versus schema.org | Digital Curation – the Class Blog
  2. Objection to Microdata Candidate Recommendation | The Beautiful, Tormented Machine
  3. Bruce Lawson’s personal site  : Talismanic fight between RDFa and microdata
  4. Bruce Lawson’s personal site  : Why I changed my mind about the element
  5. Say goodbye to Bloat « VAGCON
  6. Write less, do more: CSS Style | Wordpress Webdesigner
  7. Dati strutturati in Google: quali recepisce? Schema.org, RDFa e microdati | Casual.info.in.a.bottle
  8. Interview with Ian Hickson, HTML editor | Html5 Tutorials
  9. Reuze.me – css { bloat: none; } | Average Web Guy
  10. Write less, do more: CSS Style
  11. Interview with Ian Hickson, HTML editor | 简约码
  12. Semantic HTML5 with Schema.org for Paintings
  13. Google adds JSON-LD support to Gmail | The Beautiful, Tormented Machine
  14. Is Your Library Using Schema.org? | Eduhacker
  15. Markup for Advanced SEOs (Live fromt #SMXSeattle) - State of Search
  16. Microdata and Schema.org for all your Google rich snippet needs | Stephan Spencer
  17. La sémantique douce de schema.org | Mondeca - Leçons de Choses
  18. Interview with Ian Hickson, HTML editor - Abstract PHP
  19. The Downward Spiral of Microdata | The Beautiful, Tormented Machine
  20. Bruce Lawson’s personal site  : Reading List
  21. Interview with Ian Hickson, HTML editor - InfoLogs
  22. IP491F13A HTML5 2: The Semantic Web – Assignment Three | JimUBC
  23. Basic Vocabulary for schema.org and Structured Data
  24. HTML5, alla ricerca della semantica | Infolet - Informatica e letteratura

Leave a Comment

Let us know your thoughts on this post but remember to play nicely folks!