Objection to Microdata Candidate Recommendation

Full disclosure: I’m the current chair of the standards group at the World Wide Web Consortium that created the newest version of RDFa, editor of the HTML5+RDFa 1.1 and RDFa Lite 1.1 specifications, and I’m also a member of the HTML Working Group.

Edit: 2012-12-01 – Updated the article to rephrase some things, and include rationale and counter-arguments at the bottom in preparation for the HTML WG poll on the matter.

The HTML Working Group at the W3C is currently trying to decide if they should transition the Microdata specification to the next stage in the standardization process. There has been a call for consensus to transition the spec to the Candidate Recommendation stage. The problem is that we already have a set of specifications that are official W3C recommendations that do what Microdata does and more. RDFa 1.1 became an official W3C Recommendation last summer. From a standards perspective, this is a mistake and sends a confused signal to Web developers. Officially supporting two specification that do almost exactly the same thing in almost exactly the same way is, ultimately, a failure to standardize.

The fact that RDFa already does what Microdata does has been elaborated upon before:

Mythical Differences: RDFa Lite vs. Microdata
An Uber-comparison of RDFa, Microdata, and Microformats

Here’s the problem in a nutshell: The W3C is thinking of ratifying two completely different specifications that accomplish the same thing in basically the same way. The functionality of RDFa, which is already a W3C Recommendation, overlaps Microdata by a large margin. In fact, RDFa Lite 1.1 was developed as a plug-in replacement for Microdata. The full version of RDFa can also do a number of things that Microdata cannot, such as datatyping, associating more than one type per object, embed-ability in languages other than HTML, ability to easily publish and mix vocabularies, etc.

Microdata would have easily been dead in the water had it not been for two simple facts: 1) The editor of the specification works at Google, and 2) Google pushed Microdata as the markup language for schema.org before also accepting RDFa markup. The first enabled Google and the editor to work on schema.org without signalling to the public that it was creating a competitor to Facebook’s Open Graph Protocol. The second gave Microdata enough of a jump start to establish a foothold for schema.org markup. There have been a number of studies that show that Microdata’s sole use case (99% of Microdata markup) is for the markup of schema.org terms. Microdata is not widely used outside of that context, we now have data to back up what we had predicted would happen when schema.org made their initial announcement for Microdata-only support. Note that schema.org now supports both RDFa and Microdata.

It is typically a bad idea to have two formats published by the same organization that do the same thing. It leads to Web developer confusion surrounding which format to use. One of the goals of Web standards is to reduce, or preferably eliminate, the confusion surrounding the correct technology decision to make. The HTML Working Group and the W3C is failing miserably on this front. There is more confusion today about picking Microdata or RDFa because they accomplish the same thing in effectively the same way. The only reason both exist is due to political reasons.

If we step back and look at the technical arguments, there is no compelling reason that Microdata should be a W3C Recommendation. There is no compelling reason to have two specifications that do the same thing in basically the same way. Therefore, as a member of the HTML Working Group (not as a chair or editor of RDFa) I object to the publication of Microdata as a Candidate Recommendation.

Note that this is not a W3C formal objection. This is an informal objection to publish Microdata along the Recommendation track. This objection will not become an official W3C formal objection if the HTML Working Group holds a poll to gather consensus around whether Microdata should proceed along the Recommendation publication track. I believe the publication of a W3C Note will continue to allow Google to support Microdata in schema.org, but will hopefully correct the confused message that the W3C has been sending to Web developers regarding RDFa and Microdata. We don’t need two specifications that do almost exactly the same thing.

The message sent by the W3C needs to be very clear: There is one recommendation for doing structured data markup in HTML. That recommendation is RDFa. It addresses all of the use cases that have been put forth by the general Web community, and it’s ready for broad adoption and implementation today.

If you agree with this blog post, make sure to let the HTML Working Group know that you do not think that the W3C should ratify two specifications that do almost exactly the same thing in almost exactly the same way. Now is the time to speak up!

Summary of Facts and Arguments

Below is a summary of arguments presented as a basis for publishing Microdata along the W3C Note track:

  1. RDFa 1.1 is already a ratified Web standard as of June 7th 2012 and absorbed almost every Microdata feature before it became official. If the majority of the differences between RDFa and Microdata boil down to different attribute names (property vs. itemprop), then the two solutions have effectively converged on syntax and W3C should not ratify two solutions that do effectively the same thing in almost exactly the same way.
  2. RDFa is supported by all of the major search crawlers, including Google (and schema.org), Microsoft, Yahoo!, Yandex, and Facebook. Microdata is not supported by Facebook.
  3. RDFa Lite 1.1 is feature-equivalent to Microdata. Over 99% of Microdata markup can be expressed easily in RDFa Lite 1.1. Converting from Microdata to RDFa Lite is as simple as a search and replace of the Microdata attributes with RDFa Lite attributes. Conversely, Microdata does not support a number of the more advanced RDFa features, like being able to tell the difference between feet and meters.
  4. You can mix vocabularies with RDFa Lite 1.1, supporting both schema.org and Facebook’s Open Graph Protocol (OGP) using a single markup language. You don’t have to learn Microdata for schema.org and RDFa for Facebook – just use RDFa for both.
  5. The creator of the Microdata specification doesn’t like Microdata. When people are not passionate about the solutions that they create, the desire to work on those solutions and continue improve upon them is muted. The RDFa community is passionate about the technology that they have created together and have strived to make it better since the standardization of RDFa 1.0 back in 2008.
  6. RDFa Lite 1.1 is fully upward-compatible with RDFa 1.1, allowing you to seamlessly migrate to a more feature-rich language as your Linked Data needs grow. Microdata does not support any of the more advanced features provided by RDFa 1.1.
  7. RDFa deployment is broader than Microdata. RDFa deployment continues to grow at a rapid pace.
  8. The economic damage generated by publishing both RDFa and Microdata along the Recommendation track should not be underestimated. W3C should try to provide clear direction in an attempt to reduce the economic waste that a “let the market sort it out among two nearly identical solutions” strategy will generate. At some point, the market will figure out that both solutions are nearly identical, but only after publishing and building massive amounts of content and tooling for both.
  9. The W3C Technical Architecture Group (TAG), which is responsible for ensuring that the core architecture of the Web is sound, has raised their concern about the publication of both Microdata and RDFa as recommendations. After the W3C TAG raised their concerns, the RDFa Working Group created RDFa Lite 1.1 to be a near feature-equivalent replacement for Microdata that was also backwards-compatible with RDFa 1.0.
  10. Publishing a standard that does almost exactly the same thing as an existing standard in almost exactly the same way is a failure to standardize.

Counter-arguments and Rebuttals

[This is a] classic case of monopolistic anti-competitive protectionism.

No, this is an objection to publishing two specifications that do almost exactly the same thing in almost exactly the same way along the W3C Recommendation publication track. Protectionism would have asked that all work on Microdata be stopped and the work scuttled. The proposed resolution does not block anybody from using Microdata, nor does it try to stop or block the Microdata work from happening in the HTML WG. The objection asks that the W3C decide what the best path forward for Web developers is based on a fairly complicated set of predicted outcomes. This is not an easy decision. The objection is intended to ensure that the HTML Working Group has this discussion before we proceed to Candidate Recommendation with Microdata.

<manu1> I'd like the W3C to work as well, and I think publishing two specs that accomplish basically 
        the same thing in basically the same way shows breakage.
<annevk> Bit late for that. XDM vs DOM, XPath vs Selectors, XSL-FO vs CSS, XSLT vs XQuery, 
         XQuery vs XQueryX, RDF/XML vs Turtle, XForms vs Web Forms 2.0, 
         XHTML 1.0 vs HTML 4.01, XML 1.0 4th Edition vs XML 1.0 5th Edition, 
         XML 1.0 vs XML 1.1, etc.

[link to full conversation]

While W3C does have a history of publishing competing specifications, there have been features in each competing specification that were compelling enough to warrant the publication of both standards. For example, XHTML 1.0 provided a standard set of rules for validating documents that was aligned with XML and a decentralized extension mechanism that HTML4.01 did not. Those two major features were viewed as compelling enough to publish both specifications as Recommendations via W3C.

For authors, the differences between RDFa and Microdata are so small that, for 99% of documents in the wild, you can convert a Microdata document to an RDFa Lite 1.1 document with a simple search and replace of attribute names. That demonstrates that the syntaxes for both languages are different only in the names of the HTML attributes, and that does not seem like a very compelling reason to publish both specifications as Recommendations.

Microdata’s processing algorithm is vastly simpler, which makes the data
extracted more reliable and, when something does go wrong, makes it easier for 1) users to debug their own data, and 2) easier for me to debug it if they can’t figure it out on their own.

Microdata’s processing algorithm is simpler for two major reasons:

The complexity of implementing a processor has little bearing on how easy it is for developers to author documents. For example, XHTML 1.0 had a simpler processing model which made the data that was extracted more reliable and when something went wrong, it was easier to debug. However, HTML5 supported more use cases and recovers from errors in cases where it can, which made it more popular with Web developers in the long-run.

Additionally, authors of Microdata and RDFa should be using tools like RDFa Play to debug their markup. This is true for any Web technology. We debug our HTML, JavaScript, and CSS by loading it into a browser and bringing up the debugging tools. This is no different for Microdata and RDFa. If you want to make sure your markup does what you want, make sure to verify it by using a tool and not by trying to memorize the processing rules and running them through your head.

For what it is worth, I personally think RDFa is generally a technically better solution. But as Marcos says, “so what”? Our job at W3C is to make standards for the technology the market decides to use.

If we think one of these technologies is a technically better solution than the other one, we should signal that realization at some level. The most basic thing we could do is to make one an official Recommendation, and the other a Note. I also agree that our job at W3C is to make standards that the technology market decides to use, but clearly this particular case isn’t that cut-and-dried. Schema.org’s only option in the beginning was to use Microdata, and since authors didn’t want to risk not showing up in the search engines, they used Microdata. This forced the market to go in one direction.

This discussion would be in a different place had Google kept the playing field level. That is not to say that Google didn’t have good reasons for making the decisions that they did at the time, but those reasons influenced the development of RDFa, and RDFa Lite 1.1 was the result. The differences between Microdata and RDFa have been removed and a new question is in front of us: given two almost identical technologies, should the W3C publish two specifications that do almost exactly the same thing in almost exactly the same way?

… the [HTML] Working Group explicitly decided not to pick a winner between HTML Microdata and HTML+RDFa

The question before the HTML WG at the time was whether or not to split Microdata out of the HTML5 specification. The HTML Working Group did not discuss whether the publishing track for the Microdata document should be the W3C Note track or the W3C Recommendation track. At the time the decision was made, RDFa Lite 1.1 did not exist, RDFa Lite 1.1 was not a W3C Recommendation, nor did the RDFa and Microdata functionality so greatly overlap as they do now. Additionally, the HTML WG decision at that time states the following under the “Revisiting the issue” section:

“If Microdata and RDFa converge in syntax…”

Microdata and RDFa have effectively converged in syntax. Since Microdata can be interpreted as RDFa based on a simple search-and-replace of attributes that the languages have effectively converged on syntax except for the attribute names. The proposal is not to have work on Microdata stopped. Let work on Microdata proceed in this group, but let it proceed on the W3C Note publication track.

Closing Statements

I felt uneasy raising this issue because it’s a touchy and painful subject for everyone involved. Even if the discussion is painful, it is a healthy one for a standardization body to have from time to time. What I wanted was for the HTML Working Group to have this discussion. If the upcoming poll finds that the consensus of the HTML Working Group is to continue with the Microdata specification along the Recommendation track, I will not pursue a W3C Formal Objection. I will respect whatever decision the HTML Working Group makes as I trust the Chairs of that group, the process that they’ve put in place, and the aggregate opinion of the members in that group. After all, that is how the standardization process is supposed to work and I’m thankful to be a part of it.

16 Comments

Got something to say? Feel free, I want to hear from you! Leave a Comment

  1. Ora Lassila says:

    Well said! I agree with your position. What can I do to help?

    • ManuSporny says: (Author)

      Well, the low-hanging fruit is to send an e-mail to public-html-comments@w3.org stating that you would like to see Microdata published as a W3C Note instead of a W3C Rec. You might also want to alert the W3C Advisory Board that this discussion is happening as we’d like to get some feedback on them since these sorts of decisions affect W3C Policy.

      Further help would be to help us make the http://rdfa.info/ website better (more tutorials, tech demos, etc.) – don’t know if you have any designers/engineers that would be interested in helping out on that front? We have a github repository here that you can submit patches for (and we’re very happy with even the smallest content or code submission): https://github.com/rdfa/rdfa-website

  2. Pete B says:

    I think that web developers are already confused.

    As far as I’ve understood there is a good reason for using Microdata over RDF because the latter is incompatible with HTML.

    If RDF can be made compatible with HTML, as I see there are some efforts in the works, I’d expect a hard fight trying to convince the developer community after the wholesale shift away from XHTML and its stylings.

  3. Dennis Erny says:

    I agree whole-heartedly — RDFa Lite just seems the more elegant solution to me and I am using it extensively at http://reuze.me.

    Not having to litter markup with itemscope(?) should be enough of a reason to get behind RDFa Lite on it’s own.

    If not for the head-start Microdata had with Google, this conversation would be moot.

  4. Shelley says:

    Bottom line is consistency.

    How can we depend on anything in the W3C if it allows a drastic change of state whenever one member organization demands it?

    Metadata annotation is for the long term. Because it’s for the long term, we need a specification that we know is going to be around for the long term. There has been, consistently, a strong core of people behind RDF, and then eventually RDFa and now RDFa Lite that demonstrates a level of consistent support. Microdata was a one-man spec developed in a weekend in a fit of spite. But that one man works for a powerful tech company, and that seems to be enough.

    More importantly, if the W3C allows Microdata, does this mean we’ll end up with a third such spec in the future whenever some other powerful company demands it? A fourth?

    If people have problems with RDF/RDFa/RDFa Lite, work with the group to fix them. Don’t just have kittens and create your own spec.

  5. William Greenly says:

    Dropped a note on the public mailing list. Can’t see how they can ignore the RDFa standard. Hope its not a case of the ‘not invented here’ mentality or worse, ignorance.

    • ManuSporny says: (Author)

      Thanks William, much appreciated. I don’t think they’re ignoring RDFa, nor is it ignorance. There might be a bit of NIH going on, but typically when things are this controversial, W3C shies away from picking something and instead pushes two options in an effort to get the market to decide what the final standard should be. The down-side is the economic waste of folks having to rewrite their systems when the final standard emerges. The whole point of standardization is to standardize on /something/… not two things that do almost exactly the same thing. We’ll see what the final decision is going to be, but your e-mail to the mailing list helps direct that decision.

  6. H.E.A.T. says:

    @shelly hit the nail right on the head.

    RDFa is indeed a far superior technology than Microdata. The primary reason is because of its vocabulary extensibility. As with XHTML 1.1, the problem was never in the specification, but in the implementation.

    By implementing Microdata over RDFa, schema.org made a LOUD statement about RDFa to the developer community; a community heavily reliant on SEO (for whatever reason). By leveraging its position in the search market, schema.org basically told everyone that Microdata is the better technology.

    Is this true? No, but it doesn’t matter. XHTML 1.1 is better than HTML 5, but XHTML 1.1 was never implemented fully by the browser vendors thus making the specification appear flawed. So a rebel group created HTML 5, which is like a glass container as compared to XHTML 1.1.

    RDFa should have been positioned to create another slice in the separation principle for web authoring: the separation of semantics from content. With the Role attribute for adding semantics to structural elements, the age of truly professional web authoring was just over the horizon.

    Instead, one member faction of the W3C grabs its basketball and goes home to the WHATWG and another member faction grabs the rim and goes home to schema.org. The problem is not in objecting to the movement of the Microdata specification, but cowering to the usurpers within the consortium.

  7. mattur says:

    Perhaps W3C TAG should setup a Microdata/RDFa Task Force.

  8. Over the semantic Web popularized by TBL TED Talk of Linked Data, search is just cross-cutting concern and schema.org vocab giving it same weightage as domain specific aspects covered by other vocabs is forcing ready-made garments one size fits all by abusing Google’s internet presence for defending the ego of it’s employee working as WHATWG editor. WHATWG needs mature human editor instead of wheels re-inventor. As an engineer, I’m shamed by such unethical rip-off of RDFa, for self-glorifying cross-cutting concern over domain specific aspects, by my peer. Intelligence should serve purpose such as taking wheels to next level or must exhibit novelty over rip-off.

  9. Lin Clark says:

    In this post, you take a quote from an email I wrote, “Microdata’s processing algorithm is vastly simpler…” You respond by saying:

    “The complexity of implementing a processor has little bearing on how easy it is for developers to author documents… Additionally, authors of Microdata and RDFa should be using tools like RDFa Play to debug their markup.”

    As I pointed out in that same email (in a part which you didn’t include), the RDFa 1.1 parsing tools give different results with a relatively high frequency. For example, you state that authors should use RDFa Play to test their data. However, RDFa Play doesn’t work for pages that Drupal outputs, as pointed out in an issue 8 months ago [1].

    This means that users will get vastly different results if they try the same page in RDFa Play vs the W3C Distiller. You can try this with an example [2]. What is a content author supposed to do with those results?

    As long as the tools give results that are as inconsistent as they currently are, we cannot pretend that authoring RDFa is as easy as authoring microdata.

    [1] https://github.com/rdfa/rdfa-website/issues/10
    [2] http://openspring.net/blog/2011/09/30/schemaorg-rich-snippets-drupal-7-rdfa

Trackbacks for this post

  1. Microdata, RDFa, 语义超摩尔定律 « 语义噪声
  2. Should Microdata Become a W3C Standard? - semanticweb.com

Leave a Comment

Let us know your thoughts on this post but remember to play nicely folks!