Full disclosure: I’m the current chair of the standards group at the World Wide Web Consortium that created the newest version of RDFa, editor of the HTML5+RDFa 1.1 and RDFa Lite 1.1 specifications, and I’m also a member of the HTML Working Group.
Edit: 2012-12-01 – Updated the article to rephrase some things, and include rationale and counter-arguments at the bottom in preparation for the HTML WG poll on the matter.
The HTML Working Group at the W3C is currently trying to decide if they should transition the Microdata specification to the next stage in the standardization process. There has been a call for consensus to transition the spec to the Candidate Recommendation stage. The problem is that we already have a set of specifications that are official W3C recommendations that do what Microdata does and more. RDFa 1.1 became an official W3C Recommendation last summer. From a standards perspective, this is a mistake and sends a confused signal to Web developers. Officially supporting two specification that do almost exactly the same thing in almost exactly the same way is, ultimately, a failure to standardize.
The fact that RDFa already does what Microdata does has been elaborated upon before:
Here’s the problem in a nutshell: The W3C is thinking of ratifying two completely different specifications that accomplish the same thing in basically the same way. The functionality of RDFa, which is already a W3C Recommendation, overlaps Microdata by a large margin. In fact, RDFa Lite 1.1 was developed as a plug-in replacement for Microdata. The full version of RDFa can also do a number of things that Microdata cannot, such as datatyping, associating more than one type per object, embed-ability in languages other than HTML, ability to easily publish and mix vocabularies, etc.
Microdata would have easily been dead in the water had it not been for two simple facts: 1) The editor of the specification works at Google, and 2) Google pushed Microdata as the markup language for schema.org before also accepting RDFa markup. The first enabled Google and the editor to work on schema.org without signalling to the public that it was creating a competitor to Facebook’s Open Graph Protocol. The second gave Microdata enough of a jump start to establish a foothold for schema.org markup. There have been a number of studies that show that Microdata’s sole use case (99% of Microdata markup) is for the markup of schema.org terms. Microdata is not widely used outside of that context, we now have data to back up what we had predicted would happen when schema.org made their initial announcement for Microdata-only support. Note that schema.org now supports both RDFa and Microdata.
It is typically a bad idea to have two formats published by the same organization that do the same thing. It leads to Web developer confusion surrounding which format to use. One of the goals of Web standards is to reduce, or preferably eliminate, the confusion surrounding the correct technology decision to make. The HTML Working Group and the W3C is failing miserably on this front. There is more confusion today about picking Microdata or RDFa because they accomplish the same thing in effectively the same way. The only reason both exist is due to political reasons.
If we step back and look at the technical arguments, there is no compelling reason that Microdata should be a W3C Recommendation. There is no compelling reason to have two specifications that do the same thing in basically the same way. Therefore, as a member of the HTML Working Group (not as a chair or editor of RDFa) I object to the publication of Microdata as a Candidate Recommendation.
Note that this is not a W3C formal objection. This is an informal objection to publish Microdata along the Recommendation track. This objection will not become an official W3C formal objection if the HTML Working Group holds a poll to gather consensus around whether Microdata should proceed along the Recommendation publication track. I believe the publication of a W3C Note will continue to allow Google to support Microdata in schema.org, but will hopefully correct the confused message that the W3C has been sending to Web developers regarding RDFa and Microdata. We don’t need two specifications that do almost exactly the same thing.
The message sent by the W3C needs to be very clear: There is one recommendation for doing structured data markup in HTML. That recommendation is RDFa. It addresses all of the use cases that have been put forth by the general Web community, and it’s ready for broad adoption and implementation today.
If you agree with this blog post, make sure to let the HTML Working Group know that you do not think that the W3C should ratify two specifications that do almost exactly the same thing in almost exactly the same way. Now is the time to speak up!
Summary of Facts and Arguments
Below is a summary of arguments presented as a basis for publishing Microdata along the W3C Note track:
- RDFa 1.1 is already a ratified Web standard as of June 7th 2012 and absorbed almost every Microdata feature before it became official. If the majority of the differences between RDFa and Microdata boil down to different attribute names (property vs. itemprop), then the two solutions have effectively converged on syntax and W3C should not ratify two solutions that do effectively the same thing in almost exactly the same way.
- RDFa is supported by all of the major search crawlers, including Google (and schema.org), Microsoft, Yahoo!, Yandex, and Facebook. Microdata is not supported by Facebook.
- RDFa Lite 1.1 is feature-equivalent to Microdata. Over 99% of Microdata markup can be expressed easily in RDFa Lite 1.1. Converting from Microdata to RDFa Lite is as simple as a search and replace of the Microdata attributes with RDFa Lite attributes. Conversely, Microdata does not support a number of the more advanced RDFa features, like being able to tell the difference between feet and meters.
- You can mix vocabularies with RDFa Lite 1.1, supporting both schema.org and Facebook’s Open Graph Protocol (OGP) using a single markup language. You don’t have to learn Microdata for schema.org and RDFa for Facebook – just use RDFa for both.
- The creator of the Microdata specification doesn’t like Microdata. When people are not passionate about the solutions that they create, the desire to work on those solutions and continue improve upon them is muted. The RDFa community is passionate about the technology that they have created together and have strived to make it better since the standardization of RDFa 1.0 back in 2008.
- RDFa Lite 1.1 is fully upward-compatible with RDFa 1.1, allowing you to seamlessly migrate to a more feature-rich language as your Linked Data needs grow. Microdata does not support any of the more advanced features provided by RDFa 1.1.
- RDFa deployment is broader than Microdata. RDFa deployment continues to grow at a rapid pace.
- The economic damage generated by publishing both RDFa and Microdata along the Recommendation track should not be underestimated. W3C should try to provide clear direction in an attempt to reduce the economic waste that a “let the market sort it out among two nearly identical solutions” strategy will generate. At some point, the market will figure out that both solutions are nearly identical, but only after publishing and building massive amounts of content and tooling for both.
- The W3C Technical Architecture Group (TAG), which is responsible for ensuring that the core architecture of the Web is sound, has raised their concern about the publication of both Microdata and RDFa as recommendations. After the W3C TAG raised their concerns, the RDFa Working Group created RDFa Lite 1.1 to be a near feature-equivalent replacement for Microdata that was also backwards-compatible with RDFa 1.0.
- Publishing a standard that does almost exactly the same thing as an existing standard in almost exactly the same way is a failure to standardize.
Counter-arguments and Rebuttals
No, this is an objection to publishing two specifications that do almost exactly the same thing in almost exactly the same way along the W3C Recommendation publication track. Protectionism would have asked that all work on Microdata be stopped and the work scuttled. The proposed resolution does not block anybody from using Microdata, nor does it try to stop or block the Microdata work from happening in the HTML WG. The objection asks that the W3C decide what the best path forward for Web developers is based on a fairly complicated set of predicted outcomes. This is not an easy decision. The objection is intended to ensure that the HTML Working Group has this discussion before we proceed to Candidate Recommendation with Microdata.
<manu1> I'd like the W3C to work as well, and I think publishing two specs that accomplish basically the same thing in basically the same way shows breakage. <annevk> Bit late for that. XDM vs DOM, XPath vs Selectors, XSL-FO vs CSS, XSLT vs XQuery, XQuery vs XQueryX, RDF/XML vs Turtle, XForms vs Web Forms 2.0, XHTML 1.0 vs HTML 4.01, XML 1.0 4th Edition vs XML 1.0 5th Edition, XML 1.0 vs XML 1.1, etc.
While W3C does have a history of publishing competing specifications, there have been features in each competing specification that were compelling enough to warrant the publication of both standards. For example, XHTML 1.0 provided a standard set of rules for validating documents that was aligned with XML and a decentralized extension mechanism that HTML4.01 did not. Those two major features were viewed as compelling enough to publish both specifications as Recommendations via W3C.
For authors, the differences between RDFa and Microdata are so small that, for 99% of documents in the wild, you can convert a Microdata document to an RDFa Lite 1.1 document with a simple search and replace of attribute names. That demonstrates that the syntaxes for both languages are different only in the names of the HTML attributes, and that does not seem like a very compelling reason to publish both specifications as Recommendations.
Microdata’s processing algorithm is vastly simpler, which makes the data
extracted more reliable and, when something does go wrong, makes it easier for 1) users to debug their own data, and 2) easier for me to debug it if they can’t figure it out on their own.
Microdata’s processing algorithm is simpler for two major reasons:
- Microdata does not support as many features and use cases as RDFa does.
- RDFa 1.1 is backwards-compatible with RDFa 1.0, which complicates the processing rules. The same is true for HTML5.
The complexity of implementing a processor has little bearing on how easy it is for developers to author documents. For example, XHTML 1.0 had a simpler processing model which made the data that was extracted more reliable and when something went wrong, it was easier to debug. However, HTML5 supported more use cases and recovers from errors in cases where it can, which made it more popular with Web developers in the long-run.
For what it is worth, I personally think RDFa is generally a technically better solution. But as Marcos says, “so what”? Our job at W3C is to make standards for the technology the market decides to use.
If we think one of these technologies is a technically better solution than the other one, we should signal that realization at some level. The most basic thing we could do is to make one an official Recommendation, and the other a Note. I also agree that our job at W3C is to make standards that the technology market decides to use, but clearly this particular case isn’t that cut-and-dried. Schema.org’s only option in the beginning was to use Microdata, and since authors didn’t want to risk not showing up in the search engines, they used Microdata. This forced the market to go in one direction.
This discussion would be in a different place had Google kept the playing field level. That is not to say that Google didn’t have good reasons for making the decisions that they did at the time, but those reasons influenced the development of RDFa, and RDFa Lite 1.1 was the result. The differences between Microdata and RDFa have been removed and a new question is in front of us: given two almost identical technologies, should the W3C publish two specifications that do almost exactly the same thing in almost exactly the same way?
… the [HTML] Working Group explicitly decided not to pick a winner between HTML Microdata and HTML+RDFa
The question before the HTML WG at the time was whether or not to split Microdata out of the HTML5 specification. The HTML Working Group did not discuss whether the publishing track for the Microdata document should be the W3C Note track or the W3C Recommendation track. At the time the decision was made, RDFa Lite 1.1 did not exist, RDFa Lite 1.1 was not a W3C Recommendation, nor did the RDFa and Microdata functionality so greatly overlap as they do now. Additionally, the HTML WG decision at that time states the following under the “Revisiting the issue” section:
“If Microdata and RDFa converge in syntax…”
Microdata and RDFa have effectively converged in syntax. Since Microdata can be interpreted as RDFa based on a simple search-and-replace of attributes that the languages have effectively converged on syntax except for the attribute names. The proposal is not to have work on Microdata stopped. Let work on Microdata proceed in this group, but let it proceed on the W3C Note publication track.
I felt uneasy raising this issue because it’s a touchy and painful subject for everyone involved. Even if the discussion is painful, it is a healthy one for a standardization body to have from time to time. What I wanted was for the HTML Working Group to have this discussion. If the upcoming poll finds that the consensus of the HTML Working Group is to continue with the Microdata specification along the Recommendation track, I will not pursue a W3C Formal Objection. I will respect whatever decision the HTML Working Group makes as I trust the Chairs of that group, the process that they’ve put in place, and the aggregate opinion of the members in that group. After all, that is how the standardization process is supposed to work and I’m thankful to be a part of it.