The False Choice of Schema.org

Full disclosure: I am the current Chair of the group at the World Wide Web Consortium that created RDFa. That said, all of this is my personal opinion – I am not speaking on behalf of the W3C or my company, Digital Bazaar. I am biased, but also have been around long enough to know when freedom of choice on the Web is being threatened.

Some of you may have heard that Microsoft, Google and Yahoo have just released a new uber-vocabulary for the Web. As the site explains, if you use schema.org, you will get a better looking search listing on all of the search listings for Bing, Google and Yahoo. While this may sound good on the surface, it is very bad news for choice on the Web. There are few points that I’d like to make in this post:

  1. RDFa and Microdata markup are similar for the schema.org use cases – they should both be supported.
  2. Microdata doesn’t scale as easily as RDFa – early successes will be followed by stagnation and vocabulary lock-in.
  3. All of us have the power to change this as the Web community – let’s do that. We will release a plan shortly.

The schema.org site makes it appear as if you must pick sides and use Microdata if you want preferential treatment. This is a false choice! They even state that you cannot use RDFa and Microdata and Microformats on the same page as it will confuse their parsers – forcing Web designers to exclusively use Microdata or be lost in the morass of search listings [Edit: Google has since retracted this statement.]. The entire Web community should decide which features should be supported – not just Microsoft or Google or Yahoo. We must not let the rug be pulled out from under us, we must band together and make our voices heard. We must make it very clear that we want to use what is best for us, not what a few people think is best for three large corporations.

Google and Yahoo already support Microformats, Microdata and RDFa in their advanced search services (Google Rich Snippets and Yahoo Search). So, why is it that we cannot continue to use what has been working for our organizations? Of the three, RDFa supports far more communities and is currently used far more heavily than Microdata. So, what possible reasons could they have to now exclude RDFa? Why exclude Microformats?

The patent licensing section alone sent shivers down my spine, but the most glaring concern is the reasoning to use Microdata.

Complexity

Q: Why microdata? Why not RDFa or microformats?
Focusing on microdata was a pragmatic decision. Supporting multiple syntaxes makes documentation for webmasters more complex and introduces more overhead in terms of defining new formats.

Being pragmatic is about balance. It’s true that supporting multiple syntaxes makes documentation and management more complex, so reduction can be good, but at what cost?

RDFa is extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption.

Yes, it is extensible and very expressive. However, I don’t buy the “more complex” argument at all for the schema.org use case. The RDFa 1.1 community has been extremely focused on Web developer feedback and simplifying the markup. For example, take this Microdata snippet from schema.org:

<div itemscope itemtype="http://schema.org/CreativeWork">
   <img itemprop="image" src="videogame.jpg" />
   <span itemprop="name">Resistance 3: Fall of Man</span>
   by <span itemprop="author">Sony</span>,
   Platform: Playstation 3
   Rated:<span itemprop="contentRating">Mature</span>
</div>

and compare against the RDFa 1.1 equivalent:

<div vocab="http://schema.org/" typeof="CreativeWork">
   <span rel="image"><img src="videogame.jpg" /></span>
   <span property="name">Resistance 3: Fall of Man</span>
   by <span property="author">Sony</span>,
   Platform: Playstation 3
   Rated:<span property="contentRating">Mature</span>
</div>

The complexity difference between the two languages for the simple use cases is negligible. Make no mistake – there are politics being played here and we will eventually get to the bottom of this. When you get to more advanced use cases, such as mixing vocabularies, RDFa really shines. In Microdata, your choice in vocabulary is exclusive. In RDFa, your choice in vocabulary is inclusive. That is, you can mix-and-match vocabularies that suit your organization far more easily in RDFa than you can in Microdata. Vocabulary mixing will become far more prevalent as structured data grows on the Web.

Some have argued that some of the more involved features are complex, but the counter argument has always been: Well, don’t use those features. Those features aren’t just there to be purely complex – they were specifically requested by the Web community when building RDFa. Microdata is lacking many of those community-requested features, which does make it simpler, but it also makes it so that it doesn’t solve the problems that the “complex” features were designed for. RDFa is designed to solve a wider range of problems than just those of the search companies. Yes, complexity is bad – but so is cutting features that the Web community has specifically requested and needs to make structured data on the Web everything that it can be.

Adoption

RDFa is extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption.

The “slower adoption” statement is pure bunk. Of Microformats, RDFa and Microdata – RDFa is the only one that has experienced triple digit growth in the last year – 510% growth over the last year, to be exact. There are no such figures for Microdata. If you are going to claim that something has slow adoption, then you have to measure it against something else. Where is the public, hard data to demonstrate that Microdata is growing faster than Microformats and RDFa? Both the Microformats and RDFa communities have provided hard numbers in a public forum. I would suspect that these numbers have not been published for Microdata because they do not exist. If the numbers do exist, they should be made public so that we may check the veracity of this claim.

So it leaves us guessing, slower adoption compared to what? Since when did triple digit adoption figures become not good enough? With claims that go counter to publicly available hard data, it makes it seem as if something fishy is going on here. These numbers will probably not matter in the long run. If Google, Microsoft and Yahoo all said that you need to embed their proprietary markup in pages, people would do it if it meant higher search ranking. The adoption rate of any markup would increase if Google, Microsoft and Yahoo mandated it. That doesn’t mean that it would result in something that is better for the Web.

The False Choice

We will also be monitoring the web for RDFa and microformats adoption and if they pick up, we will look into supporting these syntaxes.

Since Google and Microsoft and Yahoo have said that the new schema.org vocabulary expressed in RDFa isn’t supported, people won’t use it. There are no examples in schema.org RDFa on the entity pages and it’s not even clear if they will index RDFa that expresses schema.org structured data. They’ve created a catch-22 situation. RDFa and Microformats adoption for schema.org will not pick up because they go out of their way to not support it. Even if they were to support extracting the schema.org vocabulary as RDFa, I don’t know how much more RDFa and Microformats would have to “pick up” to qualify. If triple-digit growth isn’t enough, then what is?

Microformats were created in an open and community-driven way. RDFa was created in an open and community-driven way. Schema.org was not, and if it catches on, expect to see it not scale over the long term and an increase in vocabulary lock-in to the major search companies. Which are you going to choose? Facebook’s Like button markup, or Google/Microsoft/Yahoo’s Microdata markup – you are being put into the position of choosing one of those exclusively.

We the publishers, developers and authors of the Web have the power to change this. We need to make it clear that we want to be able to express structured data in whatever language we choose. We create the content that Google, Microsoft and Yahoo index, it is a two-way conversation. Google and Microsoft do not tell us what is worth indexing, that is our choice – our freedom to decide.

Action

Don’t let this freedom be taken away from us and from the rest of the Web. Schema.org is the work of only a handful of people under the guise of three very large companies. It is not the community of thousands of Web Developers that RDFa and Microformats relied upon to build truly open standards. This is not how we do things on the Web.

The feedback form for schema.org is below. Let them know that you want RDFa supported for schema.org as a first class language. Tell them that you want new Microformats continued to be supported if you use it. Let them know that you want to see data backing up the claim that Microdata is the best and only choice. Let them know that you want the vocabularies provided on the schema.org site to go through a public review process. Ask them why they aren’t reusing the good work done by the Microformats community or the many Web vocabulary authors that have already put multiple years into creating solid Web vocabularies. Let them know that you don’t think that a handful of people should decide what will be used by hundreds of millions of people. We should be a part of this decision – let them know that.

We’re getting a plan of action together for those that care about freedom of choice on the Web. I’ll tweet a call to action via @manusporny when it is ready, roughly 1-2 weeks from now.

Thanks to B, T, M, D, and D for reviewing this post and suggesting changes.

79 Comments

Got something to say? Feel free, I want to hear from you! Leave a Comment

  1. Z M Gehlke says:

    What I don’t understand, and perhaps never will, is why this is needed at all.

    Wouldn’t it be possible to specify a format for individual sites to build structures of related tags without any underlying meaning to the tags and then have search indexes build abstract structures out of those tags and derive meaning through comparisons to other abstract structures and similarly named fields within them to look for synonyms?

    And didn’t we do the first half and call it XML?

    So why do we need another system of tagging up things with names? Why shouldn’t this be on the companies to be able to analyze graphs of tags while checking for common terms to identify synonyms? They certainly can do that with content, so why would it be any harder of a problem to solve over the space of tags rather than words?

    • ManuSporny says: (Author)

      You’re describing the field of Knowledge Representation. Yes, there are many ways to do what you’re saying. XML is a form of knowledge representation. See here for why this is needed:

      http://www.youtube.com/watch?v=OGg8A2zfWKg

      The questions that we are publicly contemplating are:

      1. What should the standard way to publish this information be?
      2. Should large companies like Google, Microsoft and Yahoo effectively pick what syntax we use to do it or should Web authors decide?
      3. Should large companies like Google, Microsoft and Yahoo control the Web vocabularies we use to express concepts that go into a search engine, or should the process be a public one, driven by Web authors?

      Schema.org draws a line in the sand that didn’t need to be drawn. It creates a them vs. us mentality and chooses a winner before the community has really had a chance to test-drive the technologies. Especially Microdata – it’s not even fully baked yet – there are still major design flaws that have not been addressed.

      As for not understanding why we need this stuff, maybe this will help:

      http://www.w3.org/TR/rdfa-primer/

  2. André Luís says:

    Hang on… aren’t we missing the bigger picture? Yes, there’s already existing formats out there being used. But with microdata coming out, we knew there was going to be new formats proposed… the bad news is for tools creators, who have to add suport for others schemas. As web authors we have more choice with different advantages.

    It lacks the knowledge of a community behind it, but I’ve heard someone mentioning it’s actually far more practical than other efforts.

    In the end, don’t we all want more semantic data on the web? Is it really such a catastrophe to have several vocabs to the same concepts? Let authors choose…

    Or build and alternative website documenting other ways of adding semantics in ways search enines will still get it. Lots of people will prefer the microformats approach wich is already recommended via rich snippets. Others will prefer rdfa… others microdata. This schema.org was built with microdata in mind. In the end, authors will have the last word.

    Facebook also decided how to build open graph and it didn’t end up being such a bad thing…

    • ManuSporny says: (Author)

      It depends on what you mean by the “big picture”. It is a fairly big announcement that the search companies have effectively given up on pure natural language processing for classifying web pages. That is huge. Up to this point they were saying that structured data wasn’t necessary for search accuracy. It turns out that it helps, so the search companies are adopting structured data in a big way. That is good.

      Now, what is bad is that they’re going a step further and saying that they know which syntax and vocabularies are best for Web authors. Web authors don’t get to choose. They go even further and effectively state that it’s either their way or the highway. You either exclusively use their vocabulary and Microdata to mark up your pages, or you won’t show up in search listings. That is bad.

      The other bad thing is that vocabulary design for the Web shouldn’t be created by a handful of people at large companies – it needs to be reviewed publicly by the Web. Centralized vocabulary design is bad because usually you don’t have the domain experts you need at the search companies to do a good job with the vocabulary. For example, the schema.org vocabulary has entries for http://schema.org/Distance where the measurement is provided as a special text microsyntax. In order for someone to read that value in a library, they would have to understand every measurement in every language. That is, 7 meters would have the following possibilities: “7 m”, “7 m.”, “7 meters”, “7 metres”, “7 méter”, “7 metra”, “7 メートル”, “7 metri” and so on. It’s quite obvious that the vocabulary designers were not aware of this issue at all. Or if they were, they didn’t care about the other people – the developers on the web – that needed to use the data. What’s worse, Microdata doesn’t have support for datatypes, so this becomes even more difficult to fix because there are language issues that need to be overcome.

      The problem with schema.org and Microdata is that they still have design flaws. For example, the RDF that you get out of Microdata is garbage for nested elements.

      As for Facebook – they did the right thing. They chose a subset to use and used it and they were not exclusive about what type of markup you could and could not have on your web page. They also consult with the standards community quite often and get feedback on technical changes they plan to do. They’ve also been very receptive to advice and have wanted to stay compatible with the standards community. We took much of their advice and wrapped it into RDFa 1.1. The reason that Facebook’s OGP work didn’t end up being such a bad thing is because they went about it in an inclusive way – they didn’t force people to choose between RDFa or Microdata. While OGP is only available for RDFa, it doesn’t prevent you from marking up different items using Microdata on the page. Schema.org forces people to choose between RDFa and Microdata – and that is bad for the Web. The unnecessary elimination of choice has always been bad for the Web.

      • Doc Sheldon says:

        “It is a fairly big announcement that the search companies have effectively given up on pure natural language processing for classifying web pages. That is huge. Up to this point they were saying that structured data wasn’t necessary for search accuracy. It turns out that it helps, so the search companies are adopting structured data in a big way. That is good.”
        I agree that it is a good thing, Manu. In fact, I had to wonder when it would dawn on them that although there were still able to handle the massive amount of data analysis via some impressive engineering on their parts (particularly Google, as they are manipulating the lion’s share), they were on borrowed time. It’s nothing short of amazing to me that they’ve done as well as they have, given the overwhelming volume, but eventual scalability issues surely must have been evident to them.
        It has also always puzzled me why they would not be thrilled to get specific semantic input, as their search quality could only improve as a result.
        Puts me in mind of someone so set on controlling things, they they rip the steering wheel from the column in desperation.

      • Yannick says:

        I agree that the facebook approach was better than the one of schema.org, but I also regret that they defined their own vocabulary instead of reusing existing ones.

        • ManuSporny says: (Author)

          Yes, it was unfortunate that they created their own vocabulary. They were concerned that if they didn’t do that, people would get confused by the use of multiple vocabularies or more complex markup. It was a design trade-off that made sense from their standpoint, but wasn’t necessarily clean from a design perspective.

  3. Doc Sheldon says:

    An excellent post, Manu, which echoes my sentiments precisely.

    Even back when Google began pushing microformats in lieu of RDFa, I wondered what motive they could possibly have for championing a subset that was far less extensible and scalable than RDFa. As one of many on the outside, looking in, the only possible conclusion for me was that it was a political decision, deliberately taken in opposition to W3C’s support of RDFa.

    When Google finally embraced Good Relations, I thought it might be an indication that their corporate mindset was changing. Foolish me, it would seem.

    While I agree with you that over 500% growth in implementation is quite respectable (particularly as compared to microformats or microdata, I have felt for some time that the growth was still much slower than it should reasonably be. The benefits to the site owners, the search engines and the users of utilizing RDFa are so great, that I’ve felt as though adoption was moving at a snail’s pace. However, I have my own opinion of the major reason for that pace:

    Put aside the fact that many developers still are not adequately familiar with the technology to make implementation soar overnight, with such disparity regarding which, if any, will become the “standard”, what fool would want to invest a great deal of energy into upgrading a site, only to have it rendered obsolete by a shift in emphasis? Google, in my opinion, is most guilty of prolonging that agony, and by this latest action, effectively undermining all the effort previously expended in W3C’s process, they have drawn a line in the sand.

    Like you, I feel that the most effective vocabularies and ontologies should be chosen by those that use them, It is the librarian’s job to keep the books in order, NOT to mandate which of them each of us should be allowed to read!

    So, yes… it is time to stand up and scream blue murder! Thanks for bringing this to the forefront, Manu.

    • ManuSporny says: (Author)

      I agree with many of your sentiments. Perhaps it’s time for the Web community to make the hard decision to standardize on a single syntax. Up to this point we’ve taken a live and let live mentality when it comes to specifications in the W3C – let the market decide between Microformats, Microdata and RDFa. The market was choosing RDFa. However, we can’t do that when a set of large corporations attempt to force the market one way or the other. We can continue to do that as long as there is an even playing field. Up to this point, there has been one. Google should support Microdata, RDFa, and Microformats. If the search companies don’t want to do that, then perhaps the W3C should kill off RDFa and Microdata (there is still plenty of time to do this) and force the three communities (Microformats, RDFa and Microdata) to decide on a single standard.

      So, the Microdata folks don’t think that the RDFa use cases are important. The RDFa folks have attempted to address the Microdata folks use cases (see the RDFa ISSUEs [66 and 67], but we can only do so much before backwards-compatibility is broken. That has resulted in us not being able to remove some of the features that the Microdata folks wanted to be removed. The features, like prefix declaration, that the Microdata folks want removed have some alternative solutions, but we’d need our charter changed to make those changes. We know that there is a subset that will work for both communities, but neither community has been forced to work with the other. We’ve extended our hand several times, but to no avail. Perhaps this will force the three communities to come together – I would love to see that happen. I’ve been trying to do just that for the past 4 years.

      The other possible path is to just have Google support RDFa. We’ll write all of their documentation and examples for them if they need that to happen. We’d be happy to do it.

    • Pritesh says:

      Thanks a lot for your examples Pravir! However, I still don’t see a solid iemiepmntatlon of the review (aggregaterating) Microdata in snippets, maybe I’m interpreting things wrong?The eBay example gives this preview in the rich snippet tool: https://img.skitch.com/20110926-qspaxgwnk4h8d3h5i668n9mytb.png. As I’ve written in the blogpost, only the number of reviews is displayed and not the rating itself or the price or anything else that is in the microdata.If you look at this page in Google.nl it looks even worse: https://skitch.com/rubzie/fhduy/www.ebay.com-ctg-logitech-revue-97019743-google-zoeken. (here’s the SERP) As you see, only one review is recognized and the reviewer is shown in the SERP. So the aggregaterating is ignored, together with all the other 37 reviews in the microdata, it seems?I checked it at Google.com to be sure, and there it at least looks like the preview: https://img.skitch.com/20110926-8h1enakmr77pdcabcxyup1pa9y.png So apparently at .com it does work, partly, as the preview tool suggests.So is it safe to conclude that it’s not working as one would expect? (i.e. displaying the same data as sites markupped with Microformats)I’m sorry for being so detailed about it but we’re just stressed because we hope to get our snippets back and want to make sure we’re not doing anything wrong This all still feels like going with Schema.org’s recommendations was a bad idea, although we of course would like to support the standards as proposed

  4. Mikael says:

    Good article, very interesting and persuasive !

  5. Hello,

    I’m a french guy, trying to use semantics data for more than one year. The thing is, that it’s hard to set up at the first time because of the fact the every search engine try to choose one format instead of another. I agree with your post, but I think you should see a far more further : with only one format, supported by the 3 importants search engines in France, I think that I will use this one instead of RDF-a. This is not against you, only because it’s supported by the 3 largest search engines in my country, and that I’m sure that it will impact my rankings.

    • ManuSporny says: (Author)

      Yes, and this is exactly my point. If each search engine would support all three languages going forward, you would have a choice. Since they have ignored what the community was doing and picked their favorite, you don’t have a choice. You have to use Microdata exclusively, or you will be ignored by the search companies.

      If you choose to use Microdata – that’s great, it’s your choice. If you are forced to use Microdata – that is bad, because you are not being allowed to choose.

  6. André Luís says:

    From your comment, I think I missed the part where they said it’s their way or the highway.

    From their homepage: “So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters *can* use.” (emphasis mine)

    I haven’t found any indication they will stop supporting microformats or rdfa via rich snippets. Have they said that? link pls?

    I agree with the shortcomings you mentioned, but won’t authors have the last word? I agree there’s reason to be concerned with the advertised weight behind schema.org, but that’s still our (developers) choice, isn’t it?

    • ManuSporny says: (Author)

      They are not going to stop supporting the Microformats and RDFa Rich snippets that they already support (at least for the near future). So, that is good.

      However, see item #3 on this blog post: http://googlewebmastercentral.blogspot.com/2011/06/introducing-schemaorg-search-engines.html

      One caveat to watch out for: while it’s OK to use the new schema.org markup or continue to use existing microformats or RDFa markup, you should avoid mixing the formats together on the same web page, as this can confuse our parsers.

      That’s an either/or statement. Basically, use their older stuff OR, if you want to use the new stuff, you have to use Microdata and schema.org.

      That post says not to mix RDFa, Microformats and Microdata on the same page. It also says to migrate to Microdata if you want to use schema.org. It also only shows examples in Microdata. The vast majority of web developers aren’t going to fully grasp these requirements up front. If they were given a choice between RDFa and Microdata, at least that would level the playing field.

      Web authors do have the last word – but many of them don’t think they do. If Google/Microsoft/Yahoo says jump, most Web authors jump. However, at this stage we should pause for a second and think about the ramifications of what they’re asking us to do… because it impacts choice on the Web.

      • André Luís says:

        Well, to be honest, that’s something I’d never do, double the semantic layer. But I see your point.

        Let’s see how this evolves. I’m fairly more optimistic than most, but I can understand the fears.

        Cheers for writing the post.

  7. Brian Peterson says:

    Why not document a Micro-RDFa, the subset of RDFa that covers Microdata (as you did in your example). Then it would be trivial for schema.org to support the alternate indexing. Then Micro-RDFa has the benefit of allowing for more complex cases and benefiting from all the work that has already been done with RDFa on websites (eg BestBuy, GoodRelations, Drupal, NY Times, BBC, …). These are corporations, so make the business case. I agree with your sentiments, but I can’t see a public uprising happening on the scale that would be needed to make Microsoft and Google make a course correction. Unless the other big companies (BestBuy, BBC, NYT, Drupal, …) joined in your efforts to get schema.org to adopt a Micro-RDFa.

    BP

  8. Dan Thies says:

    Manu,

    Thanks for publishing this – my initial reaction was “cool, they’re all going to support RDFa” but when I read on, I saw that their real goal was to prevent the use of RDFa in favor of very limited microformats supported by a cartel of 3 search engines.

    Why would a cartel want to do that? To stifle innovation. They’re willing to make the web worse for years to come, to preserve their dominant position in the market.

    This whole “you can’t use both because we can’t parse it” thing is just ridiculous.

    • ManuSporny says: (Author)

      Yes, there is no technical reason that they shouldn’t be able to parse both RDFa and Microdata from the same document. The two languages were designed to be able to work within the same HTML5 document. We’ve been able to have documents containing all three types of markup with no problem for quite some time. If there is a technical problem, we’ll have to address it in both the Microdata and RDFa specs as there is currently no knowledge of any technical problems with interference between Microdata and RDFa.

  9. Yes, I am angry the same way…

    • ManuSporny says: (Author)

      Let’s not be angry. To be clear – I’m not angry – I’m concerned. Ideally, I would like the Microformats, RDFa and Microdata communities to come together under a unified banner and propose a single solution that works for all of us. This is where W3C excels – bringing together diverging viewpoints and creating a unified standard from those viewpoints that address the majority of the issues. I know we can do that in this case, we just need to convince the communities to come together. I’ve been trying do to so for some time – maybe this is our chance to finally make that happen.

      • Shelley says:

        Actually, I’m angry, and have no intention of being anything but angry.

        Perhaps it’s time the W3C does get a little more aggressive, before Google et al completely takes over its operation, making it totally ineffective. It’s already come close with HTML5.

        I have never forgotten how Microdata came about. With schema.org, you see the same arrogant, dictatorial action taking place. If the W3C wants to join hands in a friendship circle and hum, “I’m OK, You’re OK, let’s just be friends”, fine. Personally, I prefer meeting such bold faced closed door actions with assertiveness. Stringent assertiveness.

  10. Thank you Manu for taking the time to write this post.

    It has been enlightening for me and raised some concerns that, in all honesty, I would not have considered myself. I would have carried on in a sheep-like fashion assuming schema.org was the correct and only way to go.

    At a very basic level, it concerns me that this is a complex subject and I would imagine there are a lot of people in the industry who would struggle to know which direction to take let alone the site owner who is relying on others to provide guidance… yikes!

    I hope your argument gets support.

  11. Phil Archer says:

    I sympathise greatly, but I fear that the search engines are not required to do anything other than look after their shareholders.

    In an ideal world, sure, RDFa would be the horse everyone backed. It’s self-evidently the superior method of embedding structured data in Web pages for all the reasons you say and know inside out. But is that what the search engines are trying to achieve with schema.org? I fear that it isn’t. They want a way to make it as easy as possible for content producers to make their job, the search engines’ job, as easy as possible. And that’s where their interest stops. A legitimate answer, if a deeply unhelpful one, would be “if you want search engines with more support for RDFa than we’re offering, you’re free to build one.” I’m not saying that’s the right approach. Again, RDFa is much the superior system. But a campaign to get the search engines to adopt it is unlikely to succeed, however ‘right’ the campaign might be.

    It seems to me that schema.org and Microdata are useful tools for disambiguating terms used in text for the benefit of search engines as they operate today. If schema.org survives, and I suspect it will, then those terms will evolve over time, no doubt. The visceral hatred that some folk have for namespaces is akin, in my view, to the codswallop spouted by creationists, anti-vaxxers and climate change deniers – and it may come back to bite them one day (I certainly hope so). But that doesn’t mean that Microdata is a substitute for RDFa which is the way to include actual data within Web pages, data that links in with/uses data published elsewhere as RDF.

    When I began working on what became POWDER, I was told by Google not to expect them to start parsing it immediately as a boost for the uptake of the technology. Rather they said, if there’s a sufficient quantity and quality of data published in any format, the search engines won’t need to be told to look for it. They’ll do it anyway – for their own reasons.

    RDFa has so much to offer. The success of the Good Relations ontology is a really important use case and that’s the kind of thing that those of us that care about the Web should put effort into promoting, rather than trying to dissuade commercial companies from acting in their own interests.

    More on this from a different angle at http://philarcher.org/diary/2011/schemaorg/

    • ManuSporny says: (Author)

      My main purpose with the blog post was to let people know that there was an issue and to empower them to say something about it. The article is resonating with some people, which tells us that this issue extends beyond the RDFa community into the general web developer community. Sure, we may not convince Google/Microsoft and Yahoo to change their minds, but what we can do is highlight why this isn’t necessarily a good thing for the Web. At the very least, it is a teaching moment.

      Thanks for your blog post – you make a number of very good points there. The only thing I would differ with you on is the notion that there is “RDF data”… it’s just data – structured data. If you have it, you can publish it with RDFa in a way that is more flexible than with Microdata.

  12. As you’ve so aptly argued, RDFa is superior to microdata by almost any measure. Chief among the problems with microdata, compared to RDFa, is being locked into a vocabulary. A vocabulary, as you point out, that is not the result of an open – let alone a community-driven – process.

    Like Open Graph, what irks me most about schema.org is the extraordinarily arbitrary nature of the vocabulary. Why these types and properties, and not others? Are these considered the most important for the publisher, the data consumer, or the end user? What of those topical realms that have no representation in the schema?

    Having said all of this, I think there are some reasons why microdata might make sense for the search engines, and that their justifications for favoring microdata over competing languages are not all smoke and mirrors.

    In regard to the adoption of RDFa and difficulties in implementing it, I think the search engines’ claims deserve some further examination, because they’re nuanced. The Mika figures show the explosion in growth of RDFa based on the number of pages deployed, but is silent on the number of domains or datasets this represents: is this RDFa mostly represents big publishers, then the search engines may regard its adoption as “slow” if there’s been little uptake among smaller publishers.

    And yes, the RDFa developer community may have been making strides in simplifying implementation, but for those smaller publishers especially RDFa is more complex than microdata or microformats (it’s interesting to see the relatively large number of hCard pages in Mika’s chart). Drupal 7 aside, publishing platforms haven’t kept pace with easy ways of publishing RDFa (the WordPress hRecipe plugin has had 10,773 downloads to date, as opposed to 1,773 for the RDFa plugin).

    The fact that the schema.org vocabulary is not readily extensible may actually be a benefit as far as the search engines are concerned.

    First of all, a prescriptive vocabulary does actually make implementation easier for everyday webmasters. Either there’s a type and property available for the data a publisher wants to describe, or their isn’t. In this way schema.org microdata is analogous to microformats and Open Graph.

    But perhaps more importantly, a fixed vocabulary surely makes it easier for the search engines to evaluate the veracity of the data offered up in the microdata. A big part of what a search engine does, in contrast to structured data processing in more closed environments, is spam detection and filtering (along with, of course, ranking of any resources that make their way into the index). A limited structured data vocabulary allows for the development of algorithms that can evaluate the content of microdata attributes against topical profiles. In short, the “trust and proof” layers of the cake are vital to generating decent results in an open indexing environment, and a static vocabulary may make it easier for the search engines to evaluate the trustworthiness of structured data.

    Again, this is not to say schema.org is better than RDFa in this respect, and that Google et al. aren’t capable of spam detection with RDFa and microformats (though for the reasons just stated they may have an easier time of it with microformats), or don’t already have decent methods of verifying these formats. It does, however, highlight the fact the search engines’ interests may differ from those of the semantic web community.

    Can this community “demand” that the search engines support any given language that expresses structured data? Of course not. Anymore then, say, the developer community can demand that Apple-made devices run Flash. The search engines – for-profit companies – are free to set whatever rules they want, even if they make bad choices (something search marketers have always known). Compelling the search engines to support a given language means making that language compelling for them: if Google knows it can improve its search results by embracing a given language or protocol, it probably will.

    • BT says:

      | Chief among the problems with microdata, compared to RDFa, is being locked into a vocabulary.

      You are not locked into the schema.org vocabulary, you can use any vocab you want. http://dev.w3.org/html5/md/

      You could use the schema.org vocabulary with RDFa if you wanted to http://schema.rdfs.org/

      | The fact that the schema.org vocabulary is not readily extensible

      what is wrong with this mechanism http://www.schema.org/docs/extension.html ?

      This seems like a vi/emacs debate, but after reading both specs microdata seems easier to me (if not as flexible).

      • ManuSporny says: (Author)

        You are not locked into the schema.org vocabulary, you can use any vocab you want. http://dev.w3.org/html5/md/

        You can use any vocab that you want, as long as it is declared using a full URL , but will it be indexed by Google, Microsoft and Yahoo? That’s what I mean by vocabulary lock-in. Anything that implies “Use our vocabulary or you won’t show up in our search listings” is not a good thing. This hasn’t been Google, Microsoft or Yahoo’s modus-operandi to this point – but that’s always true in the beginning. The best position that the search companies could take is: “We will index whatever is out there and try to make sense of it, including our own vocabulary.”

        I will also point out that in order to mix vocabularies in Microdata, you have to type out the entire URL for every term that you want to mix. For example in Microdata you have to do this: itemprop=”http://purl.org/dc/elements/1.1/title http://www.w3.org/2000/01/rdf-schema#label“. RDFa allows you to use compact URIs, and thus shorten it to property=”dc:title rdfs:label”. The latter is much easier to not screw up when authoring content.

        You could use the schema.org vocabulary with RDFa if you wanted to http://schema.rdfs.org/

        It won’t be indexed, so what’s the point?

        what is wrong with this mechanism http://www.schema.org/docs/extension.html ?

        It duplicates a mechanism that already exists – RDF Schema. It re-invents the wheel. They could’ve just re-used that, even if they wanted to still use Microdata. There is more than 10 years of research behind RDF Schema. The same cannot be said for this new extension mechanism.

        This seems like a vi/emacs debate, but after reading both specs microdata seems easier to me (if not as flexible).

        Then I’m afraid I failed to make my point to you, or you are misunderstanding what I wrote. Here’s an analogy:

        The reason there is a vi/emacs debate is because there is choice in the market. vi works for a subset of people, emacs works for another subset. Google Microsoft and Yahoo are not saying: choose vi or emacs… they’re saying, you don’t get to choose emacs, you must use vi.

    • ManuSporny says: (Author)

      Aaron, I agree with many of the points that you make in your thoughtful response. Note that I didn’t use the word “demand” anywhere in the post. Ultimately, it is the decision of the search companies – but the Web community can help guide that decision. We’re all just people on the web after all – we can learn a great deal from each other. That includes the people working on schema.org and the people in the Microformats, Microdata and RDFa communities.

      • Thanks for your response and, again, for your thought-provoking post. At the risk of being pedantic, you did indeed use the word “demand” (I would hate to think that I had previously quoted you in error!):
        “We the publishers, developers and authors of the Web have the power to change this. We need to demand that support is added for whatever language we want to use to express structured data.”

        • ManuSporny says: (Author)

          You are correct, my apologies! :)

          Upon re-reading that sentence, I don’t like it. It comes off as an ultimatum and I was hoping to have more of an open, collaborative discussion. I’ve updated the sentence and removed the word “demand”: We need to make it clear that we want to be able to express structured data in whatever language we choose.

  13. Tester says:

    STFU! We dont respect W3C its freaking still debating what the dickens to define as HTML5! Lets get things done and leave the busy body’s behind.

    • ManuSporny says: (Author)

      I wish there was something here that I could respond to… maybe you could start by learning about what the W3C and what it does: http://www.w3.org/Consortium/mission#principles

      Or understanding that the W3C is a “consortium” – that is, they are just a collection of all of the technology companies that build stuff for the Web. Attacking them is like saying that you hate what all of the tech companies on the Web are doing. W3C has 325 technology companies that are a part of it: http://www.w3.org/Consortium/Member/List and has thousands of participants, both from the companies and from the general Web community. Standards take time to ensure that they’re right. It is a slow, deliberate process because once a standard is out there – it lives for 10-25+ years if it successful.

  14. John Locke says:

    Many researchers have been dreaming of a “semantic web” where everybody would use tools such as Protégé to write Ontologies with RDF or OWL for their website and everybody would share their ontologies and then there would be some inference engine and semantic agents scouring the web to find information and do some reasoning. Well, this is utopic. In practice, no one is interested in writing complicated ontologies and use these complicated tools like Protégé and complex language like OWL and RDF. Moreover, even the inference engine such as the one for description logics/OWL would not be scalable for the whole web. These kind of works have wasted too much funding in universities already. All these researchers have been exploiting the buzz word of “semantic web” for more than 10 years with few results. Only a few websites are using their technology. Now it is enough!

    What Google, Yahoo and Microsoft did is very good. They decided to scrap all the useless research about OWL, description logics and RDF and use a simplified approach that would actually be easier to use and do what they need and nothing else. Fu*k Protégé. F*ck OWL and RDF and f*ck the inference engines and semantic agents. And also f*ck the semantic web vision of Tim B.-Lee. And Google, Yahoo and Microsoft are not just three corporations. They are the three main search engines. Therefore, if they chose a format, then this format will become the de facto standard.

    Thanks Google, Yahoo and Microsoft. You did a great job. Hope all these researchers working on some utopic research projects will have to look somewhere else to waste their research funding.

    • ManuSporny says: (Author)

      Only a few websites are using their technology.

      Do you have figures to back this statement up? We have some hard data here: http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/

      While I agree that not many people want to do reasoning and design vocabularies, that doesn’t mean that there are other benefits to RDF.

      I tend to think that bashing scientists that spend their lives solving very hard problems that could better society is not the most effective way to make a point. There are reasons that they get funding – they are pushing the limits of what we can do as a society. They are building the theoretical models that will allow us to deal with the ever-expanding set of information that we are generating today.

      I do agree with you that OWL and logics and all that other stuff doesn’t really have use for Web developers today. I personally don’t have much of an interest in it either. However, what happens if we convince everyone to publish structured data? I think we’ll need a way of crawling over that data and reasoning about it. OWL and some of these logics may not be the best way to do that – but you have to start somewhere when tackling a very difficult problem.

  15. David Gerard says:

    With all due respect for your work on this issue, I can’t help but pattern-match this to you asking the fundamental question of the Web: “Why wasn’t I consulted?”

    FWIW, Wikimedia specifically held off adding microdata or RDFa to Wikipedia by default until there was an actual consumer of the data, as Wikipedia would be an 800lb gorilla picking a side – but with schema.org, there looks like there might be an actual consumer of the data. (Nothing has been decided yet, AFAIK. But a substantial use case beats quite a lot of theorising.)

    • ManuSporny says: (Author)

      There are negative technical and societal impacts to this decision – that’s what I’m attempting to bring to the fore. That’s primarily what I’m concerned about. I’m going to write another blog post outlining some of the technical and societal concerns with Microdata. There are other concerns with centralized vocabulary development that the Microformats folks will probably write about soon.

      I think that the Wikipedia community would do a better job of creating vocabularies for the Web than the search companies. The reason being that vocabularies are best designed by domain experts in a collaborative environment – you guys have that in spades. I know you’ve probably heard of dbpedia.org: http://dbpedia.org/About . They turn the Wikipedia infoboxes into semantic identifiers on the Web. That’s brilliant. We need more of that because it’s designed and published by a collaborative community that contains domain experts.

      I’ll elaborate on the technical and societal impacts in a blog post further on in the week.

  16. Michael Houghton says:

    I understand your position. I have no real position on the issue myself. But I would suggest that if you say you aren’t speaking for your committee or your employer and then say:

    “All of us have the power to change this as the Web community – let’s do that. We will release a plan shortly.”

    “We’re getting a plan of action together for those that care about freedom of choice on the Web. I’ll tweet a call to action via @manusporny when it is ready, roughly 1-2 weeks from now.”

    We? Who is we? If you’re calling for people to comment on schema.org’s pages and then use this unspecified “We” for your own actions, this ceases to look like personal opinion to me, and deserves some clarification. If you aren’t speaking for the W3C or your employer, I suggest you avoid using collective pronouns.

    • ManuSporny says: (Author)

      “We” are the W3C, companies involved with semantic web deployment, the RDFa community and the Microformats community. I am not speaking for them in the blog post, but that doesn’t mean that “we” are not coordinating. I hope that clarifies what is meant by “we”.

  17. Dan Brickley says:

    Re “It duplicates a mechanism that already exists – RDF Schema. It re-invents the wheel. They could’ve just re-used that, even if they wanted to still use Microdata. There is more than 10 years of research behind RDF Schema. The same cannot be said for this new extension mechanism.”

    This is mistaken. The guy who basically designed RDFS designed this stuff too – Guha. You might argue with the design choices and lack of concern for standards, but you can’t seriously question his credentials.

    • ManuSporny says: (Author)

      I don’t feel that I did question Guha’s credentials. If it seemed like that, then I apologize – I certainly didn’t mean to imply that his credentials were lacking. In fact, I didn’t even know that it was Guha that designed the vocabulary for schema.org.

      That said, I believe that the first four sentences still stand. I retract the last sentence because it is misleading at best, insulting at worst.

      I question the design choices, but then again – most everyone that supports open, collaborative vocabulary development and standardization is doing that. I am concerned with the lack of concern for standards, if that’s what happened. I believe it’s a bit too early to even say that as we don’t know if that was a motivating factor for the decision. In other words, we are postulating about a lack of concern for standards, but we don’t know for sure because the design process on the vocabulary wasn’t public.

      The point I was attempting to make with that sentence is that, at least with standards, there are many people involved from many different areas. I don’t think the same thing happened with the vocabulary on schema.org, and you seem to confirm that. I think that makes the vocabulary weaker. I don’t think that any one person is smart enough to catch all of the issues in a vocabulary designed from scratch, regardless of their credentials.

    • Shelley says:

      Doesn’t matter who was behind schema.org, or what his credentials are–there is a difference between a mechanism that has seen use and been vetted by many people, than one that was created in a fit of pique, has been rarely tested or used, and certainly hasn’t had the benefit of critique and improvement by more than one person.

  18. Dzonatas says:

    “Microdata doesn’t scale as easily as RDFa – early successes will be followed by stagnation and vocabulary lock-in.”

    I’ve made complete programming language, virtual worlds, natural language processors, dynamic compilers, and more based on such scheme as scheme.org, yet scheme.org allows the strictness of XHTML all in one. I hardly find your statement true.

  19. Matt Harris says:

    manusporny, could you speak about the efforts here: http://schema.rdfs.org/

    Is this a possible solution that you think will gain some speed?

    • ManuSporny says: (Author)

      I think it’s important that schema.org has a mapping to RDF. The schema.rdfs.org site provides it – both Michael and Richard have done an excellent job putting that site together. We need to get some examples up there as a community. However, if the search companies aren’t indexing it, then people won’t see the benefit of doing so.

      What the community should ask schema.org to do, if they’re serious about measuring RDFa adoption vs. Microdata adoption, is to do the following:

      1. RDFa and Microdata markup on the same page are compatible, ensure that the public knows that by removing the statements on schema.org that make it seem like they are incompatible.
      2. Allow both Microdata and RDFa for schema.org in their Rich Snippets testing tool – this will allow people to test both types of markup ensuring that people can choose whichever markup they prefer.
      3. Publicly state that they support RDFa in schema.org – this will give people the impression that they can choose whichever they prefer.
      4. Provide examples of RDFa in schema.org – this would give people easy examples to copy, both for RDFa and for Microdata.

      If they do that, then I think that schema.rdfs.org can take the RDFa community the rest of the way if schema.org doesn’t do #4 above.

  20. I’m glad the big 3 came out with Schema.org. I’m sick and tired of the B.S. open-source standards process. Sure, it’s an ideal scenario. When it works properly. Except it’s totally borked. How long have competing data identification types been out there? Years. And years. W3C is all but a joke when it comes to how long it takes for this kind of issue to be resolved. Quite often with many things NEVER becoming a true standard.

    The fact that some people demand open community based decisions, they’re pie in the sky sometimes. People need to wake up and smell the coffee. Now that Schema.org is here, bickering developers are going to be required to do what actually generates revenue for their company owners, rather than develop things “because I like this method instead”.

  21. Nick says:

    Manu, I think you hit on a very large part of the idea behind the reason for this website without even knowing it when you mentioned:

    “The patent licensing section alone sent shivers down my spine…”

    This could work to G/M/Y’s benefit by essentially making them immune to patent infringement suits (i.e., trolls) using logic like: “if you sue any of us, your webpage will disappear from the internet when we de-license your use of our patents.” Combine that threat with the existing threat of a patent-based counter-suit, and nobody should ever be stupid enough to sue those three companies ever again.

  22. Eric says:

    I’m new to many of these concepts and most certainly naive, but I think that might be a useful perspective to offer. I work for a commercial company that profits from traffic (90+% referred from the big 3) that isn’t interested in the religious battles or even what’s best long term with respect to structured data forms. Schema.org reads as a low friction recommendation and I can almost immediately see the short term ROI potential. When I look at the w3c offerings, I can’t seem to get very far as I feel like I’m about to hit a wall of specs and acronyms and terse legalese…perhaps that’s unfair and perhaps that’s the result of years of trying to wade through RFC’s. The net result is that investment in the structured representation of our data has remained a low priority or at least not considered to be relatively low enough hanging fruit. Schema.org near instantly changed that priority and its schema has near 100% coverage of our structural needs. Am I the exception?

    As an idle observer of “Structured Data” headlines and blogs, to me schema.org feels like a means and not an end. If I’m not the exception, I see significant short term value in this closed world schema that gets me one step closer to the open world schema that extensible frameworks promise.

  23. Lazer says:

    If you like Schema.org microdata, use that. If you like to use RDF, use that. Nobody is forcing anybody to do anything.

    I am more than happy with schema.org. If you want to continue to use RDF, the search engines have stated that you can continue to do so. It is clearly stated on schema.org that “If you have already done markup and it is already being used by Google, Microsoft, or Yahoo!, the markup format will continue to be supported”. It’s about time that someone came out with a straighforward, clear standard that covers exactly what is required. Just my 2c :)

  24. Zak says:

    I don’t think “just because there are no choices” equals something bad as an entirely good argument.

    Making the argument that another vocabulary should have been considered would be different, but at some point you need a Dewey decimal system. For years the knock on Microsoft on the web surrounded standards and Microsoft suffered the clowning of its browsers.

    That the companies who are handling search sat down and came to an agreement on something, a standard, means there is clarity for a web master. I know what to do for me and my customers, and thinking that I lost a choice because of this is just silly in my opinion.

    I think the only important thing you said is that, temporarily, some expressiveness is missing. But from what I read, there is no immediate abandonment for other formats, meaning there is time for deeper expression to be worked out in the vocabulary.

    Since these guys are indexing things, it makes a lot of sense that they come to a conclusion. Otherwise, I suppose, we can picket libraries for how they organize themselves too.

  25. nimmot says:

    this sounds like the BETA vs VHS debate.

  26. croozie says:

    I just find that RDFa is actually easier to implement – especially using DOT.Net.
    Schema.org has failed to return the desired effects, wheras RDFa has always been successful.

  27. H.E.A.T. says:

    Yes, I’m late to the party, but I brought the beer.

    The issue I have with schema.org (and the The Big Three behind its creation) is a matter of group solidarity. Before Google, Microsoft, and Yahoo were a part of schema.org, they were (and still are for some strange reason) members of the W3C.

    Now, the W3C, on its Semantic Web Page, laid out a vision statement for the use of RDF and its derivatives. RDFa already has a Recommendation and is, for lack of a better way of putting, more genetically matched to XML, which is the foundation of the open and semantic web.

    If The Big Three could get together and formulate the use of a non-standard technology (Microdata), then why could they not have applied that effort to implement RDFa, which is a recommendation? The syntax of RDFa is more expressive than Microdata’s and the ability to apply many vocabularies far exceeds the capabilities of Microdata.

    Simply put, Microdata is a closed and highly restrictive version of RDFa using different syntax terminology. I agree that the W3C, as a whole can be extremely slow when it comes to its recommendation process, but I have to wonder about the politics at play in holding up new technology.

    Is RDFa slow on the uptake (and RDFa 1.1 slow in development) because of technical issues, or because alliances within the W3C are pushing their own agenda? Is RDFa a weak format, in fact, or is using Microdata supporting some not-so-obvious plan of The Big Three?

    RDFa would make the web more open and interoperable. schema.org (and its use of Microdata) essentially makes the web proprietary to whomever holds the market share in search.

    I believe that any one who objectively examines both the RDFa and schema.org specifications will conclude that RDFa is indeed the better option. The problem with RDFa, in my opinion, is a matter of implementation across the board. With Google, Microsoft, and Yahoo connecting Microdata with their search engines, schema.org is essentially acting in dissent against their own consortium.

    And the fact that Google, Microsoft, and Yahoo can act in such a disloyal way towards the W3C and still remain members of the consortium says something about the leadership sitting near the top of the consortium.

    small note: Microdata was developed by the WHATWG at the same time when one of the current members of schema.org was over at the WHATWG. A nefarious strategy maybe?

Trackbacks for this post

  1. quite a decent essay from @manusporny that debunks a lot about #schemaorg: http://manu.sporny.org/2011/false-choice/ | 香港新媒體協會
  2. schema.org – Not Too Impressive | My Blog about netlabs.org
  3. What Schema.org Means for SEO and Beyond
  4. Schema.org – Threat or Opportunity? « consultramy
  5. Links 7/6/2011: Platform 11, New Wine | Techrights
  6. Chris Mavergames: Schema.org, Drupal and linked data/web of data (RDFa) – Exquisite Web Creations
  7. Why I won’t adopt Schema.org | Amasijo
  8. Can you use schema.org with Facebook Open Graph? » Malcolm Coles
  9. Intermediate Java | Quibbling over semantics
  10. Thursday Threads: Machine-Meaningful Web Content and Successful IPv6 Test | Disruptive Library Technology Jester
  11. The Future of SEO (as of 2011) - State of Search
  12. The Future of SEO – seo
  13. Reactions to schema.org « Web Three Oh
  14. Will Schema.org Would Limit Web Developer Choices? : Beyond Search
  15. Schema.org - Die wunderbare Welt von Isotopp
  16. The Bogus Call to Arms Against Schema.org | Search Marketing Wisdom
  17. Article: The Future of SEO (as of 2011) | New Fangled Stuff - Small Business Consulting
  18. Schema.org – One Month In - semanticweb.com
  19. Linked Data Posts of the Month: New Microdata overlords | AppzData
  20. The End of Expedia – A Semantic Follow-up « Tsunami Wisdom
  21. Future-Ready Content
  22. Subverting the Open Web: Schema.org’s Scheme to Control Structured Data
  23. Still holding, not selling: Dublin Core vs. Schema.org « Leala Abbott

Leave a Comment

Let us know your thoughts on this post but remember to play nicely folks!