An Uber-comparison of RDFa, Microdata and Microformats

Full disclosure: I am the current Chair of the group at the World Wide Web Consortium that created RDFa. That said, all of this is my personal opinion. I am not speaking on behalf of the W3C or my company, Digital Bazaar. This is just my personal take on the recent events that are unfolding. If you would like to keep up with these events as they happen, you can follow me on Twitter.

There has been a recent discussion at the World Wide Web Consortium (W3C) about the state of RDFa, Microdata and Microformats. The Technical Architecture Group (TAG) is concerned about the W3C publishing two specifications that achieve effectively the same thing in incompatible ways. They are suggesting that both RDFa 1.1 and Microdata, in their current state, should not proceed as official specifications until they become more compatible with one another. The W3C intends to launch a quick examination of the situation to determine whether or not there is room for convergence of these technologies.

To those that are not following this stuff closely, it can be difficult to understand all of the technical reasons this issue has been raised. This post attempts to clarify those technical issues by providing an easy-to-read list of similarities and differences between RDFa, Microdata and Microformats. A simple table summarizing all features across each structured data syntax is listed below. Each feature is linked to a brief explanation of the feature toward the bottom of the page.

Thanks to Jeni Tennison for doing a separate technical analysis for the W3C TAG. This article builds upon her hard work, but has been heavily modified and thus should not be considered as her thoughts on the matter. Writing this article was a fairly large undertaking and there are bound to be issues with parts of the article. Please let me know if there are errors by commenting on the post and I will do my best to fix them and clarify when necessary.

Structured Data in a Nutshell

Note: This post frequently uses the term IRI. For those not familiar with the term IRI, it means “Internationalized Resource Identifier” which is basically a fancy way of saying “a URL that allows western language characters as well as characters from any language in the world, such as Arabic, Japanese Katakana, Chinese ideograms, etc”. The URL in the location bar in your browser is a valid IRI.

Feature RDFa 1.1 Microdata 1.0 Microformats 1.0
Relative Complexity High Medium Low
Data Model Graph Tree Tree
Item optionally identified by IRI Yes Yes No
Item type optionally specified by IRI Yes Yes No
Item properties specified by IRI Yes Yes No
Multiple objects per page Yes Yes Yes
Overlapping objects Yes Yes No
Plain Text properties Yes Yes Yes
IRI properties Yes Yes* No
Typed Literal properties Yes No No
XML Literal properties Yes No No
Language tagging Yes Yes Inconsistent
Override text and IRI content Yes No Text only
Clear mapping to RDF Yes Problematic No
Target Languages 8
(XHTML1, HTML4, HTML5, XHTML5, XML, SVG, ePub, OpenDocument)
2
(HTML5, XHTML5)
4
(XHTML1, HTML4, HTML5, XHTML5)
New Attributes 8
about, datatype, profile, prefix, property, resource, typeof, vocab
5
itemid, itemprop, itemref, itemscope, itemtype
0
Re-used Attributes 5
content, href, rel, rev, src
5
content, src, href, data, datetime
4
class, title, rel, href
Multiple IRI types per object Yes RDF only No
Multiple statements per element Yes No Yes
“Locally scoped” vocabulary terms Yes, via vocab Yes, via itemscope No
Item Chaining Yes Basic No
Transclusion No Yes Yes, via include pattern
Compact IRIs Yes No No
Prefix rebinding Yes No No
Vocabulary Mashups Yes No No
HTML5 time element support Not yet Yes No
Different attributes for different property types Yes
property for text, rel/rev for URLs, resource/content for overrides
No Yes
class for text and rel for URLs
Transform to JSON Yes (RDFa API) Yes (Parser and Microdata DOM API) No
DOM API Yes Yes No
Unified Parser Yes Yes No

Relative Complexity

Relative Complexity is a fuzzy measure of how difficult it is to achieve mastery of a particular structured data syntax. Microformats is by far the easiest to pick up and use. Microdata is a big step up and a bit more complex. RDFa is the most complex to master. There are design trade-offs, the simpler the syntax, the fewer structured data markup scenarios are supported. The more complex the syntax, the more structured data markup scenarios that are supported, but at the cost of making it more difficult for Web developers to master the syntax.

Data Model

The Web is a graph of information. There are nodes (web pages) and edges (links) that connect all of the information together. RDFa uses a graph to model the Web. Microdata and Microformats use a special subset of a graph called a rooted graph, or tree. There are benefits and drawbacks to each approach.

Item optionally identified by IRI

Being able to identify an item on the Web is very useful. If we weren’t able to identify web pages in a universal way, the Web wouldn’t exist as it does today. That is, we couldn’t send someone a link, have them open it and find the same information that we found. The same concept applies to “things” described in Web pages. If we identify these things with IRIs, it becomes easier to be specific about the “thing” we’re talking about.

RDFa example:

<div about="http://example.com/people#manu">...

Microdata example:

<div itemscope itemtype="http://example.com/types/Person" itemid="http://example.com/people#manu">...

Microformats example:

Not supported

Item type optionally specified by IRI

The ability to identify the type of an item on the Web is useful. In Object Oriented Programming (OOP) parlance, this is the concept of a Class. Using an IRI to specify the type of an item lets us universally identify that type on the Web. Instead of a machine having to guess whether an item of type “Person” specified on a Web page is the same type that is familiar to it, we can instead give the item a type of http://example.org/types/Person. Giving the item an IRI type allows us to be sure that two machines are using the same type information.

RDFa example:

<div typeof="http://example.com/types/Person">...

Microdata example:

<div itemscope itemtype="http://example.com/types/Person">...

Microformats example:

Not supported

Item properties specified by IRI

The ability to identify a property, also known as a vocabulary term, associated with an item on the Web is useful. In Object Oriented Programming (OOP) parlance, this is the concept of a member variable. Using an IRI to specify the property of an item lets us universally identify that property on the Web. Instead of a machine having to guess whether a property of type “name” specified on a Web page is the same property that is familiar to it, we can instead refer to the property using an IRI, like http://example.org/terms/name. Giving the property an IRI allows us to be sure that two machines are using the same vocabulary term in a program.

RDFa example:

<span property="http://example.org/terms/name">Manu Sporny</span>

Microdata example:

<span itemprop="http://example.org/terms/name">Manu Sporny</span>

Microformats example:

Not supported

Multiple objects per page

Web pages often describe multiple “things” on a page. The ability to express this information as structured data is a natural extension of a Web page.

RDFa example:

<div about="#person1">...</div>
...
<div about="#person2">...</div>

Microdata example:

<div itemscope itemtype="http://example.com/types/Person" itemid="#person1">...</div>
...
<div itemscope itemtype="http://example.com/types/Person" itemid="#person2">...</div>

Microformats example:

<div class="hcard">.../div>
...
<div class="hcard">.../div>

Overlapping objects

At times, the HTML markup on a page will contain two pieces of overlapping information. For example, two people may be marked up on a web page. Ensuring that the structured data syntax is able to specify which person is being described by the HTML is important because the syntax should not force a Web developer to change the layout of their page.

RDFa example:

<div about="#person1">... Information about Person 1 ...
   <div about="#person2">...</div> ... Information about Person 2 ...
</div>

Microdata example:

<div itemscope itemtype="http://example.com/types/Person" itemid="#person1">
   ... Information about Person 1 ...
   <div itemscope itemtype="http://example.com/types/Person" itemid="#person2">...</div>
      ... Information about Person 2 ...
</div>

Microformats example:

Not supported

Plain Text properties

Most item attributes, such as a person’s name, can be expressed using plain text. It is important that these text attributes can be picked up from the page.

RDFa example:

<span property="name">Manu Sporny</span>

Microdata example:

<span itemprop="name">Manu Sporny</span>

Microformats example:

<span class="fn">Manu Sporny</span>

IRI properties

At times it is important to differentiate between an IRI and plain text. For example, the text string sip:msporny@digitalbazaar.com could be a text string or it could be a valid IRI. While the ability to differentiate may seem trivial, guessing what a valid IRI is and isn’t will never be future proof. It is helpful to be able to understand if a value is a piece of text or an IRI in the data model.

RDFa example:

<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">CC-AT-SA-3.0</a>

While Microdata does allow one to differentiate between IRIs and strings in the syntax, the JSON-based serialization converts all IRIs to string values. This is problematic because it is impossible to differentiate between a string that looks like and IRI and an actual IRI in the JSON serialization. IRI properties are preserved correctly in the RDF serialization of Microdata.

Microdata example:

<a itemprop="license" href="http://creativecommons.org/licenses/by-sa/3.0/">CC-AT-SA-3.0</a>

While Microformats allow you to use IRI information, there is no official data model or mapping to RDF or JSON. Everything is treated as a text string and application logic must be written to determine if a particular data item is meant to be an IRI or text. So, while the markup below is valid – the IRI will be expressed as a text string, not an IRI.

Microformats example:

<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">CC-AT-SA-3.0</a>

Typed Literal properties

Typed literals allow you to express typing information about a property. This is important when you need to specify things like units of measure, or specific kinds of numbers, in a way that doesn’t depend on understanding the language in the unit of measure. For example: Is “+353872206327″ an integer or a phone number? Is “.1E-1″ a float or a text string? Is “false” a boolean value or a part of a sentence? Another example concerns measurements like the kilogram, a unit of weight measurement that can be displayed in a variety of different ways around the world. Being able to express this unit of measurement in structured data in a language-neutral and measurement-neutral way makes it easier for machines to understand the unit of measurement without having to understand the underlying language.

<span property="measure:weight" datatype="measure:kilograms">40</span> килограммов

Microdata example:

Not supported

Microformats example:

Not supported

XML Literal properties

XML Literals are used for properties that contain markup, such as the content of a blog post, SVG or MathML markup that should be preserved in the final output of the structured data parser. This is useful when you want to preserve all markup.

The Quadratic Formula

The formula above is expressed like so in RDFa and MathML:

<span property="math:formula" datatype="rdf:XMLLiteral">
<math mode="display" xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mi>x</mi>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mo form="prefix">−<!-- − --></mo>
        <mi>b</mi>
        <mo>±<!-- ± --></mo>
        <msqrt>
          <msup>
            <mi>b</mi>
            <mn>2</mn>
          </msup>
          <mo>−<!-- − --></mo>
          <mn>4</mn>
          <mo>⁢<!-- &InvisibleTimes; --></mo>
          <mi>a</mi>
          <mo>⁢<!-- &InvisibleTimes; --></mo>
          <mi>c</mi>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn>
        <mo>⁢<!-- &InvisibleTimes; --></mo>
        <mi>a</mi>
      </mrow>
    </mfrac>
  </mrow>
</math>
</span>

Microdata example:

Not supported

Microformats example:

Not supported

Language tagging

The ability to specify language information for plain text is important when pulling data in from the Web. At times, words that are spelled the same in western character sets can mean different things. For example, the word “chat” in English (to have a conversation) is a very different meaning from the word “chat” (cat) in French.

RDFa example:

<span property="name" lang="en">Manu Sporny</span>

Microdata example:

<span itemprop="name" lang="en">Manu Sporny</span>

Language information support is only on a per-microformat basis. Some Microformats do not make any statements about supporting multiple language tags.

Microformats example:

<span class="fn" lang="en">Manu Sporny</span>

Override text and IRI content

At times, the text content in the page is not what you want the machine to extract when reading the structured data. It is important to have a way to override both the text content, and the URL content in an element.

RDFa example:

<span property="candles" content="14">fourteen</span>
...
<a rel="homepage" href="http://example.org/short-url" 
      resource="http://example.org/2011/path-to-real-url">My Homepage</a>

Microdata example:

Not supported

Microformats only supports overriding text content in an element.

Microformats example:

<abbr property="candles" title="14">fourteen</abbr>

Clear mapping to RDF

The Resource Description Framework, or RDF, has been the standard model for the Semantic Web for over a decade. At times it can be overkill for simple structured data projects, but there are many times where it is necessary for some of the more involved to advanced structured data use cases. There is a fairly large, well-developed set of tools for RDF. It is beneficial if the structured data mechanism has a clear way of mapping the syntax to the RDF data model in a way that is useful to the set of existing RDF processing tools.

Since RDFa is built on RDF, the mapping to RDF is well specified. While it is possible to map Microformats to RDF, there is no standard way of doing so. Microdata does map to RDF, but there are a few bugs that are of concern. Namely, Microdata auto-generates RDF property URLs in a way that is not useful to many of the existing RDF processing tools. The issues that have raised objections in the past relate to the usefulness of/centralization of/dereferenceability of the generated IRIs. It has been argued that the IRIs designated for properties in Microdata are problematic as-is and need to be changed. The following example demonstrates how properties in RDFa map to easy-to-understand URLs:

<section vocab="http://schema.org/" typeof="Person">
   <h1 property="name">John Doe</h1>
</section>

which results in the following IRI for the “name” property in RDFa:

http://schema.org/name

This URI is not centrally controlled. It fits in well with the RDF stack. De-referencing the URI leads to a location that is under the vocabulary maintainers control. The Microdata mapping to RDF is a bit less straightforward:

<section itemscope itemtype="http://schema.org/Person">
   <h1 itemprop="name">John Doe</h1>
</section>

The following URI is generated for the “name” property in Microdata:

http://www.w3.org/1999/xhtml/microdata#http%3A%2F%2Fschema.org%2FPerson%23%3Aname

This URI is centrally controlled. It requires extensive mapping to be useful for most RDF stacks. De-referencing the URI leads to a location not under the vocabulary maintainers control.

Target Languages

Most structured data languages are meant to express data in a variety of different languages. RDFa is designed and is officially specified to work in a variety of different languages including HTML5, XHTML1, HTML4, SVG, ePub and OpenOffice Document Format. Microdata was built and specified for HTML5. Microformats re-uses attributes in HTML that have been in use for over a decade.

Having a structured data syntax support as many Web document formats as possible is good for the web because it reduces the tooling necessary to support structured data on the Web.

New Attributes

The complexity of a structured data syntax can be viewed, in part, by how many attributes a Web developer needs to understand to properly use the language. New attributes, while providing new functionality, do increase the cognitive load on the Web developer.

Re-used Attributes

All of the structured data languages re-use a subset of attributes that contain information important to structured data on the Web. There is a delicate balance between re-using too many attributes and creating new attributes.

Multiple IRI types per item

Web developers need to be able to specify that an item on a page is associated with more than one type. That is, a business can be both an “AutoPartsStore” and a “RepairShop”.

RDFa example:

<div typeof="AutoPartsStore RepairShop">...

In Microdata, you can only express multiple types for a single object using itemid to tie the information together and then only see the result in the RDF output. The DOM API would generate two separate items for the markup below, while the RDF output would generate only one item.

Microdata example:

<div itemscope itemid="#fixit" itemtype="http://example.com/types/AutoPartsStore">...</div>
<meta itemscope itemid="#fixit" itemtype="http://example.com/types/RepairShop" />

Microformats example:

Not supported

Multiple statements per element

It is advantageous to use as much of the existing information in an HTML document as possible. At times, one element can contain more than a single piece of structured data. For example, a link can contain both the name of a person as well as a link to their homepage. A structured data syntax should re-use as much of this information as possible.

RDFa example:

<a rel="homepage" href="http://manu.sporny.org/" property="name">Manu Sporny</a>

Microdata example:

Not supported

Microformats example:

<a rel="homepage" href="http://manu.sporny.org/" class="fn">Manu Sporny</a>

“Locally scoped” vocabulary terms

Locally scoped vocabulary terms allow you to create new vocabulary terms on-the-fly that are picked up by the structured data parsers. The use case for this is questionable, as it is considered good practice to have a vocabulary that allows any person or machine to dereference the URL and find out more about the vocabulary term.

RDFa example:

<div vocab="http://schema.org/" typeof="Person">
   <span property="favoriteSquash">Butternut Squash</a>
</div>

Microdata example:

<div itemscope itemtype="http://schema.org/Person">
   <span itemprop="favoriteSquash">Butternut Squash</a>
</div>

Microformats example:

Not supported

Item Chaining

Chaining allows the object of a particular statement to become the subject of the next statement. It is often useful when relating multiple items to a single item or when linking multiple items, like social networks, together. For example, “Manu knows Ivan who knows Sandro who knows Mike”.

<div about="#manu" rel="knows">
   <div about="#ivan" rel="knows">
      <div about="#sandro" rel="knows">
         <div about="#mike">
         ...
</div>

Microdata supports basic chaining, but doesn’t support hanging-rels or reverse chaining.

Microdata example:

<div itemscope itemid="#manu" itemtype="http://schema.org/Person">
   <div itemscope itemid="#ivan" itemprop="knows">
      <div itemscope itemid="#sandro" itemprop="knows">
         <div itemscope itemid="#mike" itemprop="knows">
         </div>
      </div>
   </div>
</div>

It is questionable whether or not Microformats even supports basic chaining. If somebody has a good chaining example for Microformats, please let me know and I’ll put it below.

Microformats example:

No examples of chaining.

Transclusion

Transclusion allows a Web author to specify a set of properties once in a page, such as a business address, and copy those properties to multiple items in a page. RDFa allows doing this by reference, not by making a copy. Microdata allows transclusion both by reference and by copy. Microformats allows transclusion both by reference and by copy.

RDFa example:

Transclusion by copy not supported.

Microdata example:

<span itemscope itemtype="http://microformats.org/profile/hcard" 
      itemref="home"><span itemprop="fn">Jack</span></span>
<span itemscope itemtype="http://microformats.org/profile/hcard" 
      itemref="home"><span itemprop="fn">Jill</span></span>
<span id="home" itemprop="adr" itemscope><span 
      itemprop="street-address">Bottom of the Hill</span></span>

Microformats example:

<span class="vcard">
  <span class="fn n" id="james-hcard-name">
    <span class="given-name">James</span> <span class="family-name">Levine</span>
  </span>
</span>
...
<span class="vcard">
 <object class="include" data="#james-hcard-name"></object>
 <span class="org">SimplyHired</span>
 <span class="title">Microformat Brainstormer</span>
</span>

Compact IRIs

Compact IRIs allow Web developers to compress URLs so that they are easier to author. This allows more compact markup and reduces errors because it is no longer necessary to type out full URLs.

RDFa example:

<div prefix="dc: http://purl.org/dc/terms/">
...
   <span property="dc:title">...
   <span property="dc:creator">...
   <span property="dc:abstract">...
</div>

Microdata example:

Not supported

Microformats example:

Not supported

Prefix rebinding

Enabling prefix declaration and rebinding supports decentralized vocabulary development and management. Prefix rebinding allows Web developers to create vocabularies that are specific to their domain of expertise and use them in a way that is inter-operable with other RDFa processors. Microdata and Microformats do not specify a prefix declaration and rebinding mechanism. Microdata does allow custom vocabularies using the itemtype attribute and therefore does support decentralized vocabulary development, but not decentralized vocabulary management, unless full IRIs are used to express the vocabulary terms.

RDFa example:

<div prefix="dc: http://purl.org/dc/terms/">
...

Microdata example:

Not supported

Microformats example:

Not supported

Vocabulary Mashups

Enabling multiple Web vocabularies to be mashed together into simple vocabulary terms is useful when creating application specific “vocabulary profiles”. Using a vocabulary profile, these simple vocabulary terms can be re-mapped to full vocabulary term IRIs which is useful to Web developers that need to simplify markup for a particular business unit, but ensure that the data generated maps to the correct Web vocabularies when used on the open Web.

For example, assume that a Web developer wants to map the vocabulary term “name” to “http://schema.org/name”, and “nickname” to “http://xmlns.com/0.1/foaf/nick”, and “hangout” to “http://example.com/myvocab#homebase”. These mappings could be accomplished in a simple-to-use vocabulary profile like so:

RDFa example:

<div profile="http://example.com/my-rdfa-profile">
...
   <span property="name">...
   <span property="nickname">...
   <span property="hangout">...
</div>

Microdata example:

Not supported

Microformats example:

Not supported

HTML5 time element support

There is a new element in HTML5 called time. This element is used to express human-readable dates and times and also contains a machine-readable value. This element was created as a response to the difficulty that the Microdata community was having when marking up dates and times. The only specification that makes use of the element currently is the Microdata specification. However, there is currently an issue logged against HTML5+RDFa that requests the inclusion of this element so that RDFa processors may understand it. Microformats do not use this element yet, partly because it does not exist in HTML4.

RDFa example:

Not supported

Microdata example:

<time datetime="2011-06-25" pubdate>June 25th 2011</time>

Microformats example:

Not supported

Different attributes for different property types

There is a design trade-off in structured data languages. As the number of statements that a single element can express increases, so does the number of attributes used to express statements. As the number of ways that an element’s value can be overridden increases, so does the number of attributes used to perform the override. Microdata keeps things simple by allowing only one statement to be made per element. Microformats allows class for text, rel for IRIs and title to override text content. RDFa uses the property attribute for text, rel and rev to specify URLs, and resource and content to override IRI and text content, respectively.

Transform to JSON

JSON is a heavily used data transport format used on the Web. It fits nicely into programming environments, so it is beneficial if a structured data syntax can be easily transformed into JSON. Microdata has a native mapping from the parser output to JSON, as well as a DOM API that allows items to be retrieved from the page. The RDFa API provides a mechanism to retrieve data from a page and then serialize that data to JSON.

DOM API

The ability to extract and utilize structured data from a web page in a browser setting is useful for improving interfaces and interactive applications. Microdata provides a simple Microdata DOM API for retrieving items from a web page. RDFa provides a more comprehensive RDFa DOM API for retrieving structured data from a web page. Microformats do not provide an API for extracting structured data from a web page.

Unified Parser

Having a solid set of tooling for handling structured data is important. One of the most important set of tooling are the parsers that are able to process Web documents and extract structured data from those web documents. Both RDFa and Microdata have a unified parser specification, which makes it easier to create inter-operable tools. Microformats require that separate parsers are created for each data format. This may change with the Microformats 2 work, but for now, there is no unified parser specification for Microformats.

Closing

This document will be updated as errors or omissions are found. It can be considered an up-to-date comparison between RDFa, Microdata and Microformats as of June 2011. A follow-up blog post will explain how these structured data languages could be combined into a single structured data language for the Web, achieving the W3C TAG’s goal for unification of the syntaxes used to express structured data on the Web.

59 Comments

Got something to say? Feel free, I want to hear from you! Leave a Comment

  1. If having a feature is generally green even for misfeatures like Compact URIs, why isn’t more New Attributes greener?

    • ManuSporny says: (Author)

      As with all “Feature comparison charts” there are nuances that are lost when color coding whether a feature is “good” or “bad”. The general feeling that I get, and I realize that this “feeling” is skirting very close to over-generalization, is that the number of features are a double-edged sword. There is a spectrum of opinions on each of the features – some think the feature is unnecessary or harmful, some feel the feature is necessary but problematic, some love the feature, and some don’t care. Just because a technology has more features than another technology doesn’t necessarily mean that the technology with more features is better. The question that I tried to ask for each feature was “does the syntax support feature X?”. If a feature is supported, it’s coded green. If it supports a subset, it is coded yellow. If it doesn’t support a feature, it is coded red. If the “feature” was not about something syntax-related, like “Relative Complexity” or “New Attributes”, I tried to capture the general feeling I got from the various communities.

      That is, when we get into the “fuzzier” bits of the comparison chart, I tried to capture what I’ve heard across all structured data communities. For example, the row for complexity was a judgement call, but I don’t think that many would argue with the color coding. “New Attributes” was another such judgement call – in general, I tried to mark languages that required more “New Attributes” as worse since it increases the complexity of the language and thus increases the cognitive load on the Web developer to understand the entire language. I believe that this has been one of your arguments against RDFa and I tried to reflect that in the table.

      However, I also wanted to bring concerns that some in the WHATWG has expressed into focus by outlining that more “New Attributes” is not necessarily better. I think that there are some bits of RDFa that are too complex, but there has not been consensus to drop those features. Many of the folks in the RDFa WG feel this way, but for different features.

      So, New Attributes aren’t color coded greener because in general, many feel that more new attributes does not necessarily mean a better syntax. In fact, many feel the opposite. I realize that is an over-generalization – but then again, that’s what these sorts of comparison charts end up becoming. If you feel that the color should change on a few of the items on the chart, let me know which ones and we can discuss. You may have an argument that I had not considered when making the chart.

  2. Simon Grant says:

    Very helpful & informative!

    The “Compact IRIs” and “Prefix rebinding” features overlap rather a lot don’t they? Perhaps you could factor the difference out more clearly, if they are worth keeping separate? I can see why you’ve put the “Prefix rebinding” in sort-of amber colour, because some people think it is an advantage, others a disadvantage. I think it would be more transparent (honest, clear, plain etc.) if you had a special colour to indicate that it is disputed whether a feature is a good or a bad thing. Perhaps “Compact IRIs” would also be in that colour?

    I shouldn’t ask this here, but here goes anyway: is it really impossible to have a different separator character for Compact IRIs (previously CURIEs)? If it were, it would at a stroke solve the problem of collision with URI/IRI schemes. But I guess there is a good reason why it can’t be so, other than just legacy … ?

    • ManuSporny says: (Author)

      “Compact IRIs” and “Prefix rebinding” do overlap, but not completely. It is possible to have CURIEs without prefix rebinding. That is, if the prefixes are hard-coded you could shorten the IRIs without having any sort of prefix rebinding mechanism. This done via the RDFa 1.1 Default Profile right now – it pre-defines prefixes for ‘dc’, ‘foaf’, ‘gr’, ‘og’, and a variety of other commonly used prefixes. CURIEs that use these prefixes would still work even if you didn’t use the prefix binding mechanism in RDFa 1.1

      Yellow was supposed to be the “disputed/problematic” color… but I do see your point. It would help to understand which features are controversial and which ones aren’t. How about this general rule? Anything where there isn’t green across the chart could be seen as a controversial feature. That’s another way to look at the chart. If you don’t see green across the row, it is a controversial feature.

      I shouldn’t ask this here, but here goes anyway: is it really impossible to have a different separator character for Compact IRIs (previously CURIEs)?

      There is no collision, from a determinism sense, w/ URI/IRI schemes. If there is a prefix defined for the text before the colon, it is a CURIE. If there isn’t, it is an IRI. An RDFa processor never gets confused about this. There is an argument that an author may get confused about this, but there is no non-determinism problem here. We are still discussing whether or not an RDFa processor should kick out a warning if it detects a prefix that is a known Internet protocol scheme.

  3. Lin Clark says:

    Thank you for this helpful write up (and of course thanks to Jeni T for the foundational work). I had been trying to find good comparisons of microdata and RDFa since April and had only found Jeni and Bengee’s. I’m glad that there will be more critical thinking and discussions about the different design decisions now.

    I had a question about the data model. I’ve heard most people say that microdata uses a tree data model. However, can’t you use a graph data model with microdata by using itemid and itemref? There was a related discussion in the #whatwg IRC channel in 2010, http://krijnhoetmer.nl/irc-logs/whatwg/20100424

    Is there any discussion available elsewhere that explains in detail why microdata is limited to more of a tree model than RDFa? I assume there should be since I’ve heard it from so many people, but I haven’t been able to find it.

    • Microdata’s @itemref points to @id, not @itemid. It doesn’t point to another item; it points at a subtree of the page and imports any property/value pairs from that subtree into the current item. So it’s quite different from anything in RDFa. It’s inspired by the “include pattern” used in some Microformats.

      • Lin Clark says:

        Didn’t mean to suggest that itemref points to itemid… just that it could point to something outside of the DOM tree for the entity. This fact may or may not be relevant to the discussion of whether or not microdata’s data model can express a graph.

  4. ManuSporny says: (Author)

    I’m glad that there will be more critical thinking and discussions about the different design decisions now.

    Please be careful with wording like this, Lin. It implies that there had not been critical thinking and discussions about the different design decisions behind RDFa, Microdata and Microformats. There have been – it’s just that the various camps have not been communicating as much as they probably should have been over the past several years.

    I’ve heard most people say that microdata uses a tree data model. However, can’t you use a graph data model with microdata by using itemid and itemref?

    Yes, the whole graph vs. tree discussion is a red herring, IMHO. Trees are just rooted graphs. The second you have a circular reference pointing to the head of a tree, you have a graph. You can easily do this in Microdata. You could conceivably do this in Microformats if there were a Microformat that allowed itself to refer to itself.

    The only real post I’ve seen on the topic of Graph vs. Tree is Henri Sivonen’s post here: Schema.org and Communities

    I think what Henri is alluding to in his post is the form that developers use to work with the data. Many RDF people work with a graph database. Henri seems to be stating that many JavaScript developers prefer to work with a JSON object, which is a form of rooted tree. I tend to agree with Henri on this point, as do a number of the RDF Web Apps Working Group participants, which is one of the reasons that the RDFa API extracts information from a page in tree-form (we call it a Projection).

    • I wasn’t speaking about JS developers per se. For implementors of consuming software in any programming language, working with a graph sucks compared to working without just a tree when you don’t need a graph. The RDF graph model makes sense if you want to aggregate the whole Semantic Web into the same model that you use for any given piece of the Semantic Web. However, if you take a Web page that has info on an event, person, etc. and overlay metadata on it, you’ll notice that the human-visible data isn’t organized as a graph. The graph model is over-complex for overlaying onto any given Web page even if the graph model makes sense when you aggregate the data from many pages. Microdata provides a data model on the level of a single Web page. It doesn’t try to provide you with a data model for aggregation. If you want to aggregate data from many Microdata instances, figuring out how to do that is a problem that Microdata doesn’t solve for you (except to the extent it provides a mapping to RDF so you could use the RDF graph model as the aggregate model without inventing a solution customized for your use case).

      • While we’re making generalizations, it also sucks to work with tools that only support trees when you realize that you need to scale up and actually need a graph.

      • ManuSporny says: (Author)

        I don’t disagree that working with a graph sucks. Your statement that it “makes sense if you want to aggregate the whole Semantic Web into the same model” is needless hyperbole, so I’m going to tone your statement down a bit. I’m going to interpret that statement differently and agree with you that RDF makes sense when you want to aggregate data from a number of pages. Really, it’s not even RDF – it’s a graph. A Graph makes sense when you want to aggregate data from a number of pages. I would even go as far to say that it makes sense when you want to express data in a single page.

        This claim that Microdata uses a tree-based model seems strange to me. For example:

        <div itemscope itemid="#henri" itemtype="http://schema.org/Person">
           <a itemprop="http://xmlns.com/0.1/foaf/knows" href="#manu">Manu</a> knows
        </div>
        <div itemscope itemid="#manu" itemtype="http://schema.org/Person">
           <a itemprop="http://xmlns.com/0.1/foaf/knows" href="#henri">Henri</a>
        </div>
        

        That’s a circular graph expressed in Microdata. The only thing that makes it a tree is when you pick a node to start traversing the tree, or if you flatten the graph into a list of nodes. This is effectively what getItem() does in the Microdata spec. The getProjection() call does effectively the same thing in the RDFa API spec.

        I don’t see this as a differentiator between Microdata and RDFa. Both languages allow you to express graphs of information. Both languages allow you to retrieve a tree-based view of that graph information.

        • Note that while you can create itemref loops like this, doing so is invalid as per the current spec. The spec also goes to a lot of trouble to ensure that such loops are not visible via the API, but it’s quite likely this will change in some way or another after recent implementation feedback from us (Opera).

          • ManuSporny says: (Author)

            No “@itemref loops” were created in the example above – the loop was created using @itemid. Did you mean to say @itemid? Or are you talking about “@itemref loops” and if so – what are those as I’m not familiar with the term?

          • Sorry, I wasn’t paying close enough attention. You’re quite correct that it’s possible to create (almost) any RDF graph using Microdata and itemid. Another way that you could conceivable create graphs is by creating itemref loops, where items include each other by reference. This isn’t valid, though.

        • The example you give doesn’t express a graph. The value created by the a element is an absolute URL. It is *not* an item. The only way to create a value that is an item would be by having itemscope on the a element.

          In Microdata, a URL and an item are two different kinds of value. An item can have properties and values in turn. A URL cannot. This is different from RDFa.

          The difference is in the data model. One *is* a tree, the other *is* a graph. The APIs reflect that.

          • ManuSporny says: (Author)

            You are one level deeper than I am, divining something from the specification that I don’t think holds generally. Come up one level (or go down one level, depending on how you want to look at it). I am using the mathematical/computer science definition for a tree from Wikipedia:

            Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at most one parent node. Furthermore, the children of each node have a specific order.

            The graph of information for the markup above looks like this:

            Diagram of a cyclic graph showing #manu pointing to #henri and #henri pointing to #manu via a foaf:knows relationship.

            … and fails the definition of tree based on its acyclic nature. You can also make information expressed using Microdata break the definition of a “tree” by making a datum have two parent nodes.

            I say this with my tongue firmly planted in my cheek – don’t take it too seriously: This is all pedantic crap that nobody but the standards geeks care about. Who gives a shit!? RDFa and Microdata both allow the expression of graph data and both allow that data to be coerced into a tree-like structure. That’s the important bit – they are no different from one another in that respect. One of them lets you work with the data as both a graph or a tree (RDFa) and the other lets you work with the data, using the DOM API, as a tree (Microdata) but there is no reason you couldn’t write some JavaScript to work with it as a graph.

          • Lin Clark says:

            You are right, an <a>‘s href is an absolute URL. The following does do what was intended, at least as parsed by microdatajs.

            <div itemscope id="henri" itemid="#henri" itemtype="http://schema.org/Person">
            <span itemprop="http://xmlns.com/0.1/foaf/knows" itemscope itemid="#manu" itemtype="http://schema.org/Person">Manu</span> knows
            </div>
            <div itemscope id="manu" itemid="#manu" itemtype="http://schema.org/Person">
            <span itemprop="http://xmlns.com/0.1/foaf/knows" itemscope itemid="#henri" itemtype="http://schema.org/Person">Henri</span>
            </div>

            Now that the #whatwg IRC logs are back up, I blogged some of my thoughts on this issue (too long to post as a comment).

        • Sylvia says:

          I had this same epiphany a month or so ago. With scceas to real semantic links in RDFa and Microformats, it really shows how we’ve been overselling false semantics the last few years.That’s not to say we shouldn’t keep making clean, scannable code we just need to add actual semantics where it’s really needed and not expect our code to speak for itself.

      • Lin Clark says:

        It doesn’t try to provide you with a data model for aggregation. If you want to aggregate data from many Microdata instances, figuring out how to do that is a problem that Microdata doesn’t solve for you

        I disagree on this point.

        The spec says:

        Sometimes, it is desirable to annotate content with specific machine-readable labels… to enable content from a variety of cooperating authors to be processed by a single script in a consistent manner.

        I would definitely read that as support for aggregating data from many microdata instances. Enabling cooperating authors to combine content in the same instance is already a long solved problem, and long solved without microdata, so I can’t imagine why the spec would even mention it.

        Unfortunately, the site with the #whatwg IRC logs is down, but in those logs Hixie is the one saying that the graph made by RDFa is just a list of statements. This same kind of list (with references to urls) can be made using microdata.

  5. Lin Clark says:

    Apologies for any offense, please replace “about” with “comparing” in the above sentence to get the gist. Unfortunately, until now there wasn’t so much comparative evaluation of the design decisions between the different formats. As I said, before I was only able to find Jeni and Bengee’s discussions, even though twitter accounts like LOD2 retweeted my request for suggestions.

  6. Regarding “Override text and IRI content”: Microdata provides the same feature by using the link or meta elements inlined with content. They achieve the same effect as the RDFa content attribute albeit without an end tag showing which part of content text is being overridden.

    • ManuSporny says: (Author)

      Microdata does provide the feature you describe, but that feature is not on parity with the feature described as “Override text and IRIs” by RDFa and Microformats. The key word being “override”. Microdata allows you to express hidden text and hidden links, but they are not “overriding” a particular span of text. RDFa and Microformats, on the other hand, do allow you to at least specify which section of text a particular override relates to.

      • Dhevy says:

        Thanks again Pravir. I saw indeed that aaggergterating > ratingvalue works for Movie, but it doesn’t seem to work for Product, does it? Ebay might have implemented it wrong, but in your examples the aggregate rating is also not shown (only the number of reviews and the price) and we cannot get it to work in tests either.The discrepancy in google.nl and .com is also not a great motivation to stay with microdata, since I do see many examples of sites with microformats that have snippets. And we depend largely on .nl and other local variants of Google

  7. Francisco says:

    Very helpful, nice work.

  8. Robin Berjon says:

    Silly thought. You mention microdata URIs of the form “http://www.w3.org/1999/xhtml/microdata#http%3A%2F%2Fschema.org%2FPerson%23%3Aname” and indicate that “de-referencing the URI leads to a location not under the vocabulary maintainers control.” If instead of a # it used a ?, could we not state that dereferencing it returns a 301 to the query part? It doesn’t seem like a huge change for microdata (which is most likely concerned with the string value) but it could work nicely for the RDF mapping.

    • ManuSporny says: (Author)

      Yes, but that’s a bit overkill isn’t it? Why not just create a new URI minting mechanism that doesn’t depend on mangling the vocabulary URL? For example, one that tacks the property on to the itemtype URL using the ‘#’ character? At least then you don’t have the unnecessary 301 redirect. Granted, if you do that then you’re creating a “name” property for every single thing that could be named, and that’s equally bad (eg. Dog#name, Person#name, Organization#name, etc.). There are many reasonable ways to fix this issue that don’t require a great deal of work. I was just highlighting that there is an issue that needs to be addressed.

  9. About Vocabulary Mashups, the RDFa example you give only uses a single vocabulary, although I presume it is one which extends and reuses other vocabularies? This is equally supported using Microdata, I could publish http://foolip.org/Person and define it as being the same as http://schema.org/Person except that it also has a givenName and familyName borrowed in from foaf, or some such.

    Initially I assumed that you meant mixing vocabularies in a single item. For the record, this is possible in Microdata, if a bit verbose:

    <div itemscope itemtype=”http://schema.org/Person”>
    <span itemprop=name><span itemprop=”http://xmlns.com/foaf/0.1/givenName”>Philip</span> Jägenstedt</span>
    </div>

    • ManuSporny says: (Author)

      Keep in mind that the example provided uses an RDFa Profile, not a vocabulary. You can find out more about this in the RDFa Profiles section of the RDFa Core specification.

      There is a big difference between an RDFa Profile and a Web Vocabulary. I’ve tried to make this more clear by editing the explanation a bit.

      If you were to publish “http://foolip.org/Person”, how would you mark up the equivalence such that a machine could understand that your “Person” and the schema.org Person were the same thing? How would you pull in givenName and familyName in a machine-readable way?

      • Thanks, the distinction wasn’t immediately clear. Both look and feel like a namespace given by a URI, just with somewhat different syntax, scoping and processing requirements. Perhaps you could also show the document at http://example.com/my-rdfa-profile to further clarify?

        Would you expect browsers implementing the RDFa DOM API to fetch http://example.com/my-rdfa-profile? Are you not worried about the DDoS that any popular profile will suffer, kind of similar to the problem with DTD? What should happen if the DOM API is used before that document has been completely fetched and parsed?

  10. Nitpick: datetime is not an element, it is an attribute on the time element

  11. Great work.

    The detailed section corresponding to the “IRI properties” row appears to be missing; the link doesn’t work.

    Not knowing what exactly you mean by “IRI properties”, I have to speculate a bit. I guess the intent is to highlight whether values of IRI-carrying attributes such as @href can be used as data values. I’d say that this is possible in Microformats, and is used heavily in those cases where it’s appropriate, e.g., XFN.

    The example you give for typed literals (unit of measurement as datatype) is an antipattern IMO. A processor is likely to not know the datatype, and will actually be able to do less with it than if you had simply typed it as a number. In fact, “40kg” can be more interoperable than “40″^^something:kilograms because abbreviations for units of measurements are well-standardised, while IRIs for units of measurements are not. Better examples might be: Is “+353872206327″ an integer or a phone number? Is “.1E-1″ a float? Is “false” *actually* a boolean, or just a funky username or whatever? It’s worth pointing out that JSON has datatypes for numbers and booleans.

    At least datetime typed literals might still make it into microdata:
    http://www.w3.org/Bugs/Public/show_bug.cgi?id=12718

    • ManuSporny says: (Author)

      The detailed section corresponding to the “IRI properties” row appears to be missing; the link doesn’t work.

      Woops – thanks! Fixed. Hope that explains it a bit more clearly.

      The example you give for typed literals (unit of measurement as datatype) is an antipattern IMO. A processor is likely to not know the datatype, and will actually be able to do less with it than if you had simply typed it as a number.

      A “processor” or an “application”. An structured data processor isn’t supposed to know anything about what the data means – but you already know that, so I’m assuming you mean “application”. An application that cares about measurements should be able to deal with them without having an insane list of measurements in every language on the face of the planet. That said, I agree with you that there isn’t a good measurement vocabulary for the Web. That’s a terrible shame – it’s not like it would be that difficult to put together. I don’t agree that “40kg” is “more interoperable” any more than I agree that “40 килограммов” is “more interoperable”. As an application writer, don’t make me guess – be explicit.

      That said, I think your examples are better, so I integrated them into the section – thanks. :)

      Glad to see datetime typed literals might make it into Microdata – as always, keep up the good work, Richard. :)

  12. You’re right of course, I meant application. And thanks for adding the new section! I have a quibble with it: Microdata distinguishes between IRIs and what we RDFers call literals only syntactically, but not in the “data model”. Values taken from @href are turned into full absolute URLs, and values taken from are not allowed to be full absolute URLs, so you can always tell them apart, but only syntactically.

    “40 kилограмм” may not be too interoperable, but “40kg” has been a well-defined standard abbreviation for 200 years; these days it’s part of ISO/IEC 80000. The standard list of abbreviations is officially adopted everywhere in the world, except in Burma, Liberia, and—oh well—the US. It’s not *that* hard to properly define a schema that takes values like “40kg”, and it can certainly be done without requiring IRI-named datatypes.

    QUDT is a pretty decent web vocabulary for units of measurements.

    • ManuSporny says: (Author)

      I have a quibble with it: Microdata distinguishes between IRIs and what we RDFers call literals only syntactically, but not in the “data model”.

      Good point. I added some new text to that section. Does this capture your concern?

      While Microdata does allow one to differentiate between IRIs and strings in the syntax, the JSON-based serialization converts all IRIs to string values. This is problematic because it is impossible to differentiate between a string that looks like and IRI and an actual IRI in the JSON serialization. IRI properties are preserved correctly in the RDF serialization of Microdata.

      Regarding the ISO/IEC 80000 standardization of “kg” – I’m concerned that only people that care about proper notation will care to format their text string exactly. Think about representing recipes. I can see many Web developers writing “pounds” or “lbs.” in the US and “kg” or “grams” almost everywhere else. Many people just stick with what is common place in their culture and don’t really care about the ISO/IEC 80000 standardization. Now, if there is something that says: “Make sure to remember to put your datatype information with your amounts”, that might actually stick with some people. Sure, there will be folks that ignore it and choose to use “kилограмм” instead… but hopefully, those people will be few and far between. I can see this problem being addressed using the following “graceful degradation” approach:

      1. The Web Vocabulary states – “Make sure to include @datatype with your measurements”.
      2. The Web Vocabulary states – “If you don’t specify the @datatype, and the value is a number, then kilograms (or whatever is appropriate) will be assumed”.
      3. An application MAY, upon detecting that the measurement is not a number and no @content or @datatype is specified, attempt to decipher the measurement using ISO/IEC 80000.

      QUDT is a pretty decent web vocabulary for units of measurements.

      That’s not necessarily what I had in mind. It’s a good meta-vocabulary for defining measurements. What I would like is something like ISO/IEC 80000 in Web Vocabulary form. So, someone could do something like this for the datatype: “msr:kilogram”, “msr:pound”, or “msr:metersPerSecond”.

  13. Hmm, I think I disagree with a few parts of your chart.

    For “Override text and IRI content” you mark Microdata as “no”. Microdata accomplishes the same thing by allowing you to embed a or containing the data you want right next to the plain-text version. Why is is more acceptable to use an attribute to override rather than a element?

    For “Target Languages”, you mark Microdata as having two while RDFa has 8. I don’t see the essential difference between Microdata and RDFa here that warrants such a distinction. To apply either to SVG, for example, requires you to add new things that aren’t otherwise present in the language. Is it based off a belief that adding attributes (all that is required for RDFa) is better than adding both attributes and elements (probably required for Microdata)?

    The “New Attributes” and “Re-used Attributes” sections are a little fuzzy. For RDFa, arguably ‘content’ is at least partially a “new attribute”, since the pre-existing attribute is only valid on a few (one? don’t recall off the top of my head) elements, while RDFa requires it to be valid on all elements. As well, ‘rev’ is a “new attribute” for HTML5/XHTML5, since it’s no longer a valid attribute in the language.

    I also echo Henri and Philip’s concerns about your color-coding on the “Compact IRIs” and “Vocabulary Mashups” sections, and additionally the “Multiple Statements per Element” section, given that I feel all three are anti-features for complexity and robustness reasons. That’s a much more subjective concern, though.

    (Note: I probably won’t remember to check back here if you respond. If you do so, could you ping me on twitter?)

    • ManuSporny says: (Author)

      Hmm, I think I disagree with a few parts of your chart.

      You wouldn’t be the first. :P

      For “Override text and IRI content” you mark Microdata as “no”. Microdata accomplishes the same thing by allowing you to embed a or containing the data you want right next to the plain-text version. Why is it more acceptable to use an attribute to override rather than a element?

      One reason is that it allows you to attach the override to the actual text it’s overriding. In general, hidden content is bad. Overriding is less bad than inserting hidden content. It also allows search engines to analyze what content was changed and why it was changed. For example, if you are gleaning structured data from a site, and 100% of it is hidden and doesn’t match up with any other text on the page – at worst, the page is probably spam, at best, the semantics for the page are broken. This analysis is easier to do if you have some association between the structured data and text on the page. So, if someone marks up the word “fourteen dollars” using @content “14″ – then, in general, you could reason that the hidden content closely matches what is on the page.

      Looking at it from a purely feature-oriented perspective – the feature was called “Override text and IRI content”, Microdata doesn’t provide a mechanism that allows you to do that.

      For “Target Languages”, you mark Microdata as having two while RDFa has 8. I don’t see the essential difference between Microdata and RDFa here that warrants such a distinction. To apply either to SVG, for example, requires you to add new things that aren’t otherwise present in the language. Is it based off a belief that adding attributes (all that is required for RDFa) is better than adding both attributes and elements (probably required for Microdata)?

      The comment was based off of actual work done by standards bodies to integrate RDFa and Microdata into their languages. XHTML1+RDFa 1.0 integration was done by the Semantic Web Deployment Working Group and the XHTML2 Working Group. HTML4+RDFa1.1, HTML5+RDFa1.1, and XHTML5+RDFa1.1 are being done by the HTML Working Group. XML+RDFa 1.1 is being done by the RDF Web Applications Working Group. SVG+RDFa 1.0 was done by the SVG Working Group. ePub+RDFa 1.1 is being done by International Digital Publishing Forum. OpenDocument+RDFa 1.0 was done by OASIS. HTML5+Microdata is being worked on by HTML WG/WHATWG – no other standards body has picked it up yet. I tried to not make too many “fuzzy calls” in the blog post – tried to stay to statements of fact. So, the basis isn’t what you listed above. The basis is demonstrable work done by standards groups to integrate each structured data syntax into other languages.

      The “New Attributes” and “Re-used Attributes” sections are a little fuzzy. For RDFa, arguably ‘content’ is at least partially a “new attribute”, since the pre-existing attribute is only valid on a few (one? don’t recall off the top of my head) elements, while RDFa requires it to be valid on all elements. As well, ‘rev’ is a “new attribute” for HTML5/XHTML5, since it’s no longer a valid attribute in the language.

      Yeah, good points. I had this definition in my head when putting together the comparisons on “New Attributes” and “Re-Used Attributes”: “Did the attribute exist previously in HTML and if so, is its semantic meaning still roughly the same?”. If the answer to that question was “Yes”, then it was classified as a “Re-Used Attribute”. If it was “No”, then it was marked as a “New Attribute”. I realize there is some fuzziness here. I don’t know how to clearly articulate that via the table. Honestly, I don’t think many folks would care about the nuance – do you?

      I also echo Henri and Philip’s concerns about your color-coding on the “Compact IRIs” and “Vocabulary Mashups” sections, and additionally the “Multiple Statements per Element” section, given that I feel all three are anti-features for complexity and robustness reasons.

      I tried not to mark certain features as “anti-features” as I wanted to avoid making divisive statements. I just listed features because it’s a bit easier to do that as a statement of fact. Regarding your list of anti-features, have you see this yet?

      http://manu.sporny.org/rdfa/rdfa-core-simplified/diff-20110331.html

      The “Basic” version removes Compact IRIs (namely, prefix rebinding) and “Vocabulary Mashups”. “Multiple Statements per Element” is still there, but is it something you could live with if the other two things were removed?

  14. Tab Atkins says:

    (Here’s hoping I can correctly guess your input format.)

    Looking at it from a purely feature-oriented perspective – the feature was called “Override text and IRI content”, Microdata doesn’t provide a mechanism that allows you to do that.

    That’s my point, actually. I think the name of the feature is biasing here. I don’t see a significant difference between the following two snippets:

    <span property=”foo” content=”Replacement content”>Original content</span>

    <span><meta itemprop=”foo” content=”Replacement content”>Original content</span>

    The comment was based off of actual work done by standards bodies to integrate RDFa and Microdata into their languages.

    Ah, ok. I didn’t realize you were basing the numbers off of how many languages had already integrated one of the technologies; I assumed it was more about how many languages could reasonably support it. In that case, while I don’t quite feel that the number is misleading, I do find it less than useful, as integrating Microdata into several of these languages would be the trivial work of a few hours.

    Honestly, I don’t think many folks would care about the nuance – do you?

    Not particularly. I was just pointing it out as a possible thing requiring fixing.

    The “Basic” version removes Compact IRIs (namely, prefix rebinding) and “Vocabulary Mashups”. “Multiple Statements per Element” is still there, but is it something you could live with if the other two things were removed?

    Profiles aren’t a good thing in general – what matters is what the web supports, which often ends up being the most complex version.

    If we pretended that you eliminated everything not found in the Basic Profile, though, I’ve still got some technical reservations. Do you mind if I soapbox for a bit?

    1. I’m still not very happy about multiple properties per element. You can potentially stuff a single element with three different properties, via @rev, @rel, and @property. @rev/rel grab their value from href/src/resource, while @property grabs its value from @content or the textual content.

    2. Chaining is also ridiculously complex for such a simple concept – an object may be specified by literal text, @content, @resource, or, in some cases, depending on context, @about. Subjects are usually specified with @about, except sometimes they’re done with @resource or @src, depending on context. @about usually means you’re starting a new triple, except when it instead means you’re ending a triple through chaining. You can specify multiple properties on a single element by using different attributes, as long as you want to chain all or none of them. Microdata chains in a simple, consistent way – you always first specify the subject with @itemscope, then complete the triple with @itemprop and the value on a descendant element. If you want to chain items object-to-subject, just put @itemprop on an @itemscope element, thus setting the object of the outer triple to the subject of the inner item.

    3. More generally, there are far too many optional ways to do things, for seemingly no better reason than it was possible to disambiguate them in the processing rules. The page itself is the default subject for triples. You set a new subject for nested elements declaring triples by setting @about or @typeof or @src, or @resource if you’re using chaining, or nothing at all if you’re chaining with an anonymous bnode in the middle. Some attributes can declare a subject *or* an object, depending on context (@src, @about, @resource, others?); since you can put the predicate on either the subject or object-declaring element, you have to be familiar with the processing rules and know the surrounding context to figure out what’s meant. You can put subject and predicate and object all on the same element, and the only way to know which is which is to understand exactly how the processing algorithm works: <img src=foo rel=bar resource=baz> and <img src=foo rel=bar about=baz> put @src on opposite sides of their triples. <a href=foo rel=bar>…</a> can put the @href on opposite sides of a triple depending on context. I’m still not entirely sure what happens in <a href=foo rel=bar><a href=baz rel=qux><img src=fnord></a></a> – does it create triples [ (foo, bar, baz), (baz, qux, fnord) ], or [ (foo, bar, _x), (_x, qux, baz) ]?

    Basically, it’s still very apparent that RDFa grew from the use-case “allow RDF to be written in an XML language”, and with a strong emphasis on minimal disruption of the existing source code. This resulted in a fairly strict adherence to the RDF data model, when a slightly different default model (that still maps to RDF) would have been a better fit, and a lot of clever tricks to let RDFa massively leverage existing data without duplicating anything that ended up massively complicating the interpretation of the syntax.

    But now we’re getting farther afield. ^_^

    (Also, your captcha doesn’t like it when I take a couple hours to compose a comment. It expires after a while. It would be cool to change things up so that it lasted forever; for example, by embedding a hashed version of the correct answer in the form, so you can just check the submitted data without having to reference outside data that can expire.)

  15. Ian Hickson says:

    Describing microdata as “structured data” misses the point. Microdata was designed to address a long series of use cases (documented in detail in the WHATWG list at the time of its development); “structured data” just happened to be helpful for addressing some of those use cases. Framing the discussion as being about “structured data” pretty much summarises all that is wrong with this discussion.

  16. Great comparison – thank you for sharing your insights on RDFa.

  17. @Ian Hickson You might be Google’s greatest technician, but Use Cases have human element to them. Your self glorification through abusing standardisation effort is disgrace for those of your peers whose expertise is outside linked data domain. Ripping off from RDFa+HTML & rechristening keywords with item prefix and cutting off spelling above 4 chars, like property to itemprop as well as cribbing that developers never advance in their work to require scoping prefixes called namespaces and fabricating mash-up vocab schema.org thats poor in terms of domain specific aspects and restricted to single cross-cutting concern of search are signs of dullness from overwork. Do lesser intelligence developers like me who build rep through self-help for simple tasks like making web-site one favor by taking one long break..

Trackbacks for this post

  1. Linkwertig: turntable.fm, Semantische Web, Google Health » netzwertig.com
  2. Simon Grant of CETIS » Grasping the future
  3. Schema.org – One Month In - semanticweb.com
  4. Die Welt ist gar nicht so. » Linkschleuder (19)
  5. Linked Data Posts of the Month: New Microdata overlords | AppzData
  6. Develop in the Cloud - Cameron Laird - Enhance Your Application Inexpensively With Microformats
  7. An Introduction to Structured Data Markup | Zoombug
  8. ¿Microformatos o microdatos? | The Bit Jazz Band
  9. Objection to Microdata Candidate Recommendation | The Beautiful, Tormented Machine
  10. L'utilisation des données structurées
  11. Structured Data for Author Pages and Linked Snippets
  12. Structured Data for Author Pages and Linked Snippets | magdalenamedio.com
  13. Science Markup | Jessy Kate Schingler
  14. Basic Vocabulary for schema.org and Structured Data
  15. Uber comparison of rdfa | The Journeyler
  16. Basic Vocabulary for schema.org and Structured Data | The SEO Dr.
  17. Structured Data for Author Pages and Linked Snippets | The SEO Dr.

Leave a Comment

Let us know your thoughts on this post but remember to play nicely folks!