The Case for Curies

Should we provide a way for Web developers to shorten URLs? This question is at the core of a super-geeky argument spanning three years about how we can make the Web better. This URL shortening technology was quietly released with RDFa in 2008 and is known as Compact URI Expression, also known as the CURIE.

We, the Linked Data community, thought that this would be handy for Web developers that want to add things like Google Rich Snippet markup to their web pages. Rich snippets basically allow Web developers to mark up things like movies, people, places and events on their pages so that a search engine can find them and display better search listings for people looking for those particular movies, people, places or events. Basically, RDFa helps people find your web page more easily because the search engines can now better understand what’s on your Web page.

There are over 430 million Web pages that use RDFa today, based just on Drupal 7′s release numbers (Drupal 7 includes RDFa by default), there will be over 350,000 websites publishing RDFa in 2 years. So, RDFa and CURIEs are already out there, they’re being used successfully, and they’re helping search engines better classify the Web.

It may come as a surprise to you, then, that the HTML Working Group (the people that manage the HTML5 standard) is currently entertaining a proposal to remove CURIEs from HTML+RDFa. Wait, What!? You read that correctly – this would break most of those 430 million web pages out there. Based on the way HTML5 works, if a Web server tells your web browser that it’s sending a text/html document to the browser, the browser is supposed to use HTML5 to interpret the document. If HTML5+RDFa doesn’t have CURIE support, that means that all CURIEs are not going to be recognized. All web pages that are currently deployed and using CURIEs correctly and being served as text/html (which is most of them, by the way) will break.

What are CURIEs and why are they useful?

CURIEs are a pretty simple concept. Let’s say that you want to talk about a set of geographic coordinates and you want a search engine to find them easily in your Web page. The example below talks about geographic coordinates for Blacksburg, Virginia, USA. Let’s say that anytime someone searches for something near “Blacksburg” in a search engine that you also want your page to show up in the search results. You would have to tag those coordinates with something like the following in your HTML:

<div about="#bburg">
   <span property="latitude">37.229N</span>
   <span property="longitude">-80.414W</span>
<div>

Unfortunately, it’s not as simple as the HTML code above because the search engine doesn’t know for certain that you mean the same “latitude” and “longitude” that it’s looking for. To solve this problem, the Web uses URLs to uniquely identify these terms, so the HTML code ends up looking like this:

<div about="#bburg">
   <span property="http://www.w3.org/2003/01/geo/wgs84_pos#latitude">37.229N</span>
   <span property="http://www.w3.org/2003/01/geo/wgs84_pos#longitude">-80.414W</span>
<div>

Now the thing that sucks about the code above is that you will have to type those ridiculously long URLs into your HTML every time that you wanted to talk about latitude and longitude. This is exactly what you have to do in Microdata today. There is a better way in RDFa – by using CURIEs, we can shorten what we have to type like so:

<div prefix="geo: http://www.w3.org/2003/01/geo/wgs84_pos#" about="#bburg">
   <span property="geo:latitude">37.229N</span>
   <span property="geo:longitude">-80.414W</span>
<div>

Notice how we define a prefix called geo and then we use that prefix for both latitude and longitude. Now, we didn’t save a great deal of typing above, but imagine if you had to type out 100 or 200 of these types of URLs during the week. How often do you think you would mistype the long form of the URL vs. the CURIE? How difficult would it be to spot errors in the long form of the URL? How much extra typing would you have to do for the long form of the URL? CURIEs make your life easier by reducing the mistakes that you might make when typing out the long form URL and they also save Web developers time when deploying RDFa pages.

In fact, most everyone that we know that uses and deploys RDFa really, really likes CURIEs. Add that to the millions of pages on the Web that use CURIEs correctly and the growing support for them (by deployment numbers) and you would think that this new, useful technology is a done deal.

So, Who Thinks That CURIEs Are Bad?

Ian Hickson, the lead editor of the HTML5 specification, really doesn’t like CURIEs and is leading the charge to remove them from HTML+RDFa. You can read his reasoning here, but in a nut-shell here they are and some quick rebuttals to each point:

  • “People might carelessly copy-paste RDFa markup.” – People can do this with HTML5, JavaScript, and XML too, quite easily. Just because people may be careless is not a reason to rip out a technology that is currently being used correctly. Besides, there are tools to tell you when something goes wrong with your RDFa.
  • “CURIEs are too difficult for people to understand.” – I have a hard time believing that Web developers are so thick that they can’t understand how to use a CURIE. Web developers are smart and most of them get stuff right, if not the first time, soon thereafter. Sure, some people will get it wrong at first, but that’s a part of the learning process. I would imagine that many Web developers’ first real-world HTML page would be riddled with issues after the first cut – that’s why we have tools to let us know when things go wrong.
  • “Other technologies don’t use prefixes.” – CSS, C++, JavaScript, XHTML – all of these use re-bindable variables (aka: prefixes). But let’s assume that they didn’t, that still doesn’t mean that prefixes are bad. That’s like saying – your locomotion contraption has this new fangled thing called a “wheel” – none of our horses or riding livestock has this mechanism – therefore it must be bad.
  • “People may forget to define prefixes.” – Yes, and I may forget to put my pants on before I go out of the house. I’ll find out I’ve made a mistake soon enough because there are real-world consequences for making mistakes like this (especially in the winter-time). If someone forgets to define a prefix, their data won’t show up and their search ranking will stay low. That is, if they don’t first use the tools given to them to make sure that they did the right thing.
  • “CURIEs are unnecessary, people don’t have a problem typing out full URLs.” – I don’t know about you, but I have a very big problem remembering this URL: http://www.w3.org/2003/01/geo/wgs84_pos#latitude vs. the CURIE for that URL – geo. It was this last one, particularly, that pegged my irony meter. I’ll explain below.

Now, I’m not saying that these dangers are not out there. They certainly are – language design is always a balance between risk and reward, trade-offs in design purity vs. trade-offs in practicality. The people that are building and improving RDFa believe that we have struck the right balance and we should keep CURIEs, even if there is a chance that someone, somewhere will mess it up at some point.

However, I’d like to point out one fatal flaw in the argument that full URLs are not difficult to work with and that CURIEs are not necessary – it’s based upon some “research” that Ian points to in his last point. It’ll be covered in a blog post that I’ll post tomorrow.

Editorial: The follow-up blog post discussing the irony of the anti-CURIE stance is now available.

5 Comments

Got something to say? Feel free, I want to hear from you! Leave a Comment

  1. For the sake of the argument, let’s momentarily accept that the claim “CURIEs are too difficult for people to understand” is true. I think we also agree that here the reference to “people” are really Web developers, as opposed to a random person in the population. What bothers me about the reasoning in this statement is best summarized by quoting Douglas Engelbart (inventor of the mouse): “If ease of use were the only requirement, we would all be riding tricycles.”

    What I find interesting is that, the reasoning in the “nay to CURIEs”-camp is quite similar to “namespaces considered harmful”-camp. The nay-camps always had their mind made up, and they’ve been trying to leverage their decisions by gathering only the material which seem to support their conclusions. Simply put:

    1. We think X is true.
    2. We will gather only the evidence that shows X to be true.
    3. We present that X is true.

    Poor science, to say the least.

  2. albert says:

    i found it curious how you are referring to RDFA comm as the Linked Data Community; you may have 430million rdfa live, but hCard alone has 2billion, not counting the rest of the Microformats.
    Furthermore, every value that you list RDFA having, Microformats has as well, except they’re based on Web Standards, which makes them invaluably greater.

    I’m not trolling the 3rd front here…..my point is, clearly this disconnect is on both sides, for neither value each other nor the 800lb. gorilla in the room that is actually the leader in semweb currently.

    @sarven your list is right, but it applies to both sides.

    • ManuSporny says:

      Hi Albert,

      I don’t know if you know about my involvement in this area, but I was involved in a very big way in the Microformats community years ago. I am the lead editor of the hAudio specification and have contributed to the Microformats grouping, hmedia, measure, hrecipe, hproduct, and hvideo microformats. I have also documented much of the Microformat’s community’s new microformats process. I am a dyed-in-the-wool Microformatter and it was that experience that I brought into the RDFa Working Group. RDFa 1.1 does not ignore Microformats, it formalizes it in a way that is truly extensible – that was one of the things I’ve been fighting for for the past 3 years. For example, RDFa 1.1 has the concept of “Profiles”. hCard could become one such profile, so Microformats-like markup like this is supported natively:

      <body profile="http://microformats.org/profiles/hcard"> 
      ...
      <div typeof="vcard">
         <span property="fn">Albert Bowden</span>
      </div>
      

      I was not referring to the RDFa Community as the Linked Data Community. The RDFa Community is just one part of the Linked Data community. I referred to the Linked Data community when referring to the notion that URIs need to be shortened. The notion that URIs need to be shortened is not something that is unique to the RDFa community – it exists in RDF/XML, TURTLE, SPARQL and yes, even the Microformats community and the Microdata community. In Microdata, when you do this in Microdata:

      <span itemprop="fn">Albert Bowden</span>
      

      That itemprop=”fn” property will eventually expand to this URI in many semantic web applications (this is an example straight from the Microdata specification):

      http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23:fn

      So, I don’t think I’m ignoring Microformats, and I do believe that I was being accurate when I used the phrase “Linked Data Community”. I believe the concern is where and how we allow URL shortening, not whether we allow it or not.

      Where are you getting the 2 billion number for hCard? I’m not saying that you’re wrong, I would just like to work with the same data you’re working with.

  3. Ran says:

    i do not get it can You not code it both ways? with or without curies

    and i only get on January 30th 2011 23,082 drupal 7 sites.
    http://drupal.org/project/usage/1015392

    • ManuSporny says:

      Yes, you can code it both ways. That is, if you hate CURIEs, you don’t have to use them. You can use full URIs everywhere in RDFa 1.1. As for the Drupal 7 install numbers, I see two Drupal installation statistics pages. The page you pointed to shows 23,082 while the other page ( http://drupal.org/project/usage/drupal ) shows 24,042 now. I’m happy to go with your lower, more conservative number – as I don’t think it really affects the point I was attempting to make. I hadn’t seen that page before you pointed it out to me, so thanks for that – I’ll probably quote the lower numbers in the future.

Leave a Comment

Let us know your thoughts on this post but remember to play nicely folks!