Should we provide a way for Web developers to shorten URLs? This question is at the core of a super-geeky argument spanning three years about how we can make the Web better. This URL shortening technology was quietly released with RDFa in 2008 and is known as Compact URI Expression, also known as the CURIE.
We, the Linked Data community, thought that this would be handy for Web developers that want to add things like Google Rich Snippet markup to their web pages. Rich snippets basically allow Web developers to mark up things like movies, people, places and events on their pages so that a search engine can find them and display better search listings for people looking for those particular movies, people, places or events. Basically, RDFa helps people find your web page more easily because the search engines can now better understand what’s on your Web page.
There are over 430 million Web pages that use RDFa today, based just on Drupal 7′s release numbers (Drupal 7 includes RDFa by default), there will be over 350,000 websites publishing RDFa in 2 years. So, RDFa and CURIEs are already out there, they’re being used successfully, and they’re helping search engines better classify the Web.
It may come as a surprise to you, then, that the HTML Working Group (the people that manage the HTML5 standard) is currently entertaining a proposal to remove CURIEs from HTML+RDFa. Wait, What!? You read that correctly – this would break most of those 430 million web pages out there. Based on the way HTML5 works, if a Web server tells your web browser that it’s sending a
text/html document to the browser, the browser is supposed to use HTML5 to interpret the document. If HTML5+RDFa doesn’t have CURIE support, that means that all CURIEs are not going to be recognized. All web pages that are currently deployed and using CURIEs correctly and being served as
text/html (which is most of them, by the way) will break.
What are CURIEs and why are they useful?
CURIEs are a pretty simple concept. Let’s say that you want to talk about a set of geographic coordinates and you want a search engine to find them easily in your Web page. The example below talks about geographic coordinates for Blacksburg, Virginia, USA. Let’s say that anytime someone searches for something near “Blacksburg” in a search engine that you also want your page to show up in the search results. You would have to tag those coordinates with something like the following in your HTML:
<div about="#bburg"> <span property="latitude">37.229N</span> <span property="longitude">-80.414W</span> <div>
Unfortunately, it’s not as simple as the HTML code above because the search engine doesn’t know for certain that you mean the same “latitude” and “longitude” that it’s looking for. To solve this problem, the Web uses URLs to uniquely identify these terms, so the HTML code ends up looking like this:
<div about="#bburg"> <span property="http://www.w3.org/2003/01/geo/wgs84_pos#latitude">37.229N</span> <span property="http://www.w3.org/2003/01/geo/wgs84_pos#longitude">-80.414W</span> <div>
Now the thing that sucks about the code above is that you will have to type those ridiculously long URLs into your HTML every time that you wanted to talk about latitude and longitude. This is exactly what you have to do in Microdata today. There is a better way in RDFa – by using CURIEs, we can shorten what we have to type like so:
<div prefix="geo: http://www.w3.org/2003/01/geo/wgs84_pos#" about="#bburg"> <span property="geo:latitude">37.229N</span> <span property="geo:longitude">-80.414W</span> <div>
Notice how we define a prefix called
geo and then we use that prefix for both latitude and longitude. Now, we didn’t save a great deal of typing above, but imagine if you had to type out 100 or 200 of these types of URLs during the week. How often do you think you would mistype the long form of the URL vs. the CURIE? How difficult would it be to spot errors in the long form of the URL? How much extra typing would you have to do for the long form of the URL? CURIEs make your life easier by reducing the mistakes that you might make when typing out the long form URL and they also save Web developers time when deploying RDFa pages.
In fact, most everyone that we know that uses and deploys RDFa really, really likes CURIEs. Add that to the millions of pages on the Web that use CURIEs correctly and the growing support for them (by deployment numbers) and you would think that this new, useful technology is a done deal.
So, Who Thinks That CURIEs Are Bad?
Ian Hickson, the lead editor of the HTML5 specification, really doesn’t like CURIEs and is leading the charge to remove them from HTML+RDFa. You can read his reasoning here, but in a nut-shell here they are and some quick rebuttals to each point:
- “CURIEs are too difficult for people to understand.” – I have a hard time believing that Web developers are so thick that they can’t understand how to use a CURIE. Web developers are smart and most of them get stuff right, if not the first time, soon thereafter. Sure, some people will get it wrong at first, but that’s a part of the learning process. I would imagine that many Web developers’ first real-world HTML page would be riddled with issues after the first cut – that’s why we have tools to let us know when things go wrong.
- “People may forget to define prefixes.” – Yes, and I may forget to put my pants on before I go out of the house. I’ll find out I’ve made a mistake soon enough because there are real-world consequences for making mistakes like this (especially in the winter-time). If someone forgets to define a prefix, their data won’t show up and their search ranking will stay low. That is, if they don’t first use the tools given to them to make sure that they did the right thing.
- “CURIEs are unnecessary, people don’t have a problem typing out full URLs.” – I don’t know about you, but I have a very big problem remembering this URL:
http://www.w3.org/2003/01/geo/wgs84_pos#latitudevs. the CURIE for that URL –
geo. It was this last one, particularly, that pegged my irony meter. I’ll explain below.
Now, I’m not saying that these dangers are not out there. They certainly are – language design is always a balance between risk and reward, trade-offs in design purity vs. trade-offs in practicality. The people that are building and improving RDFa believe that we have struck the right balance and we should keep CURIEs, even if there is a chance that someone, somewhere will mess it up at some point.
However, I’d like to point out one fatal flaw in the argument that full URLs are not difficult to work with and that CURIEs are not necessary – it’s based upon some “research” that Ian points to in his last point. It’ll be covered in a blog post that I’ll post tomorrow.