Blindingly Fast RDFa 1.1 Processing

The fastest RDFa processor in the world just got a big update – librdfa 1.1 has just been released! librdfa is a SAX-based RDFa processor, written in pure C – which makes it very portable to a variety of different software and hardware architectures. It’s also tiny and fast – the binary is smaller than this web page (around 47KB), and it’s capable of extracting roughly 5,000 triples per second per CPU core from an HTML or XML document. If you use Raptor or the Redland libraries, you use librdfa.

The timing for this release coincides with the push for a full standard at W3C for RDFa 1.1. The RDFa 1.1 specification has been in feature-freeze for over a month and is proceeding to W3C vote to finalize it as an officially recognized standard. There are now 5 fully conforming implementations for RDFa in a variety of languages – librdfa in C, PyRDFa in Python, RDF::RDFa in Ruby, Green Turtle in JavaScript, and clj-rdfa in Clojure.

It took about a month of spare-time hacking on librdfa to update it to support RDFa 1.1. It has also been given a new back-end document processor. A migration from libexpat to libxml2 was performed in order to better support processing of badly authored HTML documents as well as well formed XML documents. Support for all of the new features in RDFa 1.1 have been added, including the @vocab attribute, @prefix, and @inlist. Full support for RDFa Lite 1.1 has also been included. A great deal of time was also put into making sure that there were absolutely no memory leaks or pointer issues across all 700+ tests in the RDFa 1.1 Test Suite. There is still some work that needs to be done to add HTML5 @datetime attribute support and fix xml:base processing in SVG files, but that’s fairly small stuff that will be implemented over the next month or two.

Many thanks to Daniel Richard G., who updated the build system to be more cross-platform and pure C compliant on a variety of different architectures. Also thanks to Dave Longley who fixed the very last memory leak, which turned out to be a massive pain to find and resolve. This version of librdfa is ready for production use for processing all XML+RDFa and XHTML+RDFa documents. This version also supports both RDFa 1.0 and RDFa 1.1, as well as RDFa Lite 1.1. While support for HTML5+RDFa is 95% of the way there, I expect that it will be 100% in the next month or two.

Leave a Comment

Let us know your thoughts on this post but remember to play nicely folks!