All posts in Standards

The State of W3C Web Payments in 2017

Challenges In Building a Sustainable Web Platform by Manu Sporny and Dave Longley

There is a general API pattern that is emerging at the World Wide Web Consortium (W3C) where the browser mediates the sharing of information between two different websites. For the purposes of this blog post, let’s call this pattern the “Web Handler” pattern. Web Handlers are popping up in places like the Web Payments Working Group, Verifiable Credentials Working Group, and Social Web Working Group. The problem is that these groups are not coordinating their solutions and are instead reinventing bits of the Web Platform in ways that are not reusable. This lack of coordination will most likely be bad for the Web.

This blog post is about drawing attention to this growing problem and suggesting a way to address it that doesn’t harm the long term health of the Web Platform.

The Web Payments Polyfill

Digital Bazaar recently announced a polyfill for the Payment Request and Payment Handler API. It enables two things to happen.

The first thing it enables is a “digital wallet” website, called a payment handler, that helps you manage payment interactions. You store your credit cards securely on the site and then provide the card information to merchants when you want to do a purchase. The second thing it does is enable a merchant to collect information from you during a purchase, such as credit card information, billing address, shipping address, email address, and phone number.

When a merchant asks you to pay for something, you typically:

  1. select the card you want to use,
  2. select a shipping address (for physical goods), and
  3. send the information to the merchant.

Here is a video demo of the Web Payments polyfill in action:

It’s one thing to mock up something that looks like it works, but implementations must demonstrate their level of conformance by passing tests from a W3C test suite. What you will find below is the current state of how Digital Bazaar’s polyfill does against the official W3C test suite:

As you can see, it passes 702 out of 832 tests and we don’t see any barriers to getting very close to passing 100% in the months to come.

The other thing that’s important for a polyfill is ensuring that it works in a wide variety of browsers including Google Chrome, Apple Safari, Mozilla Firefox, and Internet Explorer. Here is the current state, where the blue shows native support in Google Chrome on Android, with the Polyfill making up even more of the support in the rest of the browsers:

The great news here is that this solution is compatible with roughly 3.3 billion browsers today, which is around 85% of the people using the Web. In order to take advantage of this opportunity, just like the native implementations, the polyfill has to be deployed by merchants and payment app providers on their websites.

Credential Handler Polyfill

Digital Bazaar also recently announced that they have created an experimental polyfill for the Verifiable Credentials work. This polyfill enables a “digital wallet” website, called a credential handler, to help you manage your verifiable credential interactions. This feature enables websites to ask you for things like your shipping address, proof of age, professional qualifications, driver’s license, and other sorts of 3rd-party attested information that you may keep in your wallet. Here is a video demo of the Credential Handler polyfill in action:

Like the Web Payments polyfill, this solution is compatible with roughly 3.3 billion existing browsers, which is around 85% of the people using the Web today. Again, this doesn’t mean that 3.3 billion people are using it today, the polyfill still has to be deployed on issuer, verifier, and digital wallet websites to make that a reality.

General Problem Statement

There is a common Web Handler pattern evident in the above implementations that is likely to be repeated for sharing social information (friends) and data (such as media files) in the coming years. At this point, the general pattern is starting to become clear:

  • There is a website, called a Web Handler, that manages requests for information.
  • There is a website, called the Relying Party, that requests information from you.
  • There is a process, called Web Handler Registration, where the Web Handler asks for authorization to handle specific types of requests for you and you authorize it to do so.
  • There is a process, called Web Handler Request, where the Relying Party asks you to provide a specific piece of information, and the Browser asks you to select from a list of options (associated with Web Handlers) that are capable of providing the information.
  • There is a feature that enables a Web Handler to optionally open a task-specific or contextual window to provide a user interface.
  • There is a process, called Web Handler Processing, that generates an information response that is then approved by you and sent to the Relying Party via the Browser.

If this sounds like the Web Intents API, or Web Share API, that’s because they also fit this general pattern. There is a good write up of the reason the Web Intents API failed to gain traction. I won’t go into the details here, but the takeaway is that Web Intents did not fail because it wasn’t an important problem that needed to be solved, but rather because we needed more data, implementations, and use cases around the problem before we could make progress. We needed to identify the proper browser primitives before we could make progress. Hubris also played a part in its downfall.

Fundamentally, we didn’t have a general pattern or the right composable components identified when Web Intents failed in 2015, but we do now with the advent of Web Payments and Verifiable Credentials.

Two Years of the Web Payments WG

For those of you that have seen the Payment Request API, you may be wondering what happened to composability over the first two years of the Working Groups existence. Some of us did try to solve the Web Payments use cases using simpler, more composable primitives. There is a write up on why convincing the Web Payments WG to design modular components for e-commerce failed. We had originally wanted at least two layers; one layer to request payment (and that’s all it did), and another layer to provide Checkout API functionality (such as billing and shipping address collection). These two layers could be composed together, but in hindsight, even that was probably not the right level of abstraction. In the end, it didn’t matter as it became clear that the browser manufacturers wanted to execute upon a fairly monolithic design. Fast forward to today and that’s what we have.

Payment Request: A Monolithic API

When we had tried to convince the browser vendors that they were choosing a problematic approach, we were concerned that Payment Request would become a monolithic API. We were concerned that the API would bundle too many responsibilities into the same API in such a highly specialized way such that it couldn’t be reused in other parts of the Web Platform.

Our argument was that if the API design did not create core reusable primitives, it would not allow for it to be slotted in easily amongst other Web Platform features, maintaining a separation of concerns. It would instead create a barrier between the space where Web developers could compose primitives in their own creative ways and a new space where they must ask browser vendors for improvements because of a lack of control and extensibility via existing Web Platform features. We were therefore concerned that an increasing number of requests would be made to add functionality into the API where said functionality already exists or could exist in a core primitive elsewhere in the Web Platform.

Now that we have implemented the Payment Request API and pass 702 out of 832 tests, we truly understand what it takes to implement and program to the specification. We are also convinced that some of our concerns about the API have been realized. Payment Request is a monolithic API that confuses responsibilities and is so specialized that it can only ever be used for a very narrow set of use cases. To be clear, this doesn’t mean that Payment Request isn’t useful. It still is a sizeable step forward for the Web, even if it isn’t very composable.

This lack of composability will most likely harm its long term adoption and it will eventually be replaced by something more composable, just like AppCache was replaced by Service Workers, and how XMLHttpRequest is being replaced by Fetch. While developers love these new features, browsers will forever have the dead code of AppCache and XMLHttpRequest rotting in their code bases.

Ignoring Web Handlers at our Peril

We now know that there is a general pattern emerging among the Payments, Verifiable Credentials, and Social Web work:

  1. Relying Party: Request Information
  2. Browser: Select Web Handler
  3. Web Handler: Select and Deliver Information to Relying Party

We know, through implementation work we described above, that the code and data formats look very similar. We also know that there are other W3C Working Groups that are grappling with these cross-origin user-centric data sharing use cases.

If each of these Working Groups does what the Payment Request API does, we’ll expend three times the effort to create highly specific APIs that are only useful for the narrow set of use cases each Working Group has decided to work on. Compare this to expending far less effort to create the Web Handler API, with appropriate extension points, which would be able to address many more use cases than just Payments.

Components for Web Handlers

There are really only four composable components that we would have to create to solve the generalized Web Handler problem:

  1. Permissions that a user can grant to a website to let them manage information and perform actions for the user (payments, verifiable credentials, friends, media, etc.).
  2. A set of APIs for the Web Handler to register contextual hints that will be displayed by the browser when performing Web Handler selection.
  3. A set of APIs for Relying Parties to use when requesting information from the user.
  4. A task-specific or contextual window the Web Handler can open to present a user interface if necessary.

The W3C Process makes it difficult for Working Groups chartered to work on a more specific problem, like the Web Payments WG, to work at this level of abstraction. However, there is hope as Service Workers and Fetch do exist. Other Working Groups at W3C have successfully created composable APIs for the Web and the Web Payments work should not be an exception to the rule.


It should be illuminating that both the Web Payments API and the Credential Handler API were able to achieve 85% browser compatibility for 3.3 billion people without needing any new features from the browser. So, why are we spending so much time creating specifications for native code in the browser for something that doesn’t need a lot of native code in the browser?

The polyfill implementations reuse existing primitives like Service Worker, iframes, and postMessage. It is true that some parts of the security model and experience, such as the UI that a person uses to select the Web Handler, registration, and permission management would be best handled by native browser code, but the majority of the other functionality does not need native browser code. We were able to achieve a complete implementation of Payment Request and Payment Handler because there were existing composable APIs in the Web Platform that had nothing to do Web Payments, and that’s pretty neat.

When Web Intents failed, the pendulum swung far too aggressively in the other direction of highly specialized and focused APIs for the Web Platform. Overspecialization fundamentally harms innovation in the Web Platform as it creates unnecessarily restrictive environments for Web Developers and causes duplication of effort. For example, due to the design of the Payment Request API, merchants unnecessarily lose a significant amount of control over their checkout process. This is the danger of this new overspecialized API focus at W3C. It’s more work for a less flexible Web Platform.

The right thing to do for the Web Platform is to acknowledge this Web Handler pattern and build an API that fits the pattern, not merely charge ahead with what we have in Payment Request. However, one should be under no illusion that the Web Payments WG will drastically change its course as that would kick off an existential crisis in the group. If we’ve learned anything about W3C Working Groups over the past decade, it is that the larger they are, the less introspective and less likely they are to question their existence.

Whatever path the Web Payments Working Group chooses, the Web will get a neat new set of features around Payments, and that has exciting ramifications for the future of the Web Platform. Let’s just hope that future work can be reconfigured on top of lower-level primitives so that this trend of overspecialized APIs doesn’t continue, as that would have dire consequences for the future of the Web Platform.

Rebalancing How the Web is Built

Summary: The World Wide Web Consortium members standardize technology for the next generation Web. It’s arguable that the way this process works does not provide a clear path to progress work and favors large organizations. This blog post explains why we ended up here and how we could make the process more fair and predictable.

Since the early 1990s, the World Wide Web Consortium (W3C) has been the organization where most of the next generation Web has been standardized. It is where many of the important decisions around HTML, CSS, and Browser APIs are made. It is also where many emerging non-browser related technologies like Linked Data, Blockchain, Automotive Web, and Web of Things are incubated.

The consortium has always tried to balance the needs of large corporations with the needs of the general public to varying degrees of success. More recently, a number of the smaller organizations at W3C have noted how arduous the process of starting new work at W3C has become while the behavior of large organizations continues to cause heartburn. As a result, some of us are concerned that the process is going to further tilt the playing field toward large organizations and unnecessarily slow the rate of innovation at the heart of the Web.

In this blog post I’m going to try to explain how creating new technology for the core of the Web works. I’ll also suggest some changes that W3C could make to make the process more predictable and hopefully more fair to all participants.

From There to Here

For most of its existence, the W3C has had a process for developing next generation technology for the Web Platform that goes something like this:

  1. Wait for a group of people to identify a problem in the Web Platform and propose how adding specific new features would address the problem.
  2. Create a Working Group to standardize the features in the Web Platform that would solve the identified problem.
  3. Release an “official” technical specification under a patent and royalty-free license that is used by Web developers to varying degrees of success.

It takes anywhere from 3-6 years for a technology to get through the W3C process. The formation and operation of a Working Group tends to be where the costs are fairly high in terms of W3C staff resources, which means the cost of failure is also relatively high for the organization as a whole. This has led to changes to the process that have raised the bar, in a negative way, on what is necessary to start a Working Group. The playing field has been tilted to the point that some of us are concerned that a handful of large organizations now have a tremendous amount of influence on where the Web is going while large coalitions of smaller organizations or the general public struggle to have their concerns addressed in the Web Platform.

Recently, W3C has undergone a reorganization to make it more responsive to the needs of the organizations and people that use the Web. The reorganization splits the organization into multiple functional groups: Strategy, Project Management, Architecture and Technology, Global Participation, Member Satisfaction, Business Development, etc. While the reorganization feels like a move in the right direction, it doesn’t seem to address how difficult it is to start and shepherd work through the W3C. Let’s analyze why that is.

Why Shepherding Work Through the W3C is Onerous

It is claimed that one of the primary bottlenecks at W3C when it comes to starting a new Working Group is allocating W3C staff member time to the project. The argument goes that there are just not enough W3C staffers for the workload of the organization. This is because W3C exists on a thin gruel of funding from membership fees (as well as other disparate sources) and hiring W3C staff to help support new work is financially burdensome. We have limited resources, so we must only pursue work that has the very highest likelihood of success. This means that parts of the Web Platform ecosystem develop far more slowly than they should.

This dynamic has prompted a number of the larger organizations at W3C to push back on new charters citing this bottleneck. This is a misguided strategy. We shouldn’t be focused on eliminating as much risk as possible, we should be focused on reducing the cost of making mistakes.

As a result of the focus on eliminating as much risk of failure as possible, some of these large organizations have started requesting new requirements for starting work. Some examples include requirements like complete technical specifications, “significant” deployments to customers, and large vendor support. These new requirements are intended to increase the likelihood of success and slow the rate of new work at W3C. While slowing the rate of new work may address the staffing issue, it also creates a situation where the W3C can’t keep up with the required amount of standardization needed for the Web Platform to stay relevant. It also creates a bias in favor of large organizations.

We have seen these new requirements ignored when it suits a large organization to do so (e.g. Web Payments and Web Application Security). If a large organization can see how their company financially benefits from an added feature to the Web Platform they seem to have a much easier time getting work started at the W3C as the “starting new work” requirements are not applied to them in the same way as they are to the smaller organizations at W3C.

While I won’t go into all of them here, there are more double standards, no pun intended, at play when transitioning work to a Working Group at W3C. This dynamic results in significant frustration from the smaller organizations at W3C when the goal posts for starting new work keep changing based on the current desires of the larger organizations.

Regardless of whether or not this is a staffing issue, a technology maturity issue, or one of the other new requirements at play at W3C, one thing is certain. The list of requirements for transitioning work from an experimental technology to a Working Group at W3C is not clear and seems to favor larger organizations. It is a source of extreme frustration for those of us that are trying to help build the next generation Web and are NOT working for a multi-billion dollar multinational corporation.

The W3C Working Group Formation Checklist

One remedy for the aforementioned problems is to employ a simple but detailed checklist. This is one of the tools that is currently missing for W3C members.

Some have argued that this lack of a clear checklist is by design. Some argue that the formation of every new Working Group is unique and requires discussion and debate. Having participated in starting multiple groups at W3C over the past decade, I disagree. The details differ, but the general topics that are debated remain fairly consistent. The data that you need to support the creation of a Working Group tend to be the same.

Here is the checklist that I’ve put together over the years after much trial and error at W3C. The purpose of the checklist is to gather data so that your community can prove that it has done its due diligence to the W3C membership when requesting the creation of a Working Group.

  1. Clearly identify and articulate a problem statement. Do this by sending out a survey to all organizations that you believe may benefit from a standard. You will need at least 35 organizations to become actively involved. I typically end up having to contact close to 100 organizations to get this core group formed. Example: Digital Offers Problem Statement Survey.
  2. Create a W3C Community Group around a broad version of the problem statement and invite organizations to join the group. Organize weekly calls to coordinate activity. Example: Credentials Community Group
  3. Gather use cases that these organizations want to support. Specifically, gather use cases that are currently impossible to achieve. This should be in the form of another survey where you ask each organization about their most important use case. You will eventually process this list down to 3-5 core use cases. Example: Verifiable Claims Use Cases Survey and Verifiable Claims Use Cases.
  4. Perform a gap analysis. Identify capabilities that are missing from the Web Platform and explain why those capabilities can address some of the use cases outlined in the previous step. Example: Verifiable Claims Ecosystem Capabilities Analysis and Gap Analysis.
  5. Produce a proposed architecture and technical specification that addresses part or all of the problem statement. If possible, get multiple implementations of the technical specification and deploy it at a few customer sites. Example: Verifiable Claims Architecture Document
  6. Create a draft charter for a Working Group that will take the technical specification through the W3C standardization process. The charter time frame should be for no more than 24 months: 6 months to spin up, 12 months to finalize the specs, 6 months to complete interoperability testing. Example: Verifiable Claims Working Group Draft Charter
  7. Create an executive summary for W3C member companies. Summarize the work that the group has done. At this point you have far more content than W3C member company representatives have the time to read to determine if they want to join the initiative. Ease their decision making process by providing a summary. Example: Verifiable Claims Executive Summary for W3C Members
  8. Measure buy-in for the proposed Working Group Charter and technical deliverables. The best way to do this is via another survey that should be distributed to any organization that participated in step #1, any organization that has joined the work since then, and any W3C member organization that may be impacted by the work. Example: Demonstration of Support for Verifiable Claims Working Group Charter Survey

I don’t expect much in the list above to be controversial. There are templates that the W3C membership should create for the surveys and reports above. Having this W3C Working Group Formation Checklist available will help organizations navigate the often confusing W3C Process.

Reducing Reliance on W3C Staff

The checklist above makes the process of going from an experimental technology to a W3C Working Group more clear. The checklist, however, does not alleviate the W3C staffing shortage. In fact, if the way W3C staff is utilized does not also change, it might make the current situation worse.

The W3C staff play a very important role in that they help organizations navigate how to get things done via the W3C Process. They help build consensus before and during a Working Group activity at W3C. This can be a double-edged sword. If there is a W3C staff member available to help, they can be a fantastic champion for a group’s work. If there isn’t, and you’ve never progressed work yourself through W3C, you will find yourself in the unfavorable position of not knowing how to proceed. Many of the staff members also have their own way of navigating the W3C process and many of them have to repeat themselves when new work is started. In short, there is a lot of engineering churn and tribal knowledge at W3C; this is true of most standardization bodies.

The unfortunate truth is that building out the Web platform is currently being restricted due to the way that we start and progress new work at W3C. The W3C membership relies on the W3C staff to its detriment. The W3C staff are incredibly helpful, but there aren’t enough of them to support building out all of the new functionality needed for the Web, and this is ultimately bad for the Web.

In order to address the staffing shortage, it is proposed that we shift much of their work from executing upon the proposed checklist, which is more or less what they do today, to verifying that the checklists are being processed correctly. This is effectively a quality control check on the work that groups are doing as they work their way through the proposed checklist. This offloads a significant chunk of W3C staff time to the groups that want to see their work get traction while giving those groups clear goals to achieve.

Blocking Work at W3C

W3C members currently have the ability to stop work that has ticked all of the necessary boxes in the checklist mentioned above. At present, large organizations tend to not become involved with work until a vote for the Working Group is circulated by W3C staff. Some members then suggest that they may vote against the work if their viewpoint isn’t taken into account even though they have not participated in the work to date. Some call this a part of the process, but it is uselessly frustrating to those that have spent years building consensus around a Working Group proposal only to have a large W3C member company respond with a “Why wasn’t I consulted” retort.

So, now for the controversial bit:

A W3C Community Group’s work should automatically transition to a W3C Working Group if a significant coalition of companies, say around 35 of them, have ticked all of the boxes in a predefined W3C Working Group Formation checklist.

This checklist should include the creation of two interoperable implementations and a test suite. In other words, it should meet all of the minimum bars for a Working Group’s success per the W3C Process. Making this change achieves the following:

  1. It clearly defines how experimental technology progresses to being part of the Web platform so that organizations can allocate resources appropriately.
  2. It reduces the workload on W3C staff members, which is a very limited resource.
  3. It prevents large organizations from blocking work at W3C while addressing their concerns around technological maturity.

In the end, the checklist for starting work at W3C could look something like this:

  1. Clearly identify and articulate a problem statement.
  2. Create a W3C Community Group.
  3. Gather and document use cases.
  4. Perform a gap analysis.
  5. Produce a proposed architecture and technical specification.
  6. Produce two implementations and a test suite.
  7. Create a draft charter for a Working Group.
  8. Create an executive summary for W3C member companies.
  9. Survey organizations and demonstrate that at least 35 of them are supportive of the work and technical direction.

Each step above would have document templates that W3C members can use as a starting point. None of the steps above require W3C staff resources and if all steps are completed, the formation of a Working Group is the natural outcome and should not be blocked by the membership.

This checklist approach may be seen as too constraining for some, and that’s why it’s voluntary. Some organizations may feel that they do not need to produce all of the work above to get a Working Group and for those organizations, they can choose to ignore the checklist. Initiatives not using the checklist should expect push back if they choose to not answer some of the questions that the checklist covers.

Rebalancing How the Web is Built

The current process for developing next generation standards for the Web is too unpredictable and too constrained by limited W3C resources. There are groups of small organizations that want to help create these next generation standards; we have to empower those groups with a clear path to standardization. We must not allow organizations to block work if the champions of the work have met the requirements in the checklist. Doing this will free up W3C staffing resources to ensure that the Web Platform is advancing at a natural and rapid rate.

A W3C Working Group formation checklist would help make the process more predictable, require less W3C staff time to execute, and provide a smoother path from the inception of an idea, to implementation, to standardization of a technology for the Web platform. This would be great for the Web.

Advancing the Web Payments HTTP API and Core Messages

Summary: This blog post strongly recommends that the Web Payments HTTP API and Core Messages work be allowed to proceed at W3C.

For the past six years, the W3C has had a Web Payments initiative in one form or another. First came the Web Payments Community Group (2010), then the Web Payments Interest Group (2014), and now the Web Payments Working Group (2015). The title of each of those groups share two very important words: “Web” and “Payments”.

“Payments” are a big and complex landscape and there have been international standards in this space for a very long time. These standards are used over a variety of channels, protocols, and networks. ISO-8583 (credit cards), ISO-20022 (inter-bank messages), ISO-13616 (international bank account numbers), the list is long and it has taken decades to get this work to where it is today. We should take these messaging standards into account while doing our work.

The “Web” is a big and complex landscape as well. The Web has its own set of standards; HTML, HTTP, URL, the list is equally long with many years of effort to get this work to where it is today. Like payments, there are also many sorts of devices on the Web that access the network in many different ways. People are most familiar with the Web browser as a way to access the Web, but tend to be unaware that many other systems such as banks, business-to-business commerce systems, phones, televisions, and now increasingly appliances, cars, and home utility meters also use the Web to provide basic functionality. The protocol that these systems use is often HTTP (outside of the Web browser) and those systems also need to initiate payments.

It seems as if the Web Payments Working Group is poised to delay the Core Messaging and HTTP API. This is important work that the group is chartered to deliver. The remainder of this blog post elaborates on why delaying this work is not in the best interest of the Web.

Why a Web Payments HTTP API is Important

At least 33% of all payments[1][2], like subscriptions and automatic bill payment, are non-interactive. The Web Payments Working Group has chosen to deprioritize those use cases for the past 10 months. The Web Payments Working Group charter expires in 14 months. We’re almost halfway down the road with no First Public Working Draft of an HTTP API or Web Payments Core Messages, and given the current rate of progress and way we’re operating (working on specifications in a serial manner instead of in parallel), there is a real danger that the charter will expire before we get the HTTP API out there.

The Case Against the Web Payments HTTP API and Core Messages

Some in the group have warned against publication of the Web Payments HTTP API and Core Messages specification on the following grounds:

  1. The Web Payments HTTP API and Core Messages are a low priority.
  2. There is a lack of clear support.
  3. There is low interest from implementers in the group.
  4. The use cases are not yet well understood.
  5. It’s too soon to conclude that payment messages will share common parts between the Browser API and the HTTP API.

While some of the arguments seem reasonable on the surface, deconstructing them shows that only one side of the story is being told. Let’s analyze each argument to see where we end up.

The Web Payments HTTP API and Core Messages are a low priority.

The group decided that the Web Payments HTTP API and Core Messages specs were a relatively lower priority than the Browser API and the Payment Apps API until June 2016. There was consensus around this in the group and our company agreed with that consensus. What we did not agree to is that the HTTP API and Core Messages are a low priority in the sense that it’s work that we really don’t need to do. One of the deliverables in the charter of the Working Group is a Web Payments Messages Recommendation. The Charter also specifies that “request messages are passed to a server-side wallet, for example via HTTP, JavaScript, or some other approach”. Our company was involved in the writing of the charter of this group and we certainly intended the language of the charter to include HTTP, which it does.

So, while these specs may be lower priority, it’s work that we are chartered to do. This work was one of the reasons that our company joined the Web Payments Working Group. Delaying this work is making a number of us very concerned about what the end result is going to look like. The fact that there is no guiding architecture or design document for the group makes the situation even worse. The group is waiting for an architecture to emerge and that is troubling because we only have around 8 months left to figure this out and then 6 months to get implementations and testing sorted. One way to combat the uncertainty is to do work in parallel as it will help us uncover issues sooner than at the end, when it will be too late.

There is a lack of clear support.

In the list of concerns, it was noted that:

“Activity on the issue lists and the repositories for these deliverables has been limited to two organizations… This suggests that the Working Group as a whole is not engaged in this work.”

Previous iterations of the HTTP API and Core Messages specifications have been in development for more than 5 years with far more than two organizations collaborating on the documents. It is true that there has been a lack of engagement in the Web Payments Working Group, primarily because the work was deprioritized. That being said, there are only ever so many people that actively work on a given specification. We need to let people who are willing and able to work on these specifications to proceed in parallel with the other documents that the group is working on.

There is low interest from implementers in the group.

We were asked to not engage the group, which we didn’t, and still ended up with two implementations and another commitment to implement. Note that this is before First Public Working Draft. Commitments to implement are typically not requested until entering Candidate Recommendation, so that there is now a request for implementation commitments before a First Public Working Draft is strange and not a requirement per the W3C Process.

If the group is going to require implementations as a prerequisite for First Public Working Group publication, then these new requirements should apply equally to all specifications developed by the group. I personally think this requirement is onerous and sets a bad precedent as it raises the bar for starting work in the group so high that it’ll result in a number of good initiatives being halted before they have a chance to get a foothold in the group. For example, I expect that the SEPA Credit Transfer and crypto-currency payment methods will languish in the group as a result of this requirement.

The use cases are not yet well understood.

It has also been asserted that the basic use cases for the Web Payments HTTP API are not well understood. We have had a use cases document for quite a while now, which makes this assertion shaky. To restate what has been said before in the group, the generalized use case for the HTTP API is simple:

A piece of software (that is not a web browser) operating on behalf of a payer attempts to access a service on a website that requires payment. The payee software provides a payment request to the payer software. The payment request is processed and access is granted to the service.

Any system that may need to process payments in an automated fashion could leverage the HTTP API to do so. Remember, at least 33% of all payments are automated and perhaps many more could be automated if there were an international standard for doing so.

There are more use cases that would benefit from an HTTP API that were identified years ago and placed into the Web Payments Use Cases document: Point of Sale, Mobile, Freemium, Pre-auth, Trialware, In-Vehicle, Subscription, Invoices, Store Credit, Automatic Selection, Payer-initiated, and Electronic Receipts. Additional use cases from the W3C Automotive Working Group related to paying for parking, tolls, and gasoline have been proposed as well.

The use cases have been understood for quite some time.

It’s too soon to conclude that payment messages will share common parts between the Browser API and the HTTP API.

The work has already been done to determine if there are common parts and those that have done the work have discovered around 80% overlap between the Browser API messages and the HTTP API messages. Even if this were not the case, I had suggested that we could deal with the concerns in at least two ways:

  1. The first was to mark this concern as an issue in the specification before publication.
  2. The second was to relabel the “Core Messages” as “HTTP Core Messages” and change the label back if the group was able to reconcile the messages between the Browser API and the HTTP API.

These options were not surfaced to the group in the call for consensus, which is frustrating.

The long term effects of pushing off discussion of core messages, however, are more concerning. If we cannot find common messages, then the road we’re headed down is one where a developer will have to use different Web Payments messages if payment is initiated via the browser vs. non-browser software. In addition, this further confuses our relationship to ISO-20022 and ISO-8583 and will make Web developers lives far complex than necessary. We’re advocating for two ways of doing something when we should be striving for convergence.

The group is chartered to deliver a Web Payments Messages Recommendation; I suggest we do that. We are more than 40% of the way through our chartered timeline and we haven’t even started having this discussion yet. We need to get this document sorted as soon as possible.

The Problem With Our Priorities

The problem with our priorities is that we have placed the Browser API front-and-center in the group. The group did this for two reasons:

  1. A subset of the group wanted to improve the “checkout experience” and iterate quickly instead of focusing on initiating payments, which was more basic and easier to accomplish.
  2. A subset of the group was very concerned that the browser vendors would lose interest if their work was not done first.

I understand and sympathize with both of these realities, but as a result, the majority of other organizations in the group are now non-browser second-class citizens. This is not a new dynamic at W3C; it happens regularly, and as one of the smaller W3C member companies, it is thoroughly frustrating.

It would be more accurate to have named ourselves the Browser Payments Working Group because that is primarily what we’ve been working on since its inception and if we don’t correct course, that is all we will have time to do. This focus on the browser checkout experience and pushing things out quickly without much thought to the architecture of what we’re building does result in “product” getting out there faster. It is also short-sighted, results in technical debt, and makes it harder to reconcile things like the HTTP API and Core Messages after the Browser API is “fully baked”. We are supportive of browser specifications, but we are not supportive of only browser specifications.

This approach is causing the design of the Web Payments ecosystem to be influenced by the way things are done in browsers to a degree that is deeply concerning. Those of us in the group that are concerned with this direction have been asked to not “distract” the group by raising concerns related to the HTTP API and Core Messages specifications. “Payments” and the “Web” are much bigger than just browsers, it’s time that the group started acting accordingly.

Parallel and Non-Blocking is a Better Approach

The Web Payments Working Group has been working on specifications in a serial fashion since the inception of the group. The charter expires in 14 months and we typically need around 6 months to get implementations and testing done. That means we really only have 8 months left to wrap up the specs. We’re not going to get there by working on specs in a serial fashion.

We need to start working on these issues in parallel. We should stop blocking people that are motivated to work on specifications that are needed in order for the work to be successful. Other groups work in this fashion. For example, the Web Apps Sec group has over 13 specs that they’re working on, many of them in parallel. The same is true for the Web Apps working group, which has roughly 32 specs that it’s working on, often in parallel. These are extremes, but so is only focusing on one specification at a time in a Working Group.

We should start working in parallel. Let’s publish the HTTP API and HTTP Core Messages specifications as First Public Working Drafts and get on with it.

JSON-LD Best Practice: Context Caching

An important aspect of systems design is understanding the trade-offs you are making in your system. These trade-offs are influenced by a variety of factors: latency, safety, expressiveness, throughput, correctness, redundancy, etc. Systems, even ones that do effectively the same thing, prioritize their trade-offs differently. There is rarely “one perfect solution” to any problem.

It has been asserted that Unconstrained JSON-LD Performance Is Bad for API Specs. In that article, Dr. Chuck Severance asserts that the cost of JSON-LD parsing is 2,000 times more costly in real time and 70 times more costly in CPU time than pure JSON processing. Sounds bad, right? So, lets unpack these claims and see where the journey takes us.

TL;DR: Don’t ever put a system that uses JSON-LD into production without thinking about your JSON-LD context caching strategy. If you want to use JSON-LD, your strategy should probably either be: Cache common contexts and do JSON-LD processing or don’t do JSON-LD processing, but enforce an input format via something like JSON Schema on your clients so they’re still submitting valid JSON-LD.

The Performance Test Suite

After reading the article yesterday, Dave Longley (the co-inventor of JSON-LD and the creator of the jsonld.js and php-json-ld libraries) put together a test suite that effectively re-creates the test that Dr. Severance ran. It processes a JSON-LD Person object in a variety of ways. We chose to change the object because the one that Dr. Severance chose is not necessarily a common use of JSON-LD (due to the extensive use of CURIEs) and we wanted a more realistic example (that uses terms). The suite first tests pure JSON processing, then JSON-LD processing w/ a cached context, and then JSON-LD processing with an uncached context. We ran the tests using the largely unoptimized JSON-LD processors written in PHP and Node vs. a fully optimized JSON processor written in C. The tests used PHP 5.6, PHP 7.0, and node.js 4.2.

So, with Dr. Severance’s data in hand and our shiny new test suite, let’s do some science!

The Results

The raw output of the test suite is available, but we’ve transformed the data into a few pretty pictures below.

The first graph shows the wall time performance hit using basic JSON processing as the baseline (shown as 1x). It then compares that against JSON-LD processing using a cached context and JSON-LD processing using an uncached context. Wall time in this case means the time taken if you were to start a stop watch from the start of the test to the end of the test. Take a look at the longest bars in this graph:

As we can see, there is a significant performance hit any way you look at it. Wall time spent on processing JSON-LD with an uncached context in PHP 5 is 7,551 times slower than plain JSON processing! That’s terrible! Why would anyone choose to use JSON-LD with such a massive performance hit!

Even when you take out the time spent just sitting around, the CPU cost (for running the network code) is still pretty terrible. Take a look at the longest bars in this graph:

CPU processing time for JSON vs. JSON-LD with an uncached context in PHP 5 is 260x slower. For PHP 7 it’s 239x slower. For Node 4.2, it’s 140x slower. Sounds pretty dismal, right? JSON-LD is a performance hog… but hold up, let’s examine why this is happening.

JSON-LD adds meaning to your data. The way it does this is by associating your data with something called a JSON-LD context that has to be downloaded by the JSON-LD processor and applied to the JSON data. The context allows a system to formally apply a set of rules to data to determine whether a remote system and your local system are “speaking the same language”. It removes ambiguity from your data so that you know that when the remote system says “homepage” and your system says “homepage”, that they mean the same thing. Downloading things across a network is an expensive process, orders of magnitude slower than having something loaded in a CPUs cache and executed without ever having to leave home sweet silicon home.

So, what happens when you tell the program to go out to the network and fetch a document from the Internet for every iteration of a 1,000 cycle for loop? Your program takes forever to execute because it spends most of it’s time in network code and waiting for I/O from the remote site. So, this is lesson number 1. JSON-LD is not magic. Things that are slow (because of physics) are still slow in JSON-LD.

Best Practice: JSON-LD Context Caching

Accessing things across a network of any kind is expensive, which is why there are caches. There are primary, secondary, and tertiary caches in our CPUs, there are caches in our memory controllers, there are caches on our storage devices, there are caches on our network cards, there are caches in our routers, and yes, there are even caches in our JSON-LD processors. Use those caches because they provide a huge performance gain. Let’s look at that graph again, and see how much of a performance gain we get by using the caches (look at the second longest bars in both graphs):

CPU processing time for JSON vs. JSON-LD with a cached context in PHP 5 is 67x slower. For PHP 7 it’s 35x slower. For Node 4.2, it’s 18x slower. To pick the worst case, 67x slower (using a cached JSON-LD Context) is way better than 7,551x slower. That said, 67x slower still sounds really scary. So, let’s dig a bit deeper and put some processing time numbers (in milliseconds) behind these figures:

These numbers are less scary. In the common worst case, where we’re using a cached context in the slowest programming language tested, it will take 2ms to do JSON-LD processing per CPU core. If you have 8 cores, you can process 8 JSON-LD API requests in 2ms. It’s true that 2ms is an order of magnitude slower than just pure JSON processing, but the question is: is it worth it to you? Is gaining all of the benefits of using JSON-LD for your industry and your application worth 2ms per request?

If the answer is no, and you really need to shave off 2ms from your API response times, but you still want to use JSON-LD – don’t do JSON-LD processing. You can always delay processing until later by just ensuring that your client is delivering valid JSON-LD; all you need to do that is apply a JSON Schema to the incoming data. This effectively pushes JSON-LD processing off to the API client, which has 2ms to spare. If you’re building any sort of serious API, you’re going to be validating incoming data anyway and you can’t get around that JSON Schema processing cost.

I’ve never had a discussion with someone where 2 milliseconds was the deal breaker between JSON-LD processing and not doing it. There are many things in software systems that eat up more than 2 milliseconds, but JSON-LD still gives you the choice of doing the processing at the server, pushing that responsibility off to the client, or a number of other approaches that provide different trade-offs.

But Dr. Severance said…

There are a few parting thoughts in Unconstrained JSON-LD Performance Is Bad for API Specs that I’d be remiss in not addressing.

JSON-LD evangelists will talk about “caching” – this of course is an irrelevant argument because virtually all of the shared hosting PHP servers do not allow caching so at least in PHP the “caching fixes this” is a useless argument. Any normal PHP application in real production environments will be forced to re-retrieve and re-parse the context documents on every request / response cycle.

phpFastCache exists, use it. If for some reason you can’t, and I know of no reason you couldn’t, cache the context by writing it to disk and retrieving it from disk. Most modern operating systems will optimize this down to a very fast read from memory. If you can’t write to disk in your shared PHP hosting environment, switch to a provider that allows it (which are most of them).

even with cached pre-parsed [ed: JSON-LD Context] documents the additional order of magnitude is due to the need to loop through the structures over and over, to detect many levels of *potential* indirection between prefixes, contexts, and possible aliases for prefixes or aliases.

That is not how JSON-LD processing works. Rather than go into the details, here’s a link to the JSON-LD processing algorithms.

json_decode is written in C in PHP and jsonld_compact is written in PHP and if jsonld_compact were written in C and merged into the PHP core and all of the hosting providers around the world upgraded to PHP 12.0 – it means that perhaps the negative performance impact of JSON-LD would be somewhat lessened “when pigs fly”.

You can do JSON-LD processing in 2ms in PHP 5, 0.7ms in PHP 7, and 1ms in Node 4. You don’t need a C implementation unless you need to shave those times off of your API calls.

If the JSON-LD community actually wants its work to be used outside the “Semantic Web” backwaters – or in situations where hipsters make all the decisions and never run their code into production, the JSON-LD community should stand up and publish a best practice to use JSON-LD in a way that maintains compatibility with JSON – so that APIs and be interoperable and performant in all programming languages. This document should be titled “High Performance JSON-LD” and be featured front and center when talking about JSON-LD as a way to define APIs.

I agree that we need to write more about high performance JSON-LD API design, because we have identified a few things that seem best-practice-y. The problem is that we’ve been too busy drinking our “tripel” mocha lattes and riding our fixies to the latest butchershop-by-day-faux-vapor-bar-by-night flashmob experiences to get around to it. I mean, we are hipsters after all. Play-play balance is important to us and writing best practices sounds like a real drag. 🙂

In all seriousness, JSON-LD is now published by several million sites. We know that people want some best practices from the JSON-LD community. Count this blog post as one of them. If you use JSON-LD, and that includes the developers that created those several million sites that publish JSON-LD, you are a part of the JSON-LD community. We created JSON-LD to help our fellow developers. If you think you have settled on a best practice with JSON-LD, don’t wait for someone else to write about it; please pay it forward and blog about it. Your community of fellow JSON-LD developers will thank you.

The Web Browser API Incubation Anti-Pattern

This blog post is about a number of hard lessons learned at the World Wide Web Consortium (W3C) while building new features for the Web with browser vendors and how political and frustrating the process can be.

The W3C Web Payments Community Group (a 220 person pre-standardization effort around payments on the Web) has been in operation now for a bit more than four years. This effort was instrumental in starting conversations around the payments initiative at W3C and has, over the years, incubated many ideas and pre-standards specifications around Web Payments.

When the Web Payments Working Group (a different, official standardization group at W3C) was formed, a decision was made to incubate specifications from two Community Groups and then merge them before the Web Payments Browser API First Public Working Draft was released in April 2016.

The first community group was composed of roughly 220 people from the general public (the Web Payments Community Group). The second community group was composed of a handful of employees from browser vendor companies (fewer than 7 active). When it came time to merge these two specifications, the browser vendors dug in their heels making no compromises and the Web Payments Community Group had no choice but to sacrifice their own specifications in the hope of making progress based on the Microsoft/Google specifications.

It is currently unclear how much the Web Payments Community Group or the Web Payments Working Group will be able to sway the browser vendors on the Web Payments Browser API specification. I do believe that there is still room for the Web Payments Community Group to influence the work, but we face a grueling uphill slog.

The rest of this post analyzes a particular W3C standards development anti-pattern that we discovered over the last few years and attempts to advise future groups at W3C so that they may avoid the trap in which we became ensnared.

A Brief History of Web Payments in and Around W3C

Full disclosure: I’m one of the founders and chairman of several W3C Community Groups (JSON-LD, Permanent Identifiers, Credentials, and Web Payments). A great deal of my company’s time, effort, and money (to the tune of hundreds of thousands of dollars) has gone into these groups in an effort to help them create the next generation Web. I am biased, so take this post with a grain of salt. That said, my bias is a well-informed one.

For a while now, the W3C Community Group process seemed to be working as designed. It took a great deal of effort to get a new activity started at W3C, but a small group of very motivated “outsiders” were able to kick-start the most supported initiative in W3C history. Here’s a brief history of payments in and around W3C:

  • 1998-2001 – The W3C launches a Micropayments activity that ultimately fails.
  • 2010 May – The PaySwarm open payments community launches outside of the W3C with a handful of people, a few specifications, and the goal of standardizing payments, offers of sale, digital receipts, and automated digital contracts on the Web.
  • 2012 May – W3C staff contact, Doug Schepers, convinces a small handful of us to create the Web Payments Community Group at W3C and bring the work into W3C. In parallel, a W3C Headlights process identifies payments as an area to keep an eye on.
  • 2012 – 2014 – Web Payments Community Group expands to 100 people with several experimental specifications under active development. Web Payments Community Group members engage the United Nation’s Internet Governance Forum, banking/finance industry, and technology industry to gather large organization support.
  • 2013 October – W3C TPAC in Shenzen, China – W3C Management is convinced by the Web Payments Community Group and W3C staff contact Dave Raggett to hold a workshop for Web Payments.
  • 2014 March – W3C Workshop for Web Payments is held in Paris, France. The result is a proposal to start official work at the W3C.
  • 2014 October – First official Web Payments Activity, the Web Payments Interest Group, is launched to create a vision and strategy for Web Payments. It does so, along with the creation of a Web Payments Working Group charter. A great deal of the Web Payment Community Group’s input is taken into account.
  • 2015 September – Second official Web Payments activity, the Web Payments Working Group, is launched to execute on the Web Payments Interest Group’s vision.

So far, so good. While it took twice as long to start a Working Group as we had hoped, we had far greater support going in than we imagined when we embarked on our journey in 2010. The Web Payments Interest Group was being receptive to input from the Web Payments Community Group. The Working Group was launched and that’s when the first set of issues between the Web Payments Community Group, Web Payments Interest Group, and the Web Payments Working Group started.
The Web Browser API Incubation Anti-Pattern

  • 2015 October – A decision is made to incubate the Web Payments Working Group input specifications in separate Community Groups.

When W3C Working Groups are created, they are expected to produce technical specifications that expand the capabilities of the Web. Often input is taken from Community Group documents, existing specifications and technology created by corporate organizations, research organizations, and workshops. This input is then merged into one or more Working Group specifications under the control of the Working Group.

That is how it is supposed to work, anyway. In reality, there are many shenanigans that are employed by some of the largest organizations at W3C that get in the way of how things are supposed to work.

Here is one such story:

The Web Payments Community Group has a body of specification work going back 4+ years. When the Web Payments Working Group started, Microsoft and Google did not have a Web Payments specification. As a result, they put something together in a couple of weeks and incubated it at the Web Incubator Community Group. In an attempt to “play on the same field”, some of the editors of the Web Payments Community Group specifications moved that community’s specifications to the Web Incubator Community Group and incubated them there as well.

The explanation that the Microsoft and Google representatives used was that they were just figuring this stuff out and wanted a bit of time to incubate their ideas before merging them into the Web Payments Working Group. At the time I remember it sounding like a reasonable plan and was curious about the Web Incubator Community Group process; it couldn’t hurt to incubate the Web Payments Community Group specifications alongside the Microsoft/Google specification. I did raise the point that I thought we should merge as quickly as possible and the idea that we’d have “competing specifications” where one winner would be picked at the end was a very undesirable potential outcome.

The W3C staff contacts, chairs, and Microsoft/Google assured the group that we wouldn’t be picking a winner and would merge the best ideas from each set of specifications in time. I think everyone truly believed that would happen at the time.


  • 2016 February – The months old Microsoft/Google specification is picked as the winner over the years old work that went into the Web Payments Community Group specification. Zero features from the Web Payments Community Group specification are merged with a suggestion to perform pull requests if the Web Payments Community Group would like modifications made to the Microsoft/Google specification.

Four Months of Warning Signs

The thing the Web Payments Working Group did not want to happen, the selection of one specification with absolutely zero content being merged in from the other specification, ended up happening. Let’s rewind a bit and analyze how this happened.

  • 2015 October – The Web Payments Working Group has its first face-to-face meeting at W3C TPAC in Sapporo, Japan.

Warning Sign #1: Lack of Good-Faith Review

At the start of the Working Group, two sets of specification editors agreed to put together proposals for the Web Payments Working Group. The first set of specification editors consisted of members of the Web Payments Community Group. The second set of specification editors consisted of employees from Microsoft and Google. A mutual agreement was made for both sets of specification editors to review each other’s specifications and raise issues on them in the hope of eventually hitting a point where a merge was the logical next step.

The Web Payments Community Group‘s editors raised 24+ issues on the Microsoft/Google proposal and compared/contrasted the specifications in order to highlight the differences between both specifications. This took an enormous amount of effort and the work continued for weeks. The Microsoft/Google editors raised zero issues on the counter-proposals. To this day, it’s unclear if they actually read through the counter proposals in depth. No issues were raised at all by the Microsoft/Google editors on the Web Payments Community Group’s specifications.

Warning Sign #2: Same Problem, Two Factions

When the Web Payments Working Group agreed to work on their first specification, noting that there were going to be at least two proposals, the idea was raised by the Microsoft/Google representatives that they wanted to work on their specification in the Web Incubator Community Group. Even though the Web Payments Community Group editors chose to incubate their community’s specifications in the Web Incubator Community Group, the inevitable structure of that work forced the editor’s to split into two factions to make rapid progress on each set of specifications. This lead to a breakdown in the lines of communication between the editors with Microsoft and Google collaborating closely on their specification and the Web Payments Community Group editors focusing on trying to keep their specifications in sync with the Microsoft/Google specification without significant engagement from the Microsoft/Google editors.

This inevitably led to closed discussion, misunderstanding motives, backchannel discussions, and a variety of other things that are harmful to the standardization process. In hindsight, this was trouble waiting to happen. This sort of system pits the editors and the group against itself and the most likely outcome is a Web Payments Community Group vs. Microsoft/Google gambit. Systems often force a particular outcome and this system forces harmful competition and a winner take all outcome.

Warning Sign #3: One-Sided Compromise

A few weeks before the Web Payments Working Group face-to-face meeting, the editors of the Web Payments Community Group specifications discovered (through a backchannel discussion) that “the browser vendors are never going to adopt JSON-LD and the Web Payments Community Group specification will never gain traction as a result”. The Web Payments Community Group specifications used JSON-LD to meet the extensibility needs of Web Payments. This desire to not use JSON-LD by the browser vendors was never communicated through any public channel nor was it directly communicated by the browser vendors privately. This is particularly bad form in an open standardization initiative as it leaves the editors to guess at what sort of proposal would be attractive to the browser vendors.

Once again, the Web Payments Community Group editors (in an attempt to propose a workable specification to the browser vendors) reworked the Web Payments Community Group specifications to compromise and remove JSON-LD as the message extensibility mechanism, ensuring it was purely optional and that no browser would have to implement any part of it. We did this not because we thought it was the best solution for the Web, but in an effort to gain some traction with the browser vendors. This modification was met with no substantive response from the browser vendors.

Warning Sign #4: Limited Browser Vendor Cycles

Two weeks before the Web Payments Working Group face-to-face meeting, the Web Payments Community Group editors personally reached out to the Microsoft/Google editors to see why more collaboration wasn’t taking place. Both editors from Microsoft/Google noted that they were just trying to come up to speed, incubate their specifications, and make sure their proposal was solid before the face-to-face meeting. It became clear that they too could be stressed for time and did not have the bandwidth to review or participate in the counter-proposal discussions.

The Web Payments Community Group specification editors, in a further attempt to compromise, forked the Microsoft/Google specification for the purposes of demonstrating how some aspects of the Web Payments Community Group design could be achieved with the browser vendor’s specification and prepared a presentation to outline four compromises that could be made to get to a merged specification. The proposal received no substantive response or questions from the browser vendors.

Warning Sign #5: Insufficient Notice

Google put forward a registration proposal 12 hours before the Web Payments face-to-face meeting and announced it as a proposal during the face-to-face. This has happened again recently with the Microsoft/Google proposed Web Payments Method Identifiers specification. Instead of adding an issue marker to the specification to add features, like was being done for other pull requests, the editors unilaterally decided to merge one particular pull request that took the specification from three paragraphs to many pages overnight. This happened the day before a discussion about taking that specification to First Public Working Draft.

Clearly, less than 24 hours isn’t enough time to do a proper review before making a decision and if the Web Payments Community Group editors had attempted to do something of that nature it would have been heavily frowned upon.

Things Finally Fall Apart

  • 2016 February – The Web Payments Working Group Face-to-Face Meeting in San Francisco

The Web Payments Community Group specification editors went into the meeting believing that there would be a number of hard compromises made during the meeting, but a single merged specification would come out as a result of the process. There were presentations by Microsoft/Google and by the Web Payments Community Group before the decision making process of what to merge started.

For most of the afternoon, discussion went back and forth with the browser vendors vaguely pushing back on suggestions made by the Web Payments Community Group specifications. Most of the arguments were kept high-level instead of engaging directly with specification text. Many of the newer Web Payments Working Group participants were either unprepared for the discussion or did not know how to engage in the discussion.

Further, solid research data seemed to have no effect on the position of the browser vendors. When research data covering multiple large studies on shopping cart abandonment was presented to demonstrate how the Microsoft/Google solution could actually make things like shopping cart abandonment worse, the response was that “the data doesn’t apply to us, because we (the browser vendors) haven’t tried to solve this particular problem”.

After several hours of frustrating discussion, one of the Chairs posed an interesting thought experiment. He asked that if the group flipped a coin and picked a specification, if there were any Web Payments Working Group members that could not live with the outcome. The lead editor of the Microsoft/Google specification said he could not support the outcome because he could not understand the Web Payments Community Group’s proposals. Keep in mind that one of these proposals was a fork of the Microsoft/Google proposal with roughly four design changes.

At that point, it became clear to me that all of our attempts to engage the browser vendors had been entirely unsuccessful. In the best case, they didn’t have the time to review our specifications in detail and thus didn’t have a position on them other than they wanted to use what was familiar, which was their proposal. Even if it was purely an issue of time, they made no effort to indicate they saw value in reviewing our specifications. In the worst case, they wanted to ensure that they had control over the specification and the Community Group’s technical design didn’t matter as much as them being put in charge of the specifications. The Working Group was scheduled to release a First Public Working Draft of the Web Payments Browser API a few weeks after the meeting, so there was a strong time pressure for the group to make a decision.

In the end, there were only two outcomes that I could see. The first was a weeks-to-months-long protracted fight which had a very high chance of dividing the group further and greatly reducing the momentum we had built. The second was to kill the Web Payments Community Group specifications, give the browser vendors what they wanted (control), and attempt to modify what we saw as the worst parts of the Microsoft/Google proposals to something workable to organizations that aren’t the browser vendors.

The Key Things That Went Wrong

After having some time to reflect on what happened during the meeting, here are the most prominent things that went wrong along the way:

  • Prior to the start of the Working Group, we tried, multiple times, to get the browser vendors involved in the Web Payments Community Group. Every attempt failed due to their concerns over Intellectual Property and patents. Only the creation of the Working Group was able to bring about significant browser vendor involvement.
  • The Microsoft/Google specification editors were not engaged in the Web Payments Interest Group and thus had very little context on the problems that we are trying to solve. This group discusses the future of payments on the Web at a much higher-level and provides the context for creating Working Groups to address particular problems. Without that context, the overarching goals may be lost.
  • The Microsoft/Google specification editors were not engaged with the Community Group specifications. It was a mistake to never require them to at least review the Community Group specifications and raise issues on them. They were asked to review them and they agreed to, but the Working Group should have put a stop on other work until those reviews were completed. At a minimum, this may have caused some of the ideas to be incorporated into the Microsoft/Google specification once work resumed.
  • The Working Group decided to incubate specifications outside of the Working Group, which meant that they had no control over how those specifications were incubated or the timeline around them. In hindsight, it is clear now that this was not just an experiment, but a dereliction of duty.
  • The Working Group decided to incubate technical specifications between two groups that are grossly unmatched in power: the Web Payments Community Group and the browser vendors. Imagine if we were to throw members of the middle class into the same room as the largest financial companies in the world and ask them to figure out how to extend capital to individuals in the future. Each group would have a different idea on what would be the best approach, but only one of the groups needs to listen to the other.

Two Proposed Solutions to the W3C Browser Incubation Anti-Pattern

We know that the following patterns hold true at W3C:

  • Attempting to propose a browser API with little to no browser vendor involvement from the beginning, regardless of the technical merit of the specification does not work.
  • Attempting to propose a browser API where the browser vendors do not feel sufficiently in control of the specification does not work.

Giving browser vendors significant control over a browser API specification and providing feedback via pull requests is what currently works best at W3C. The downside here, of course, is that no one but the browser vendors can expect to have a significant voice in the creation of new browser APIs.

There are a few exceptions to the patterns listed above as some specification editors are more benevolent than others. Some put in more due diligence than others. Some standardization initiatives are less politically charged than others. Overall, there has to be a better way at doing specifications for browser APIs than what we have at W3C now.

The simplest solution to this problem is this:

As a best practice, a Working Group should never have multiple specifications that are still being incubated when it starts. The Working Group should pick either one specification before it starts or start with a blank specification and build from there. All discussion of that specification and decisions made about it should be done out in the open and with the group’s consent.

Barring what is said above, if the group decides to incubate multiple specifications (which is a terrible idea), here is a simple three-step proposal that could have been used to avoid the horror show that was the first couple of months of the Web Payments Working Group:

  1. Any Working Group that decides to incubate multiple specifications that do more or less the same thing, and will attempt to merge later, will always start out with one blank specification under the control of the Working Group.
  2. As editors of competing specifications finish up work on their incubated specifications, they make pull requests on the blank specification with concepts from their competing specification. The pull requests should be:
    • Small in size, integrating one concept at a time
    • Pulled in alternating between both specifications (with priorities on what to pull and when assigned by the originating editors)
    • Commented on by all editors in a timely fashion (or merged automatically after 5 business days; it’s better to have conflicting material in the spec that needs resolution than to ignore the desires of some of the group)
  3. The Working Group Chairs merge pull requests in. If there is a conflict, both options are pulled into the specification and an issue marker is created to ensure that the WG discusses both options and eventually picks one or combines them together.

The proposal is designed to:

  • Ensure that the Working Group is in control of the specification at all times.
  • Ensure that proper due diligence is carried out by all editors during the merging process.
  • Ensure that a mix of ideas are integrated into the specification.
  • Ensure that steady progress can be made on a single specification.

While I doubt the browser vendors will find the proposal better than the current state of things, I don’t think they would find it so unpalatable that they wouldn’t join any group using this procedure. This approach does inject some level of fairness to those participants that are not browser vendors and forces due diligence into the process (from all sides). If the Web is supposed to be for everyone, our specification creation process for browser APIs needs to be available to more than just the largest browser vendors.

Credentials: A Retrospective of Mild Successes and Dramatic Failures

Over the past 20 years, various organizations have tried to “solve” problems related to identity and credentialing on the Web and have been met with varying degrees of mild success, but mostly failure. Understanding what worked and what did not is necessary if we’re going to make progress on identity and credentialing on the Web.

The word identity means different things to different people and is often discussed as a problem waiting to be solved on the Web. In the physical world, we have many identities. We have an identity for work life and home life. We have an identity that we use when we talk with our friends and one that we use when we talk with our families. The concept of identity is as nuanced as it is broad.

There are aspects of our identities that have very little consequence to others, such as whether we have dark brown hair or black hair. There are also aspects of our identities that are vital for proving that we should be able to perform certain tasks, like a drivers license or nursing license. Then there are aspects of our identities that are important for social reasons, such as the rapport that we build with our friends over multiple decades.

Many aspects of our identity are often expressed via credentials, which can be seen as verifiable statements made by one person or organization about another. There have been multiple attempts at formalizing credentials on the Web; each one of them have been met with varying degrees of mild success, but mostly failure. This blog post explores the development goals and system capabilities that would lead to a healthy credentialing ecosystem and why previous attempts to achieve these goals have been met with limited success.

Credentialing Ecosystem Goals

A healthy credentialing ecosystem should have a number of qualities:

  • Credentials should be interoperable and portable. Credentials should be used by as broad of a range of organizations as possible. The recipient of a credential should be able to store, manage, and share credentials throughout their lifetime with relative ease.
  • The ecosystem should scale to the 3 billion people using the Web today and then to the 6 billion people that will be using the Web by the year 2020.
  • The process of exchanging a credential should be privacy enhancing and recipient controlled such that the system protects the privacy of the individual or organization using the credential by placing the recipient in control of who is allowed to access their credential.
  • Implementing systems that issue and consume credentials should be easy for Web developers in order to lower barriers to entry and increase the amount of software solutions in the ecosystem.
  • Creating systems that are accessible should be a fundamental design criteria, as 10% of the world’s population have disabilities and the solution should be usable by as many people as possible.
  • The solution should follow a number of core Web principles such as being patent and royalty-free, adhering to Web architecture fundamentals, supporting network and device independence, and being machine-readable where possible to enable automation and engagement of non-human actors.

Credentialing Capabilities

A solution for a healthy credential ecosystem should have the following capabilities:

  • Extensible Data Model – A data model that supports an entity making an unbounded set of claims about another entity. This enables a very broad applicability of credentials to different use cases and market verticals.
  • Choice of Syntax – A data model that is capable of being expressed in a variety of data syntaxes. This increases interoperability between disparate credentialing systems and increases the long-term viability of the technology.
  • Decentralized Vocabularies – A formal mechanism of expressing new types of claims without centralized coordination. This promotes a high degree of parallel adoption and innovation.
  • Web-based PKI – A digital signature mechanism that does not require out-of-band information to verify the authenticity of claims; instead it should enable public keys to be automatically fetched via the Web during verification. It should not render the signed data opaque because opaque data is harder to learn from, program to, and debug. This makes the digital signature mechanism easier to use for developers and system integrators.
  • Choice of Storage – A protocol for storing a credential at an arbitrary identity provider after it has been issued by an arbitrary issuer. This helps create a level playing field for all actors in the ecosystem.
  • App Integration – A protocol for managing credentials by a recipient using arbitrary 3rd party applications. This promotes a healthy application ecosystem for managing credentials.
  • Privacy-enhanced Sharing – A protocol that enables the recipient to share their credentials without revealing the intended destination to their identity provider. This enhances privacy.
  • Credential Portability – A protocol for migrating from one identity provider to another without the need to reissue each credential. This promotes a healthy identity provider ecosystem.
  • Credential Revocation – A protocol for revocation of a previously issued credential by the credential issuer. This enables issuers to ensure that the credentials they have issued accurately represent their claims.

A Foreword on the Analysis

There are a number of existing identity, credentialing, and general authentication solutions that have been deployed in the past and have seen success according to the design goals of the particular solution. The design goals and capabilities listed in the previous sections do not always align with the design goals and capabilities of the systems that have been analyzed below. It is fair to say that some of the analysis below is “unfair” as some of the systems are being judged against criteria that they were not meant to address. The reason these solutions have been included below are because 1) they come up in conversation as potential solutions, and/or 2) they have been successful in meeting some of the criteria listed above, and/or 3) they have failed in important ways that we need to learn from to ensure that a new initiative doesn’t make the same mistakes.

With that said, let’s get on with the analysis.


The grandparent of these identity initiatives is SAML, an XML-based, open-standard data format for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider.

Capability Rating Summary
Extensible Data Model Problematic XML-based data model. Extensible, but extensibility is rarely used leading to very limited claim types.
Choice of Syntax Poor Only XML is supported.
Decentralized Vocabularies Problematic The use of simple key-value pairs created the possibility of name clashes, limiting decentralized innovation and adoption.
Web-based PKI Problematic Service Providers had to explicitly trust Identity Providers leading to scalability issues.
Choice of Storage Poor Recipients can only have credentials issued and stored via a single identity provider.
App Integration Poor A recipient could only manage their credentials via their identity provider interface.
Privacy-enhanced Sharing Poor Identity providers cannot be prevented from knowing where recipients use their credentials.
Credential Portability Poor Credentials cannot be transferred from one identity provider to another.
Credential Revocation Problematic Credentials can only be issued and revoked by identity providers.

SAML has failed to gain traction outside the education and public service organization sectors and is rarely used as an option to log into non-enterprise websites. SAML isn’t a viable solution for the goals and capabilities listed above because 1) it is hobbled by older technologies such as XML and SOAP, 2) it has scalability issues due to the need to manually setup federations of identity and service providers, 3) it restricts the organizations that can issue valid credentials to identity providers, 4) it enables tracking and violates a number of privacy requirements listed above, and 5) it encourages centralization and super providers as more people use the system because of the administrative overhead of managing the identity and service provider federations as they grow.

Windows CardSpace / Infocard

Microsoft released CardSpace (code named InfoCard) in late 2006. CardSpace stored references to your digital identity, presenting them for selection as visual Information Cards. CardSpace provided a consistent interface designed to help you easily and securely use these identities in applications and websites.

Capability Rating Summary
Extensible Data Model Problematic Extensibility was possible (XML and URLs), but hard coded strings were used for parameter names.
Choice of Syntax Poor Only XML was supported.
Decentralized Vocabularies Problematic URLs were used for parameter values, but the solution was ultimately Windows-only.
Web-based PKI Good XML enveloping signatures with attached X509 certificates were used.
Choice of Storage Good Credentials were stored in an application controlled by the recipient.
App Integration Problematic Credentials were managed by a Windows application.
Privacy-enhanced Sharing Good Credential consumers requested credentials directly from the recipient.
Credential Portability Problematic Credentials could only live on Windows devices with no automatic cross-device synchronization capability (although manual export/import was supported).
Credential Revocation Good Security tokens are not granted for revoked credentials.

Windows CardSpace failed to gain traction and the project was canceled in 2011. The initiative has been replaced with U-Prove, a Microsoft Research project. CardSpace did get a number of things right, but ultimately failed because 1) it was largely a Microsoft-centric solution that required Windows and Active Directory 2) early adopters that initially backed the project did not feel as if there was a strong near-term demand for a solution and thus didn’t roll out products to support the standard, 3) it didn’t scale to mobile and was largely tied to the desktop, 4) it was a separate product requiring manual installation and not a feature of Internet Explorer, 5) while it supported verified claims through a trusted third party, very few applications surfaced that enabled that optional feature, and 6) it was partly based on the WS-* technology stack, which has been derided as “bloated, opaque, and insanely complex“.


Shibboleth is a middleware initiative for an architecture and open-source implementation for identity management and federated identity-based authentication and authorization. It is based on SAML and thus has many of the same advantages and drawbacks.

Capability Rating Summary
Extensible Data Model Problematic XML-based data model. Extensible, but extensibility is rarely used leading to very limited claim types.
Choice of Syntax Poor Only XML is supported.
Decentralized Vocabularies Problematic The use of simple key-value pairs created the possibility of name clashes, limiting decentralized innovation and adoption.
Web-based PKI Problematic Service Providers had to explicitly trust Identity Providers leading to scalability issues.
Choice of Storage Poor Recipients can only have credentials issued and stored via a single identity provider.
App Integration Poor A recipient can only manage their credentials via their identity provider interface.
Privacy-enhanced Sharing Poor Identity providers cannot be prevented from knowing where recipients use their credentials.
Credential Portability Poor Credentials cannot be transferred from one identity provider to another.
Credential Revocation Problematic Credentials can only be issued and revoked by identity providers.

Shibboleth has failed to gain traction outside the research, education, and public service organization sectors and is rarely used as an option to log into non-enterprise websites. Shibboleth is not a viable solution for the goals and capabilities listed above because 1) it is hobbled by older technologies such as XML and SOAP, 2) it has scalability issues due to the need to manually setup federations of identity and service providers, 3) it restricts the organizations that can issue valid credentials to identity providers, 4) it enables tracking and violates a number of privacy requirements listed above, and 5) it encourages centralization and super providers as more people use the system because of the administrative overhead of managing the identity and service provider federations as they grow.

OAuth 2.0

While not an identity or credentialing solution, OAuth 2.0 often comes up as being in the same class of solution and has been used (by Facebook and OpenID Connect) to achieve an authentication/authorization solution. Introduced in 2006 and finalized as OAuth 2.0 in 2012, the latest version of the framework has wide deployment among Facebook, Google, and Microsoft but suffers from multiple non-interoperable implementations.

Capability Rating Summary
Extensible Data Model Poor Key-value pairs, which require centralized coordination to avoid conflicts.
Choice of Syntax Poor Only string-based key-value pairs are supported.
Decentralized Vocabularies Poor Key-value pairs, which require centralized coordination to avoid conflicts.
Web-based PKI N/A OAuth 2.0 is not designed to perform digital signatures.
Choice of Storage N/A OAuth 2.0 is not designed to express credentials.
App Integration N/A OAuth 2.0 is not designed to manage credentials.
Privacy-enhanced Sharing N/A OAuth 2.0 is not designed to request credentials.
Credential Portability N/A OAuth 2.0 is not designed to port credentials.
Credential Revocation Problematic OAuth 2.0 tokens timeout after a while, but operate as bearer tokens (if they are stolen, they can be used by other people for a non-trivial amount of time).

OAuth 2.0 was never designed to achieve the goals and capabilities listed at the beginning of this blog post, and so the comparison above isn’t very fair. That said, there are a number of reasons that OAuth 2.0 is not a good fit for the goals and required capabilities listed above: 1) it is often criticized as being problematic from a security standpoint, overly complex, and favoring large enterprise deployments, 2) it doesn’t have a data model that is flexible enough to model more than the most basic type of credentials, 3) it does not support digital signatures, and 4) it doesn’t solve the credentialing problem described above because it is designed for a completely different set of use cases: providing access to 3rd parties that want to perform certain operations on your account. While OAuth 2.0 is not a solution to the credentialing problem, it can be used as part of a credentialing solution, which is what OpenID Connect does.

OpenID Connect

OpenID Connect is a simple identity layer on top of the OAuth 2.0 protocol, which allows clients to verify the identity of an end-user based on the authentication performed by an authorization server, as well as to obtain basic profile information about the end-user in an interoperable and REST-like manner.

Capability Rating Summary
Extensible Data Model Problematic JSON is extensible, but needs a single centralized registry for terms or full URLs must be used.
Choice of Syntax Poor Only JSON is supported.
Decentralized Vocabularies Poor Extensions require a single centralized registry for terms or full URLs must be used.
Web-based PKI Poor URLs can be used for JWK key IDs, but the mechanism to do discovery isn’t specified in the specifications.
Choice of Storage Poor Credential information is not portable.
App Integration Problematic Recipients can only manage credentials via their identity provider.
Privacy-enhanced Sharing Poor Identity providers can track all logins.
Credential Portability Poor Credentials are not portable between identity providers.
Credential Revocation Problematic Distributed credentials can be revoked at any time, but are rarely used.

The OpenID initiative started in 2005. Today Google, Microsoft, Deutsche Telekom, and SalesForce use the OpenID Connect protocol to support federated login, but Facebook and Twitter do not. OpenID Connect has been fairly successful in creating a Web-scale single-sign on solution, but it fails to address the goals and required capabilities for the following reasons: 1) it depends heavily on centralized registries to define types of credentials, 2) credentials issued by 3rd parties and digital signatures are rarely used (there is no rich 3rd party credential ecosystem), 3) there is no protocol for transferring credentials from one provider to another, 4) it enables tracking and other privacy violations, and 5) it promotes centralization and super-providers due to a reliance on email-addresses-as-identifiers, ensuring that email super providers become the new credential super providers.


WebID+TLS, previously known as FOAF+SSL, is a decentralized and secure authentication protocol built on W3C standards around Linked Data and utilizing client-side x509 certificates and TLS to boostrap the identity discovery process.

Capability Rating Summary
Extensible Data Model Good WebID is based on RDF which is a provably extensible data model.
Choice of Syntax Good WebID is based on RDF and thus supports multiple syntaxes.
Decentralized Vocabularies Good All data is expressed using vocabularies, which can be created by anyone. Data merging is designed to happen automatically.
Web-based PKI Problematic All claims in a WebID are self-issued claims. No clear specification for 3rd party claims.
Choice of Storage Good WebIDs can be stored at the recipient’s preferred location
App Integration Problematic Possible, but no clear specification exists to do so.
Privacy-enhanced Sharing Problematic Identity Providers can track logins by default (unless a mixnet is used).
Credential Portability Problematic Self-issued credentials can be ported easily. If there were 3rd party credentials tied to WebID URL, those credentials would not be portable.
Credential Revocation Problematic Revocation is possible, but there is no clear specification for doing so.

WebID+TLS was first presented for the W3C Workshop on the Future of Social Networking in 2009, but has failed to gain any significant traction in the past six years. WebID+TLS is also problematic because 1) it depends on browser client-side certificate handling, which is a bad UI experience, 2) it depends on the KEYGEN HTML element in a non-trivial way, which is currently under discussion to be removed from certain browsers, and 3) it is not well specified how you achieve many of the credential goals and required capabilities listed in the beginning of this post.

Mozilla Persona

Persona was launched in 2011 and shares some of its goals with some similar authentication systems like OpenID or Facebook Connect, but it is different in several important ways: 1) It used email addresses as identifiers, 2) It was more focused on privacy, 3) It was intended to be fully integrated in the browser.

Capability Rating Summary
Extensible Data Model N/A Not a design requirement.
Choice of Syntax N/A Not a design requirement.
Decentralized Vocabularies N/A Not a design requirement.
Web-based PKI Problematic Public keys are discoverable but content is largely hidden.
Choice of Storage Poor Identity provider is tied to login domain.
App Integration N/A There are no credentials to manage.
Privacy-enhanced Sharing Good Logins are privacy-enhancing and not easily trackable.
Credential Portability Poor There are no credentials to port.
Credential Revocation N/A There are no credentials to revoke.

Persona is solely an authentication system and was not designed to support arbitrary credentials. Mozilla was hoping for widespread adoption of the protocol, but that did not occur because super providers like Google and Facebook had already developed their own competing login mechanisms. All full-time developers were pulled from the project in 2014.

Omitted Technologies

There are a number of technologies, such as U-Prove, the WebAppSec Login/Credentials API, and Mozilla’s Firefox Account API, that were not included in this analysis because they are still very early in the research and development phases. APIs like Facebook Connect and Login with Google were not included because they are not intended to be open standards.


There are a number of goals and required capabilities that need to be fulfilled in order to create a vibrant credentialing ecosystem. There have been various attempts at addressing subsets of the problems in the ecosystem and those solutions have been met with small successes and varied failures. There still is no widely deployed and adopted way of issuing, storing, managing, and transmitting credentials on the Web today and we do have quite a bit of insight into why the prior attempts at solving the general identity/credentialing problem have failed.

These findings lead to a simple question: Is it time to do something about Credentials on the Web?

The Case for Standardizing Credentials via W3C

Credentials solve a variety of real authentication problems on the Web today, we can implement them in a way that doesn’t threaten anonymity or privacy and avoid the mistakes made in previous attempts. The W3C is the right place to do this important work.

Over the past two years a growing number of organizations in education, finance, and healthcare have been gathering in W3C Community Groups around the concept of a standard format for credentials on the Web.

A credential is a qualification, achievement, quality, or piece of information about an entity’s background, typically used to indicate suitability. A credential could include information such as a name, home address, government ID, professional license, or university degree.

We use credentials when we show our shoppers card to get discounts at the grocery store, we use them when we show a drivers license to order a drink at the bar, and we use them when we show our passport as we enter a foreign country. The use of credentials to demonstrate capability, membership, status, and minimum legal requirements is something that we do on a regular basis as we go about our lives.

A variety of identity and authentication solutions exist today to perform things like single-sign on and limited attribute verification, but they are not widely deployed enough to be ubiquitous on the Web. As a result, these problems exist in the world today because it’s difficult to provide a flexible, digitally verifiable credential:

  • In 2014, 12.7 million U.S. consumers experienced identity fraud losses of over $16B dollars.
  • Over the last 5 years, more than $100B in identity-related fraud losses have been detected in the United States alone. That number is in the several hundreds of billions of dollars worldwide.
  • The cost to law enforcement ranges from $15,000 to $25,000 to investigate each identity theft case; many of them are not investigated. Victims spend, on average, 600 hours trying to clear damaged credit or even criminal records caused by the thief.
  • In 2010, 7 million people self-reported illegal use of prescription drugs in the previous month in the US. These drugs are often acquired through the use of faked credentials. The healthcare costs alone of nonmedical use of prescription opioids – the most commonly misused class of prescription drugs – are estimated to total $72.5 billion annually.
  • Educational and professional service testing fraud has led to multiple hundred million dollar losses when test takers are not who they say they are and the testing agencies are held accountable as a result.
  • Forty thousand legitimate Ph.D.s are awarded annually in the U.S. — while 50,000 spurious Ph.D.s are purchased; that is, more than 50% of new PhDs degrees are fake and more than 500,000 working Americans have a fraudulent degree. More than 25% of job applicants inflate their educational achievements on their resumes.

So, the simple question has been raised in the W3C community: Is it time to do something about Credentials and if so, can W3C add value to the standardization process? Or is this already a solved problem?

Myriad Credentials

There are many different types of credentials in the world and many industry verticals that use credentials in a variety of different ways.

There are many categories of credentials that have been identified as important to society, here are a sample of them:


Academic credentials and co-curricular activities are recognized and exchanged among learners, institutions, employers, or consumers.
A worker’s certified skill or license is a condition for employment, professional development, and promotability.
Civil Society
Access to social benefits and contracts may be based on verifiable conditions such as marital status.
Ensuring that medical professionals are properly licensed to write prescriptions for controlled substances or provide certain services to patients.
Regulations require the proper identification of parties in high value transactions. The legal right to purchase a product depends on the verifiable age or location of the buyer.

The use of credentials is a big part of how our society operates. In order to standardize their usage, it’s important that a generalized mechanism is used to express all the data associated with the types of credentials above. To put it another way, the solution shouldn’t be specific to each category above because there are too many different types of credentials for a single group to standardize. Rather, the solution should be a generalized mechanism that enables verifiable claims to be made about an entity where the contents of the credential can be specified by a specific industry or market vertical (such as university entrance testing, or pharmaceutical distribution, or wealth management).

HTML, CSS, XML, and JSON-LD, are examples of formats that have been used by many different market verticals to encode data. WebRTC, Geolocation, Drag and Drop, and Web Audio are examples of new ways of encoding and exchanging data over the Web that have found use in a variety of industries. Each of these technologies were standardized at W3C.

The purpose of this post is to help build a shared understanding of the use cases for credentials and, in doing so, help accelerate work at W3C. What follows is a list of concerns and hesitations that have been raised over the past several months. It is helpful to discuss these concerns as part of a larger conversation around use cases and technical merits related to the work.

Hesitation #1: The End of Anonymity on the Web

The spectrum of identity needs across diverse use cases suggests there will be different kinds of credentials to match strong privacy use cases and strong identity use cases.

One of the hesitations expressed by privacy proponents is that if we make it easy to identify people using credentials, that we are going to lose anonymity on the Web. The Web started as a largely anonymous medium of communication. In the early days you could jump from site to site without having to identify yourself or expose any personally identifying information. Things have changed for the worse since then with the advent of IP tracking, evercookies, device fingerprinting, browser fingerprinting, email addresses as identifiers, analytics packages, and ad-driven business models. One could argue that anonymity on the Web was lost long ago.

Whether or not you believe that anonymity on the Web is a real thing today, there is a strong argument to not make things worse. Where strong privacy is required, bearer credentials can be used. Where strong identity assurances are required, we can use tightly-bound credentials.

A tightly-bound credential contains a set of claims that are associated with an identifier, effectively stating things like “John Doe is over the age of 21 — signed by some mutually trusted organization or person”. The “John Doe” part of the credential is the problem, because if you provide that credential to multiple websites, you can be tracked across those websites because they know who you are. A tightly-bound credential, however, can contain a link to retrieve a bearer credential. A bearer credential is a short lived, untrackable (by itself) credential that effectively states things like “The holder of this credential is over the age of 21 — signed by some mutually trusted organization or person”.

Bearer credentials do have downsides. Sophisticated attackers can intercept them and replay them across multiple sites. For this reason, bearer credentials have a very short lived lifetime and may not be accepted by credential consumer sites that require a stronger type of authentication such as a tightly-bound credential. That said, there are many websites where using bearer credentials to verify things like age or postal code should be acceptable.

Do bearer credentials ensure anonymity on the Web? No. The only thing you need to do to defeat them is to ask the entity with the bearer credential what their email address is and you have a universally trackable identifier for them. What bearer credentials do, however, is to help not make the tracking problem worse.

Hesitation #2: Credentials as a Gateway Drug

If an easy mechanism exists to ask for personally invasive credentials, it does not necessarily follow that every website will ask for those credentials or that people will willingly hand them over.

Another concern that is often raised is that if there is a good way to strongly identify people via credentials, and the mechanism is easy to use, more websites will require credentials in order to use their services. This is certainly a concern and predicting what will happen here is more difficult particularly because there are many market forces at work.

Websites have varying degrees of utility to people. Depending on each site’s utility, people are willing to give up more or less of their personal information to access the site. A website that is effectively a collection of cat pictures will probably not convince many to give up their email address in exchange for more cat pictures. To provide contrast to the previous example, a personal banking website will most likely be able to convince you to provide far more personally identifiable information. This dynamic will most likely not change as far as it’s clear what information is being transmitted to each site via a credential.

Ultimately, this choice is up to the person sitting behind the browser and the choice must be presented in such a way as to ensure informed consent before the credential is transmitted.

Hesitation #3: We’ve Tried This Before

While it may sound like this has been tried before, the capabilities required to meet the identified goals are more advanced than what the state of the art today supports.

The third major hesitation for not starting the Credentials work as an official work item at W3C is a misconception that many of the identified goals and capabilities are the same as previous attempts at a solution. SAML, InfoCard, Shibboleth, OAuth, OpenID Connect, WebID+TLS, Mozilla Persona; none have solved the credentialing problem stated at the beginning of this blog post in a significant way.

A detailed analysis of each of these initiatives has been performed in a separate blog post titled Credentials: A Retrospective of Mild Successes and Dramatic Failures. Here’s a summary of the state of the art ranked against desired capabilities:

SAML 2.0
OAuth 2.0
OpenID Connect
Mozilla Persona
Extensible Data Model
Choice of Syntax
Decentralized Vocabularies
Web-based PKI
Choice of Storage
App Integration
Privacy-Enhanced Sharing
Credential Portability
Credential Revocation

The primary finding of the analysis above is simple: there still is no widely deployed and adopted way of exchanging credentials on the Web today, but we do have quite a bit of insight into why the prior attempts at solving the general identity/credentialing problem have failed.

Hesitation #4: There Is Nothing to Do

While the problem may seem trivial on the surface, to solve it correctly requires carefully understanding the current gaps in the Open Web Platform.

Another assertion that has been made before is that all of the technology necessary to create a robust open credentialing ecosystem on the Web already exists and it is just a simple matter of stringing HTTP, JSON, Javascript Object Signing and Encryption (JOSE), OAuth, OpenID Connect, and a few other technologies together to create the solution. Thus, there is nothing for W3C to do as the problem is more or less solved, and developers only need to be educated about the proper solution.

Clearly, we should reuse existing technologies wherever possible to build a single coherent, stable solution. We do not want to reinvent the wheel.

If it is true that all the technologies already exist, we could quickly solve the problem and move on to other pressing problems on the Web. Unfortunately, this view is not held by the majority of the community that has had contact with the credentialing problem. Participants in the community have tried and failed to implement credentialing solutions using all of the technologies listed at the beginning of this section. It is true that pieces of the solution exist across multiple standardization organizations, but a set of standards for a vibrant credentialing ecosystem do not exist yet as detailed in this blog post: Credentials: A Retrospective of Mild Successes and Dramatic Failures.

We use credentials in our daily lives. We often receive them from one party and then share them with another. If this is a solved problem on the Web, then why is there no similarly vibrant ecosystem of disparate but interoperable Web-based credentialing systems? Where is the specification that outlines how to express a digitally signed credential? Or the protocol for issuing a credential to a storage location? Or the protocol for requesting a credential?

Even if all of the core technologies exist, they have yet to be put together into a coherent, secure, easy to integrate solution for the Web and that sounds like something where the W3C could add value.


This work is worth doing at the W3C because the Open Web Platform doesn’t have a functionality that society depends on to run its education, healthcare, finance, and government sectors. It’s worth doing because fraud is a big problem on the Web for high-stakes exchanges, and the problem is only getting worse. It’s worth doing because the W3C has a platform that 3 billion people around the world use and we could solve this problem at an international scale if successful. It’s worth doing because we plan to add another 3 billion people to the Web in the next 5 years, many of them lacking the basic credentials they need to participate in society, and we can improve upon that condition. It’s worth doing because the W3C membership has solved problems like this before, and has changed the world for the better in doing so.

Web Payments: The Architect, the Sage, and the Moral Voice

Three billion people use the Web today. That number will double to more than six billion people by the year 2020. Let that sink in for a second; in five years, the Web will reach 90% of all humans over the age of 6. A very large percentage of these people will be using their mobile phones as their only means of accessing the Web.

TL;DR: In 2015, the World Wide Web Consortium (The Architect), the US Federal Reserve (The Sage), and the Bill and Melinda Gates Foundation (The Moral Voice) have each started initiatives to dramatically improve the worlds payment systems. The organizations are highly aligned in their thinking. Imagine what we could accomplish if we joined forces.

The Problem

With the power of the Web, we can send a message from New York to New Delhi in the blink of an eye. Millions of people around the globe can read a story published by a citizen journalist, writing about a fragile situation, from a previously dark corner of the world. Yet, when it comes to exchanging economic value, the Web has not fulfilled a promise to ease the way we exchange information.

While it costs fractions of a penny to send a message around the globe, on average, it costs tens of thousands of times that to send money the same distance. Furthermore, two and a half billion adults on this planet do not have access to modern financial infrastructure, which places families in precarious situations when a shock such as a medical emergency hits the household. Worse, it leaves these families in a vicious cycle of living hand-to-mouth, unable to fully engage in the local economy much less the global one.

The Architect

What if we were able to use the most ubiquitous communication network that exists today to move economic value in the same way that we move instant messages? What if we could lower the costs to the point that we could pull those 2.5 billion people out of the precarious situation in which they’re operating now? Doing this would have dramatically positive financial and societal effects for those people as well as the people and businesses operating in industrialized nations.

That’s the premise behind the Web Payments Activity at the World Wide Web Consortium (W3C). The W3C, along with its 400 member organizations, standardized the Web and is one of the main reasons you’re able to view this web page today from wherever you’re sitting on this planet of ours. For the last five years or so, a number of us have been involved in trying to standardize the way money is sent and received by building that mechanism into the core architecture of the Web.

It has been a monumental effort, and we’re very far from being done, but the momentum we’ve gained so far is more than we predicted by far. For example, here is just a sampling of the organizations involved in the work: Bloomberg, Google, The US Federal Reserve, Alibaba, Tencent, Apple, Opera, Target, Intel, Deutsche Telekom, Ripple Labs, Oracle, Yandex… the list goes on. What was a pipe dream a few years ago at W3C is a very real possibility today. There is a strong upside here for customers, merchants, financial institutions, and developers.

The Sage

At one time, the US had the most advanced payment system in the world. One of the problems with being the first is that you quickly start accruing technical debt. Today, the US Payment system is among the worst in the world in many categories in which you rate these sorts of things. For the last several years, the US Fed has been running an initiative to improve the state of the US payment ecosystem.

Two of the US Fed’s strengths are 1) pouring through massive amounts of research on our financial system and producing a cohesive summary of the state of the US financial system and 2) its ability to convene a large number of financial players around a particular topic of interest. Their call for papers on ideas around improving the US payment system resulted in 190 submitted papers and a coherent strategy for fixing the US payment system. Their most recent Faster Payments Task Force has attracted over 320 organizations that will be attempting to propose systems to fix a number of the US payment systems rough spots.

If we are going to try to upgrade the payment systems in the world, it’s important to be able to make decisions based on data. The research and convening ability of the US Fed is a powerful force and the W3C and US Fed are already collaborating on the Web Payments work. The plan should be to deepen these relationships over the next couple of years.

The Moral Voice

The Bill and Melinda Gates Foundation just announced the LevelOne Project, which is an initiative to dramatically increase financial inclusion around the world by building a system that will work for the 2.5 billion people that have little to no access to financial infrastructure in their countries. This isn’t just a developing world problem. At least 30% of people in places like the United States and the European Union don’t have access to modern financial infrastructure.

The Gates Foundation has just proposed a research-backed formula for success for launching a new payment system designed to foster financial inclusion, and here’s where it gets interesting.

The Collaboration

Building the next generation payments system for the world requires answering the ‘what’, ‘how’, and ‘why’. The organizations mentioned previously will play a crucial role in elaborating on those answers. The US Fed (the sage) can influence what we are building; they can explain what has been and what should be. The W3C (the architect) can influence how we build what is needed; they can explain how all the pieces fit together into a cohesive whole. Finally, the Gates Foundation (the moral voice) can explain the why behind what we are building in the way that we’re constructing it.

I’ve had the great pleasure of working with the people in these initiatives over the past several years. Aside from everyone I’ve spoken with being deeply dedicated to the task at hand, I can also say from first-hand experience that there is a tremendous amount of alignment between the three organizations. It’ll take time to figure out the logistics of how to most effectively work together, but it is certainly something worth pursuing. At a minimum each organization should be publicly supportive of each others work. My hope is that the organizations start to become deeply involved with each other where it makes sense to do so.

The first opportunity to collaborate in person is going to be the Web Payments face-to-face meeting in New York City happening on June 16th-18th 2015. W3C and US Fed will be there. We need to get someone from the Gates Foundation there.

If this collaboration ends up being successful, the future is looking very bright for the Web and the 6 billion people that will have access to Web Payments in a few years time.

Three Important Upcoming Web Payments Events

The next two months will bring three important events related to the Web Payments work at W3C:

  1. The deadline for voting in favor of an official Web Payments Activity at W3C is October 10th, 2014 (in days). If you know a W3C Member company, be sure to push them to vote in favor of the initiative via the online form. If you have any questions, contact Stephane Boyera <>.
  2. The first face-to-face meeting for the official Web Payments work will be October 27th and 28th at the W3C Technical Plenary, as long as the Web Payments Activity is approved (see #1). Make sure you have registered for W3C TPAC on those dates if you want to participate and are a W3C member. Early bird registration ends in days. If you have any questions, contact Stephane Boyera <>
  3. The Web Payments use cases are being finalized this week in preparation for a vote by the Web Payments Community Group, make sure you review the use cases and weigh in on those that concern you by October 3rd 2014 (in days). If you have any further questions, contact Manu Sporny <>

If you want more information about any of these upcoming events, feel free to ask about them in the comments below, or email me directly. Details about each one of these events can be found below.

The Web Payments Activity Vote

The W3C is a member organization composed of roughly 389 technology companies, financial institutions, universities, governments, and NGOs. In order to start work officially, there is a process that is followed. Typically the W3C holds a workshop to get input, proposes a charter for one or more groups, and then has the membership vote on the charter.

The plan to initiate the Web Payments work followed this tried and true process. The worlds first Web Payments Workshop happened in March of this year in Paris. A charter for a Web Payments Interest Group was then proposed by the W3C and later refined by the workshop participants over the summer of 2014. The Web Payments Interest Group’s purpose is to identify gaps in the Web Platform with respect to payments and propose technical Working Groups to address those gaps.

This is all happening under a larger initiative at W3C called the Web Payments Activity. As the last set of pre-vote comments trickled in, the W3C opened the Web Payments Activity to an official vote by the W3C membership which is currently open (vote now!) and will close on October 10th 2014.

The W3C Technical Plenary

The W3C holds a technical plenary every year, usually led by a W3C host organization. Last year it was in Shenzhen, China. This year it will be in Santa Clara, California from October 27th-31st. If you want to be involved in the first Web Payments face-to-face meeting, register now for at least the first two days of the plenary. This meeting will set the tone and work plan for much of the next year. As an added bonus, this year is the 25th anniversary of the Web and the 20th anniversary of W3C. There will be a big gala dinner that you will not want to miss. It’s open to the public, so if you’re in Silicon Valley during that time, you may want to get your tickets now.

Web Payments Use Cases

The Web Payments Community Group has been hard at work organizing and refining the use cases that were raised at the Web Payments Workshop. The Community Group has spent close to 18 hours of teleconference time removing unnecessary cruft, more clearly expressing the core needs of each use case, and debating the merits of addressing each use case in the first iteration or latter iterations of the Web Payments work. The current list is going to go through a final review this week and then will be voted on by the community group to be formalized in a document and sent as input into the Web Payments Interest Group. If you would like to send your comments into the Web Payments Community Group, please do so by October 3rd, 2014.

Github adds JSON-LD support in core product

Github has just announced that they’ve added JSON-LD and support. Now if you get a pull request via email, and your mail client supports it (like Gmail does), you’ll see an action button shown in the display without having to open the email, like so:

How does Github do this? With Actions, which use JSON-LD markup. When you get an email from Github, it will now include markup that looks like this:

<script type="application/ld+json">
  "@type": "EmailMessage",
  "description": "View this Pull Request on GitHub",
  "action": {
    "@type": "ViewAction",
    "name":"View Pull Request"

Programs like Gmail then take that JSON-LD and translate it into a simple action that you can take without opening the email. Do you send emails to your customers, if so, Meldium has a great HOW-TO on integrating actions with your customer emails.

Hooray for Github,, and JSON-LD!

Identity Credentials and Web Login

In a previous blog post, I outlined the need for a better login solution for the Web and why Mozilla Persona, WebID+TLS, and OpenID Connect currently don’t address important use cases that we’re considering in the Web Payments Community Group. The blog post contained a proposal for a new login mechanism for the Web that was simultaneously more decentralized, more extensible, enabled a level playing field, and was more privacy-aware than the previously mentioned solutions.

In the private conversations we have had with companies large and small, the proposal was met with a healthy dose of skepticism and excitement. There was enough excitement generated to push us to build a proof-of-concept of the technology. We are releasing this proof-of-concept to the Web today so that other technologists can take a look at it. It’s by no means done, there are plenty of bugs and security issues that we plan to fix in the next several weeks, but the core of the idea is there and you can try it out.

TL;DR: There is now an open source demo of credential-based login for the Web. We think it’s better than Persona, WebID+TLS, and OpenID Connect. If we can build enough support for Identity Credentials over the next year, we’d like to standardize it via the W3C.

The Demo

The demonstration that we’re releasing today is a proof-of-concept asserting that we can have a unified, secure identity and login solution for the Web. The technology is capable of storing and transmitting your identity credentials (email address, payment processor, shipping address, driver’s license, passport, etc.) while also protecting your privacy from those that would want to track and sell your online browsing behavior. It is in the same realm of technology as Mozilla Persona, WebID+TLS, and OpenID Connect. Benefits of using this technology include:

  • Solving the NASCAR login problem in a way that greatly increases identity provider competition.
  • Removing the need for usernames and passwords when logging into 99.99% of the websites that you use today.
  • Auto-filling information that you have to repeat over and over again (shipping address, name, email, etc.).
  • Solving the NASCAR payments problem in a way that greatly increases payment processor competition.
  • Storage and transmission of credentials, such as email addresses, driver’s licenses, and digital passports, via the Web that cryptographically prove that you are who you say you are.

The demonstration is based on the Identity Credentials technology being developed by the Web Payments Community Group at the World Wide Web Consortium. It consists of an ecosystem of four example websites. The purpose of each website is explained below:

Identity Provider (

The Identity Provider stores your identity document and any information about you including any credentials that other sites may issue to you. This site is used to accomplish several things during the demo:

  • Create an identity.
  • Register your identity with the Login Hub.
  • Generate a verified email credential and store it in your identity.

Login Hub (

This site helps other websites discover your identity provider in a way that protects your privacy from both the website you’re logging into as well as your identity provider. Eventually the functionality of this website will be implemented directly in browsers, but until that happens, it is used to bootstrap the discovery of and login/credential transmission process for the identity provider. This site is used to do the following things during the demo:

  • Register your identity, creating an association between your identity provider and the email address and passphrase you use on the login hub.
  • Login to a website.

Credential Issuer (

This site is responsible for verifying information about you like your home address, driver’s license, and passport information. Once the site has verified some information about you, it can issue a credential to you. For the purposes of the demonstration, all verifications are simulated and you will immediately be given a credential when you ask for one. All credentials are digitally signed by the issuer which means their validity can be proven without the need to contact the issuer (or be online). This site is used to do the following things during the demo:

  • Login using an email credential.
  • Issue other credentials to yourself like a business address, proof of age, driver’s license, and digital passport.

Single Sign-On Demo

The single sign-on website, while not implemented yet, will be used to demonstrate the simplicity of credential-based login. The sign-on process requires you to click a login button, enter your email and passphrase on the Login Hub, and then verify that you would like to transmit the requested credential to the single sign-on website. This website will allow you to do the following in a future demo:

  • Present various credentials to log in.

How it Works

The demo is split into four distinct parts. Each part will be explained in detail in the rest of this post. Before you try the demo, it is important that you understand that this is a proof-of-concept. The demo is pretty easy to break because we haven’t spent any time polishing it. It’ll be useful for technologists that understand how the Web works. It has only been tested in Google Chrome, versions 31 – 35. There are glaring security issues with the demo that have solutions which have not been implemented yet due to time constraints. We wanted to publish our work as quickly as possible so others could critique it early rather than sitting on it until it was “done”. With those caveats clearly stated up front, let’s dive in to the demo.

Creating an Identity

The first part of the demo requires you to create an identity for yourself. Do so by clicking the link in the previous sentence. Your short name can be something simple like your first name or a handle you use online. The passphrase should be something long and memorable that is specific to you. When you click the Create button, you will be redirected to your new identity page.

Note the text displayed in the middle of the screen. This is your raw identity data in JSON-LD format. It is a machine-readable representation of your credentials. There are only three pieces of information in it in the beginning. The first is the JSON-LD @context value,, which tells machines how to interpret the information in the document. The second is the id value, which is the location of this particular identity on the Web. The third is the sysPasswordHash, which is just a bcrypt hash of your login password to the identity website.

Global Web Login Network

Now that you have an identity, you need to register it with the global Web login network. The purpose of this network is to help map your preferred email address to your identity provider. Keep in mind that in the future, the piece of software that will do this mapping will be your web browser. However, until this technology is built into the browser, we will need to bootstrap the email to identity document mapping in another way.

The way that both Mozilla Persona and OpenID do it is fairly similar. OpenID assumes that your email address maps to your identity provider. So, an OpenID login name of assumes that is your identity provider. Mozilla Persona went a step further by saying that if wouldn’t vouch for your email address, that they would. So Persona would first check to see if spoke the Persona protocol, and if it didn’t, the burden of validating the email address would fall back to Mozilla. This approach put Mozilla in the unenviable position of running a lot of infrastructure to make sure this entire system stayed up and running.

The Identity Credentials solution goes a step further than Mozilla Persona and states that you are the one that decides which identity provider your email address maps to. So, if you have an email address like, you can use as your identity provider. You can probably imagine that this makes the large identity providers nervous because it means that they’re now going to have to compete for your business. You have the choice of who is going to be your identity provider regardless of what your email address is.

So, let’s register your new identity on the global web login network. Click the text on the screen that says “Click here to register”. That will take you to a site called This website serves two purposes. The first is to map your preferred email address to your identity provider. The second is to protect your privacy as you send information from your identity provider and send it to other websites on the Internet (more on this later).

You should be presented with a screen that asks you for three pieces of information. Your preferred email address, a passphrase, and a verification of that passphrase. When you enter this information, it will be used to do a number of things. The first thing that will happen is that a public/private keypair will be generated for the device that you’re using (your web browser, for instance). This key will be used as a second factor of authentication in later steps in this process. The second thing that will happen is that your email address and passphrase will be used to generate a query token, which will be later used to query the decentralized Telehash-based identity network. The third thing that will happen is that your query token to identity document mapping will be encrypted and placed onto the Telehash network.

The Decentralized Database (Telehash)

We could spend an entire blog post itself on Telehash, but the important thing to understand about it is that it provides a mechanism to store data in a decentralized database and query that database at a later time for the data. By storing this query token and query response in the decentralized database, it allows us to find your identity provider mapping regardless of which device you’re using to access the Web and independent of who your email provider is.

In fact, note that I said that you use your “preferred email address” above? It doesn’t need to be an email address, it could be a simple string like “Bob” and a unique passphrase. Even though there are many “Bob”s in the world, the likelyhood that they’d use the same 20+ character passphrase is unlikely and therefore one could use just a first name and a complex passphrase. That said, we’re suggesting that most non-technical people use a preferred email address because most people won’t understand the dangers of SHA-256 collisions on username+passphrase combinations like sha256(“Bob” + “password”). In addition to this aside, the decentralized database solution doesn’t need to be Telehash. It could just as easily be a decentralized ledger like Namecoin or Ripple.

Once you have filled out your preferred email address and passphrase, click the Register button. You will be sent back to your identity provider and will see two new pieces of information. The first piece of information is sysIdpMapping, which is the decentralized database query token (query) and passphrase-encrypted mapping (queryResponse). The second piece of information is sysDeviceKeys, which is the public key associated with the device that you registered your identity through and which will be used as a second factor of authentication in later versions of the demo. The third piece of information is sysRegistered, which is an indicator that the identity has been registered with the decentralized database.

Acquiring an Email Credential

At this point, you can’t really do much with your identity since it doesn’t have any useful credential information associated with it. So, the next step is to put something useful into your identity. When you create an account on most websites, the first thing the website asks you for is an email address. It uses this email address to communicate with you. The website will typically verify that it can send and receive an email to that address before fully activating your account. You typically have to go through this process over and over again, once for each new site that you join. It would be nice if an identity solution designed for the Web would take care of this email registration process for you. For those of you familiar with Mozilla Persona, this approach should sound very familiar to you.

The Identity Credentials technology is a bit different from Mozilla Persona in that it enables a larger number of organizations to verify your email address than just your email provider or Mozilla. In fact, we see a future where there could be tens, if not hundreds, of organizations that could provide email verification. For the purposes of the demo, the Identity Provider will provide a “simulated verification” (aka fake) of your email address. To get this credential, click on the text that says “Click here to get one”.

You will be presented with a single input field for your email address. Any email address will do, but you may want to use the preferred one you entered earlier. Once you have entered your email address, click “Issue Email Credential”. You will be sent back to your identity page and you should see your first credential listed in your JSON-LD identity document beside the credential key. Let’s take a closer look at what constitutes a credential in the system.

The EmailCredential is a statement that a 3rd party has done an email verification on your account. Any credential that conforms to the Identity Credentials specification is composed of a set of claims and a signature value. The claims tie the information that the 3rd party is asserting, such as an email address, to the identity. The signature is composed of a number of fields that can be used to cryptographically prove that only the issuer of the credential was capable of issuing this specific credential. The details of how the signature is constructed can be found in the Secure Messaging specification.

Now that you have an email credential, you can use it to log into a website. The next demonstration will use the email credential to log into a credential issuer website.

Credential-based Login

Most websites will only require an email credential to log in. There are other sites, such as ecommerce sites or high-security websites, that will require a few more credentials to successfully log in or use their services. For example, a ecommerce site might require your payment processor and shipping address to send you the goods you purchased. A website that sells local wines might request that you provide a credential proving that you are above the required drinking age in your locality. A travel website might request your digital passport to ease your security clearing process if you are traveling internationally. There are many more types of speciality credentials that one may issue and use via the Identity Credentials technology. The next demo will entail issuing some of these credentials to yourself. However, before we do that, we have to login to the credential issuer website using our newly acquired email credential.

Go to the website and click on the “Login” button. This will immediately send you to the login hub website where you had previously registered your identity. The request sent to the login hub by will effectively be a request for your email credential. Once you’re on, enter your preferred email address and passphrase and then click “Login”.

While you were entering your email address and passphrase, the page connected to the Telehash network and readied itself to send a query. When you click “Login”, your email address and passphrase are SHA-256’d and sent as a query to the Telehash network. Your identity provider will receive the request and respond to the query with an encrypted message that will then be decrypted using your passphrase. The contents of that message will tell the login hub where your identity provider is holding your identity. The request for the email credential is then forwarded to your identity provider. Note that at this point your identity provider has no idea where the request for your email credential is coming from because it is masked by the login hub website. This masking process protects your privacy.

Once the request for your email credential is received by your identity provider, a response is packaged up and sent back to, which then relays that information back to Once recieves your email credential, it will log you into the website. Note at this point that you didn’t have to enter a single password on the website, all you needed was an email credential to log in. Now that you have logged in, you can start issuing additional credentials to yourself.

Issuing Additional Credentials

The previous section introduced the notion that you can issue many different types of credentials. Once you have logged into the website, you may now issue a few of these credentials to yourself. Since this is a demonstration, no attempt will be made to verify those credentials by a 3rd party. The credentials that you can issue to yourself include a business address, proof of age, payment processor, driver’s license, and passport. You many specify any information that you’d like to specify in the input fields to see how the credential would look if it held real data.

Once you have filled out the various fields, click the blue button to issue the credential. The credential will be digitally signed and sent to your identity provider, which will then show you the credential that was issued to you. You have a choice to accept or reject the credential. If you accept the credential, it is written to your identity.

You may repeat this process as many times as you would like. Note that on the passport credential how there is an issued on date as well as an expiration date to demonstrate that credentials can have a time limit associated with them.

Known Issues

As mentioned throughout this post, this demonstration has a number of shortcomings and areas that need improvement, among them are:

  • Due to a lack of time, we didn’t setup our own HTTPS Telehash seed. Since we didn’t setup the HTTPS Telehash seed, we couldn’t run secured by TLS due to security settings in most web browsers related to WebSocket connections. Not using TLS results in a gigantic man-in-the-middle attack possibility. A future version will, of course, use both TLS and HSTS on the website.
  • The Telehash query/response database isn’t decentralized yet. There are a number of complexities associated with creating a decentralized storage/query network, and we haven’t decided on what the proper approach should be. There is no reason why the decentralized database couldn’t be NameCoin or Ripple-based, and it would probably be good if we had multiple backend databases that supported the same query/response protocol.
  • We don’t check digital signatures yet, but will soon. We were focused on the flow of data first and ensuring security parameters were correct second. Clearly, you would never want to run such a system in production, but we will improve it such that all digital signatures are verified.
  • We do not use the public/private keypair generated in the browser to limit the domain and validity length of credentials yet. When the system is productionized, implementing this will be a requirement and will protect you even if your credentials are stolen through a phishing attack on
  • We expect there to be many more security vulnerabilities that we haven’t detected yet. That said, we do believe that there are no major design flaws in the system and are releasing the proof-of-concept, along with source code, to the general public for feedback.

Feedback and Future Work

If you have any questions or concerns about this particular demo, please leave them as comments on this blog post or send them as comments to the mailing list.

Just as you logged in to the website using your email credential, you may also use other credentials such as your driver’s license or passport to log in to websites. Future work on this demo will add functionality to demonstrate the use of other forms of credentials to perform logins while also addressing the security issues outlined in the previous section.

The Marathonic Dawn of Web Payments

A little over six years ago, a group of doe-eyed Web developers, technologists, and economists decided that the way we send and receive money over the Web was fundamentally broken and needed to be fixed. The tiring dance of filling out your personal details on every website you visited seemed archaic. This was especially true when handing over your debit card number, which is basically a password into your bank account, to any fly by night operation that had something you wanted to buy. It took days to send money where an email would take milliseconds. Even with the advent of Bitcoin, not much has changed since 2007.

At the time, we naively thought that it wouldn’t take long for the technology industry to catch on to this problem and address it like they’ve addressed many of the other issues around publishing and communication over the Web. After all, getting paid and paying for services is something all of us do as a fundamental part of modern day living. Change didn’t come as fast as we had hoped. So we kept our heads down and worked for years gathering momentum to address this issue on the Web. I’m happy to say that we’ve just had a breakthrough.

The first ever W3C Web Payments Workshop happened two weeks ago. It was a success. Through it, we have taken significant steps toward a better future for the Web and those that make a living by using it. This is the story of how we got from there to here, what the near future looks like, and the broad implications this work has for the Web.

TL;DR: The W3C Web Payments Workshop was a success, we’re moving toward standardizing some technologies around the way we send and receive money on the Web; join the Web Payments Community Group if you want to find out more.

Primordial Web Payment Soup

In late 2007, our merry little band of collaborators started piecing together bits of the existing Web platform in an attempt to come up with something that could be standardized. After a while, it became painfully obvious that the Web Platform was missing some fundamental markup and security technologies. For example, there was no standard machine-readable or automate-able way of describing an item for sale on the Web. This meant that search engines can’t index all the things on the Web that are offered for sale. It also meant that all purchasing decisions had to be made by people. You couldn’t tell your Web browser something like “I trust the New York Times, let them charge me $0.05 per article up to $10 per month for access to their website”. Linked Data seemed like the right solution for machine-readable products, but the Linked Data technologies at the time seemed mired in complex, draconian solutions (SOAP, XML, XHTML, etc.): the bane of most Web Developers.

We became involved in the Microformats community and in the creation of technologies like RDFa in the hope that we could apply it to the Web Payments work. When it became apparent that RDFa was only going to solve part of the problem (and potentially produce a new set of problems), we created JSON-LD and started to standardize it through the W3C.

As these technologies started to grow out of the need to support payments on the Web, it became apparent that we needed to get more people from the general public, government, policy, traditional finance, and technology sectors involved.

Founding a Payment Incubator for the Web

We needed to build a movement around the Web Payments work and the founding of a community was the first step in that movement. In 2009, we founded the PaySwarm Community and worked on the technologies related to payments on the Web with a handful of individuals. In 2011, we transitioned the PaySwarm Community to the W3C and renamed the group to the Web Payments Community Group. To be clear, Community Groups at W3C are never officially sanctioned by W3C’s membership, but they are where most of the pre-standardization work happens. The purpose of the Web Payments Community Group was to incubate payment technologies and lobby W3C to start official standardization work related to how we exchange monetary value on the Web.

What started out as nine people spread across the world has grown into an active community of more than 150 people today. That community includes interesting organizations like Bloomberg, Mozilla, Stripe, Yandex, Ripple Labs, Citigroup, Opera, Joyent, and Telefónica. We have 14 technologies that are in the pre-standardization phase, ready to be placed into the standardization pipeline at W3C if we can get enough support from Web developers and the W3C member organizations.


In 2013, a number of us thought there was enough momentum to lobby W3C to hold the world’s first Web Payments Workshop. The purpose of the workshop would be to get major payment providers, government organizations, telecommunication providers, Web technologists, and policy makers into the same room to see if they thought that payments on the Web were broken and to see if people in the room thought that there was something that we could do about it.

In November of 2013, plans were hatched to hold the worlds first Web Payments Workshop. Over the next several months, the W3C, the Web Payments Workshop Program Committee, and the Web Payments Community Group worked to bring together as many major players as possible. The result was something better than we could have hoped for.

The Web Payments Workshop

In March 2014, the Web Payments Workshop was held in the beautiful, historic, and apropos Paris stock exchange, the Palais Brongniart. It was packed by an all-star list of financial and technology industry titans like the US Federal Reserve, Google, SWIFT, Yandex, Mozilla, Bloomberg, ISOC, Rabobank, and 103 other people and organizations that shape financial and Web standards. In true W3C form, every single session was minuted and is available to the public. The sessions focused on the following key areas related to payments and the Web. The entire contents of each session, all 14 hours of discussion, are linked to below:

  1. Introductions by W3C and European Commission
  2. Overview of Current and Future Payment Ecosystems
  3. Toward an Ideal Web Payments Experience
  4. Back End: Banks, Regulation, and Future Clearing
  5. Enhancing the Customer and Merchant Experience
  6. Front End: Wallets – Initiating Payment and Digital Receipts
  7. Identity, Security, and Privacy
  8. Wrap-up of Workshop and Next Steps

I’m not going to do any sort of deep dive into what happened during the workshop. W3C has released a workshop report that does justice to summarizing what went on during the event. The rest of this blog post will focus on what will most likely happen if we continue to move down the path we’ve started on wrt. Web Payments at W3C.

The Next Year in Web Payments

The next step of the W3C process is to convene an official group that will take all of the raw input from the Web Payments Workshop, the papers submitted to the event, input from various W3C Community Groups and from the industry at large, and reduce the scope of work down to something that is narrowly focused but will have a very large series of positive impacts on the Web.

This group will most likely operate for 6-12 months to make its initial set of recommendations for work that should start immediately in existing W3C Working Groups. It may also recommend that entirely new groups be formed at W3C to start standardization work. Once standardization work starts, it will be another 3-4 years before we see an official Web standard. While that sounds like a long time, keep in mind that large chunks of the work will happen in parallel, or have already happened. For example, the first iteration of the RDFa and JSON-LD bits of the Web Payments work are already done and standardized. The HTTP Signatures work is quite far along (from a technical standpoint, it still needs a thorough security review and consensus to move forward).

So, what kind of new work can we expect to get started at W3C? While nothing is certain, looking at the 14 pre-standards documents that the Web Payments Community Group is working on helps us understand where the future might take us. The payment problems of highest concern mentioned in the workshop papers also hint at the sorts of issues that need to be addressed for payments on the Web. Below are a few ideas of what may spin out of the work over the next year. Keep in mind that these predictions are mine and mine alone, they are in no way tied to any sort of official consensus either at the W3C or in the Web Payments Community Group.

Identity and Verified Credentials

One of the most fundamental problems that was raised at the workshop was the idea that identity on the Web is broken. That is, being able to prove who you are to a website, such as a bank or merchant, is incredibly difficult. Since it’s hard for us to prove who we are on the Web, fraud levels are much higher than they should be and peer-to-peer payments require a network of trusted intermediaries (which drive up the cost of the simplest transaction).

The Web Payments Community Group is currently working on technology called Identity Credentials that could be applied to this problem. It’s also closely related to the website login problem that Mozilla Persona was attempting to solve. Security and privacy concerns abound in this area, so we have to make sure to carefully design for those concerns. We need a privacy-conscious identity solution for the Web, and it’s possible that a new Working Group may need to be created to push forward initiatives like credential-based login for the Web. I personally think it would be unwise for W3C members to put off the creation of an Identity Working Group for much longer.

Wallets, Payment Initiation, and Digital Receipts

Another agreement that seemed to come out of the workshop was the belief that we need to create a level playing field for payments while also not attempting to standardize one payment solution for the Web. The desire was to standardize on the bare minimum necessary to make it so that websites only needed a few ways to initiate payments and receive confirmation for them. The ideal case was that your browser or wallet software would pick the best payment option for you based on your needs (best protection, fastest payment confirmation, lowest fees, etc.).

Digital wallets that hold different payment mechanisms, loyalty cards, personal data, and receipts were discussed. Unfortunately, the scope of a wallet’s functionality was not clear. Would a wallet consist of a browser-based API? Would it be cloud-based? Both? How would you sync data between wallets on different devices? What sort of functionality would be the bare minimum? These are questions that the upcoming W3C Payments Interest Group should answer. The desired outcome, however seemed to be fairly concrete: provide a way for people to do a one-click purchase on any website without having to hand over all of their personal information. Make it easy for Web developers to integrate this functionality into websites using a standards-based approach.

Shifting to use some Bitcoin-like protocol seemed to be a non-starter for most everyone in the room, however the idea that we could create Bitcoin/USD/Euro wallets that could initiate payment and provide a digital receipt proving that funds were moved seemed to be one possible implementation target. This would allow Visa, Mastercard, PayPal, Bitcoin, and banks to not have to reinvent their entire payment networks in order to support simple one-click purchases on the Web. The Web Payments Community Group does have a Web Commerce API specification and a Web Commerce protocol that covers this area, but it may need to be modified or expanded based on the outcome of the “What is a digital wallet and what does it do?” discussion.

Everything Else

The three major areas where it seemed like work could start at W3C revolved around verified identity, payment initiation, and digital receipts. In order to achieve those broad goals, we’re also going to have to work on some other primitives for the Web.

For example, JSON-LD was mentioned a number of times as the digital receipt format. If JSON-LD is going to be the digital receipt format, we’re going to have to have a way of digitally signing those receipts. JOSE is one approach, Secure Messaging is another, and there is currently a debate over which is best suited for digitally signing JSON-LD data.

If we are going to have digital receipts, then what goes into those receipts? How are we going to express the goods and services that someone bought in an interoperable way? We need something like the product ontology to help us describe the supply and demand for products and services on the Web.

If JSON-LD is going to be utilized, some work needs to be put into Web vocabularies related to commerce, identity, and security. If mobile-based NFC payment is a part of the story, we need to figure out how that’s going to fit into the bigger picture, and so on.

Make a Difference, Join us

As you can see, even if the payments scope is very narrow, there is still a great deal of work that needs to be done. The good news is that the narrow scope above would focus on concrete goals and implementations. We can measure progress for each one of those initiatives, so it seems like what’s listed above is quite achievable over the next few years.

There also seems to be broad support to address many of the most fundamental problems with payments on the Web. That’s why I’m calling this a breakthrough. For the first time, we have some broad agreement that something needs to be done and that W3C can play a major role in this work. That’s not to say that if a W3C Payments Interest Group is formed that they won’t self destruct for one reason or another, but based on the sensible discussion at the Web Payments Workshop, I wouldn’t bet on that outcome.

If the Web Payments work at W3C is successful, it means a more privacy-conscious, secure, and semantically rich Web for everyone. It also means it will be easier for you to make a living through the Web because the proper primitives to do things like one-click payments on the Web will finally be there. That said, it’s going to take a community effort. If you are a Web developer, designer, or technical writer, we need your help to make that happen.

If you want to become involved, or just learn more about the march toward Web Payments, join the Web Payments Community Group.

If you are a part of an organization that would specifically like to provide input to the Web Payments Steering Group Charter at W3C, join here.

A Proposal for Credential-based Login

Mozilla Persona allows you to sign in to web sites using any of your existing email addresses without needing to create a new username and password on each website. It was a really promising solution for the password-based security nightmare that is login on the Web today.

Unfortunately, all the paid engineers for Mozilla Persona have been transitioned off of the project. While Mozilla is going to continue to support Persona into the foreseeable future, it isn’t going to directly put any more resources into improving Persona. Mozilla had very good reasons for doing this. That doesn’t mean that the recent events aren’t frustrating or sad. The Persona developers made a heroic effort. If you find yourself in the presence of Lloyd, Ben, Dan, Jed, Shane, Austin, or Jared (sorry if I missed someone!) be sure to thank them for their part in moving the Web forward.

If Not Persona, Then What?

At the moment, the Web’s future with respect to a better login experience is unclear. The current leader seems to be OpenID Connect which, while implemented across millions of sites, is still not seeing the sort of adoption that you’d need for a general Web-based login solution. It’s also really complex, so complex that the lead editor of the foundation OpenID is built on left the work a long time ago in frustration. It also doesn’t pro-actively protect your privacy, meaning that your identity provider can track where you go on the Web. OpenID also gives an undue amount of power to email providers, like Gmail and Yahoo as your email provider would typically end up becoming most people’s identity provider as well.

WebID+TLS should also be mentioned as this proposal embraces and extends a number of good features from that specification. Concepts such as an identity document, which is a place where you store facts about yourself, and the ability to express that information as Linked Data are solid concepts. WebID+TLS, unfortunately, relies on the client-certificate technology built into most browsers which is confusing to non-technologists and puts too much of a burden, such as requiring the use of an RDF TURTLE processor as well as the ability to hook into the TLS stream, onto websites adopting the technology. WebID+TLS also doesn’t do much to protect against pervasive monitoring and tracking of your behavior online by companies that would like to sell that behavior to the highest bidder.

Somewhere else on the Internet, the Web Payments Community Group is working on technology to build payments into the core architecture of the Web. Login and identity are a big part of payments. We need a solution that allows someone to login to a website and transmit their payment preferences at the same time. A single authorized click by you would provide your email address, shipping address, and preferred payment provider. Another authorized click by you would buy an item and have it shipped to your preferred address. There will be no need to fill out credit card information, shipping, or billing addresses and no need to create an email address and password for every site to which you want to send money. Persona was going to be this login solution for us, but that doesn’t seem achievable at this point.

What Persona Got Right

The Persona after-action review that Mozilla put together is useful. If you care about identity and login, you should read it. Persona did four groundbreaking things:

  1. It was intended to be fully decentralized, being integrated into the browser eventually.
  2. It focused on privacy, ensuring that your identity provider couldn’t track the sites that you were logging in to.
  3. It used an email address as your login ID, which is a proven approach to login on the Web.
  4. It was simple.

It failed for at least three important reasons that were not specific to Mozilla:

  1. It required email providers to buy into the protocol.
  2. It had a temporary, centralized solution that required a costly engineering team to keep it up and running.
  3. If your identity provider goes down, you can’t login to any website.

Finally, the Persona solution did one thing well. It provided a verified email credential, but is that enough for the Web?

The Need for Verifiable Credentials

There is a growing need for digitally verifiable credentials on the Web. Being able to prove that you are who you say you are is important when paying or receiving payment. It’s also important when trying to prove that you are a citizen of a particular country, of a particular age, licensed to perform a specific task (like designing a house), or have achieved a particular goal (like completing a training course). In all of these cases, it requires the ability for you to collect digitally signed credentials from a third party, like a university, and store it somewhere on the Web in an interoperable way.

The Web Payments group is working on just such a technology. It’s called the Identity Credentials specification.

We had somewhat of an epiphany a few weeks ago when it became clear that Persona was in trouble. An email address is just another type of credential. The process for transmitting a verified email address to a website should be the same as transmitting address information or your payment provider preference. Could we apply this concept and solve the login on the web problem as well as the transmission of verified credentials problem? It turns out that the answer is: most likely, yes.

Verified Credential-based Web Login

The process for credential-based login on the Web would more or less work like this:

  1. You get an account with an identity provider, or run one yourself. Not everyone wants to run one themselves, but it’s the Web, you should be able to easily do so if you want to.
  2. You show up at a website, it asks you to login by typing in your email address. No password is requested.
  3. The website then kick-starts a login process via that will be driven by a Javascript polyfill in the beginning, but will be integrated into the browser in time.
  4. A dialog is presented to you (that the website has no control over or visibility into) that asks you to login to your identity provider. Your identity provider doesn’t have to be your email provider. This step is skipped if you’ve logged in previously and your session with your identity provider is still active.
  5. A digitally signed assertion that you control your email address is given by your identity provider to the browser, which is then relayed on to the website you’re logging in to.

Details of how this process works can be found in the section titled Credential-based Login in the Identity Credentials specification. The important thing to note about this approach is that it takes all the best parts of Persona while overcoming key things that caused its demise. Namely:

  • Using an innovative new technology called Telehash, it is fully decentralized from day one.
  • It doesn’t require browser buy-in, but is implemented in such a way that allows it to be integrated into the browser eventually.
  • It is focused on privacy, ensuring that your identity provider can’t track the sites that you are logging into.
  • It uses an email address as your login ID, which is a proven approach to login on the Web.
  • It is simple, requiring far fewer web developer gymnastics than OpenID to implement. It’s just one Javascript library and one call.
  • It doesn’t require email providers to buy into the protocol like Persona did. Any party that the relying party website trusts can digitally sign a verified email credential.
  • If your identity provider goes down, there is still hope that you can login by storing your email credentials in a password-protected decentralized hash table on the Internet.

Why Telehash?

There is a part of this protocol that requires the system to map your email address to an identity provider. The way Persona did it was to query to see if your email provider was a Persona Identity Provider (decentralized), and if not, the system would fall back to Mozilla’s email-based verification system (centralized). Unfortunately, if Persona’s verification system was down, you couldn’t log into a website at all. This rarely happened, but that was more because Mozilla’s team was excellent at keeping the site up and there weren’t any serious attempts to attack the site. It was still a centralized solution.

The Identity Credentials specification takes a different approach to the problem. It allows any identity provider to claim an email address. This means that you no longer need buy-in from email providers. You just need buy-in from identity providers, and there are a ton of them out there that would be happy to claim and verify addresses like, or Unfortunately, this approach means that either you need browser support, or you need some sort of mapping mechanism that maps email addresses to identity providers. Enter Telehash.

Telehash is an Internet-wide distributed hashtable (DHT) based on the proven Kademlia protocol used by BitTorrent and Gnutella. All communication is fully encrypted. It allows you to store-and-replicate things like the following JSON document:

  "email": "",
  "identityService": ""

If you want to find out who’s identity provider is, you just query the Telehash network. The more astute readers among you see the obvious problem in this solution, though. There are massive trust, privacy, and distributed denial of service attack concerns here.

Attacks on the Distributed Mapping Protocol

There are four problems with the system described in the previous section.

The first is that you can find out which email addresses are associated with which identity providers; that leaks information. Finding out that is associated with the identity provider is a problem. Finding out that they’re also associated with the identity provider outs them as military personnel, which turns a regular privacy problem into a national security issue.

The second is that anyone on the network can claim to be an identity provider for that email address, which means that there is a big phishing risk. A nefarious identity provider need only put an entry for in the DHT pointing to their corrupt identity provider service and watch the personal data start pouring in.

The third is that a website wouldn’t know which digital signature on a email to trust. Which verified credential is trustworthy and which one isn’t?

The fourth is that you can easily harvest all of the email addresses on the network and spam them.

Attack Mitigation on the Distributed Mapping Protocol

There are ways to mitigate the problems raised in the previous section. For example, replacing the email field with a hash of the email address and passphrase would prevent attackers from both spamming an email address and figuring out how it maps to an identity provider. It would also lower the desire for attackers to put fake data into the DHT because only the proper email + passphrase would end up returning a useful result to a query. The identity service would also need to be encrypted with the passphrase to ensure that injecting bogus data into the network wouldn’t result in an entry collision.

In addition to these three mitigations, the network would employ a high CPU/memory proof-of-work to put a mapping into the DHT so the network couldn’t get flooded by bogus mappings. Keep in mind that the proof-of-work doesn’t stop bad data from getting into the DHT, it just slows its injection into the network.

Finally, figuring out which verified email credential is valid is tricky. One could easily anoint 10 non-profit email verification services that the network would trust, or something like the certificate-authority framework, but that could be argued as over-centralization. In the end, this is more of a policy decision because you would want to make sure email verification services are legally bound to do the proper steps to verify an email while ensuring that people aren’t gouged for the service. We don’t have a good solution to this problem yet, but we’re working on it.

With the modifications above, the actual data uploaded to the DHT will probably look more like this:

  "id": "c8e52c34a306fe1d487a0c15bc3f9bbd11776f30d6b60b10d452bcbe268d37b0",  <-- SHA256 hash of + >15 character passphrase
  "proofOfWork": "000000000000f7322e6add42",                                 <-- Proof of work for email to identity service mapping
  "identityService": "GZtJR2B5uyH79QXCJ...s8N2B5utJR2B54m0Lt"                <-- Passphrase-encrypted identity provider service URL

To query the network, the customer must provide both an email address and a passphrase which are hashed together. If the hash doesn't exist on the network, then nothing is returned by Telehash.

Also note that this entire Telehash-based mapping mechanism goes away once the technology is built into the browser. The telehash solution is merely a stop-gap measure until the identity credential solution is built into browsers.

The Far Future

In the far future, browsers would communicate with your identity providers to retrieve data that are requested by websites. When you attempt to login to a website, the website would request a set of credentials. Your browser would either provide the credentials directly if it has cached them, or it would fetch them from your identity provider. This system has all of the advantages of Persona and provides realistic solutions to a number of the scalability issues that Persona suffers from.

The greatest challenges ahead will entail getting a number of things right. Some of them include:

  • Mitigate the attack vectors for the Telehash + Javascript-based login solution. Even though the Telehash-based solution is temporary, it must be solid until browser implementations become the norm.
  • Ensure that there is buy-in from large companies wanting to provide credentials for people on the Web. We have a few major players in the pipeline at the moment, but we need more to achieve success.
  • Clearly communicate the benefits of this approach over OpenID and Persona.
  • Make sure that setting up your own credential-based identity provider is as simple as dropping a PHP file into your website.
  • Make it clear that this is intended to be a W3C standard by creating a specification that could be taken standards-track within a year.
  • Get buy-in from web developers and websites, which is going to be the hardest part.

JSON-LD and Why I Hate the Semantic Web

Full Disclosure: I am one of the primary creators of JSON-LD, lead editor on the JSON-LD 1.0 specification, and chair of the JSON-LD Community Group. This is an opinionated piece about JSON-LD. A number of people in this space don’t agree with my viewpoints. My statements should not be construed as official statements from the JSON-LD Community Group, W3C, or Digital Bazaar (my company) in any way, shape, or form. I’m pretty harsh about the technologies covered in this article and want to be very clear that I’m attacking the technologies, not the people that created them. I think most of the people that created and promote them are swell and I like them a lot, save for a few misguided souls, who are loveable and consistently wrong.

JSON-LD became an official Web Standard last week. This is after exactly 100 teleconferences typically lasting an hour and a half, fully transparent with text minutes and recorded audio for every call. There were 218+ issues addressed, 2,000+ source code commits, and 3,102+ emails that went through the JSON-LD Community Group. The journey was a fairly smooth one with only a few jarring bumps along the road. The specification is already deployed in production by companies like Google, the BBC,, Yandex, Yahoo!, and Microsoft. There is a quickly growing list of other companies that are incorporating JSON-LD. We’re off to a good start.

In the previous blog post, I detailed the key people that brought JSON-LD to where it is today and gave a rough timeline of the creation of JSON-LD. In this post I’m going to outline the key decisions we made that made JSON-LD stand out from the rest of the technologies in this space.

I’ve heard many people say that JSON-LD is primarily about the Semantic Web, but I disagree, it’s not about that at all. JSON-LD was created for Web Developers that are working with data that is important to other people and must interoperate across the Web. The Semantic Web was near the bottom of my list of “things to care about” when working on JSON-LD, and anyone that tells you otherwise is wrong. 😛

TL;DR: The desire for better Web APIs is what motivated the creation of JSON-LD, not the Semantic Web. If you want to make the Semantic Web a reality, stop making the case for it and spend your time doing something more useful, like actually making machines smarter or helping people publish data in a way that’s useful to them.


If you don’t know what JSON-LD is and you want to find out why it is useful, check out this video on Linked Data and this one on an Introduction to JSON-LD. The rest of this post outlines the things that make JSON-LD different from the traditional Semantic Web / Linked Data stack of technologies and why we decided to design it the way that we did.

Decision 1: Decrypt the Cryptic

Many W3C specifications are so cryptic that they require the sacrifice of your sanity and a secret W3C decoder ring to read. I never understood why these documents were so difficult to read, and after years of study on the matter, I think I found the answer. It turns out that most specification editors are just crap at writing.

It’s not like many of the things that are in most W3C specifications are complicated, it’s just that the editor is bad at explaining them to non-implementers, which are most of the web developers that end up reading these specification documents. This approach is often defended by raising the point that readability of the specification by non-implementers is viewed as secondary to its technical accuracy for implementers. The audience is the implementer, and you are expected to cater to them. To counter that point, though, we all know that technical accuracy is a bad excuse for crap writing. You can write something that is easy to understand and technically accurate, it just takes more effort to do that. Knowing your audience helps.

We tried our best to eliminate complex techno-babble from the JSON-LD specification. I made it a point to not mention RDF at all in the JSON-LD 1.0 specification because you didn’t need to go off and read about it to understand what was going on in JSON-LD. There was tremendous push back on this point, which I’ll go into later, but the point is that we wanted to communicate at a more conversational level than typical Internet and Web specifications because being pedantic too early in the spec sets the wrong tone.

It didn’t always work, but it certainly did set the tone we wanted for the community, which was that this Linked Data stuff didn’t have to seem so damn complicated. The JSON-LD 1.0 specification starts out by primarily using examples to introduce key concepts. It starts at basics, assuming that the audience is a web developer with modest training, and builds its way up slowly into more advanced topics. The first 70% of the specification contains barely any normative/conformance language, but after reading it, you know what JSON-LD can do. You can look at the section on the JSON-LD Context to get an idea of what this looks like in practice.

This approach wasn’t a wild success. Reading sections of the specification that have undergone feedback from more nitpicky readers still make me cringe because ease of understanding has been sacrificed at the alter of pedantic technical accuracy. However, I don’t feel embarrassed to point web developers to a section of the specification when they ask for an introduction to a particular feature of JSON-LD. There are not many specifications where you can do that.

Decision 2: Radical Transparency

One of the things that has always bothered me about W3C Working Groups is that you have to either be an expert to participate, or you have to be a member of the W3C, which can cost a non-trivial amount of money. This results in your typical web developer being able to comment on a specification, but not really having the ability to influence a Working Group decision with a vote. It also hobbles the standards-making community because the barrier to entry is perceived as impossibly high. Don’t get me wrong, the W3C staff does as much as they can to drive inclusion and they do a damn good job at it, but that doesn’t stop some of their member companies from being total dicks behind closed door sessions.

The W3C is a consortium of mostly for-profit companies and they have things they care about like market share, quarterly profits, and drowning goats (kidding!)… except for, anyone can join as long as you pay the membership dues! My point is that because there is a lack of transparency at times, it makes even the best Working Group less responsive to the general public, and that harms the public good. These closed door rules are there so that large companies can say certain things without triggering a lawsuit, which is sometimes used for good but typically results in companies being jerks and nobody finding out about it.

So, in 2010 we kicked off the JSON-LD work by making it radically open and we fought for that openness every step of the way. Anyone can join the group, anyone can vote on decisions, anyone can join the teleconferences, there are no closed door sessions, and we record the audio of every meeting. We successfully kept the technical work on the specification this open from the beginning to the release of JSON-LD 1.0 web standard a week ago. People came and went from the group over the years, but anyone could participate at any level and that was probably the thing I’m most proud of regarding the process that was used to create JSON-LD. Had we not have been this open, Markus Lanthaler may have never gone from being a gifted student in Italy to editor of the JSON-LD API specification and now leader of the Hypermedia Driven Web APIs community. We also may never have had the community backing to do some of the things we did in JSON-LD, like kicking RDF in the nuts.

Decision 3: Kick RDF in the Nuts

RDF is a shitty data model. It doesn’t have native support for lists. LISTS for fuck’s sake! The key data structure that’s used by almost every programmer on this planet and RDF starts out by giving developers a big fat middle finger in that area. Blank nodes are an abomination that we need, but they are applied inconsistently in the RDF data model (you can use them in some places, but not others). When we started with JSON-LD, RDF didn’t have native graph support either. For all the “RDF data model is elegant” arguments we’ve seen over the past decade, there are just as many reasons to kick it to the curb. This is exactly what we did when we created JSON-LD, and that really pissed off a number of people that had been working on RDF for over a decade.

I personally wanted JSON-LD to be compatible with RDF, but that’s about it. You could convert JSON-LD to and from RDF and get something useful, but JSON-LD had a more sane data model where lists were a first-class construct, you had generalized graphs, and you could use JSON-LD using a simple library and standard JSON tooling. To put that in perspective, to work with RDF you typically needed a quad store, a SPARQL engine, and some hefty libraries. Your standard web developer has no interest in that toolchain because it adds more complexity to the solution than is necessary.

So screw it, we thought, let’s create a graph data model that looks and feels like JSON, RDF and the Semantic Web be damned. That’s exactly what we did and it was working out pretty well until…

Decision 4: Work with the RDF Working Group. Whut?!

Around mid-2012, the JSON-LD stuff was going pretty well and the newly chartered RDF Working Group was going to start work on RDF 1.1. One of the work items was a serialization of RDF for JSON. The lead solutions for RDF in JSON were things like the aptly named RDF/JSON and JTriples, both of which would look incredibly foreign to web developers and continue the narrative that the Semantic Web community creates esoteric solutions to non-problems. The biggest problem being that many of the participants in the RDF Working Group at the time didn’t understand JSON.

The JSON-LD group decided to weigh in on the topic by pointing the RDF WG to JSON-LD as an example of what was needed to convince people that this whole Linked Data thing could be useful to web developers. I remember the discussions getting very heated over multiple months, and at times, thinking that the worst thing we could do to JSON-LD was to hand it over to the RDF Working Group for standardization.

It is at that point that David Wood, one of the chairs of the RDF Working Group, phoned me up to try and convince me that it would be a good idea to standardize the work through the RDF WG. I was very skeptical because there were people in the RDF Working Group who drove some of thinking that I had grown to see as toxic to the whole Linked Data / Semantic Web movement. I trusted Dave Wood, though. I had never seen him get religiously zealous about RDF like some of the others in the group and he seemed to be convinced that we could get JSON-LD through without ruining it. To Dave’s credit, he was mostly right. 🙂

Decision 5: Hate the Semantic Web

It’s not that the RDF Working Group was populated by people that are incompetent, or that I didn’t personally like. I’ve worked with many of them for years, and most of them are very intelligent, capable, gifted people. The problem with getting a room full of smart people together is that the group’s world view gets skewed. There are many reasons that a working group filled with experts don’t consistently produce great results. For example, many of the participants can be humble about their knowledge so they tend to think that a good chunk of the people that will be using their technology will be just as enlightened. Bad feature ideas can be argued for months and rationalized because smart people, lacking any sort of compelling real world data, are great at debating and rationalizing bad decisions.

I don’t want people to get the impression that there was or is any sort of animosity in the Linked Data / Semantic Web community because, as far as I can tell, there isn’t. Everyone wants to see this stuff succeed and we all have our reasons and approaches.

That said, after 7+ years of being involved with Semantic Web / Linked Data, our company has never had a need for a quad store, RDF/XML, N3, NTriples, TURTLE, or SPARQL. When you chair standards groups that kick out “Semantic Web” standards, but even your company can’t stomach the technologies involved, something is wrong. That’s why my personal approach with JSON-LD just happened to be burning most of the Semantic Web technology stack (TURTLE/SPARQL/Quad Stores) to the ground and starting over. It’s not a strategy that works for everyone, but it’s the only one that worked for us, and the only way we could think of jarring the more traditional Semantic Web community out of its complacency.

I hate the narrative of the Semantic Web because the focus has been on the wrong set of things for a long time. That community, who I have been consciously distancing myself from for a few years now, is schizophrenic in its direction. Precious time is spent in groups discussing how we can query all this Big Data that is sure to be published via RDF instead of figuring out a way of making it easy to publish that data on the Web by leveraging common practices in use today. Too much time is spent assuming a future that’s not going to unfold in the way that we expect it to. That’s not to say that TURTLE, SPARQL, and Quad stores don’t have their place, but I always struggle to point to a typical startup that has decided to base their product line on that technology (versus ones that choose MongoDB and JSON on a regular basis).

I like JSON-LD because it’s based on technology that most web developers use today. It helps people solve interesting distributed problems without buying into any grand vision. It helps you get to the “adjacent possible” instead of having to wait for a mirage to solidify.

Decision 6: Believe in Consensus

All this said, you can’t hope to achieve anything by standing on idealism alone and I do admit that some of what I say above is idealistic. At some point you have to deal with reality, and that reality is that there are just as many things that the RDF and Semantic Web initiative got right as it got wrong. The RDF data model is shitty, but because of the gauntlet thrown down by JSON-LD and a number of like-minded proponents in the RDF Working Group, the RDF Data Model was extended in a way that made it compatible with JSON-LD. As a result, the gap between the RDF model and the JSON-LD model narrowed to the point that it became acceptable to more-or-less base JSON-LD off of the RDF model. It took months to do the alignment, but it was consensus at its best. Nobody was happy with the result, but we could all live with it.

To this day I assert that we could rip the data model section out of the JSON-LD specification and it wouldn’t really affect the people using JSON-LD in any significant way. That’s consensus for you. The section is in there because other people wanted it in there and because the people that didn’t want it in there could very well have turned out to be wrong. That’s really the beauty of the W3C and IETF process. It allows people that have seemingly opposite world views to create specifications that are capable of supporting both world views in awkward but acceptable ways.

JSON-LD is a product of consensus. Nobody agrees on everything in there, but it all sticks together pretty well. There being a consensus on consensus is what makes the W3C, IETF, and thus the Web and the Internet work. Through all of the fits and starts, permathreads, pedantry, idealism, and deadlock, the way it brings people together to build this thing we call the Web is beautiful thing.


I’d like to thank the W3C staff that were involved in getting JSON-LD to offical Web standard status (and the staff, in general, for being really fantastic people). Specifically, Ivan Herman for simultaneously pointing out all of the problems that lay in the road ahead while also providing ways to deal with each one as we came upon them. Sandro Hawke for pushing back against JSON-LD, but always offering suggestions about how we could move forward. I actually think he may have ended up liking JSON-LD in the end :). Doug Schepers and Ian Jacobs for fighting for W3C Community Groups, without which JSON-LD would not have been able to plead the case for Web developers. The systems team and publishing team who are unknown to most of you, but work tirelessly to ensure that everything continues to operate, be published, and improve at W3C.

From the RDF Working group, the chairs (David Wood and Guus Schreiber), for giving JSON-LD a chance and ensuring that it got a fair shake. Richard Cyganiak for pushing us to get rid of microsyntaxes and working with us to try and align JSON-LD with RDF. Kingsley Idehen for being the first external implementer of JSON-LD after we had just finished scribbling the first design down on paper and tirelessly dogfooding what he preaches. Nobody does it better. The rest of the RDF Working Group members without which JSON-LD would have escaped unscathed from your influence, making my life a hell of a lot easier, but leaving JSON-LD and the people that use it in a worse situation had you not been involved.

The Origins of JSON-LD

Full Disclosure: I am one of the primary creators of JSON-LD, lead editor on the JSON-LD 1.0 specification, and chair of the JSON-LD Community Group. These are my personal opinions and not the opinions of the W3C, JSON-LD Community Group, or my company.

JSON-LD became an official Web Standard last week. This is after exactly 100 teleconferences typically lasting an hour and a half, fully transparent with text minutes and recorded audio for every call. There were 218+ issues addressed, 2,071+ source code commits, and 3,102+ emails that went through the JSON-LD Community Group. The journey was a fairly smooth one with only a few jarring bumps along the road. The specification is already deployed in production by companies like Google, the BBC,, Yandex, Yahoo!, and Microsoft. There is a quickly growing list of other companies that are incorporating JSON-LD, but that’s the future. This blog post is more about the past, namely where did JSON-LD come from? Who created it and why?

I love origin stories. When I was in my teens and early twenties, the only origin stories I liked to read about were of the comic and anime variety. Spiderman, great origin story. Superman, less so, but entertaining. Nausicaä, brilliant. Major Motoko Kusanagi, nuanced. Spawn, dark. Those connections with characters fade over time as you understand that this world has more interesting ones. Interesting because they touch the lives of billions of people, and since I’m a technologist, some of my favorite origin stories today consist of finding out the personal stories behind how a particular technology came to be. The Web has a particularly riveting origin story. These stories are hard to find because they’re rarely written about, so this is my attempt at documenting how JSON-LD came to be and the handful of people that got it to where it is today.

The Origins of JSON-LD

When you’re asked to draft the press pieces on the launch of new world standards, you have two lists of people in your head. The first is the “all inclusive list”, which is every person that uttered so much as a word that resulted in a change to the specification. That list is typically very long, so you end up saying something like “We’d like to thank all of the people that provided input to the JSON-LD specification, including the JSON-LD Community, RDF Working Group, and individuals who took the time to send in comments and improve the specification.” With that statement, you are sincere and cover all of your bases, but feel like you’re doing an injustice to the people without which the work would never have survived.

The all inclusive list is very important, they helped refine the technology to the point that everyone could achieve consensus on it being something that is world class. However, 90% of the back breaking work to get the specification to the point that everyone else could comment on it is typically undertaken by a 4-5 people. It’s a thankless and largely unpaid job, and this is how the Web is built. It’s those people that I’d like to thank while exploring the origins of JSON-LD.


JSON-LD started around late 2008 as the work on RDFa 1.0 was wrapping up. We were under pressure from Microformats and Microdata, which we were also heavily involved in, to come up with a good way of programming against RDFa data. At around the same time, my company was struggling with the representation of data for the Web Payments work. We had already made the switch to JSON a few years previous and were storing that data in MySQL, mostly because MongoDB didn’t exist yet. We were having a hard time translating the RDFa we were ingesting (products for sale, pricing information, etc.) into something that worked well in JSON. At around the same time, Mark Birbeck, one of the creators of RDFa, and I were thinking about making something RDFa-like for JSON. Mark had proposed a syntax for something called RDFj, which I thought had legs, but Mark didn’t necessarily have the time to pursue.

The Hard Grind

After exchanging a few emails with Mark about the topic over the course of 2009, and letting the idea stew for a while, I wrote up a quick proposal for a specification and passed it by Dave Longley, Digital Bazaar’s CTO. We kicked the idea around a bit more and in May of 2010, published the first working draft of JSON-LD. While Mark was instrumental in injecting the first set of basis ideas into JSON-LD, Dave Longley would become the most important key technical mind behind how to make JSON-LD work for web programmers.

At that time, JSON-LD had a pretty big problem. You can represent data in JSON-LD in a myriad of different ways, making it hard to tell if two JSON-LD documents are the same or not. This was an important problem to Digital Bazaar because we were trying to figure out how to create product listings, digital receipts, and contracts using JSON-LD. We had to be able to tell if two product listings were the same, and we had to figure out a way to serialize the data so that products and their associated prices could be listed on the Web in a decentralized way. This meant digital signatures, and you have to be able to create a canonical/normalized form for your data if you want to be able to digitally sign it.

Dave Longley invented the JSON-LD API, JSON-LD Framing, and JSON-LD Graph Normalization to tackle these canonicalization/normalization issues and did the first four implementations of the specification in C++, JavaScript, PHP, and Python. The JSON-LD Graph Normalization problem itself took roughly 3 months of concentrated 70+ hour work weeks and dozens of iterations by Dave Longley to produce an algorithm that would work. To this day, I remain convinced that there are only a handful of people on this planet with a mind that is capable of solving those problems. He was the first and only one that cracked those problems. It requires a sort of raw intelligence, persistence, and ability to constantly re-evaluate the problem solving approach you’re undertaking in a way that is exceedingly rare.

Dave and I continued to refine JSON-LD, with him working on the API and me working on the syntax for the better part of 2010 and early 2011. When MongoDB started really taking off in 2010, the final piece just clicked into place. We had the makings of a Linked Data technology stack that would work for web developers.

Toward Stability

Around April 2011, we launched the JSON-LD Community Group and started our public push to try and put the specification on a standards track at the World Wide Web Consortium (W3C). It is at this point that Gregg Kellogg joined us to help refine the rough edges of the specification and provide his input. For those of you that don’t know Gregg, I know of no other person that has done complete implementations of the entire stack of Semantic Web technologies. He has Ruby implementations of quad stores, TURTLE, N3, NQuads, SPARQL engines, RDFa, JSON-LD, etc. If it’s associated with the Semantic Web in any way, he’s probably implemented it. His depth of knowledge of RDF-based technologies is unmatched and he focused that knowledge on JSON-LD to help us hone it to what it is today. Gregg helped us with key concepts, specification editing, implementations, tests, and a variety of input that left its mark on JSON-LD.

Markus Lanthaler also joined us around the same time (2011) that Gregg did. The story of how Markus got involved with the work is probably my favorite way of explaining how the standards process should work. Markus started giving us input while a masters student at Technische Universität Graz. He didn’t have a background in standards, he didn’t know anything about the W3C process or specification editing, he was as green as one can be with respect to standards creation. We all start where he did, but I don’t know of many people that became as influential as quickly as Markus did.

Markus started by commenting on the specification on the mailing list, then quickly started joining calls. He’d raise issues and track them, he started on his PHP implementation, then started making minor edits to the specifications, then major edits until earning our trust to become lead specification editor for the JSON-LD API specification and one of the editors for the JSON-LD Syntax specification. There was no deliberate process we used to make him lead editor, it just sort of happened based on all the hard work he was putting in, which is the way it should be. He went through a growth curve that normally takes most people 5 years in about a year and a half, and it happened exactly how it should happen in a meritocracy. He earned it and impressed us all in the process.

The Final Stretch

Of special mention as well is Niklas Lindström, who joined us starting in 2012 on almost every JSON-LD teleconference and provided key input to the specifications. Aside from being incredibly smart and talented, Niklas is particularly gifted in his ability to find a balanced technical solution that moved the group forward when we found ourselves deadlocked on a particular decision. Paul Kuykendall joined us toward the very end of the JSON-LD work in early 2013 and provided fresh eyes on what we were working on. Aside from being very level-headed, Paul helped us understand what was important to web developers and what wasn’t toward the end of the process. It’s hard to find perspective as work wraps up on a standard, and luckily Paul joined us at exactly the right moment to provide that insight.

There were literally hundreds of people that provided input on the specification throughout the years, and I’m very appreciative of that input. However, without this core of 4-6 people, JSON-LD would have never had a chance. I will never be able to find the words to express how deeply appreciative I am to Dave, Markus, Gregg, Niklas and Paul, who did the work on a primarily volunteer basis. At this moment in time, the Web is at the core of the way human kind communicates and the most ardent protectors of this public good create standards to ensure that the Web continues to serve all of us. It boils my blood to then know that they will go largely unrewarded by society for creating something that will benefit hundreds of millions of people, but that’s another post for another time.

The next post in this series tells the story of how JSON-LD was nearly eliminated on several occasions by its critics and proponents while on its journey toward a web standard.

Web Payments and the World Banking Conference

The standardization group for all of the banks in the world (SWIFT) was kind enough to invite me to speak at the world’s premier banking conference about the Web Payments work at the W3C. The conference, called SIBOS, happened last week and brings together 7,000+ people from banks and financial institutions around the world. The event was being held in Dubai this year. They wanted me to present on the new Web Payments work being done at the World Wide Web Consortium (W3C) including the work we’re doing with PaySwarm, Mozilla, the Bitcoin community, and Ripple Labs.

If you’ve never been to Dubai, I highly recommend visiting. It is a city of extremes. It contains the highest density of stunningly award-winning sky scrapers while the largest expanses of desert loom just outside of the city. Man-made islands dot the coastline, willed into shapes like that of a multi-mile wide palm tree or massive lumps of stone, sand, steel and glass resembling all of the countries of the world. I saw the largest in-mall aquarium in the world and ice skated in 105 degree weather. Poverty lines the outskirts of Dubai while ATMs that vend gold can be found throughout the city. Lamborghinis, Ferraris, Maybachs, and Porsches roared down the densely packed highways while plants struggled to survive in the oppressive heat and humidity.

The extravagances nestle closely to the other extremes of Dubai: a history of indentured servitude, women’s rights issues, zero-tolerance drug possession laws, and political self-censorship of the media. In a way, it was the perfect location for the worlds premier banking conference. The capital it took to achieve everything that Dubai had to offer flowed through the banks represented at the conference at some point in time.

The Structure of the Conference

The conference was broken into two distinct areas. The more traditional banking side was on the conference floor and resembled what you’d expect of a well-established trade show. It was large, roughly the size of four football fields. Innotribe, the less-traditional and much hipper innovation track, was outside of the conference hall and focused on cutting edge thinking, design, new technologies. The banks are late to the technology game, but that’s to be expected in any industry that has a history that can be measured in centuries. Innotribe is trying to fix the problem of innovation in banking.


One of the most surprising things that I learned during the conference was the different classes of customers a bank has and which class of customers are most profitable to the banks. Many people are under the false impression that the most valuable customer a bank can have is the one that walks into one of their branches and opens an account. In general, the technology industry tends to value the individual customer as the primary motivator for everything that it does. This impression, with respect to the banking industry, was shattered when I heard the head of an international bank utter the following with respect to banking branches: “80% of our customers add nothing but sand to our bottom line.” The banker was alluding to the perception that the most significant thing that customers bring into the banking branch is the sand on the bottom of their shoes. The implication is that most customers are not very profitable to banks and are thus not a top priority. This summarizes the general tone of the conference with respect to customers when it came to the managers of these financial institutions.

Fundamentally, a bank’s motives are not aligned with most of their customer’s needs because that’s not where they make the majority of their money. Most of a bank’s revenue comes from activities like short-term lending, utilizing leverage against deposits, float-based leveraging, high-frequency trading, derivatives trading, and other financial exercises that are far removed with what most people in the world think of when they think of the type of activities one does at a bank.

For example, it has been possible to do realtime payments over the current banking network for a while now. The standards and technology exists to do so within the vast majority of the bank systems in use today. In fact, enabling this has been put to a vote for the last five years in a row. Every time it has been up for a vote, the banks have voted against it. The banks make money on the day-to-day float against the transfers, so the longer it takes to complete a transfer, the more money the banks make.

I did hear a number of bankers state publicly that they cared about the customer experience and wanted to improve upon it. However, those statements rang pretty hollow when it came to the product focus on the show floor, which revolved around B2B software, high-frequency trading protocols, high net-value transactions, etc. There were a few customer-focused companies, but they were dwarfed by the size of the major banks and financial institutions in attendance at the conference.

The Standards Team

I was invited to the conference by two business units within SWIFT. The first was the innovation group inside of SWIFT, called Innotribe. The second was the core standards group at SWIFT. There are over 6,900 banks that participate in the SWIFT network. Their standards team is very big, many times larger than the W3C, and extremely well funded. The primary job of the standards team at SWIFT is to create standards that help their member companies exchange financial information with the minimum amount of friction. Their flagship product is a standard called ISO 20022, which is a 3,463 page document that outlines every sort of financial message that the SWIFT network supports today.

The SWIFT standards team are a very helpful group of people that are trying their hardest to pull their membership into the future. They fundamentally understand the work that we’re doing in the Web Payments group and are interested in participating more deeply. They know that technology is going to eventually disrupt their membership and they want to make sure that there is a transition path for their membership, even if their membership would like to view these new technologies, like Bitcoin, PaySwarm, and Ripple as interesting corner cases.

In general, the banks don’t view technical excellence as a fundamental part of their success. Most view personal relationships as the fundamental thing that keeps their industry ticking. Most bankers come from an accounting background of some kind and don’t think of technology as something that can replace the sort of work that they do. This means that standards and new technologies almost always take a back seat to other more profitable endeavors such as implementing proprietary high frequency trading and derivatives trading platforms (as opposed to customer-facing systems like PaySwarm).

SWIFT’s primary customers are the banks, not the bank’s customers. Compare this with the primary customer of most Web-based organizations and the W3C, which is the individual. Since SWIFT is primarily designed to serve the banks, and banks make most of their money doing things like derivatives and high-frequency trading, there really is no champion for the customer in the banking organizations. This is why using your bank is a fairly awful experience. Speaking from a purely capitalistic standpoint, individuals that have less than a million dollars in deposits are not a priority.

Hobbled by Complexity

I met with over 30 large banks while I was at SIBOS and had a number of low-level discussions with their technology teams. The banking industry seems to be crippled by the complexity of their current systems. Minor upgrades cost millions of dollars due to the requirement to keep backwards compatibility. For example, at one point during the conference, it was explained that there was a proposal to make the last digit in an IBAN number a particular value if the organization was not a bank. The amount of push-back on the proposal was so great that it was never implemented since it would cost thousands of banks several million dollars each to implement the feature. Many of the banks are still running systems as part of their core infrastructure that were created in the 1980s, written in COBOL or Fortran, and well past their initial intended lifecycles.

A bank’s legacy systems mean that they have a very hard time innovating on top of their current architecture, and it could be that launching a parallel financial systems architecture would be preferable to broadly interfacing with the banking systems in use today. Startups launching new core financial services are at a great advantage as long as they limit the number of places that they interface with these old technology infrastructures.

Commitment to Research and Development

The technology utilized in the banking industry is, from a technology industry point of view, archaic. For example, many of the high-frequency trading messages are short ASCII text strings that look like this:


Imagine anything like that being accepted as a core part of the Web. Messages are kept to very short sequences because they must be processed in less than 5 microseconds. There is no standard binary protocol, even for high-frequency trading. Many of the systems that are core to a bank’s infrastructure pre-date the Web, sometimes by more than a decade or two. At most major banking institutions, there is very little R&D investment into new models of value transfer like PaySwarm, Bitcoin, or Ripple. In a room of roughly 100 bank technology executives, when asked how many of them had an R&D or innovation team, only around 5% of the people in the room raised their hands.

Compare this with the technology industry, which devotes a significant portion of their revenue to R&D activities and tries to continually disrupt their industry through the creation of new technologies.

No Shared Infrastructure

The technology utilized in the banking industry is typically created and managed in-house. It is also highly fractured; the banks share the messaging data model, but that’s about it. The SWIFT data model is implemented over and over again by thousands of banks. There is no set of popular open source software that one can use to do banking, which means that almost every major bank writes their own software. There is a high degree of waste when it comes to technology re-use in the banking industry.

Compare this with how much of the technology industry shares in the development of core infrastructure like operating systems, Web servers, browsers, and open source software libraries. This sort of shared development model does not exist in the banking world and the negative effects of this lack of shared architecture are evident in almost every area of technology associated with the banking world.

Fear of Technology Companies

The banks are terrified of the thought of Google, Apple, or Amazon getting into the banking business. These technology companies have hundreds of millions of customers, deep brand trust, and have shown that they can build systems to handle complexity with relative ease. At one point it was said that if Apple, Google, or Amazon wanted to buy Visa, they could. Then in one fell swoop, one of these technology companies could remove one of the largest networks that banks rely on to move money in the retail space.

While all of the banks seemed to be terrified of being disrupted, there seemed to be very little interest in doing any sort of drastic change to their infrastructure. In many cases, the banks are just not equipped to deal with the Web. They tend to want to build everything internally and rarely acquire technology companies to improve their technology departments.

There was also a relative lack of executives at banks that I spoke with that were able to carry on a fairly high-level conversation about things like Web technology. It demonstrated that it is going to still be some time until the financial industry can understand the sort of disruption that things like PaySwarm, Bitcoin, and Ripple could trigger. Many know that there are going to be a large chunk of jobs that are going to be going away, but those same individuals do not have the skill set to react to the change, or are too busy with paying customers to focus on the coming disruption.

A Passing Interest in Disruptive Technologies

There was a tremendous amount of interest in Bitcoin, PaySwarm, Ripple and how it could disrupt banking. However, much like the music industry, all but a few of the banks seemed to want to learn how they could adopt or use the technology. Many of the conversations ended with a general malaise related to technological disruption with no real motivation to dig deeper lest they find something truly frightening. Most executives would express how nervous they were about competition from technology companies, but were not willing to make any deep technological changes that would undermine their current revenue streams. There were parallels between many bank executives I spoke with, the innovators dilemma, and how many of the music industry executives I had been involved with in the early 2000s reacted to the rise of Napster, peer-to-peer file trading networks, and digital music.

Many higher-level executives were dismissive about the sorts of lasting changes Web technologies could have on their core business, often to the point of being condescending when they spoke about technologies like Bitcoin, PaySwarm, and Ripple. Most arguments boiled down to the customer needing to trust some financial institution to carry out the transaction, demonstrating that they did not fundamentally understand the direction that technologies like Bitcoin and Ripple are headed.

Lessons Learned

We were able to get the message out about the sort of work that we’re doing at W3C when it comes to Web Payments and it was well received. I have already been asked to present at next year’s conference. There is a tremendous opportunity here for the technology sector to either help the banks move into the future, or to disrupt many of the services that have been seen as belonging to the more traditional financial institutions. There is also a big opportunity for the banks to seize the work that is being done in Web Payments, Bitcoin, and Ripple, and apply it to a number of the problems that they have today.

The trip was a big success in that the Web Payments group now has very deep ties into SWIFT, major banks, and other financial institutions. Many of the institutions expressed a strong desire to collaborate with them on future Web Payments work. The financial institutions we spoke with thought that many of these technologies were 10 years away from affecting them, so there was no real sense of urgency to integrate the technology. I’d put the timeline closer to 3-4 years than 10 years. That said, there was general agreement that these technologies mattered. The lines of communication are now more open than they used to be between the traditional financial industry and the Web Payments group at W3C. That’s a big step in the right direction.

Interested in becoming a part of the Web Payments work, or just peeking in from time to time? It’s open to the public. Join here.

The Downward Spiral of Microdata

Full disclosure: I’m the chair of the RDFa Working Group and have been heavily involved during the RDFa and Microdata standardization initiatives. I am biased, but also understand all of the nuanced decisions that were made during the creation of both specifications.

Support for the Microdata API has just been removed from Webkit (Apple Safari). Support for the Microdata API was also removed from Blink (Google Chrome) a few months ago. This means that Apple Safari and Google Chrome will no longer support the Microdata API. Removal of the feature from a browser also shows us a likely future for Microdata, which is less and less support.

In addition, this discussion on the Blink developer list demonstrates that there isn’t anyone to pick up the work of maintaining the Microdata implementation. Microdata has also been ripped out of the main HTML5 specification at the W3C, with the caveat that the Microdata specification will only continue “if editorial resources can be found”. Translation: if an editor doesn’t step up to edit the Microdata specification, Microdata is dead at W3C. It just takes someone to raise their hand to volunteer, so why is it that out of a group of hundreds of people, no one has stepped up to maintain, create a test suite for, and push the Microdata specification forward?

A number of observers have been surprised by these events, but for those that have been involved in the month-to-month conversation around Microdata, it makes complete sense. Microdata doesn’t have an active community supporting it. It never really did. For a Web specification to be successful, it needs an active community around it that is willing to do the hard work of building and maintaining the technology. RDFa has that in spades, Microdata does not.

Microdata was, primarily, a shot across the bow at RDFa. The warning worked because the RDFa community reacted by creating RDFa Lite, which matches Microdata feature-for-feature, while also supporting things that Microdata is incapable of doing. The existence of RDFa Lite left the HTML Working Group in an awkward position. Publishing two specifications that did the exact same thing in almost the exact same way is a position that no standards organization wants to be in. At that point, it became a race to see which community could create the developer tools and support web developers that were marking up pages.

Microdata, to this day, still doesn’t have a specification editor, an active community, a solid test suite, or any of the other things that are necessary to become a world class technology. To be clear, I’m not saying Microdata is dying (4 million out of 329 million domains use it), just that not having these basic things in place will be very problematic for the future of Microdata.

To put that in perspective, HTML5+RDFa 1.1 will become an official W3C Recommendation (world standard) next Thursday. There was overwhelming support from the W3C member companies to publish it as a world standard. There have been multiple specification editors for RDFa throughout the years, there are hundreds of active people in the community integrating RDFa into pages across the Web, there are 7 implementations of RDFa in a variety of programming languages, there is a mailing list, website and an IRC channel dedicated to answering questions for people learning RDFa, and there is a test suite with 800 tests covering RDFa in 6 markup languages (HTML4, HTML5, XML, SVG, XHTML1 and XHTML5). If you want to build a solution on a solid technology, with a solid community and solid implementations; RDFa is that solution.

JSON-LD is the Bee’s Knees

Full disclosure: I’m one of the primary authors and editors of the JSON-LD specification. I am also the chair of the group that created JSON-LD and have been an active participant in a number of Linked Data initiatives: RDFa (chair, author, editor), JSON-LD (chair, co-creator), Microdata (primary opponent), and Microformats (member, haudio and hvideo microformat editor). I’m biased, but also well informed.

JSON-LD has been getting a great deal of good press lately. It was adopted by Google, Yahoo, Yandex, and Microsoft for use in The PaySwarm universal payment protocol is based on it. It was also integrated with Google’s Gmail service and the open social networking folks have also started integrating it into the Activity Streams 2.0 work.

That all of these positive adoption stories exist was precisely the reason why Shane Becker’s post on why JSON-LD is an Unneeded Spec was so surprising. If you haven’t read it yet, you may want to as the rest of this post will dissect the arguments he makes in his post (it’s a pretty quick 5 minute read). The post is a broad brush opinion piece based on a number of factual errors and misinformed opinion. I’d like to clear up these errors in this blog post and underscore some of the reasons JSON-LD exists and how it has been developed.

A theatrical interpretation of the “JSON-LD is Unneeded” blog post

Shane starts with this claim:

Today I learned about a proposed spec called JSON-LD. The “LD” is for linked data (Linked Data™ in the Uppercase “S” Semantic Web sense).

When I started writing the original JSON-LD specification, one of the goals was to try and merge lessons learned in the Microformats community with lessons learned during the development of RDFa and Microdata. This meant figuring out a way to marry the lowercase semantic web with the uppercase Semantic Web in a way that was friendly to developers. For developers that didn’t care about the uppercase Semantic Web, JSON-LD would still provide a very useful data structure to program against. In fact, Microformats, which are the poster-child for the lowercase semantic web, were supported by JSON-LD from day one.

Shane’s article is misinformed with respect to the assertion that JSON-LD is solely for the uppercase Semantic Web. JSON-LD is mostly for the lowercase semantic web, the one that developers can use to make their applications exchange and merge data with other applications more easily. JSON-LD is also for the uppercase Semantic Web, the one that researchers and large enterprises are using to build systems like IBM’s Watson supercomputer, search crawlers, Gmail, and open social networking systems.

Linked data. Web sites. Standards. Machine readable.
Cool. All of those sound good to me. But they all sound familiar, like we’ve already done this before. In fact, we have.

We haven’t done something like JSON-LD before. I wish we had because we wouldn’t have had to spend all that time doing research and development to create the technology. When writing about technology, it is important to understand the basics of a technology stack before claiming that we’ve “done this before”. An astute reader will notice that at no point in Shane’s article is any text from the JSON-LD specification quoted, just the very basic introductory material on the landing page of the website. More on this below.

Linked data
That’s just the web, right? I mean, we’ve had the <a href> tag since literally the beginning of HTML / The Web. It’s for linking documents. Documents are a representation of data.

Speaking as someone that has been very involved in the Microformats and RDFa communities, yes, it’s true that the document-based Web can be used to publish Linked Data. The problem is that standard way of expressing a link to another piece of data that can be followed did not carry over to the data-based Web. That is, most JSON-based APIs don’t have a standard way of encoding a hyperlink.

The other implied assertion with the statement above is that the document-based Web is all we need. If this were true, sending HTML documents to Web applications would be all we needed. Web developers know that this isn’t the case today for a number of obvious reasons. We send JSON data back and forth on the Web when we need to program against things like Facebook, Google, or Twitter’s services. JSON is a very useful data format for machine-to-machine data exchange. The problem is that JSON data has no standard way of doing a variety of things we do on the document-based Web, like expressing links, expressing the types of data (like times and dates), and a variety of other very useful features for the data-based Web. This is one of the problems that JSON-LD addresses.

Web sites
If it’s not wrapped in HTML and viewable in a browser it, is it really a website? JSON isn’t very useful in the browser by itself. It’s not style-able. It’s not very human-readable. And worst of all, it’s not clickable.

Websites are composed of many parts. It’s a weak argument to say that if a site is mainly composed of data that isn’t in HTML, and isn’t viewable in a browser, that it’s not a real website. The vast majority of websites like Twitter and Facebook are composed of data and API calls with a relatively thin varnish of HTML on top. JSON is the primary way that applications interact with these and other data-driven websites. It’s almost guaranteed these days that any company that has a popular API uses JSON in their Web service protocol.

Shane’s argument here is pretty confused. It assumes that the primary use of JSON-LD is to express data in an HTML page. Sure, JSON-LD can do that, but focusing on that brush stroke is missing the big picture. The big picture is that JSON-LD allows applications that use it to share data and interoperate in a way that is not possible with regular JSON, and it’s especially useful when used in conjunction with a Web service or a document-based database like MongoDB or CouchDB.

Standards based
To their credit, JSON-LD did license their website content Creative Commons CC0 Public Domain. But, the spec itself isn’t. It’s using (what seems to be) a W3C boilerplate copyright / license. Copyright © 2010-2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.

Nope. The JSON-LD specification has been released under a Creative Commons Attribution 3.0 license multiple times in the past, and it will be released under a Creative Commons license again, most probably CC0. The JSON-LD specification was developed in a W3C Community Group using a Creative Commons license and then released to be published as a Web standard via W3C using their W3C Community Final Specification Agreement (FSA), which allows the community to fork the specification at any point in time and publish it under a different license.

When you publish a document through the W3C, they have their own copyright, license, and patent policy associated with the document being published. There is a legal process in place at W3C that asserts that companies can implement W3C published standards in a patent and royalty-free way. You don’t get that with CC0, in fact, you don’t get any such vetting of the technology or any level of patent and royalty protection.

What we have with JSON-LD is better than what is proposed in Shane’s blog post. You get all of the benefits of having W3C member companies vet the technology for technical and patent issues while also being able to fork the specification at any point in the future and publish it under a license of your choosing as long as you state where the spec came from.

Machine readable
Ah… “machine readable”. Every couple of years the current trend of what machine readable data should look like changes (XML/JSON, RSS/Atom, xml-rpc/SOAP, rest/WS-*). Every time, there are the same promises. This will solve our problems. It won’t change. It’ll be supported forever. Interoperability. And every time, they break their promises. Today’s empires, tomorrow’s ashes.

At no point has any core designer of JSON-LD claimed 1) that JSON-LD will “solve our problems” (or even your particular problem), 2) that it won’t change, and 3) that it will be supported forever. These are straw-man arguments. The current consensus of the group is that JSON-LD is best suited to a particular class of problems and that some developers will have no need for it. JSON-LD is guaranteed to change in the future to keep pace with what we learn in the field, and we will strive for backward compatibility for features that are widely used. Without modification, standardized technologies have a shelf life of around 10 years, 20-30 if they’re great. The designers of JSON-LD understand that, like the Web, JSON-LD is just another grand experiment. If it’s useful, it’ll stick around for a while, if it isn’t, it’ll fade into history. I know of no great software developer or systems designer that has ever made these three claims and been serious about it.

We do think that JSON-LD will help Web applications interoperate better than they do with plain ‘ol JSON. For an explanation of how, there is a nice video introducing JSON-LD.

With respect to the “Today’s empires, tomorrow’s ashes” cynicism, we’ve already seen a preview of the sort of advances that Web-based machine-readable data can unleash. Google, Yahoo!, Microsoft, Yandex, and Facebook all use a variety of machine-readable data technologies that have only recently been standardized. These technologies allow for faster, more accurate, and richer search results. They are also the driving technology for software systems like Watson. These systems exist because there are people plugging away at the hard problem of machine readable data in spite of cynicism directed at past failures. Those failures aren’t ashes, they’re the bedrock of tomorrow’s breakthroughs.

Instead of reinventing the everything (over and over again), let’s use what’s already there and what already works. In the case of linked data on the web, that’s html web pages with clickable links between them.

Microformats, Microdata, and RDFa do not work well for data-based Web services. Using Linked Data with data-based Web services is one of the primary reasons that JSON-LD was created.

For open standards, open license are a deal breaker. No license is more open than Creative Commons CC0 Public Domain + OWFa. (See also the Mozilla wiki about standards/license, for more.) There’s a growing list of standards that are already using CC0+OWFa.

I think there might be a typo here, but if not, I don’t understand why open licenses are a deal breaker for open standards. Especially things like the W3C FSA or the Creative Commons licenses we’ve published the JSON-LD spec under. Additionally, CC0 + OWFa might be neat. Shane’s article was the first time that I had heard of OWFa and I’d be a proponent for pushing it in the group if it granted more freedom to the people using and developing JSON-LD than the current set of agreements we have in place. After glossing over the legal text of the OWFa, I can’t see what CC0 + OWFa buys us over CC0 + W3C patent attribution. If someone would like to make these benefits clear, I could take a proposal to switch to CC0 + OWFa to the JSON-LD Community Group and see if there is interest in using that license in the future.

No process is more open than a publicly editable wiki.

A counter-point to publicly accessible forums

Publicly editable wikis are notorious for edit wars, they are not a panacea. Just because you have a wiki, does not mean you have an open community. For example, the Microformats community was notorious for having a different class of unelected admins that would meet in San Francisco and make decisions about the operation of the community. This seemingly innocuous practice would creep its way into the culture and technical discussion on a regular basis leading to community members being banned from time to time. Similarly, Wikipedia has had numerous issues with publicly editable wikis and the behavior of their admins.

Depending on how you define “open”, there are a number of processes that are far more open than a publicly editable wiki. For example, the JSON-LD specification development process is completely open to the public, based on meritocracy, and is consensus-driven. The mailing list is open. The bug tracker is open. We have weekly design teleconferences where all the audio is recorded and minuted. We have these teleconferences to this day and will continue to have them into the future because we make transparency a priority. JSON-LD, as far as I know, is the first such specification in the world developed where all the previously described operating guidelines are standard practice.

(Mailing lists are toxic.)

A community is as toxic as its organizational structure enables it to be. The JSON-LD community is based on meritocracy, consensus, and has operated in a very transparent manner since the beginning (open meetings, all calls are recorded and minuted, anyone can contribute to the spec, etc.). This has, unsurprisingly, resulted in a very pleasant and supportive community. That said, there is no perfect communication medium. They’re all lossy and they all have their benefits and drawbacks. Sometimes, when you combine multiple communication channels as a part of how your community operates, you get better outcomes.

Finally, for machine readable data, nothing has been more widely adopted by publishers and consumers than microformats. As of June 2012, microformats represents about 70% of all of the structured data on the web. And of that ~70%, the vast majority was h-card and xfn. (All RDFa is about 25% and microdata is a distant third.)

Microformats are good if all you need to do is publish your basic contact and social information on the Web. If you want to publish detailed product information, financial data, medical data, or address other more complex scenarios, Microformats won’t help you. There have been no new Microformats released in the last 5 years and the mailing list traffic has been almost non-existent for around 5 years. From what I can tell, most everyone has moved on to RDFa, Microdata, or JSON-LD.

There are a few that are working on Microformats 2, but I haven’t seen anything that it provides that is not already provided by existing solutions that also have the added benefit of being W3C standards or backed by major companies like Google, Facebook, Yahoo!, Microsoft, and Yandex.

Maybe it’s because of the ease of publishing microformats. Maybe it’s the open process for developing the standards. Maybe it’s because microformats don’t require any additions to HTML. (Both RDFa and microdata required the use of additional attributes or XML namespaces.) Whatever the reason, microformats has the most uptake. So, why do people keep trying to reinvent what microformats is already doing well?

People aren’t reinventing what Microformats are already doing well, they’re attempting to address problems that Microformats do not solve.

For example, one of the reasons that Google adopted JSON-LD is because markup was much easier in JSON-LD than it was in Microformats, as evidenced by the example below:

Back to JSON-LD. The “Simple Example” listed on the homepage is a person object representing John Lennon. His birthday and wife are also listed on the object.

          "@context": "",
          "@id": "",
          "name": "John Lennon",
          "born": "1940-10-09",
          "spouse": ""

I look at this and see what should have been HTML with microformats (h-card and xfn). This is actually a perfect use case for h-card and xfn: a person and their relationship to another person. Here’s how it could’ve been marked up instead.

        <div class="h-card">
          <a href="" class="u-url u-uid p-name">John Lennon</a>
          <time class="dt-bday" datetime="1940-10-09">October 9<sup>th</sup>, 1940</time>
          <a rel="spouse" href="">Cynthia Lennon</a>.

I’m willing to bet that most people familiar with JSON will find the JSON-LD markup far easier to understand and get right than the Microformats-based equivalent. In addition, sending the Microformats markup to a REST-based Web service would be very strange. Alternatively, sending the JSON-LD markup to a REST-based Web service would be far more natural for a modern day Web developer.

This HTML can be easily understood by machine parsers and humans parsers. Microformats 2 parsers already exists for: JavaScript (in the browser), Node.js, PHP and Ruby. HTML + microformats2 means that machines can read your linked data from your website and so can humans. It means that you don’t need an “API” that is something other than your website.

You have been able to do the same thing, and much more, using RDFa and Microdata for far longer (since 2006) than you have been able to do it in Microformats 2. Let’s be clear, there is no significant advantage to using Microformats 2 over RDFa or Microdata. In fact, there are a number of disadvantages for using Microformats 2 at this point, like little to no support from the search companies, very little software tooling, and an anemic community (of which I am a member) for starters. Additionally, HTML + Microformats 2 does not address the Web service API issue at all.

Please don’t waste time and energy reinventing all of the wheels. Instead, please use what already works and what works the webby way.

Do not miss the irony of this statement. RDFa has been doing what Microformats 2 does today since 2006, and it’s a Web standard. Even if you don’t like RDFa 1.0, RDFa 1.1, RDFa Lite 1.1, and Microdata all came before Microformats 2. To assert that wheels should not be reinvented and then claim that Microformats 2, which was created far after there were already a number of well-established solutions, is quite a strange position to take.


JSON-LD was created by people that have been directly involved in the Linked Data, lowercase semantic web, uppercase Semantic Web, Microformats, Microdata, and RDFa work. It has proven to be useful to them. There are a number of very large technology companies that have adopted JSON-LD, further underscoring its utility. Expect more big announcements in the next six months. The JSON-LD specifications have been developed in a radically open and transparent way, the document copyright and licensing provisions are equally open. I hope that this blog post has helped clarify most of the misinformed opinion in Shane Becker’s blog post.

Most importantly, cynicism will not solve the problems that we face on the Web today. Hard work will, and there are very few communities that I know of that work harder and more harmoniously than the excellent volunteers in the JSON-LD community.

If you would like to learn more about Linked Data, a good video introduction exists. If you want to learn more about JSON-LD, there is a good video introduction to that as well.

Linked Data Signatures vs. Javascript Object Signing and Encryption

The Web Payments Community Group at the World Wide Web Consortium (W3C) is currently performing a thorough analysis on the MozPay API. The first part of the analysis examined the contents of the payment messages . This is the second part of the analysis, which will focus on whether the use of the Javascript Object Signing and Encryption (JOSE) group’s solutions to achieve message security is adequate, or if the Web Payment group’s solutions should be used instead.

The Contenders

The IETF JOSE Working Group is actively standardizing the following specifications for the purposes of adding message security to JSON:

JSON Web Algorithms (JWA)
Details the cryptographic algorithms and identifiers that are meant to be used with the JSON Web Signature (JWS), JSON Web Encryption (JWE), JSON Web Token (JWT), and JSON Web Key (JWK) specifications. For example, when specifying an encryption algorithm, a JSON key/value pair that has alg as the key may have HS256 as the value, which means HMAC using the SHA-256 hash algorithm.
JSON Web Key (JWK)
Details a data structure that represents one or more cryptographic keys. If you need to express one of the many types of cryptographic key types in use today, this specification details how you do that in a standard way.
JSON Web Token (JWT)
Defines a way of representing claims such as “Bob was born on November 15th, 1984”. These claims are digitally signed and/or encrypted using either the JSON Web Signature (JWS) or JSON Web Encryption (JWE) specifications.
JSON Web Encryption (JWE)
Defines a way to express encrypted content using JSON-based data structures. Basically, if you want to encrypt JSON data so that only the intended receiver can read the data, this specification tells you how to do it in an interoperable way.
JSON Web Signature (JWS)
Defines a way to digitally sign JSON data structures. If your application needs to be able to verify the creator of a JSON data structure, you can use this specification to do so.

The W3C Web Payments group is actively standardizing a similar specification for the purpose of adding message security to JSON messages:

Linked Data Signatures (code named: HTTP Keys)
Describes a simple, decentralized security infrastructure for the Web based on JSON, Linked Data, and public key cryptography. This system enables Web applications to establish identities for agents on the Web, associate security credentials with those identities, and then use those security credentials to send and receive messages that are both encrypted and verifiable via digital signatures.

Both groups are relying on technology that has existed and been used for over a decade to achieve secure communications on the Internet (symmetric and asymmetric cryptography, public key infrastructure, X509 certificates, etc.). The key differences between the two have to do more with flexibility, implementation complexity, and how the data is published on the Web and used between systems.

Basic Differences

In general, the JOSE group is attempting to create a flexible/generalized way of expressing cryptography parameters in JSON. They are then using that information and encrypting or signing specific data (called claims in the specifications).

The Web Payments group’s specification achieves the same thing, but while not trying to be as generalized as the JOSE group. Flexibility and generalization tends to 1) make the ecosystem more complex than it needs to be for 95% of the use cases, 2) make implementations harder to security audit, and 3) make it more difficult to achieve interoperability between all implementations. The Linked Data Signatures specification attempts to outline a single best practice that will work for 95% of the applications out there. The 5% of Web applications that need to do more than the Linked Data Signatures spec can use the JOSE specifications. The Linked Data Signatures specification is also more Web-y. The more Web-y nature of the spec gives us a number of benefits, such as giving us a Web-scale public key infrastructure as a pleasant side-effect, that we will get into below.

JSON-LD Advantages over JSON

Fundamentally, the Linked Data Signatures specification relies on the Web and Linked Data to remove some of the complexity that exists in the JOSE specs while also achieving greater flexibility from a data model perspective. Specifically, the Linked Data Signatures specification utilizes Linked Data via a new standards-track technology called JSON-LD to allow anyone to build on top of the core protocol in a decentralized way. JSON-LD data is fundamentally more Web-y than JSON data. Here are the benefits of using JSON-LD over regular JSON:

  • A universal identifier mechanism for JSON objects via the use of URLs.
  • A way to disambiguate JSON keys shared among different JSON documents by mapping them to URLs via a context.
  • A standard mechanism in which a value in a JSON object may refer to a JSON object on a different document or site on the Web.
  • A way to associate datatypes with values such as dates and times.
  • The ability to annotate strings with their language. For example, the word ‘chat’ means something different in English and French and it helps to know which language was used when expressing the text.
  • A facility to express one or more directed graphs, such as a social network, in a single document. Graphs are the native data structure of the Web.
  • A standard way to map external JSON application data to your application data domain.
  • A deterministic way to generate a hash on JSON data, which is helpful when attempting to figure out if two data sources are expressing the same information.
  • A standard way to digitally sign JSON data.
  • A deterministic way to merge JSON data from multiple data sources.

Plain old JSON, while incredibly useful, does not allow you to do the things mentioned above in a standard way. There is a valid argument that applications may not need this amount of flexibility, and for those applications, JSON-LD does not require any of the features above to be used and does not require the JSON data to be modified in any way. So people that want to remain in the plain ‘ol JSON bucket can do so without the need to jump into the JSON-LD bucket with both feet.

JSON Web Algorithms vs. Linked Data Signatures

The JSON Web Algorithms specification details the cryptographic algorithms and identifiers that are meant to be used with the JSON Web Signature (JWS), JSON Web Encryption (JWE), JSON Web Token (JWT), and JSON Web Key (JWK) specifications. For example, when specifying an encryption algorithm, a JSON key/value pair that has alg as the key may have HS256 as the value, which means HMAC using the SHA-256 hash algorithm. The specification is 70 pages long and is effectively just a collection of what values are allowed for each key used in JOSE-based JSON documents. The design approach taken for the JOSE specifications requires that such a document exists.

The Linked Data Signatures specification takes a different approach. Rather than declare all of the popular algorithms and cryptography schemes in use today, it defines just one digital signature scheme (RSA encryption with a SHA-256 hashing scheme), one encryption scheme (128-bit AES with cyclic block chaining), and one way of expressing keys (as PEM-formatted data). If placed into a single specification, like the JWA spec, it would be just a few pages long (really, just 1 page of actual content).

The most common argument against the Linked Data Signatures spec, with respect to the JWA specification, is that it lacks the same amount of cryptographic algorithm agility that the JWA specification provides. While this may seem like a valid argument on the surface, keep in mind that the core algorithms used by the Linked Data Signatures specification can be changed at any point to any other set of algorithms. So, the specification achieves algorithm agility while greatly reducing the need for a large 70-page specification detailing the allowable values for the various cryptographic algorithms. The other benefit is that since the cryptography parameters are outlined in a Linked Data vocabulary, instead of a process-heavy specification, that they can be added to at any point as long as there is community consensus. Note that while the vocabulary can be added to, thus providing algorithm agility if a particular cryptography scheme is weakened or broken, already defined cryptography schemes in the vocabulary must not be changed once the cryptography vocabulary terms become widely used to ensure that production deployments that use the older mechanism aren’t broken.

Providing just one way, the best practice at the time, to do digital signatures, encryption, and key publishing reduces implementation complexity. Reducing implementation complexity makes it easier to perform security audits on implementations. Reducing implementation complexity also helps ensure better interoperability and more software library implementations, as the barrier to creating a fully conforming implementation is greatly reduced.

The Web Payments group believes that new digital signature and encryption schemes will have to be updated every 5-7 years. It is better to delay the decision to switch to another primary algorithm as long as as possible (and as long as it is safe to do so). Delaying the cryptographic algorithm decision ensures that the group will be able to make a more educated decision than attempting to predict which cryptographic algorithms may be the successors to currently deployed algorithms.

Bottom line: The Linked Data Signatures specification utilizes a much simpler approach than the JWA specification while supporting the same level of algorithm agility.

JSON Web Key vs. Linked Data Signatures

The JSON Web Key (JWK) specification details a data structure that is capable of representing one or more cryptographic keys. If you need to express one of the many types of cryptographic key types in use today, JWK details how you do that in an standard way. A typical RSA public key looks like the following using the JWK specification:

  "keys": [{
    "n": "0vx7agoe ... DKgw",

A similar RSA public key looks like the following using the Linked Data Signatures specification:

  "@context": "",
  "@id": "",
  "@type": "Key",
  "owner": "",
  "publicKeyPem": "-----BEGIN PUBLIC KEY-----\nMIIBG0BA...OClDQAB\n-----END PUBLIC KEY-----\n"

There are a number of differences between the two key formats. Specifically:

  1. The JWK format expresses key information by specifying the key parameters directly. The Linked Data Signatures format places all of the key parameters into a PEM-encoded blob. This approach was taken because it is easier for developers to use the PEM data without introducing errors. Since most Web developers do not understand what variables like dq (the second factor Chinese Remainder Theorem exponent parameter) or d (the Elliptic Curve private key parameter) are, the likelihood of transporting and publishing that sort of data without error is lower than placing all parameters in an opaque blob of information that has a clear beginning and end (-----BEGIN PUBLIC KEY-----, and --- END PUBLIC KEY ---)
  2. In the general case, the Linked Data Signatures key format assigns URL identifiers to keys and publishes them on the Web as JSON-LD, and optionally as RDFa. This means that public key information is discoverable and human and machine-readable by default, which means that all of the key parameters can be read from the Web. The JWK mechanism does assign a key ID to keys, but does not require that they are published to the Web if they are to be used in message exchanges. The JWK specification could be extended to enable this, but by default, doesn’t provide this functionality.
  3. The Linked Data Signatures format is also capable of specifying an identity that owns the key, which allows a key to be tied to an identity and that identity to be used for thinks like Access Control to Web resources and REST APIs. The JWK format has no such mechanism outlined in the specification.

Bottom line: The Linked Data Signatures specification provides four major advantages over the JWK format: 1) the key information is expressed at a higher level, which makes it easier to work with for Web developers, 2) it allows key information to be discovered by deferencing the key ID, 3) the key information can be published (and extended) in a variety of Linked Data formats, and 4) it provides the ability to assign ownership information to keys.

JSON Web Tokens vs. Linked Data Signatures

The JSON Web Tokens (JWT) specification defines a way of representing claims such as “Bob was born on November 15th, 1984”. These claims are digitally signed and/or encrypted using either the JSON Web Signature (JWS) or JSON Web Encryption (JWE) specifications. Here is an example of a JWT document:

  "iss": "joe",
  "exp": 1300819380,
  "": true

JWT documents contain keys that are public, such as iss and exp above, and keys that are private (which could conflict with keys from the JWT specification). The data format is fairly free-form, meaning that any data can be placed inside a JWT Claims Set like the one above.

Since the Linked Data Signatures specification utilizes JSON-LD for its data expression mechanism, it takes a fundamentally different approach. There are no headers or claims sets in the Linked Data Signatures specification, just data. For example, the data below is effectively a JWT claims set expressed in JSON-LD:

  "@context": "",
  "@type": "Person",
  "name": "Manu Sporny",
  "gender": "male",
  "homepage": ""

Note that there are no keywords specific to the Linked Data Signatures specification, just keys that are mapped to URLs (to prevent collisions) and data. In JSON-LD, these keys and data are machine-interpretable in a standards-compliant manner (unlike JWT data), and can be merged with other data sources without the danger of data being overwritten or colliding with other application data.

Bottom line: The Linked Data Signatures specifications use of a native Linked Data format removes the requirement for a specification like JWT. As far as the Linked Data Signatures specification is concerned, there is just data, which you can then digitally sign and encrypt. This makes the data easier to work with for Web developers as they can continue to use their application data as-is instead of attempting to restructure it into a JWT.

JSON Web Encryption vs. Linked Data Signatures

The JSON Web Encryption (JWE) specification defines a way to express encrypted content using JSON-based data structures. Basically, if you want to encrypt JSON data so that only the intended receiver can read the data, this specification tells you how to do it in an interoperable way. A JWE-encrypted message looks like this:

  "protected": "eyJlbmMiOiJBMTI4Q0JDLUhTMjU2In0",
  "unprotected": {"jku": ""},
  "recipients": [{
    "header": {
      "encrypted_key": "UGhIOgu ... MR4gp_A"
  "iv": "AxY8DCtDaGlsbGljb3RoZQ",
  "ciphertext": "KDlTtXchhZTGufMYmOYGS4HffxPSUrfmqCHXaI9wOGY",
  "tag": "Mz-VPPyU4RlcuYv1IwIvzw"

To decrypt this information, an application would retrieve the private key associated with the recipients[0].header, and then decrypt the encrypted_key. Using the decrypted encrypted_key value, it would then use the iv to decrypt the protected header. Using the algorithm provided in the protected header, it would then use the decrypted encrypted_key, iv, the algorithm specified in the protected header, and the ciphertext to retrieve the original message as a result.

For comparison purposes, a Linked Data Signatures encrypted message looks like this:

  "@context": "",
  "@type": "EncryptedMessage2012",
  "data": "VTJGc2RH ... Fb009Cg==",
  "encryptionKey": "uATte ... HExjXQE=",
  "iv": "vcDU1eWTy8vVGhNOszREhSblFVqVnGpBUm0zMTRmcWtMrRX==",
  "publicKey": ""

To decrypt this information, an application would use the private key associated with the publicKey to decrypt the encryptionKey and iv. It would then use the decrypted encryptionKey and iv to decrypt the value in data, retrieving the original message as a result.

The Linked Data Signatures encryption protocol is simpler than the JWE protocol for three major reasons:

  1. The @type of the message, EncryptedMessage2012, encapsulates all of the cryptographic algorithm information in a machine-readable way (that can also be hard-coded in implementations). The JWE specification utilizes the protected field to express the same sort of information, which is allowed to get far more complicated than the Linked Data Signatures equivalent, leading to more complexity.
  2. Key information is expressed in one entry, the publicKey entry, which is a link to a machine-readable document that can express not only the public key information, but who owns the key, the name of the key, creation and revocation dates for the key, as well as a number of other Linked Data values that result in a full-fledged Web-based PKI system. Not only is Linked Data Signatures encryption simpler than JWE, but it also enables many more types of extensibility.
  3. The key data is expressed in a PEM-encoded format, which is expressed as a base-64 encoded blob of information. This approach was taken because it is easier for developers to use the data without introducing errors. Since most Web developers do not understand what variables like dq (the second factor Chinese Remainder Theorem exponent parameter) or d (the Elliptic Curve private key parameter) are, the likelihood of transporting and publishing that sort of data without error is lower than placing all parameters in an opaque blob of information that has a clear beginning and end (-----BEGIN PUBLIC KEY-----, and --- END PUBLIC KEY ---).

The rest of the entries in the JSON are typically required for the encryption method selected to secure the message. There is not a great deal of difference between the two specifications when it comes to the parameters that are needed for the encryption algorithm.

Bottom line: The major difference between the Linked Data Signatures and JWE specification has to do with how the encryption parameters are specified as well as how many of them there can be. The Linked Data Signatures specification expresses only one encryption mechanism and outlines the algorithms and keys external to the message, which leads to a reduction in complexity. The JWE specification allows many more types of encryption schemes to be used, at the expense of added complexity.

JSON Web Signatures vs. Linked Data Signatures

The JSON Web Signatures (JWS) specification defines a way to digitally sign JSON data structures. If your application needs to be able to verify the creator of a JSON data structure, you can use this specification to do so. A JWS digital signature looks like the following:

  "payload": "eyJpc ... VlfQ",
    "header": {
    "signature": "cC4hi ... 77Rw"

For the purposes of comparison, a Linked Data Signatures message and signature looks like the following:

  "@context": ["", ""]
  "@type": "Person",
  "name": "Manu Sporny",
  "homepage": "",
    "@type": "GraphSignature2012",
    "creator": "",
    "created": "2013-08-04T17:39:53Z",
    "signatureValue": "OGQzN ... IyZTk="

There are a number of stark differences between the two specifications when it comes to digital signatures:

  1. The Linked Data Signatures specification does not need to base-64 encode the payload being signed. This makes it easier for a developer to see (and work with) the data that was digitally signed. Debugging signed messages is also simplified as special tools to decode the payload are unnecessary.
  2. The Linked Data Signatures specification does not require any header parameters for the payload, which reduces the number of things that can go wrong when verifying digitally signed messages. One could argue that this also reduces flexibility. The counter-argument is that different signature schemes can always be switched in by just changing the @type of the signature.
  3. The signer’s public key is available via a URL. This means that, in general, all Linked Data Signatures signatures can be verified by dereferencing the creator URL and utilizing the published key data to verify the signature.
  4. The Linked Data Signatures specification depends on a normalization algorithm that is applied to the message. This algorithm is non-trivial, typically implemented behind a JSON-LD library .normalize() method call. JWS does not require data normalization. The trade-off is simplicity at the expense of requiring your data to always be encapsulated in the message. For example, the Linked Data Signatures specification is capable of pointing to a digital signature expressed in RDFa on a website using a URL. An application can then dereference that URL, convert the data to JSON-LD, and verify the digital signature. This mechanism is useful, for example, when you want to publish items for sale along with their prices on a Web page in a machine-readable way. This sort of use case is not achievable with the JWS specification. All data is required to be in the message. In other words, Linked Data Signatures performs a signature on information that could exist on the Web where the JWS specification performs a signature on a string of text in a message.
  5. The JWS mechanism enables HMAC-based signatures while the Linked Data Signatures mechanism avoids the use of HMAC altogether, taking the position that shared secrets are typically a bad practice.

Bottom line: The Linked Data Signatures specification does not need to encode its payloads, but does require a rather complex normalization algorithm. It supports discovery of signature key data so that signatures can be verified using standard Web protocols. The JWS specification is more flexible from an algorithmic standpoint and simpler from a signature verification standpoint. The downside is that the only data input format must be from the message itself and can’t be from an external Linked Data source, like an HTML+RDFa web page listing items for sale.


The Linked Data Signatures and JOSE designs, while attempting to achieve the same basic goals, deviate in the approaches taken to accomplish those goals. The Linked Data Signatures specification leverages more of the Web with its use of a Linked Data format and URLs for identifying and verifying identity and keys. It also attempts to encapsulate a single best practice that will work for the vast majority of Web applications in use today. The JOSE specifications are more flexible in the type of cryptographic algorithms that can be used which results in more low-level primitives used in the protocol, increasing complexity for developers that must create interoperable JOSE-based applications.

From a specification size standpoint, the JOSE specs weigh in at 225 pages, the Linked Data Signatures specification weighs in at around 20 pages. This is rarely a good way to compare specifications, and doesn’t always result in an apples to apples comparison. It does, however, give a general idea of the amount of text required to explain the details of each approach, and thus a ballpark idea of the complexity associated with each specification. Like all specifications, picking one depends on the use cases that an application is attempting to support. The goal with the Linked Data Signatures specification is that it will be good enough for 95% of Web developers out there, and for the remaining 5%, there is the JOSE stack.

[Editor’s Note: The original text of this blog post contained the phrase “Secure Messaging”, which has since been rebranded to “Linked Data Signatures“.]

Technical Analysis of 2012 MozPay API Message Format

The W3C Web Payments group is currently analyzing a new API for performing payments via web browsers and other devices connected to the web. This blog post is a technical analysis of the MozPay API with a specific focus on the payment protocol and its use of JOSE (JSON Object Signing and Encryption). The first part of the analysis takes the approach of examining the data structures used today in the MozPay API and compares them against what is possible via PaySwarm. The second part of the analysis examines the use of JOSE to achieve the use case and security requirements of the MozPay API and compares the solution to JSON-LD, which is the mechanism used to achieve the use case and security requirements of the PaySwarm specification.

Before we start, it’s useful to have an example of what the current MozPay payment initiation message looks like. This message is generated by a MozPay Payment Provider and given to the browser to initiate a native purchase process:

  "aud": "",
  "typ": "mozilla/payments/pay/v1",
  "iat": 1337357297,
  "exp": 1337360897,
  "request": {
    "id": "915c07fc-87df-46e5-9513-45cb6e504e39",
    "pricePoint": 1,
    "name": "Magical Unicorn",
    "description": "Adventure Game item",
    "icons": {
      "64": "",
      "128": ""
    "productData": "user_id=1234&my_session_id=XYZ",
    "postbackURL": "",
    "chargebackURL": ""

The message is effectively a JSON Web Token. I say effectively because it seems like it breaks the JWT spec in subtle ways, but it may be that I’m misreading the JWT spec.

There are a number of issues with the message that we’ve had to deal with when creating the set of PaySwarm specifications. It’s important that we call those issues out first to get an understanding of the basic concerns with the MozPay API as it stands today. The comments below use the JWT code above as a reference point.

Unnecessarily Cryptic JSON Keys

  "aud": "",
  "typ": "mozilla/payments/pay/v1",
  "iat": 1337357297,
  "exp": 1337360897,

This is more of an issue with the JOSE specs than it is the MozPay API. I can’t think of a good line of argumentation to shorten things like ‘issuer’ to ‘iss’ and ‘type’ to ‘typ’ (seriously :), the ‘e’ was too much?). It comes off as 1980s protocol design, trying to save bits on the wire. Making code less readable by trying to save characters in a human-readable message format works against the notion that the format should be readable by a human. I had to look up what iss, aud, iat, and exp meant. The only reason that I could come up with for using such terse entries was that the JOSE designers were attempting to avoid conflicts with existing data in JWT claims objects. If this was the case, they should have used a prefix like “@” or “$”, or placed the data in a container value associated with a key like ‘claims’.

PaySwarm always attempts to use terminology that doesn’t require you to go and look at the specification to figure out basic things. For example, it uses creator for iss (issuer), validFrom for iat (issued at), and validUntil for exp (expire time).



The MozPay API specification does not require the APPLICATION_KEY to be a URL. Since it’s not a URL, it’s not discoverable. The application key is also specific to each Marketplace, which means that one Marketplace could use a UUID, another could use a URL, and so on. If the system is intended to be decentralized and interoperable, the APPLICATION_KEY should either be dereferenceable on the public Web without coordination with any particular entity, or a format for the key should be outlined in the specification.

All identities and keys used in digital signatures in PaySwarm use URLs for the identifiers that must contain key information in some sort of machine-readable format (RDFa and JSON-LD, for now). This means that 1) they’re Web-native, 2) they can be dereferenced, and 3) when they’re dereferenced, a client can extract useful data from the document retrieved.


  "aud": "",

It’s not clear what the aud parameter is used for in the MozPay API, other than to identify the marketplace.

Issued At and Expiration Time

  "iat": 1337357297,
  "exp": 1337360897,

The iat (issued at) and exp (expiration time) values are encoded in the number of seconds since January 1st, 1970. These are not very human readable and make debugging issues with purchases more difficult than they need to be.

PaySwarm uses the W3C Date/Time format, which are human-readable strings that are also easy for machines to process. For example, November 5th, 2013 at 1:15:30 AM (Zulu / Universal Time) is encoded as: 2013-11-05T13:15:30Z.

The Request

  "request": {
    "id": "915c07fc-87df-46e5-9513-45cb6e504e39",
    "pricePoint": 1,
    "name": "Magical Unicorn",

This object in the MozPay API is a description of the thing that is to be sold. Technically, it’s not really a request. The outer object is the request. There is a big of a conflation of terminology here that should probably be fixed at some point.

In PaySwarm, the contents of the MozPay request value is called an Asset. An asset is a description of the thing that is to be sold.

Request ID

  "request": {
    "id": "915c07fc-87df-46e5-9513-45cb6e504e39",

The MozPay API encodes the request ID as a universally unique identifier (UUID). The major downside to this approach is that other applications can’t find the information on the Web to 1) discover more about the item being sold, 2) discuss the item being sold by referring to it by a universal ID, 3) feed it to a system that can read data published at the identifier address, and 4) index it for the purposes of searching.

The PaySwarm specifications use a URL for the identifier for assets and publish machine-readable data at the asset location so that other systems can discover more information about the item being sold, refer to the item being sold in discussions (like reviews of the item), start a purchase by referencing the URL, index the item being sold such that it may be utilized in price-comparison/search engines.

Price Point

  "request": {
    "pricePoint": 1,

The pricePoint for the item being sold is currently a whole number. This is problematic because prices are usually decimal numbers including a fraction and a currency.

PaySwarm publishes its pricing information in a currency agnostic way that is compatible with all known monetary systems. Some of these systems include USD, EUR, JYP, RMB, Bitcoin, Brixton Pound, Bernal Bucks, Ven, and a variety of other alternative currencies. The amount is specified as a decimal with fraction and a currency URL. A URL is utilized for the currency because PaySwarm supports arbitrary currencies to be created and managed external to the PaySwarm system.


  "request": {
    "icons": {
      "64": "",
      "128": ""

Icon data is currently modeled in a way that is useful to developers by indexing the information as a square pixel size for the icon. This allows developers to access the data like so: icons.64 or icons.128. Values are image URLs, which is the right choice.

PaySwarm uses JSON-LD and can support this sort of data layout through a feature called data indexing. Another approach is to just have an array of objects for icons, which would allow us to include extended information about the icons. For example:

  "request": {
  "icon": [{size: 64, id: "", label: "Magical Unicorn"}, ...]

Product Data

  "request": {
    "productData": "user_id=1234&my_session_id=XYZ",

If the payment technology we’re working on is going to be useful to society at large, we have to allow richer descriptions of products. For example, model numbers, rich markup descriptions, pictures, ratings, colors, and licensing terms are all important parts of a product description. The value needs to be larger than a 256 byte string and needs to support decentralized extensibility. For example, Home Depot should be able to list UPC numbers and internal reference numbers in the asset description and the payment protocol should preserve that extra information, placing it into digital receipts.

PaySwarm uses JSON-LD and thus supports decentralized extensibility for product data. This means that any vendor may express information about the asset in JSON-LD and it will be preserved in all digital contracts and digital receipts. This allows the asset and digital receipt format to be used as a platform that can be built on top of by innovative retailers. It also increases data fidelity by allowing far more detailed markup of asset information than what is currently allowed via the MozPay API.

Postback URL

  "request": {
    "postbackURL": "",

The postback URL is a pretty universal concept among Web-based payment systems. The payment processor needs a URL endpoint that the result of the purchase can be sent to. The postback URL serves this purpose.

PaySwarm has a similar concept, but just lists it in the request URL as ‘callback’.

Chargeback URL

  "request": {
    "chargebackURL": ""

The chargeback URL is a URL endpoint that is called whenever a refund is issued for a purchased item. It’s not clear if the vendor has a say in whether or not this should be allowed for a particular item. For example, what happens when a purchase is performed for a physical good? Should chargebacks be easy to do for those sorts of items?

PaySwarm does not build chargebacks into the core protocol. It lets the merchant request the digital receipt of the sale to figure out if the sale has been invalidated. It seems like a good idea to have a notification mechanism build into the core protocol. We’ll need more discussion on this to figure out how to correctly handle vendor-approved refunds and customer-requested chargebacks.


There are a number of improvements that could be made to the basic MozPay API that would enable more use cases to be supported in the future while keeping the level of complexity close to what it currently is. The second part of this analysis will examine the JavaScript Object Signature and Encryption (JOSE) technology stack and determine if there is a simpler solution that could be leveraged to simplify the digital signature requirements set forth by the MozPay API.

[UPDATE: The second part of this analysis is now available]

Verifiable Messaging over HTTP

Problem: Figure out a simple way to enable a Web client or server to authenticate and authorize itself to do a REST API call. Do this in one HTTP round-trip.

There is a new specification that is making the rounds called HTTP Signatures. It enables a Web client or server to authenticate and authorize itself when doing a REST API call and only requires one HTTP round-trip to accomplish the feat. The meat of the spec is 5 pages long, and the technology is simple and awesome.

We’re working on this spec in the Web Payments group at the World Wide Web Consortium because it’s going to be a fundamental part of the payment architecture we’re building into the core of the Web. When you send money to or receive money from someone, you want to make sure that the transaction is secure. HTTP Signatures help to secure that financial transaction.

However, the really great thing about HTTP Signatures is that it can be applied anywhere password or OAuth-based authentication and authorization is used today. Passwords, and shared secrets in general, are increasingly becoming a problem on the Web. OAuth 2 sucks for a number of reasons. It’s time for something simpler and more powerful.

HTTP Signatures:

  1. Work over both HTTP and HTTPS. You don’t need to spend money on expensive SSL/TLS security certificates to use it.
  2. Protect messages sent over HTTP or HTTPS by digitally signing the contents, ensuring that the data cannot be tampered with in transit. In the case that HTTPS security is breached, it provides an additional layer of protection.
  3. Identify the signer and establish a certain level of authorization to perform actions over a REST API. It’s like OAuth, only way simpler.

When coupled with the Web Keys specification, HTTP Signatures:

  1. Provide a mechanism where the digital signature key does not need to be registered in advance with the server. The server can automatically discover the key from the message and determine what level of access the client should have.
  2. Enable a fully distributed Public Key Infrastructure for the Web. This opens up new ways to more securely communicate over the Web, which is timely considering the recent news concerning the PRISM surveillance program.

If you’re interested in learning more about HTTP Signatures, the meat of the spec is 5 pages long and is a pretty quick read. You can also read (or listen to) the meeting notes where we discuss the HTTP Signatures spec a week ago, or today. If you want to keep up with how the spec is progressing, join the Web Payments mailing list.

Google adds JSON-LD support to Search and Google Now

Full disclosure: I’m one of the primary designers of JSON-LD and the Chair of the JSON-LD group at the World Wide Web Consortium.

Last week, Google announced support for JSON-LD markup in Gmail. Putting JSON-LD in front of 425 million people is a big validation of the technology.

Hot on the heels of last weeks announcement, Google has just announced additional JSON-LD support for two more of their core products! The first is their flagship product, Google Search. The second is their new intelligent personal assistant service called Google Now.

The addition of JSON-LD support to Google Search now allows you to do incredibly accurate personalized searches. For example, here’s an example search for “my flights”:

and here’s an example for “my hotel reservation for next week”:

Web developers that mark certain types of sort of information up as JSON-LD in the e-mails that they send to you can now enable new functionality in these core Google services. For example, using JSON-LD will make it really easy for you to manage flights, hotel bookings, reservations at restaurants, and events like concerts and movies from within Google’s ecosystem. It also makes it easy for services like Google Now to push a notification to your phone when your flight has been delayed:

Or, show your boarding pass on your mobile phone when you’ve arrived at the airport:

Or, let you know when you need to leave to make your reservation for a restaurant:

Google Search and Google Now can make these recommendations to you because the information that you received about these flights, boarding passes, hotels, reservations, and other events were marked up in JSON-LD format when they hit your Gmail inbox. The most exciting thing about all of this is that it’s just the beginning of what Linked Data can do to for all of us. Over the next decade, Linked Data will be at the center of getting computing and the monotonous details of our everyday grind out of the way so that we can focus more on enjoying our lives.

If you want to dive deeper into this technology, Google’s page on schemas is a good place to start.

Google adds JSON-LD support to Gmail

Google announced support for JSON-LD markup in Gmail at Google I/O 2013. The design team behind JSON-LD is delighted by this announcement and applaud the Google engineers that integrated JSON-LD with Gmail. This blog post examines what this announcement means for Gmail customers as well as providing some suggestions to the Google Gmail engineers on how they could improve their JSON-LD markup.

JSON-LD enables the representation of Linked Data in JSON by describing a common JSON representation format for expressing graphs of information (see Google’s Knowledge Graph). It allows you to mix regular JSON data with Linked Data in a single JSON document. The format has already been adopted by large companies such as Google in their Gmail product and is now available to over 425 million people via currently live software products around the world.

The syntax is designed to not disturb already deployed systems running on JSON, but provide a smooth upgrade path from JSON to JSON-LD. It is primarily intended to be a way to use Linked Data in Web-based programming environments, to build inter-operable Linked Data Web services, and to store Linked Data in JSON-based storage engines.

For Google’s Gmail customers, this means that Gmail will now be able to recognize people, places, events and a variety of other Linked Data objects. You can then take actions on the Linked Data objects embedded in an e-mail. For example, if someone sends you an invitation to a party, you can do a single-click response on whether or not you’ll attend a party right from your inbox. Doing so will also create a reminder for the party in your calendar. There are other actions that you can perform on Linked Data objects as well, like approving an expense report, reviewing a restaurant, saving a coupon for a free online movie, making a flight, hotel, or restaurant reservation, and many other really cool things that you couldn’t do before from the inside of your inbox.

What Google Got Right and Wrong

Google followed the JSON-LD standard pretty closely, so the vast majority of the markup looks really great. However, there are four issues that the Google engineers will probably want to fix before pushing the technology out to developers.

Invalid Context URL

The first issue is a fairly major one. Google isn’t using the JSON-LD @context parameter correctly in any of their markup examples. It’s supposed to be a URL, but they’re using a text string instead. This means that their JSON-LD documents are unreadable by all of the conforming JSON-LD processors today. For example, Google does the following when declaring a context in JSON-LD:

  "@context": ""

When they should be doing this:

  "@context": ""

It’s a fairly simple change; just add “http://” to the beginning of the “” value. If Google doesn’t make this change, it’ll mean that JSON-LD processors will have to include a special hack to translate “” to “” just for this use case. I hope that this was just a simple oversight by the Google engineers that implemented these features and not something that was intentional.

Context isn’t Online

The second issue has to do with the JSON-LD Context for There doesn’t seem to be a downloadable context for at the moment. Not having a Web-accessible JSON-LD context is bad because the context is at the heart and soul of a JSON-LD document. If you don’t publish a JSON-LD context on the Web somewhere, applications won’t be able to resolve any of the Linked Data objects in the document.

The Google engineers could fix this fairly easily by providing a JSON-LD Context document when a web client requests a document of type “application/ld+json” from the URL. The JSON-LD community would be happy to help the Google engineers create such a document.

Keyword Aliasing, FTW

The third issue is a minor usability issue with the markup. The Google help pages on the JSON-LD functionality use the @type keyword in JSON-LD to express the type of Linked Data object that is being expressed. The Google engineers that wrote this feature may not have been aware of the Keyword Aliasing feature in JSON-LD. That is, they could have just aliased @type to type. Doing so would mean that the Gmail developer documentation wouldn’t have to mention the “specialness” of the @type keyword.

Use RDFa Lite

The fourth issue concerns the use of Microdata. JSON-LD was designed to work seamlessly with RDFa Lite 1.1; you can easily and losslessly convert data between the two markup formats. JSON-LD is compatible with Microdata, but pairing the two is a sub-optimal design choice. When JSON-LD data is converted to Microdata, information is lost due to data fidelity issues in Microdata. For example, there is no mechanism to specify that a value is a URL in Microdata.

RDFa Lite 1.1 does not suffer from these issues and has been proven to be a drop-in replacement for Microdata without any of the downsides that Microdata has. The designers of JSON-LD are the same designers behind RDFa Lite 1.1 and have extensive experience with Microdata. We specifically did not choose to pair JSON-LD with Microdata because it was a bad design choice for a number of reasons. I hope that the Google engineers will seek out advice from the JSON-LD and RDFa communities before finalizing the decision to use Microdata, as there are numerous downsides associated with that decision.


All in all, the Google engineers did a good job of implementing JSON-LD in Gmail. With a few small fixes to the Gmail documentation and code examples, they will be fully compliant with the JSON-LD specifications. The JSON-LD community is excited about this development and looks forward to working with Google to improve the recent release of JSON-LD for Gmail.

Permanent Identifiers for the Web

Web applications that deal with data on the web often need to specify and use URLs that are very stable. They utilize services such as to ensure that applications using their URLs will always be re-directed to a working website. These “permanent URL” redirection services operate kind of like a switchboard, connecting requests for information with the true location of the information on the Web. These switchboards can be reconfigured to point to a new location if the old location stops working.

How Does it Work?

If the concept sounds a bit vague, perhaps an example will help. A web author could use the following link ( to refer to an important document. That link is hosted on a permanent identifier service. When a Web browser attempts to retrieve that link, it will be re-directed to the true location of the document on the Web. Currently, that location is If the location of the payswarm-v1.jsonld document changes at any point in the future, the only thing that needs to be updated is the re-direction entry on That is, all Web applications that use the URL will be transparently re-directed to the new location of the document and will continue to “Just Work™”. Launches

Permanent identifiers on the Web are an important thing to support, but until today there was no organization that would back a service for the Web to keep these sorts of permanent identifiers operating over the course of multiple decades. A number of us saw that this is a real problem and so we launched, which is a permanent identifier service for the Web. The purpose of is to provide a secure, permanent URL re-direction service for Web applications. This service will be run and operated by the W3C Permanent Identifier Community Group.

Specifically, the following organizations that have pledged responsibility to ensure the operation of this service for the decades to come: Digital Bazaar, 3 Round Stones, OpenLink Software, Applied Testing and Technology, and Openspring. Many more organizations will join in time.

These organizations are responsible for all administrative tasks associated with operating the service. The social contract between these organizations gives each of them full access to all information required to maintain and operate the website. The agreement is setup such that a number of these companies could fail, lose interest, or become unavailable for long periods of time without negatively affecting the operation of the site.

Why not

While many web authors and data publishers currently use, there are a number of issues or concerns that we have about the website:

  1. The site was designed for the library community and was never intended to be used by the general Web.
  2. Requests for information or changes to the service frequently go unanswered.
  3. The site does not support HTTPS connections, which means it cannot be used to serve documents for security-sensitive industries such as medicine and finance. Requests to migrate the site to HTTPs have gone unanswered.
  4. There is no published backup or fail-over plan for the website.
  5. The site is run by a single organization, with a single part-time administrator, on a single machine. It suffers from multiple single points of failure. Features

The launch of the website mitigates all of the issues outlined above with

  1. The site is specifically designed for web developers, authors, and data publishers on the general Web. It is not tailored for any specific community.
  2. Requests for information can be sent to a public mailing list that contains multiple administrators that are accountable for answering questions publicly. All administrators have been actively involved in world standards for many years and know how to run a service at this scale.
  3. The site supports HTTPS security, which means it can be used to securely serve data for industries such as medicine and finance.
  4. Multiple organizations, with multiple administrators per organization have full access to administer all aspects of the site and recover it from any potential failure. All important site data is in version control and is mirrored across the world on a regular basis.
  5. The site is run by a consortium of organizations that have each pledged to maintain the site for as long as possible. If a member organization fails, a new one will be found to replace the failing organization while the rest of the members ensure the smooth operation of the site.

All identifiers associated with the website are intended to be around for as long as the Web is around. This means decades, if not centuries. If the final destination for popular identifiers used by this service fail in such a way as to be a major inconvenience or danger to the Web, the community will mirror the information for the popular identifier and setup a working redirect to restore service to the rest of the Web.

Adding a Permanent Identifier

Anyone with a github account and knowledge of simple Apache redirect rules can add a permanent identifier to by performing the following steps:

  1. Fork on Github.
  2. Add a new redirect entry and commit your changes.
  3. Submit a pull request for your changes.

If you wish to engage the community in discussion about this service for your Web application, please send an e-mail to the mailing list. If you are interested in helping to maintain this service for the Web, please join the W3C Permanent Identifier Community Group.

Note: The letters ‘w3’ in the domain name stand for “World Wide Web”. Other than hosting the software for the Permanent Identifier Community Group, the “World Wide Web Consortium” (W3C) is not involved in the support or management of in any way.

Browser Payments 1.0

Kumar McMillan (Mozilla/FirefoxOS) and I (PaySwarm/Web Payments) have just published the first draft of the Browser Payments 1.0 API. The purpose of the spec is to establish a way to initiate payments from within the browser. It is currently a direct port of the mozPay API framework that is integrated into Firefox OS. It enables Web content to initiate payment or issue a refund for a product or service. Once implemented in the browser, a Web author may issue navigator.payment() function to initiate a payment.

This is work that we intend to pursue in the Web Payments Community Group at W3C. The work will eventually be turned over to a Web Payments Working Group at W3C, which we’re trying to kick-start at some point this year.

The current Browser Payments 1.0 spec can be read here:

The github repository for the spec is here:

Keep in mind that this is a very early draft of the spec. There are lots of prose issues as well as bugs that need to be sorted out. There are also a number of things that we need to discuss about the spec and how it fits into the larger Web ecosystem. Things like how it integrates with Persona and PaySwarm are still details that we need to suss out. There is a bug and issue tracker for the spec here:

The Mozilla guys will be on next week’s Web Payments telecon (Wednesday, 11am EST) for a Q/A session about this specification. Join us if you’re interested in payments in the browser. The call is open to the public, details about joining and listening in can be found here:

Identifiers in JSON-LD and RDF

TL;DR: This blog post argues that the extension of blank node identifiers in JSON-LD and RDF for the purposes of identifying predicates and naming graphs is important. It is important because it simplifies the usage of both technologies for developers. The post also provides a less-optimal solution if the RDF Working Group does not allow blank node identifiers for predicates and graph names in RDF 1.1.

We need identifiers as humans to convey complex messages. Identifiers let us refer to a certain thing by naming it in a particular way. Not only do humans need identifiers, but our computers need identifiers to refer to data in order to perform computations. It is no exaggeration to say that our very civilization depends on identifiers to manage the complexity of our daily lives, so it is no surprise that people spend a great deal of time thinking about how to identify things. This is especially true when we talk about the people that are building the software infrastructure for the Web.

The Web has a very special identifier called the Uniform Resource Locator (URL). It is probably one of the best known identifiers in the world, mostly because everybody that has been on the Web has used one. URLs are great identifiers because they are very specific. When I give you a URL to put into your Web browser, such as the link to this blog post, I can be assured that when you put the URL into your browser that you will see what I see. URLs are globally scoped, they’re supposed to always take you to the same place.

There is another class of identifier on the Web that is not globally scoped and is only used within a document on the Web. In English, these identifiers are used when we refer to something as “that thing”, or “this widget”. We can really only use this sort of identifier within a particular context where the people participating in the conversation understand the context. Linguists call this concept deixis. “Thing” doesn’t always refer to the same subject, but based on the proper context, we can usually understand what is being identified. Our consciousness tags the “thing” that is being talked about with a tag of sorts and then refers to that thing using this pseudo-identifier. Most of this happens unconsciously (notice how your mind unconsciously tied the use of ‘this’ in this sentence to the correct concept?).

The take-away is that there are globally-scoped identifiers like URLs, and there are also locally-scoped identifiers, that require a context in order to understand what they refer to.


In JSON, developers typically express data like this:

  "name": "Joe"

Note how that JSON object doesn’t have an identifier associated with it. JSON-LD creates a straight-forward way of giving that object an identifier:

  "@context": ...,
  "@id": "",
  "name": "Joe"

Both you and I can refer to that object using and be sure that we’re talking about the same thing. There are times that assigning a global identifier to every piece of data that we create is not desired. For example, it doesn’t make much sense to assign an identifier to a transient message that is a request to get a sensor reading. This is especially true if there are millions of these types or requests and we never want to refer to the request once it has been transmitted. This is why JSON-LD doesn’t force developers to assign an identifier to the objects that they express. The people that created the technology understand that not everything needs a global identifier.

Computers are less forgiving, they need identifiers for most everything, but a great deal of that complexity can be hidden from developers. When an identifier becomes necessary in order to perform computations upon the data, the computer can usually auto-generate an identifier for the data.

RDF, Graphs, and Blank Node Identifiers

The Resource Description Framework (RDF) primarily uses an identifier called the Internationalized Resource Identifier (IRI). Where URLs can typically only express links in Western languages, an IRI can express links in almost every language in use today including Japanese, Tamil, Russian and Mandarin. RDF also defines a special type of identifier called a blank node identifier. This identifier is auto-generated and is locally scoped to the document. It’s an advanced concept, but is one that is pretty useful when you start dealing with transient data, where creating a global identifier goes beyond the intended usage of the data. An RDF-compatible program will step in and create blank node identifiers on your behalf, but only when necessary.

Both JSON-LD and RDF have the concept of a Statement, Graph, and a Dataset. A Statement consists of a subject, predicate, and an object (for example: “Dave likes cookies”). A Graph is a collection of Statements (for example: Graph A contains all the things that Dave said and Graph B contains all the things that Mary said). A Dataset is a collection of Graphs (for example: Dataset Z contains all of the things Dave and Mary said yesterday).

In JSON-LD, at present, you can use a blank node identifier for subjects, predicates, objects, and graphs. In RDF, you can only use blank node identifiers for subjects and objects. There are people, such as myself, in the RDF WG that think this is a mistake. There are people that think it’s fine. There are people that think it’s the best compromise that can be made at the moment. There is a wide field of varying opinions strewn between the various extremes.

The end result is that the current state of affairs have put us into a position where we may have to remove blank node identifier support for predicates and graphs from JSON-LD, which comes across as a fairly arbitrary limitation to those not familiar with the inner guts of RDF. Don’t get me wrong, I feel it’s a fairly arbitrary limitation. There are those in the RDF WG that don’t think it is and that may prevent JSON-LD from being able to use what I believe is a very useful construct.

Document-local Identifiers for Predicates

Why do we need blank node identifiers for predicates in JSON-LD? Let’s go back to the first example in JSON to see why:

  "name": "Joe"

The JSON above is expressing the following Statement: “There exists a thing whose name is Joe.”

The subject is “thing” (aka: a blank node) which is legal in both JSON-LD and RDF. The predicate is “name”, which doesn’t map to an IRI. This is fine as far as the JSON-LD data model is concerned because “name”, which is local to the document, can be mapped to a blank node. RDF cannot model “name” because it has no way of stating that the predicate is local to the document since it doesn’t support blank nodes for predicates. Since the predicate doesn’t map to an IRI, it can’t be modeled in RDF. Finally, “Joe” is a string used to express the object and that works in both JSON-LD and RDF.

JSON-LD supports the use of blank nodes for predicates because there are some predicates, like every key used in JSON, that are local to the document. RDF does not support the use of blank nodes for predicates and therefore cannot properly model JSON.

Document-local Identifiers for Graphs

Why do we need blank node identifiers for graphs in JSON-LD? Let’s go back again to the first example in JSON:

  "name": "Joe"

The container of this statement is a Graph. Another way of writing this in JSON-LD is this:

  "@context": ...,
  "@graph": {
    "name": "Joe"

However, what happens when you have two graphs in JSON-LD, and neither one of them is the RDF default graph?

  "@context": ...,
  "@graph": [
      "@graph": {
        "name": "Joe"
      "@graph": {
        "name": "Susan"

In JSON-LD, at present, it is assumed that a blank node identifier may be used to name each graph above. Unfortunately, in RDF, the only thing that can be used to name a graph is an IRI, and a blank node identifier is not an IRI. This puts JSON-LD in an awkward position, either JSON-LD can:

  1. Require that developers name every graph with an IRI, which seems like a strange demand because developers don’t have to name all subjects and objects with an IRI, or
  2. JSON-LD can auto-generate a regular IRI for each predicate and graph name, which seems strange because blank node identifiers exist for this very purpose (not to mention this solution won’t work in all cases, more below), or
  3. JSON-LD can auto-generate a special IRI for each predicate and graph name, which would basically re-invent blank node identifiers.

The Problem

The problem surfaces itself when you try to convert a JSON-LD document to RDF. If the RDF Working Group doesn’t allow blank node identifiers for predicates and graphs, then what do you use to identify predicates and graphs that have blank node identifiers associated with them in the JSON-LD data model? This is a feature we do want to support because there are a number of important use cases that it enables. The use cases include:

  1. Blank node predicates allow JSON to be mapped directly to the JSON-LD and RDF data models.
  2. Blank node graph names allow developers to use graphs without explicitly naming them.
  3. Blank node graph names make the RDF Dataset Normalization algorithm simpler.
  4. Blank node graph names prevent the creation of a parallel mechanism to generate and manage blank node-like identifiers.

It’s easy to see the problem exposed when performing RDF Dataset Normalization, which we need to do in order to digitally sign information expressed in JSON-LD and RDF. The rest of this post will focus on this area, as it exposes the problems with not supporting blank node identifiers for predicates and graph names. In JSON-LD, the two-graph document above could be normalized to this NQuads (subject, predicate, object, graph) representation:

_:bnode0 _:name "Joe" _:graph1 .
_:bnode1 _:name "Susan" _:graph2 .

This is illegal in RDF since you can’t have a blank node identifier in the predicate or graph position. Even if we were to use an IRI in the predicate position, the problem (of not being able to normalize “un-labeled” JSON-LD graphs like the ones in the previous section) remains.

The Solutions

This section will cover the proposed solutions to the problem in order least desirable to most desirable.

Don’t allow blank node identifiers for predicates and graph names

Doing this in JSON-LD ignores the point of contention. The same line of argumentation can be applied to RDF. The point is that by forcing developers to name graphs using IRIs, we’re forcing them to do something that they don’t have to do with subjects and objects. There is no technical reason that has been presented where the use of a blank node identifier in the predicate or graph position is unworkable. Telling developers that they must name graphs using IRIs will be surprising to them, because there is no reason that the software couldn’t just handle that case for them. Requiring developers to do things that a computer can handle for them automatically is anti-developer and will harm adoption in the long run.

Generate fragment identifiers for graph names

One solution is to generate fragment identifiers for graph names. This, coupled with the base IRI would allow the data to be expressed legally in NQuads:

_:bnode0 <> "Joe" <> .
_:bnode1 <> "Susan" <> .

The above is legal RDF. The approach is problematic when you don’t have a base IRI, such as when JSON-LD is used as a messaging protocol between two systems. In that use case, you end up with something like this:

_:bnode0 <#name> "Joe" <#graph1> .
_:bnode1 <#name> "Susan" <#graph2> .

RDF requires absolute IRIs and so the document above is illegal from an RDF perspective. The other down-side is that you have to keep track of all fragment identifiers in the output and make sure that you don’t pick fragment identifiers that are used elsewhere in the document. This is fairly easy to do, but now you’re in the position of tracking and renaming both blank node identifiers and fragment IDs. Even if this approach worked, you’d be re-inventing the blank node identifier. This approach is unworkable for systems like PaySwarm that use transient JSON-LD messages across a REST API; there is no base IRI in this use case.

Skolemize to create identifiers for graph names

Another approach is skolemization, which is just a fancy way of saying: generate a unique IRI for the blank node when expressing it as RDF. The output would look something like this:

_:bnode0 <> "Joe" <> .
_:bnode1 <> "Susan" <> .

This would be just fine if there was only one application reading and consuming data. However, when we are talking about RDF Dataset Normalization, there are cases where two applications must read and independently verify the representation of a particular IRI. One scenario that illustrates the example fairly nicely is the blind verification scenario. In this scenario, two applications de-reference an IRI to fetch a JSON-LD document. Each application must perform RDF Dataset Normalization and generate a hash of that normalization to see if they retrieved the same data. Based on a strict reading of the skolemization rules, Application A would generate this:

_:bnode0 <> "Joe" <> .
_:bnode1 <> "Susan" <> .

and Application B would generate this:

_:bnode0 <> "Joe" <> .
_:bnode1 <> "Susan" <> .

Note how the two graphs would never hash to the same value because the Skolem IRIs are completely different. The RDF Dataset Normalization algorithm would have no way of knowing which IRIs are blank node stand-ins and which ones are legitimate IRIs. You could say that publishers are required to assign the skolemized IRIs to the data they publish, but that ignores the point of contention, which is that you don’t want to force developers to create identifiers for things that they don’t care to identify. You could argue that the publishing system could generate these IRIs, but then you’re still creating a global identifier for something that is specifically meant to be a document-scoped identifier.

A more lax reading of the Skolemization language might allow one to create a special type of Skolem IRI that could be detected by the RDF Dataset Normalization algorithm. For example, let’s say that since JSON-LD is the one that is creating these IRIs before they go out to the RDF Dataset Normalization Algorithm, we use the tag IRI scheme. The output would look like this for Application A:

_:bnode0 <,2013:dsid:345> "Joe" <,2013:dsid:254> .
_:bnode1 <,2013:dsid:345> "Susan" <,2013:dsid:363> .

and this for Application B:

_:bnode0 <,2013:dsid:a> "Joe" <,2013:dsid:b> .
_:bnode1 <,2013:dsid:a> "Susan" <,2013:dsid:c> .

The solution still doesn’t work, but we could add another step to the RDF Dataset Normalization algorithm that would allow it to rename any IRI starting with,2013:. Keep in mind that this is exactly the same thing that we do with blank nodes, and it’s effectively duplicating that functionality. The algorithm would allow us to generate something like this for both applications doing a blind verification.

_:bnode0 <,2013:dsid:predicate-1> "Joe" <,2013:dsid:graph-1> .
_:bnode1 <,2013:dsid:predicate-1> "Susan" <,2013:dsid:graph-2> .

This solution does violate one strong suggestion in the Skolemization section:

Systems wishing to do this should mint a new, globally unique IRI (a Skolem IRI) for each blank node so replaced.

The IRI generated is definitely not globally unique, as there will be many,2013:dsid:graph-1s in the world, each associated with data that is completely different. This approach also goes against something else in Skolemization that states:

This transformation does not appreciably change the meaning of an RDF graph.

It’s true that using tag IRIs doesn’t change the meaning of the graph when you assume that the document will never find its way into a database. However, once you place the document in a database, it certainly creates the possibility of collisions in applications that are not aware of the special-ness of IRIs starting with,2013:dsid:. The data is fine taken by itself, but a disaster when merged with other data. We would have to put a warning in some specification for systems to make sure to rename the incoming,2013:dsid: IRIs to something that is unique to the storage subsystem. Keep in mind that this is exactly what is done when importing blank node identifiers into a storage subsystem. So, we’ve more-or-less re-invented blank node identifiers at this point.

Allow blank node identifiers for graph names

This leads us to the question of why not just extend RDF to allow blank node identifiers for predicates and graph names? Ideally, that’s what I would like to see happen in the future as it places the least burden on developers, and allows RDF to easily model JSON. The responses from the RDF WG are varied. These are all of the current arguments against that I have heard:

There are other ways to solve the problem, like fragment identifiers and skolemization, than introducing blank nodes for predicates and graph names.

Fragment identifiers don’t work, as demonstrated above. There is really only one workable solution based on a very lax reading of skolemization, and as demonstrated above, even the best skolemization solution re-invents the concept of a blank node.

There are other use cases that are blocked by the introduction of blank node identifiers into the predicate and graph name position.

While this has been asserted, it is still unclear exactly what those use cases are.

Adding blank node identifiers for predicates and graph names will break legacy applications.

If blank nodes for predicates and graph names were illegal before, wouldn’t legacy applications reject that sort of input? The argument that there are bugs in legacy applications that make them not robust against this type of input is valid, but should that prevent the right solution from being adopted? There has been no technical reason put forward for why blank nodes for predicates or graph names cannot work, other than software bugs prevent it.

The PaySwarm work has chosen to model the data in a very strange way.

The people that have been working on RDFa, JSON-LD, and the Web Payments specifications for the past 5 years have spent a great deal of time attempting to model the data in the simplest way possible, and in a way that is accessible to developers that aren’t familiar with RDF. Whether or not it may seem strange is arguable since this response is usually levied by people not familiar with the Web Payments work. This blog post outlines a variety of use cases where the use of a blank node for predicates and graph naming is necessary. Stating that the use cases are invalid ignores the point of contention.

If we allow blank nodes to be used when naming graphs, then those blank nodes should denote the graph.

At present, RDF states that a graph named using an IRI may denote the graph or it may not denote the graph. This is a fancy way of saying that the IRI that is used for the graph name may be an identifier for something completely different (like a person), but de-referencing the IRI over the Web results in a graph about cars. I personally think that is a very dangerous concept to formalize in RDF, but there are others that have strong opinions to the contrary. The chances of this being changed in RDF 1.1 is next to none.

Others have argued that while that may be the case for IRIs, it doesn’t have to be the case for blank nodes that are used to name graphs. In this case, we can just state that the blank node denotes the graph because it couldn’t possibly be used for anything else since the identifier is local to the document. This makes a great deal of sense, but it is different from how an IRI is used to name a graph and that difference is concerning to a number of people in the RDF Working Group.

However, that is not an argument to disallow blank nodes from being used for predicates and graph names. The group could still allow blank nodes to be used for this purpose while stating that they may or may not be used to denote the graph.

The RDF Working Group does not have enough time left in its charter to make a change this big.

While this may be true, not making a decision on this is causing more work for the people working on JSON-LD and RDF Dataset Normalization. Having the,2013:dsid: identifier scheme is also going to make many RDF-based applications more complex in the long run, resulting in a great deal more work than just allowing blank nodes for predicates and graph names.


I have a feeling that the RDF Working Group is not going to do the right thing on this one due to the time pressure of completing the work that they’ve taken on. The group has already requested, and has been granted, a charter extension. Another extension is highly unlikely, so the group wants to get everything wrapped up. This discussion could take several weeks to settle. That said, the solution that will most likely be adopted (a special tag-based skolem IRI) will cause months of work for people living in the JSON-LD and RDF ecosystem. The best solution in the long run would be to solve this problem now.

If blank node identifiers for predicates and graphs are rejected, here is the proposal that I think will move us forward while causing an acceptable amount of damage down the road:

  1. JSON-LD continues to support blank node identifiers for use as predicates and graph names.
  2. When converting JSON-LD to RDF, a special, relabelable IRI prefix will be used for blank nodes in the predicate and graph name position of the form,2013:dsid:

Thanks to Dave Longley for proofing this blog post and providing various corrections.


A few days ago, a proposal was put forward in the HTML Working Group (HTML WG) by Microsoft, Netflix, and Google to take DRM in HTML5 to the next stage of standardization at W3C. This triggered another uproar about the morality and ethics behind DRM and building it into the Web. There are good arguments about morality/ethics on both sides of the debate but ultimately, the HTML WG will decide whether or not to pursue the specification based on technical merit. I (@manusporny) am a member of the HTML WG. I was also the founder of a start-up that focused on building a legal, peer-to-peer, content distribution network for music and movies. It employed DRM much like the current DRM in HTML5 proposal. During the course of 8 years of technical development, we had talks with many of the major record labels. I have first-hand knowledge of the problem, and building a technical solution to address the problem.

TL;DR: The Encrypted Media Extensions (DRM in HTML5) specification does not solve the problem the authors are attempting to solve, which is the protection of content from opportunistic or professional piracy. The HTML WG should not publish First Public Working Drafts that do not effectively address the primary goal of a specification.

The Problem

The fundamental problem that the Encrypted Media Extensions (EME) specification seems to be attempting to solve is to find a way to reduce piracy (since eliminating piracy on the Web is an impossible problem to solve). This is a noble goal as there are many content creators and publishers that are directly impacted by piracy. These are not faceless corporations, they are people with families that depend on the income from their creations. It is with this thought in mind that I reviewed the specification on a technical basis to determine if it would lead to a reduction in piracy.

Review Notes for Encrypted Media Extensions (EME)


The EME specification does not specify a DRM scheme in the specification, rather it explains the architecture for a DRM plug-in mechanism. This will lead to plug-in proliferation on the Web. Plugins are something that are detrimental to inter-operability because it is inevitable that the DRM plugin vendors will not be able to support all platforms at all times. So, some people will be able to view content, others will not.

A simple example of the problem is Silverlight by Microsoft. Take a look at the Plugin details for Silverlight, specifically, click on the “System Requirements” tab. Silverlight is Microsoft’s creation. Microsoft is a HUGE corporation with very deep pockets. They can and have thrown a great deal of money at solving very hard problems. Even Microsoft does not support their flagship plugin on Internet Explorer 8 on older versions of their operating system and the latest version of Chrome on certain versions of Windows and Mac. If Microsoft can’t make their flagship Web plugin work across all major Operating Systems today, what chance does a much smaller DRM plugin company have?

The purpose of a standard is to increase inter-operability across all platforms. It has been demonstrated that plug-ins, on the whole, harm inter-operability in the long run and often create many security vulnerabilities. The one shining exception is Flash, but we should not mistake an exception for the rule. Also note that Flash is backed by Adobe, a gigantic multi-national corporation with very deep pockets.

1.1 Goals

The goals section does not state the actual purpose of the specification. It states meta-purposes like: “Support a range of content security models, including software and hardware-based models” and “Support a wide range of use cases.”. While those are sub-goals, the primary goal isn’t stated once in the Goals section. The only rational primary goal is to reduce the amount of opportunistic piracy on the Web. Links to piracy data collected over the last decade could help make the case that this is worth doing.

1.2.1. Content Decryption Module (CDM)

When we were working on our DRM system, we took almost exactly the same approach that the EME specification does. We had a plug-in system that allowed different DRM modules to be plugged into the system. We assumed that each DRM scheme had a shelf-life of about 2-3 months before it was defeated, so our system would rotate the DRM modules every 3 months. We had plans to create genetic algorithms that would encrypt and watermark data into the file stream and mutate the encryption mechanism every couple of months to keep the pirates busy. It was a very complicated system to keep working because one slip up in the DRM module meant that people couldn’t view the content they had purchased. We did get the system working in the end, but it was a nightmare to make sure that the DRM modules to decrypt the information were rotated often enough to be effective while ensuring that they worked across all platforms.

Having first-hand knowledge of how such a system works, it’s a pretty terrible idea for the Web because it takes a great deal of competence and coordination to pull something like this off. I would expect the larger Content Protection companies to not have an issue with this. The smaller Content Protection companies, however, will inevitably have issues with ensuring that their DRM modules work across all platforms.

The bulk of the specification

The bulk of the specification is what you would expect from a system like this, so I won’t go into the gory details. There were two major technical concerns I had while reading through the implementation notes.

The first is that key retrieval is handled by JavaScript code, which means that anybody using a browser could copy the key data. This means that if a key is sent in the clear, the likelihood that the DRM system could be compromised goes up considerably because the person that is pirating the content knows the details necessary to store and decrypt the content.

If the goal is to reduce opportunistic piracy, all keys should be encrypted so that snooping by the browser doesn’t result in the system being compromised. Otherwise, all you would need to do is install a plugin that shares all clear-text keys with something like Mega. Pirates could use those keys to then decrypt byte-streams that do not mutate between downloads. To my knowledge, most DRM’ed media delivery does not encrypt content on a per-download basis. So, the spec needs to make it very clear that opaque keys MUST be used when delivering media keys.

One of the DRM systems we built, which became the primary way we did things, would actually re-encrypt the byte stream for every download. So even if a key was compromised, you couldn’t use the key to decrypt any other downloads. This was massively computationally expensive, but since we were running a peer-to-peer network, the processing was pushed out to the people downloading stuff in the network and not our servers. Sharing of keys was not possible in our DRM system, so we could send the decryption keys in the clear. I doubt many of the Content Protection Networks will take this approach as it would massively spike the cost of delivering content.

6. Simple Decryption

The “org.w3.clearkey” Key System indicates a plain-text clear (unencrypted) key will be used to decrypt the source. No additional client-side content protection is required.

Wow, what a fantastically bad idea.

  1. This sends the decryption key in the clear. This key can be captured by any Web browser plugin. That plugin can then share the decryption key and the byte stream with the world.
  2. It duplicates the purpose of Transport Layer Security (TLS).
  3. It doesn’t protect anything while adding a very complex way of shipping an encrypted byte stream from a Web server to a Web browser.

So. Bad. Seriously, there is nothing secure about this mechanism. It should be removed from the specification.

9.1. Use Cases: “user is not an adversary”

This is not a technical issue, but I thought it would be important to point it out. This “user is not an adversary” text can be found in the first question about use cases. It insinuates that people that listen to radio and watch movies online are potential adversaries. As a business owner, I think that’s a terrible way to frame your customers.

Thinking of the people that are using the technology that you’re specifying as “adversaries” is also largely wrong. 99.999% of people using DRM-based systems to view content are doing it legally. The folks that are pirating content are not sitting down and viewing the DRM stream, they have acquired a non-DRM stream from somewhere else, like Mega or The Pirate Bay, and are watching that. This language is unnecessary and should be removed from the specification.


There are some fairly large security issues with the text of the current specification. Those can be fixed.

The real goal of this specification is to create a framework that will reduce content piracy. The specification has not put forward any mechanism that demonstrates that it would achieve this goal.

Here’s the problem with EME – it’s easy to defeat. In the very worst case, there exist piracy rigs that allow you to point an HD video camera at a HD television and record the video and audio without any sort of DRM. That’s the DRM-free copy that will end up on Mega or the Pirate Bay. In practice, no DRM system has survived for more than a couple of years.

Content creators, if your content is popular, EME will not protect your content against a content pirate. Content publishers, your popular intellectual property will be no safer wrapped in anything that this specification can provide.

The proposal does not achieve the goal of the specification, it is not ready for First Public Working Draft publication via the HTML Working Group.

Aaron Swartz, PaySwarm, and Academic Journals

For those of you that haven’t heard yet, Aaron Swartz took his own life two days ago. Larry Lessig has a follow-up on one of the reasons he thinks led to his suicide (the threat of 50 years in jail over the JSTOR case).

I didn’t know Aaron at all. A large number of people that I deeply respect did, and have written about his life with great admiration. I, like most of you that have read the news, have done so while brewing a cauldron of mixed emotions. Saddened that someone that had achieved so much good in their life is no longer in this world. Angry that Aaron chose this ending. Sickened that this is the second recent suicide, Ilya’s being the first, involving a young technologist trying to make the world a better place for all of us. Afraid that other technologists like Aaron and Ilya will choose this path over persisting in their noble causes. Helpless. Helpless because this moment will pass, just like Ilya’s did, with no great change in the way our society deals with mental illness. With no great change, in what Aaron was fighting for, having been realized.

Nobody likes feeling helpless. I can’t mourn Aaron because I didn’t know him. I can mourn the idea of Aaron, of the things he stood for. While reading about what he stood for, several disconnected ideas kept rattling around in the back of my head:

  1. We’ve hit a point of ridiculousness in our society where people at HSBC knowingly laundering money for drug cartels get away with it, while people like Aaron are labeled a felon and face upwards of 50 years in jail for “stealing” academic articles. This, even after the publisher of said academic articles drops the charges. MIT never dropped their charges.
  2. MIT should make it clear that he was not a felon or a criminal. MIT should posthumously pardon Aaron and commend him for his life’s work.
  3. The way we do peer-review and publish scientific research has to change.
  4. I want to stop reading about all of this, it’s heartbreaking. I want to do something about it – make something positive out of this mess.

Ideas, Floating

I was catching up on news this morning when the following floated past on Twitter:

clifflampe: It seems to me that the best way for we academics to honor Aaron Swartz’s memory is to frigging finally figure out open access publishing.

1Copenut: @clifflampe And finally implement a micropayment system like @manusporny’s #payswarm. I don’t want the paper-but I’ll pay for the stories.

1Copenut: @manusporny These new developments with #payswarm are a great advance. Is it workable with other backends like #Middleman or #Sinatra?

This was interesting because we have been talking about how PaySwarm could be applied to academic publishing for a while now. All the discussions to this point have been internal, we didn’t know if anybody would make the connection between the infrastructure that PaySwarm provides and how it could be applied to academic journals. This is up on our ideas board as a potential area that PaySwarm could be applied:

  • Payswarm for peer-reviewed, academic publishing
    • Use Payswarm identity mechanism to establish trusted reviewer and author identities for peer review
    • Use micropayment mechanism to fund research
    • Enable university-based group-accounts for purchasing articles, or refunding researcher purchases

Journals as Necessary Evils

For those in academia, journals are often viewed as a necessary evil. They cost a fortune to subscribe to, farm out most of their work to academics that do it for free, and employ an iron-grip on the scientific publication process. Most academics that I speak with would do away with journal organizations in a heartbeat if there was a viable alternative. Most of the problem is political, which is why we haven’t felt compelled to pursue fixing it. Political problems often need a groundswell of support and a number of champions that are working inside the community. I think the groundswell is almost here. I don’t know who the set of academic champions are that will be the ones to push this forward. Additionally, if nobody takes the initiative to build such a system, things won’t change.

Here’s what we (Digital Bazaar) have been thinking. To fix the problem, you need at least the following core features:

  • Web-scale identity mechanisms – so that you can identify reviewers and authors for the peer-review process regardless of which site is publishing or reviewing a paper.
  • Decentralized solution – so that universities and researchers drive the process – not the publishers of journals.
  • Some form of remuneration system – you want to reward researchers with heavily cited papers, but in a way that makes it very hard to game the system.

Scientific Remuneration

PaySwarm could be used to implement each of these core features. At its core, PaySwarm is a decentralized payment mechanism for the Web. It also has a decentralized identity mechanism that is solid, but in a way that does not violate your privacy. There is a demo that shows how it can be applied to WordPress blogs where just an abstract is published, and if the reader wants to see more of the article, they can pay a small fee to read it. It doesn’t take a big stretch of the imagination to replace “blog article” with “research paper”. The hope is that researchers would set access prices on articles such that any purchase to access the research paper would then go to directly funding their current research. This would empower universities and researchers with an additional revenue stream while reducing the grip that scientific publishers currently have on our higher-education institutions.

A Decentralized Peer-review Process

Remuneration is just one aspect of the problem. Arguably, it is the lesser of the problems in academic publishing. The biggest technical problem is how you do peer review on a global, distributed scale. Quite obviously, you need a solid identity system that can identify scientists over the long term. You need to understand a scientists body of work and how respected their research is in their field. You also need a review system that is capable of pairing scientists and papers in need of review. PaySwarm has a strong identity system in place using the Web as the identification mechanism. Here is the PaySwarm identity that I use for development: Clearly, paper publishing systems wouldn’t expose that identity URL to people using the system, but I include it to show what a Web-scale identifier looks like.

Web-scale Identity

If you go to that identity URL, you will see two sets of information: my public financial accounts and my digital signature keys. A PaySwarm Authority can annotate this identity with even more information, like whether or not an e-mail address has been verified against the identity. Is there a verified cellphone on record for the identity? Is there a verified driver’s license on record for the identity? What about a Twitter handle? A Google+ handle? All of these pieces of information can be added and verified by the PaySwarm Authority in order to build an identity that others can trust on the Web.

What sorts of pieces of information need to be added to a PaySwarm identity to trust its use for academic publishing? Perhaps a list of articles published by the identity? Review comments for all other papers that have been reviewed by the identity? Areas of research that other’s have certified that the identity is an expert on? This is pretty basic Web-of-trust stuff, but it’s important to understand that PaySwarm has this sort of stuff baked into the core of the design.

The Process

Leveraging identity to make decentralized peer-review work is the goal, and here is how it would work from a researcher perspective:

  1. A researcher would get a PaySwarm identity from any PaySwarm Authority, there is no cost associated with getting such an identity. This sub-system is already implemented in PaySwarm.
  2. A researcher would publish an abstract of their paper in a Linked Data format such as RDFa. This abstract would identify the authors of the paper and some other basic information about the paper. It would also have a digital signature on the information using the PaySwarm identity that was acquired in the previous step. The researcher would set the cost to access the full article using any PaySwarm-compatible system. All of this is already implemented in PaySwarm.
  3. A paper publishing system would be used to request a review among academic peers. Those peers would review the paper and publish digital signatures on review comments, possibly with a notice that the paper is ready to be published. This sub-system is fairly trivial to implement and would mirror the current review process with the important distinction that it would not be centralized at journal publications.
  4. Once a pre-set limit on the number of positive reviews has been met, the paper publishing system would place its stamp of approval on the paper. Note that different paper publishing systems may have different metrics just as journals have different metrics today. One benefit to doing it this way is that you don’t need a paper publishing system to put its stamp of approval on a paper at all. If you really wanted to, you could write the software to calculate whether or not the paper has gotten the appropriate amount of review because all of the information is on the Web by default. This part of the system would be fairly trivial to write once the metrics were known. It may take a year or two to get the correct set of metrics in place, but it’s not rocket science and it doesn’t need to be perfect before systems such as this are used to publish papers.

From a reviewer perspective, it would work like so:

  1. You are asked to review papers by your peers once you have an acceptable body of published work. All of your work can be verified because it is tied to your PaySwarm identity. All review comments can be verified as they are tied to other PaySwarm identities. This part is fairly trivial to implement, most of the work is already done for PaySwarm.
  2. Once you review a paper, you digitally sign your comments on the paper. If it is a good paper, you also include a claim that it is ready for broad publication. Again, technically simple to implement.
  3. Your reputation builds as you review more papers. The way that reputation is calculated is outside of the scope of this blog post mainly because it would need a great deal of input from academics around the world. Reputation is something that can be calculated, but many will argue about the algorithm and I would expect this to oscillate throughout the years as the system grows. In the end, there will probably be multiple reputation algorithms, not just one. All that matters is that people trust the reputation algorithms.

Freedom to Research and Publish

The end-goal is to build a system that empowers researchers and research institutions, is far more transparent than the current peer-reviewed publishing system, and remunerates the people doing the work more directly. You will also note that at no point does a traditional journal enter the picture to give you a stamp of approval and charge you a fee for publishing your paper. Researchers are in control of the costs at all stages. As I’ve said above, the hard part isn’t the technical nature of the project, it’s the political nature of it. I don’t know if this is enough of a pain-point among academics to actually start doing something about it today. I know some are, but I don’t know if many would use such a system over the draw of publications like Nature, PLOS, Molecular Genetics and Genomics, and Planta. Quite obviously, what I’ve proposed above isn’t a complete road map. There are issues and details that would need to be hammered out. However, I don’t understand why a system like this doesn’t already exist, so I implore the academic community to explain why what I’ve laid out above hasn’t been done yet.

It’s obvious that a system like this would be good for the world. Building such a system may have reduced the possibility of us losing someone like Aaron in the way that we did. He was certainly fighting for something like it. Talking about it makes me feel a bit less helpless than I did yesterday. Maybe making something good out of this mess will help some of you out there as well. If others offer to help, we can start building it.

So how about it researchers of the world, would you publish all of your research through such a system?

Objection to Microdata Candidate Recommendation

Full disclosure: I’m the current chair of the standards group at the World Wide Web Consortium that created the newest version of RDFa, editor of the HTML5+RDFa 1.1 and RDFa Lite 1.1 specifications, and I’m also a member of the HTML Working Group.

Edit: 2012-12-01 – Updated the article to rephrase some things, and include rationale and counter-arguments at the bottom in preparation for the HTML WG poll on the matter.

The HTML Working Group at the W3C is currently trying to decide if they should transition the Microdata specification to the next stage in the standardization process. There has been a call for consensus to transition the spec to the Candidate Recommendation stage. The problem is that we already have a set of specifications that are official W3C recommendations that do what Microdata does and more. RDFa 1.1 became an official W3C Recommendation last summer. From a standards perspective, this is a mistake and sends a confused signal to Web developers. Officially supporting two specification that do almost exactly the same thing in almost exactly the same way is, ultimately, a failure to standardize.

The fact that RDFa already does what Microdata does has been elaborated upon before:

Mythical Differences: RDFa Lite vs. Microdata
An Uber-comparison of RDFa, Microdata, and Microformats

Here’s the problem in a nutshell: The W3C is thinking of ratifying two completely different specifications that accomplish the same thing in basically the same way. The functionality of RDFa, which is already a W3C Recommendation, overlaps Microdata by a large margin. In fact, RDFa Lite 1.1 was developed as a plug-in replacement for Microdata. The full version of RDFa can also do a number of things that Microdata cannot, such as datatyping, associating more than one type per object, embed-ability in languages other than HTML, ability to easily publish and mix vocabularies, etc.

Microdata would have easily been dead in the water had it not been for two simple facts: 1) The editor of the specification works at Google, and 2) Google pushed Microdata as the markup language for before also accepting RDFa markup. The first enabled Google and the editor to work on without signalling to the public that it was creating a competitor to Facebook’s Open Graph Protocol. The second gave Microdata enough of a jump start to establish a foothold for markup. There have been a number of studies that show that Microdata’s sole use case (99% of Microdata markup) is for the markup of terms. Microdata is not widely used outside of that context, we now have data to back up what we had predicted would happen when made their initial announcement for Microdata-only support. Note that now supports both RDFa and Microdata.

It is typically a bad idea to have two formats published by the same organization that do the same thing. It leads to Web developer confusion surrounding which format to use. One of the goals of Web standards is to reduce, or preferably eliminate, the confusion surrounding the correct technology decision to make. The HTML Working Group and the W3C is failing miserably on this front. There is more confusion today about picking Microdata or RDFa because they accomplish the same thing in effectively the same way. The only reason both exist is due to political reasons.

If we step back and look at the technical arguments, there is no compelling reason that Microdata should be a W3C Recommendation. There is no compelling reason to have two specifications that do the same thing in basically the same way. Therefore, as a member of the HTML Working Group (not as a chair or editor of RDFa) I object to the publication of Microdata as a Candidate Recommendation.

Note that this is not a W3C formal objection. This is an informal objection to publish Microdata along the Recommendation track. This objection will not become an official W3C formal objection if the HTML Working Group holds a poll to gather consensus around whether Microdata should proceed along the Recommendation publication track. I believe the publication of a W3C Note will continue to allow Google to support Microdata in, but will hopefully correct the confused message that the W3C has been sending to Web developers regarding RDFa and Microdata. We don’t need two specifications that do almost exactly the same thing.

The message sent by the W3C needs to be very clear: There is one recommendation for doing structured data markup in HTML. That recommendation is RDFa. It addresses all of the use cases that have been put forth by the general Web community, and it’s ready for broad adoption and implementation today.

If you agree with this blog post, make sure to let the HTML Working Group know that you do not think that the W3C should ratify two specifications that do almost exactly the same thing in almost exactly the same way. Now is the time to speak up!

Summary of Facts and Arguments

Below is a summary of arguments presented as a basis for publishing Microdata along the W3C Note track:

  1. RDFa 1.1 is already a ratified Web standard as of June 7th 2012 and absorbed almost every Microdata feature before it became official. If the majority of the differences between RDFa and Microdata boil down to different attribute names (property vs. itemprop), then the two solutions have effectively converged on syntax and W3C should not ratify two solutions that do effectively the same thing in almost exactly the same way.
  2. RDFa is supported by all of the major search crawlers, including Google (and, Microsoft, Yahoo!, Yandex, and Facebook. Microdata is not supported by Facebook.
  3. RDFa Lite 1.1 is feature-equivalent to Microdata. Over 99% of Microdata markup can be expressed easily in RDFa Lite 1.1. Converting from Microdata to RDFa Lite is as simple as a search and replace of the Microdata attributes with RDFa Lite attributes. Conversely, Microdata does not support a number of the more advanced RDFa features, like being able to tell the difference between feet and meters.
  4. You can mix vocabularies with RDFa Lite 1.1, supporting both and Facebook’s Open Graph Protocol (OGP) using a single markup language. You don’t have to learn Microdata for and RDFa for Facebook – just use RDFa for both.
  5. The creator of the Microdata specification doesn’t like Microdata. When people are not passionate about the solutions that they create, the desire to work on those solutions and continue improve upon them is muted. The RDFa community is passionate about the technology that they have created together and have strived to make it better since the standardization of RDFa 1.0 back in 2008.
  6. RDFa Lite 1.1 is fully upward-compatible with RDFa 1.1, allowing you to seamlessly migrate to a more feature-rich language as your Linked Data needs grow. Microdata does not support any of the more advanced features provided by RDFa 1.1.
  7. RDFa deployment is broader than Microdata. RDFa deployment continues to grow at a rapid pace.
  8. The economic damage generated by publishing both RDFa and Microdata along the Recommendation track should not be underestimated. W3C should try to provide clear direction in an attempt to reduce the economic waste that a “let the market sort it out among two nearly identical solutions” strategy will generate. At some point, the market will figure out that both solutions are nearly identical, but only after publishing and building massive amounts of content and tooling for both.
  9. The W3C Technical Architecture Group (TAG), which is responsible for ensuring that the core architecture of the Web is sound, has raised their concern about the publication of both Microdata and RDFa as recommendations. After the W3C TAG raised their concerns, the RDFa Working Group created RDFa Lite 1.1 to be a near feature-equivalent replacement for Microdata that was also backwards-compatible with RDFa 1.0.
  10. Publishing a standard that does almost exactly the same thing as an existing standard in almost exactly the same way is a failure to standardize.

Counter-arguments and Rebuttals

[This is a] classic case of monopolistic anti-competitive protectionism.

No, this is an objection to publishing two specifications that do almost exactly the same thing in almost exactly the same way along the W3C Recommendation publication track. Protectionism would have asked that all work on Microdata be stopped and the work scuttled. The proposed resolution does not block anybody from using Microdata, nor does it try to stop or block the Microdata work from happening in the HTML WG. The objection asks that the W3C decide what the best path forward for Web developers is based on a fairly complicated set of predicted outcomes. This is not an easy decision. The objection is intended to ensure that the HTML Working Group has this discussion before we proceed to Candidate Recommendation with Microdata.

<manu1> I'd like the W3C to work as well, and I think publishing two specs that accomplish basically 
        the same thing in basically the same way shows breakage.
<annevk> Bit late for that. XDM vs DOM, XPath vs Selectors, XSL-FO vs CSS, XSLT vs XQuery, 
         XQuery vs XQueryX, RDF/XML vs Turtle, XForms vs Web Forms 2.0, 
         XHTML 1.0 vs HTML 4.01, XML 1.0 4th Edition vs XML 1.0 5th Edition, 
         XML 1.0 vs XML 1.1, etc.

[link to full conversation]

While W3C does have a history of publishing competing specifications, there have been features in each competing specification that were compelling enough to warrant the publication of both standards. For example, XHTML 1.0 provided a standard set of rules for validating documents that was aligned with XML and a decentralized extension mechanism that HTML4.01 did not. Those two major features were viewed as compelling enough to publish both specifications as Recommendations via W3C.

For authors, the differences between RDFa and Microdata are so small that, for 99% of documents in the wild, you can convert a Microdata document to an RDFa Lite 1.1 document with a simple search and replace of attribute names. That demonstrates that the syntaxes for both languages are different only in the names of the HTML attributes, and that does not seem like a very compelling reason to publish both specifications as Recommendations.

Microdata’s processing algorithm is vastly simpler, which makes the data
extracted more reliable and, when something does go wrong, makes it easier for 1) users to debug their own data, and 2) easier for me to debug it if they can’t figure it out on their own.

Microdata’s processing algorithm is simpler for two major reasons:

The complexity of implementing a processor has little bearing on how easy it is for developers to author documents. For example, XHTML 1.0 had a simpler processing model which made the data that was extracted more reliable and when something went wrong, it was easier to debug. However, HTML5 supported more use cases and recovers from errors in cases where it can, which made it more popular with Web developers in the long-run.

Additionally, authors of Microdata and RDFa should be using tools like RDFa Play to debug their markup. This is true for any Web technology. We debug our HTML, JavaScript, and CSS by loading it into a browser and bringing up the debugging tools. This is no different for Microdata and RDFa. If you want to make sure your markup does what you want, make sure to verify it by using a tool and not by trying to memorize the processing rules and running them through your head.

For what it is worth, I personally think RDFa is generally a technically better solution. But as Marcos says, “so what”? Our job at W3C is to make standards for the technology the market decides to use.

If we think one of these technologies is a technically better solution than the other one, we should signal that realization at some level. The most basic thing we could do is to make one an official Recommendation, and the other a Note. I also agree that our job at W3C is to make standards that the technology market decides to use, but clearly this particular case isn’t that cut-and-dried.’s only option in the beginning was to use Microdata, and since authors didn’t want to risk not showing up in the search engines, they used Microdata. This forced the market to go in one direction.

This discussion would be in a different place had Google kept the playing field level. That is not to say that Google didn’t have good reasons for making the decisions that they did at the time, but those reasons influenced the development of RDFa, and RDFa Lite 1.1 was the result. The differences between Microdata and RDFa have been removed and a new question is in front of us: given two almost identical technologies, should the W3C publish two specifications that do almost exactly the same thing in almost exactly the same way?

… the [HTML] Working Group explicitly decided not to pick a winner between HTML Microdata and HTML+RDFa

The question before the HTML WG at the time was whether or not to split Microdata out of the HTML5 specification. The HTML Working Group did not discuss whether the publishing track for the Microdata document should be the W3C Note track or the W3C Recommendation track. At the time the decision was made, RDFa Lite 1.1 did not exist, RDFa Lite 1.1 was not a W3C Recommendation, nor did the RDFa and Microdata functionality so greatly overlap as they do now. Additionally, the HTML WG decision at that time states the following under the “Revisiting the issue” section:

“If Microdata and RDFa converge in syntax…”

Microdata and RDFa have effectively converged in syntax. Since Microdata can be interpreted as RDFa based on a simple search-and-replace of attributes that the languages have effectively converged on syntax except for the attribute names. The proposal is not to have work on Microdata stopped. Let work on Microdata proceed in this group, but let it proceed on the W3C Note publication track.

Closing Statements

I felt uneasy raising this issue because it’s a touchy and painful subject for everyone involved. Even if the discussion is painful, it is a healthy one for a standardization body to have from time to time. What I wanted was for the HTML Working Group to have this discussion. If the upcoming poll finds that the consensus of the HTML Working Group is to continue with the Microdata specification along the Recommendation track, I will not pursue a W3C Formal Objection. I will respect whatever decision the HTML Working Group makes as I trust the Chairs of that group, the process that they’ve put in place, and the aggregate opinion of the members in that group. After all, that is how the standardization process is supposed to work and I’m thankful to be a part of it.

The Problem with RDF and Nuclear Power

Full disclosure: I am the chair of the RDFa Working Group, the JSON-LD Community Group, a member of the RDF Working Group, as well as other Semantic Web initiatives. I believe in this stuff, but am critical about the path we’ve been taking for a while now.

The Resource Description Framework (a model for publishing data on the Web) has this horrible public perception akin to how many people in the USA view nuclear power. The coal industry campaigned quite aggressively to implant the notion that nuclear power was not as safe as coal. Couple this public misinformation campaign with a few nuclear-power-related catastrophes and it is no surprise that the current public perception toward nuclear power can be summarized as: “Not in my back yard”. Nevermind that, per tera-watt, nuclear power generation has killed far fewer people since its inception than coal. Nevermind that it is one of the more viable power sources if we gaze hundreds of years into Earth’s future, especially with the recent renewed interest in Liquid Fluoride Thorium Reactors. When we look toward the future, the path is clear, but public perception is preventing us from proceeding down that path at the rate that we need to in order to prevent more damage to the Earth.

RDF shares a number of these similarities with nuclear power. RDF is one of the best data modeling mechanisms that humanity has created. Looking into the future, there is no equally-powerful, viable alternative. So, why has progress been slow on this very exciting technology? There was no public mis-information campaign, so where did this negative view of RDF come from?

In short, RDF/XML was the Semantic Web’s 3 Mile Island incident. When it was released, developers confused RDF/XML (bad) with the RDF data model (good). There weren’t enough people and time to counter-act the negative press that RDF was receiving as a result of RDF/XML and thus, we are where we are today because of this negative perception of RDF. Even Wikipedia’s page on the matter seems to imply that RDF/XML is RDF. Some purveyors of RDF think that the public perception problem isn’t that bad. I think that when developers hear RDF, they think: “Not in my back yard”.

The solution to this predicament: Stop mentioning RDF and the Semantic Web. Focus on tools for developers. Do more dogfooding.

To explain why we should adopt this strategy, we can look to Tesla for inspiration. Elon Musk, founder of PayPal and now the CEO of Tesla Motors, recently announced the Tesla Supercharger project. At a high-level, the project accomplishes the following jaw-dropping things:

  1. It creates a network of charging stations for electric cars that are capable of charging a Tesla in less than 30 minutes.
  2. The charging stations are solar powered and generate more electricity than the cars use, feeding the excess power into the local power grid.
  3. The charging stations are free to use for any person that owns a Tesla vehicle.
  4. The charging stations are operational and available today.

This means that, in 4-5 years, any owner of a Tesla vehicle be able to drive anywhere in the USA, for free, powered by the sun. No person in their right mind (with the money) would pass up that offer. No fossil fuel-based company will ever be able to provide “free”, clean energy. This is the sort of proposition we, the RDF/Linked Data/Semantic Web community, need to make; I think we can re-position ourselves to do just that.

Here is what the RDF and Linked Data community can learn from Tesla:

  1. The message shouldn’t be about the technology. It should be about the problems we have today and a concrete solution on how to address those problems.
  2. Demonstrate real value. Stop talking about the beauty of RDF, theoretical value, or design. Deliver production-ready, open-source software tools.
  3. Build a network of believers by spending more of your time working with Web developers and open-source projects to convince them to publish Linked Data. Dogfood our work.

Here is how we’ve applied these lessons to the JSON-LD work:

  1. We don’t mention RDF in the specification, unless absolutely necessary, and in many cases it isn’t necessary. RDF is plumbing, it’s in the background, and developers don’t need to know about it to use JSON-LD.
  2. We purposefully built production-ready tools for JSON-LD from day one; a playground, multiple production-ready implementations, and a JavaScript implementation of the browser-based API.
  3. We are working with Wikidata, Wikimedia, Drupal, the Web Payments and Read Write Web groups at W3C, and a number of other private clients to ensure that we’re providing real value and dogfooding our work.

Ultimately, RDF and the Semantic Web are of no interest to Web developers. They also have a really negative public perception problem. We should stop talking about them. Let’s shift the focus to be on Linked Data, explaining the problems that Web developers face today, and concrete, demonstrable solutions to those problems.

Note: This post isn’t meant as a slight against any one person or group. I was just working on the JSON-LD spec, aggressively removing prose discussing RDF, and the analogy popped into my head. This blog post was an exercise in organizing my thoughts on the matter.

HTML5 and RDFa 1.1

Full disclosure: I’m the chair of the newly re-chartered RDFa Working Group at the W3C as well as a member of the HTML WG.

The newly re-chartered RDFa Working Group at the W3C published a First Public Working Draft of HTML5+RDFa 1.1 today. This might be confusing to those of you that have been following the RDFa specifications. Keep in mind that HTML5+RDFa 1.1 is different from XHTML+RDFa 1.1, RDFa Core 1.1, and RDFa Lite 1.1 (which are official specs at this point). This is specifically about HTML5 and RDFa 1.1. The HTML5+RDFa 1.1 spec reached Last Call (aka: almost done) status at W3C via the HTML Working Group last year. So, why are we doing this now and what does it mean for the future of RDFa in HTML5?

Here’s the issue: the document was being unnecessarily held up by the HTML5 specification. In the most favorable scenario, HTML5 is expected to become an official standard in 2014. RDFa Core 1.1 became an official standard in June 2012. Per the W3C process, HTML5+RDFa 1.1 would have had to wait until 2014 to become an official W3C specification, even though it would be ready to go in a few months from now. W3C policy states that all specs that your spec depends on must reach the official spec status before your spec becomes official. Since HTML5+RDFa 1.1 is a language profile for RDFa 1.1 that is layered on top of HTML5, it had no choice but to wait for HTML5 to become official. Boo.

Thankfully the chairs of the HTML WG, RDFa WG, and W3C staff found an alternate path forward for HTML5+RDFa 1.1. Since the specification doesn’t depend on any “at risk” features in HTML5, and since all of the features that RDFa 1.1 uses in HTML5 have been implemented in all of the Web browsers, there is very little chance that those features will be removed in the future. This means that HTML5+RDFa 1.1 could become an official W3C specification before HTML5 reaches that status. So, that’s what we’re going to try to do. Here’s the plan:

  1. Get approval from W3C member companies to re-charter the RDFa WG to take over publishing responsibility of HTML5+RDFa 1.1. [Done]
  2. Publish the HTML5+RDFa 1.1 specification under the newly re-chartered RDFa WG. [Done]
  3. Start the clock on a new patent exclusion period and resolve issues. Wait a minimum of 6 months to go to W3C Candidate Recommendation (feature freeze) status, due to patent policy requirements.
  4. Fast-track to an official W3C specification (test suite is already done, inter-operable implementations are already done).

There are a few minor issues that still need to be ironed out, but the RDFa WG is on the job and those issues will get resolved in the next month or two. If everything goes according to plan, we should be able to publish HTML5+RDFa 1.1 as an official W3C standard in 7-9 months. That’s good for RDFa, good for Web Developers, and good for the Web.

Mythical Differences: RDFa Lite vs. Microdata

Full disclosure: I’m the current chair of the standards group at the World Wide Web Consortium that created the newest version of RDFa.

RDFa 1.1 became an official Web specification last month. Google started supporting RDFa in Google Rich Snippets some time ago and has recently announced that they will support RDFa Lite for as well. These announcements have led to a weekly increase in the number of times the following question is asked by Web developers on Twitter and Google+:

“What should I implement on my website? Microdata or RDFa?”

This blog post attempts to answer the question once and for all. It dispels some of the myths around the Microdata vs. RDFa debate and outlines how the two languages evolved to solve the same problem in almost exactly the same way.


Here’s the short answer for those of you that don’t have the time to read this entire blog post: Use RDFa Lite – it does everything important that Microdata does, it’s an official standard, and has the strongest deployment of the two.

Functionally Equivalent

Microdata was initially designed as a simple subset of RDFa and Microformats, primarily focusing on the core features of RDFa. Unfortunately, when this was done, the choice was made to break compatibility with RDFa and effectively fork the specification. Conversely, RDFa Lite highlights the subset of RDFa that Microdata did, but does it in a way that does not break backwards compatibility with RDFa. This was done on purpose, so that Web developers wouldn’t have a hard decision in front of them.

RDFa Lite contains all of the simplicity of Microdata coupled with the extensibility of and compatibility with RDFa. This is an important point that is often lost in the debate – there is no solid technical reason for choosing Microdata over RDFa Lite anymore. There may have been a year ago, but RDFa Lite made a few tweaks in such a way as to achieve feature-parity with Microdata today while being able to do much more than Microdata if you ever need the flexibility. If you don’t want to code yourself into a corner – use RDFa Lite.

To examine why RDFa Lite is a better choice, let’s take a look at the markup attributes for Microdata and the functionally equivalent ones provided by RDFa Lite:

Microdata 1.0 RDFa Lite 1.1 Purpose
itemid resource Used to identify the exact thing that is being described using a URL, such as a specific person, event, or place.
itemprop property Used to identify a property of the thing being described, such as a name, date, or location.
itemscope not needed Used to signal that a new thing is being described.
itemtype typeof Used to identify the type of thing being described, such as a person, event, or place.
itemref not needed Used to copy-paste a piece of data and associate it with multiple things.
not supported vocab Used to specify a default vocabulary that contains terms that are used by markup.
not supported prefix Used to mix different vocabularies in the same document, like ones provided by Facebook, Google, and open source projects.

As you can see above, both languages have exactly the same number of attributes. There are nuanced differences on what each attribute allows one to do, but Web developers only need to remember one thing from this blog post: Over 99% of all Microdata markup in the wild can be expressed in RDFa Lite just as easily. This is a provable fact – replace all Microdata attributes with the equivalent RDFa Lite attributes, add vocab="" to the markup block, and you’re done.

At this point, you may be asking yourself why the two languages are so similar. There is almost 8 years of history here, but to summarize: RDFa was created around the 2004 time frame, Microdata came much later and used RDFa as a design template. Microdata chose a subset of the original RDFa design to support, but did so in an incompatible way. RDFa Lite then highlighted the subset of the functionality that Microdata did, but in a way that is backwards compatible with RDFa. RDFa Lite did this while keeping the flexibility of the original RDFa intact.

That leaves us where we are today – with two languages, Microdata and RDFa Lite, that accomplish the same things using the same markup patterns. The reason both exist is a very long story involving politics, egos, and a fair amount of disfunctionality between various standards groups – all of which doesn’t have any impact on the actual functionality of either language. The bottom line is that we now have two languages that do almost exactly the same thing. One of them, RDFa Lite 1.1, is currently an official standard. The other one, Microdata, probably won’t become a standard until 2014.

Markup Similarity

The biggest deployment of Microdata on the Web is for implementing the vocabulary by Google. Recently, with the release of RDFa Lite 1.1, Google has announced their intent to “officially” support RDFa as well. To see what this means for Web developers, let’s take a look at some markup. Here is a side-by-side comparison of two markup examples – one in Microdata and another in RDFa Lite 1.1:

Microdata 1.0 RDFa Lite 1.1
<div itemscope itemtype="">
  <img itemprop="image" src="dell-30in-lcd.jpg" />
  <span itemprop="name">Dell UltraSharp 30" LCD Monitor</span>
<div vocab="" typeof="Product">
  <img property="image" src="dell-30in-lcd.jpg" />
  <span property="name">Dell UltraSharp 30" LCD Monitor</span>

If the markup above looks similar to you, that was no accident. RDFa Lite 1.1 is designed to function as a drop-in replacement for Microdata.

The Bits that Don’t Matter

Only two features of Microdata aren’t supported by RDFa Lite; itemref and itemscope. Regarding itemref, the RDFa Working Group discussed the addition of that property and, upon reviewing Microdata markup in the wild, saw almost no use of itemref in production code. The examples steer clear of using itemref as well, so it was fairly clear that itemref is, and will continue to be, an unused feature of Microdata. The itemscope property is redundant in RDFa Lite and is thus unnecessary.

5 Reasons

For those of you that still are not convinced, here are the top five reasons that you should pick RDFa Lite 1.1 over Microdata:

  1. RDFa is supported by all of the major search crawlers, including Google (and, Microsoft, Yahoo!, Yandex, and Facebook. Microdata is not supported by Facebook.
  2. RDFa Lite 1.1 is feature-equivalent to Microdata. Over 99% of Microdata markup can be expressed easily in RDFa Lite 1.1. Converting from Microdata to RDFa Lite is as simple as a search and replace of the Microdata attributes with RDFa Lite attributes. Conversely, Microdata does not support a number of the more advanced RDFa features, like being able to tell the difference between feet and meters.
  3. You can mix vocabularies with RDFa Lite 1.1, supporting both and Facebook’s Open Graph Protocol (OGP) using a single markup language. You don’t have to learn Microdata for and RDFa for Facebook – just use RDFa for both.
  4. RDFa Lite 1.1 is fully upward-compatible with RDFa 1.1, allowing you to seamlessly migrate to a more feature-rich language as your Linked Data needs grow. Microdata does not support any of the more advanced features provided by RDFa 1.1.
  5. RDFa deployment is greater than Microdata. RDFa deployment continues to grow at a rapid pace.

Hopefully the reasons above are enough to convince most Web developers that RDFa Lite is the best bet for expressing Linked Data in web pages, boosting your Search Engine Page rank, and ensuring that you’re future-proofing your website as your data markup needs grow over the next several years. If it’s not, please leave a comment below explaining why you’re still not convinced.

If you’d like to learn more about RDFa, try the website. If you’d like to see more RDFa Lite examples and play around with the live RDFa editor, check out RDFa Play.

Thanks to Tattoo Tabatha for the artwork in this blog piece.

Blindingly Fast RDFa 1.1 Processing

The fastest RDFa processor in the world just got a big update – librdfa 1.1 has just been released! librdfa is a SAX-based RDFa processor, written in pure C – which makes it very portable to a variety of different software and hardware architectures. It’s also tiny and fast – the binary is smaller than this web page (around 47KB), and it’s capable of extracting roughly 5,000 triples per second per CPU core from an HTML or XML document. If you use Raptor or the Redland libraries, you use librdfa.

The timing for this release coincides with the push for a full standard at W3C for RDFa 1.1. The RDFa 1.1 specification has been in feature-freeze for over a month and is proceeding to W3C vote to finalize it as an officially recognized standard. There are now 5 fully conforming implementations for RDFa in a variety of languages – librdfa in C, PyRDFa in Python, RDF::RDFa in Ruby, Green Turtle in JavaScript, and clj-rdfa in Clojure.

It took about a month of spare-time hacking on librdfa to update it to support RDFa 1.1. It has also been given a new back-end document processor. A migration from libexpat to libxml2 was performed in order to better support processing of badly authored HTML documents as well as well formed XML documents. Support for all of the new features in RDFa 1.1 have been added, including the @vocab attribute, @prefix, and @inlist. Full support for RDFa Lite 1.1 has also been included. A great deal of time was also put into making sure that there were absolutely no memory leaks or pointer issues across all 700+ tests in the RDFa 1.1 Test Suite. There is still some work that needs to be done to add HTML5 @datetime attribute support and fix xml:base processing in SVG files, but that’s fairly small stuff that will be implemented over the next month or two.

Many thanks to Daniel Richard G., who updated the build system to be more cross-platform and pure C compliant on a variety of different architectures. Also thanks to Dave Longley who fixed the very last memory leak, which turned out to be a massive pain to find and resolve. This version of librdfa is ready for production use for processing all XML+RDFa and XHTML+RDFa documents. This version also supports both RDFa 1.0 and RDFa 1.1, as well as RDFa Lite 1.1. While support for HTML5+RDFa is 95% of the way there, I expect that it will be 100% in the next month or two.

Google Indexing RDFa 1.0 + Markup

Full disclosure: I am the chair of the RDF Web Applications Working Group at the World Wide Web Consortium – RDFa is one of the technologies that we’re working on.

Google is building a gigantic Knowledge Graph that will change search forever. The purpose of the graph is to understand the conceptual “things” on a web page and produce better search results for the world. Clearly, the people and companies that end up in this Knowledge Graph first will have a huge competitive advantage over those that do not. So, what can you do today to increase your organization’s chances of ending up in this Knowledge Graph, and thus ending up higher in the Search Engine Result Pages (SERPs)?

One possible approach is to mark your pages up with RDFa and “But wait”, you might ask, “ doesn’t support RDFa, does it?”. While launched with only Microdata support, Google has said that they will support RDFa 1.1 Lite, which is slated to become an official specification in the next couple of months.

However, that doesn’t mean that the Google engineers are sitting still while the RDFa 1.1 spec moves toward official standard status. RDFa 1.0 became an official specification in October 2008. Many people have been wondering if Google would start indexing RDFa 1.0 + markup while we wait for RDFa 1.1 to become official. We have just discovered that Google is not only indexing expressed as RDFa 1.0, but they’re enhancing search result listings based on data gleaned from markup expressed as RDFa 1.0!

Here’s what it looks like in the live Google search results:

Enhanced Google search result showing event information
The image above shows a live, enhanced Google search result with event information extracted from the RDFa 1.0 + data, including date and location of the event.

Enhanced Google search result showing recipe preparation time information
The image above shows a live, enhanced Google search result with recipe preparation time information, also extracted from the RDFa 1.0 + data that was on the page.

Enhanced Google search result showing detailed event information with click-able links
The image above shows a live, enhanced Google search result with very detailed event information gleaned from the RDFa 1.0 + data, including date, location and links to the individual event pages.

Looking at the source code for the pages above, a few things become apparent:

  1. All of the pages contain a mixture of RDFa 1.0 + markup. There is no Microformats or Microdata markup used to express the data shown in the live search listings. The RDFa 1.0 + data is definitely being used in live search listing displays.
  2. The Drupal module seems to be used for all of the pages, so if you use Drupal, you will probably want to install that module if you want the benefit of enhanced Google search listings.
  3. The search and social companies are serious about indexing RDFa content, which means that you may want to get serious about adding it into your pages before your competitors do.

Google isn’t the only company that is building a giant global graph of knowledge. Last year, Facebook launched a similar initiative called the Open Graph, which is also built on RDFa. The end-result of all of this work are better search listings, more relevant social interactions, and a more unified way of expressing “things” in Web pages using RDFa.

Does your website talk about any of the following things: Applications, Authors, Events, Movies, Music, People, Products, Recipes, Reviews, and/or TV Episodes? If so, you should probably be expressing that structured data as RDFa so that both Facebook and Google can give you better visibility over those that don’t in the coming years. You can get started by viewing the RDFa examples or reading more about Facebook’s Open Graph markup. If you don’t know anything about RDFa, you may want to start with the RDFa Lite document, or the RDFa Primer.

Many thanks to Stéphane Corlosquet for spotting this and creating the Drupal 7 module. Also thanks to Chris Olafson for spotting that RDFa 1.0 + markup is now consistently being displayed in live Google search results.

Searching for Microformats, RDFa, and Microdata Usage in the Wild

A few weeks ago, we announced the launch of the Data Driven Standards Community Group at the World Wide Web Consortium (W3C). The focus is on researching, analyzing and publicly documenting current usage patterns on the Internet. Inspired by the Microformats Process, the goal of this group is to enlighten standards development with real-world data. This group will collect and report data from large Web crawls, produce detailed reports on protocol usage across the Internet, document yearly changes in usage patterns and promote findings that demonstrate that the current direction of a particular specification should be changed based on publicly available data. All data, research, and analysis will be made publicly available to ensure the scientific rigor of the findings. The group will be a collection of search engine companies, academic researchers, hobbyists, protocol designers and specification editors in search of data that will guide the Internet toward a brighter future.

We had launched the group with the intent of regularly analyzing the Common Crawl data set. The goal of Common Crawl is to build and maintain an open crawl of the web that can be used by researchers, educators and innovators. The crawl currently contains roughly 40TB of compressed data, around 5 billion web pages, and is hosted on Amazon’s S3 service. To analyze the data, you have to write a small piece of analysis software that is then applied to all of the data using Amazon’s Elastic Map Reduce service.

I spent a few hours a couple of nights ago and wrote the analysis software, which is available as open source on github. This blog post won’t go into how the software was written, but rather the methodology and data that resulted from the analysis. There were three goals that I had in mind when performing this trial run:

  • Quickly hack something together to see if Microformats, RDFa and Microdata analysis was feasible.
  • Estimate the cost of performing a full analysis.
  • See if the data correlates with the Yahoo! study or the Web Data Commons project.


The analysis software was executed against a very small subset of the Common Crawl data set. The directory that was analyzed (/commoncrawl-crawl-002/2010/01/07/18/) contained 1,273 ARC files, each weighing in at 100MBs each for around 124GBs of data processed. It took 8 EC2 machines a total of 14 hours and 23 minutes to process the data, for a grand total of 120 CPU hours utilized.

The analysis software streams each file from disk, decompresses it and breaks each file into the data that was retrieved from a particular URL. The file is checked to ensure that it is an HTML or XHTML file, if it isn’t, it is skipped. If the file is an XHTML or HTML file, an HTML4 DOM is constructed from the file using a very forgiving tag soup parser. At that point, CSS selectors are executed on the resulting DOM to search for HTML elements that contain attributes for each language. For example, the CSS selector “[property]” is executed to retrieve a count of all RDFa property attributes on the page. The same was performed for Microdata and Microformats. You can see the exact CSS queries used in the source file for the markup language detector.


Here are the types of documents that we found in the sample set:

Document Type Count Percentage
HTML or XHTML 10,598,873 100%
Microformats 14,881 0.14%
RDFa 4726 0.045%
Microdata* 0 0%
* The sample size was clearly too small since there were reports of Microdata out in the wild before this point in time in 2010.

The numbers above clearly deviate from both the Yahoo! study and the Web Data Commons project. The problem with our data set was that it was probably too small to really tell us anything useful, so please don’t use the numbers in this blog post for anything of importance.

The analysis software also counted the RDFa 1.1 attributes:

RDFa Attribute Count
property 3746
about 3671
resource 833
typeof 302
datatype 44
prefix 31
vocab 1

The property, about, resource, typeof, and datatype attributes have a usage pattern that is not very surprising. I didn’t check for combinations of attributes like property and content on the same element due to time constraints. I only had one night to figure out how to write the software, write it and run it. This sort of co-attribute detection should be included in future analysis of the data. What was surprising was that the prefix and vocab attributes were used somewhere out there before the features were introduced into RDFa 1.1, but not to the degree that it would be of concern to the people designing the RDFa 1.1 language.

The Good and the Bad

The good news is that it does not take a great deal of effort to write a data analysis tool and run it against the Common Crawl data set. I’ve published both our methodology and findings such that anybody could re-create them if they so desired. So, this is good for open Web Science initiatives everywhere.

However, there is bad news. It cost around $12.46 USD to run the test using Amazon’s Elastic Map Reduce system. The Common Crawl site states that they believe that it would cost roughly $150 to process the entire data set, but my calculations show a very different picture when you start doing non-trivial analysis. Keep in mind that 124GBs was processed of a total 40TB of data. That is, only about 0.31% of the data set was processed for $12.46. To process the entire Common Crawl corpus, it would cost around $4,020 USD. Clearly far more than what any individual would want to spend, but still very much within the reach of small companies and research groups.

Funding a full analysis of the entire Common Crawl dataset seemed within reach, but after discoverng what the price would be, I’m having second thoughts about performing the full analysis without a few other companies or individuals pitching in to cover the costs.

Potential Ways Forward

We may have run the analysis in a way that caused the price to far exceed what was predicted by the Common Crawl folks. I will be following up with them to see if there is a trick to reducing the cost of the EC2 instances.

One option would be to bid a very low price for Amazon EC2 Spot Instances. The down-side is that processing would happen only when nobody else was willing to bid the price we would and therefore, the processing job could take weeks. Another approach would have us use regular expressions to process the document instead of building an in-memory HTML DOM for the document. Regular expressions would be able to detect RDFa/Microdata and Microformats using far less CPU than the DOM-based approach. Yet another approach would have an individual or company with $4K to spend on this research project fund the analysis of the full data set.

Overall, I’m excited that doing this sort of analysis is becoming available to those of us without access to Google or Facebook-like resources. It is only a matter of time before we will be able to do a full analysis on the Common Crawl data set. If you are reading this and think you can help fund this work, please leave a comment on this blog, or e-mail me directly at:

Web Data Commons Launches

Some interesting numbers were just published regarding Microformats, RDFa and Microdata adoption as of October 2010 (fifteen months ago). The source of the data is the new CommonCrawl dataset, which is being analyzed by the Web Data Commons project. They sampled 1% of the 40 Terabyte data set (1.3 million pages) and came up with the following number of total statements (triples) made by pages in the sample set:

Markup Format Statements
Microformats 30,706,071
RDFa 1,047,250
Microdata 17,890
Total 31,771,211

Based on this preliminary data, of the structured data on the Web: 96.6% of it was Microformats, 3.2% of it was RDFa, and 0.05% of it was Microdata. Microformats is the clear winner in October 2010, with the vast majority of the data consisting of markup of people (hCard) and their relationships with one another (xfn). I also did a quick calculation on percentage of the 1.3 million URLs that contain Microformats, RDFa and Microdata markup:

Format Percentage of Pages
Microformats 88.9%
RDFa 12.1%
Microdata 0.09%

These findings deviate wildly from the findings by Yahoo around the same time. Additionally, the claim that 88.9% of all pages on the Web contain Microformats markup, even though I’d love to see that happen, is wishful thinking.

There are a few things that could have caused these numbers to be off. The first is that the Web Data Commons’ parsers are generating false positives or negatives, resulting in bad statement counts. A quick check of the data, which they released in full, will reveal if this is true. The other cause could be that the Yahoo study was flawed in the same way, but we may never know if that is true because they will probably never release their data set or parsers for public viewing. By looking at the RDFa usage numbers (3.2% for the Yahoo study vs. 12.1% for Web Data Commons) and the Microformats usage numbers (roughly 5% for the Yahoo study vs. 88.9% for Web Data Commons), the Web Data Commons numbers seem far more suspect. Data publishing in HTML is taking off, but it’s not that popular yet.

I would be wary of doing anything with these preliminary findings until the Web Data Commons folks release something more final. Nevertheless, it is interesting as a data point and I’m looking forward toward the full analysis that these researchers do in the coming months.

Web Payments: PaySwarm vs. OpenTransact Shootout (Part 3)

This is a continuing series of blog posts analyzing the differences between PaySwarm and OpenTransact. The originating blog post and subsequent discussion is shown below:

  1. Web Payments: PaySwarm vs. OpenTransact Shootout by Manu Sporny
  2. OpenTransact the payment standard where everything is out of scope by Pelle Braendgaard
  3. Web Payments: PaySwarm vs. OpenTransact Shootout (Part 2) by Manu Sporny
  4. OpenTransact vs PaySwarm part 2 – yes it’s still mostly out of scope by Pelle Braendgaard

It is the last post of Pelle’s that this blog post will address. All of the general points made in the previous analysis still hold and so familiarizing yourself with them before continuing will give you some context. In summary,

TL;DR – The OpenTransact standard does not specify the minimum necessary algorithms and processes required to implement an interoperable, open payment network. It, accidentally, does the opposite – further enforcing silo-ed payment networks, which is exactly what PaySwarm is attempting to prevent.

You may jump to each section of this blog post:

  1. Why OpenTransact Fails To Standardize Web Payments
  2. General Misconceptions (continued)
  3. Detailed Rebuttal (continued)
  4. Conclusion

Why OpenTransact Fails to Standardize Web Payments

After analyzing OpenTransact over the past few weeks, the major issue of the technology is becoming very clear. The Web Payments work is about writing a world standard. The purpose of a standard is to formalize the data formats and protocols, in explicit detail, that teaches the developer how two pieces of software should interoperate. If any two pieces of software implement the standard, it is known that they will be able to communicate and carry out any of the actions defined in the standard. The OpenTransact specification does not achieve this most fundamental goal of a standard. It does not specify how any two payment processors may interoperate, instead, it is a document that suggests one possible way for a single payment processor to implement its Web API.

Here is why this is a problem: When a vendor lists an OpenTransact link on their website, and a customer clicks on that link, the customer is taken to the vendor’s OpenTransact payment processor. If the customer does not have an account on that payment processor, they must get an account, verify their information, put money in the account, and go through all of the hoops required to get an account on that payment provider. In other words, OpenTransact changes absolutely nothing about how payment is performed online today.

For example, if you go to a vendor and they have a PayPal button on their site, you have to go to PayPal and get an account there in order to pay the vendor. If they have an Amazon Payments button instead, you have to go to Amazon and get an account there in order to pay the vendor. Even worse, OpenTransact doesn’t specify how individuals are identified on the network. One OpenTransact provider could use e-mail addresses for identification, while another one might use Facebook accounts or Twitter handles. There is no interoperability because these problems are considered out of scope for the OpenTransact standard.

PaySwarm, on the other hand, defines exactly how payment processors interoperate and identities are used. A customer may choose their payment processor independently of the vendor, and the vendor may choose their payment processor independently of the customer. The PaySwarm specification also details how a vendor can list items for sale in an interoperable way such that any transaction processor may process a sale of the item. PaySwarm enables choice in a payment processor, OpenTransact does not.

OpenTransact continues to lock in customers and merchants into a particular payment processor. It requires that they both choose the same one if they are to exchange payment. While Pelle has asserted that this is antithetical to OpenTransact, the specification fails to detail how a customer and a merchant could use two different payment processors to perform a purchase. Leaving something as crucial as sending payment from one payment processor to the next as unspecified will only mean that many payment processors will implement mechanisms that are non-interoperable across all payment processors. Given this scenario, it doesn’t really matter what the API is for the payment processor as everyone has to be using the same system anyway.

Therefore, the argument that OpenTransact can be used as a basic building block for online commerce is fatally flawed. The only thing that you can build on top of OpenTransact is a proprietary walled garden of payments, an ivory tower of finance. This is exactly what payment processors do today, and will do with OpenTransact. It is in their best interest to create closed financial networks as it strips market power away from the vendor and the customer and places it into their ivory tower.

Keep this non-interoperability point in mind when you see an “out of scope” argument on behalf of OpenTransact – there are some things that can be out of scope, but not at the expense of choice and interoperability.

General Misconceptions (continued)

There are a number of misconceptions that Pelle’s latest post continues to hold regarding PaySwarm that demonstrate a misunderstanding of the purpose of the specification. These general misconceptions are addressed below, followed by a detailed analysis of the rest of Pelle’s points.

PaySwarm is a fully featured idealistic multi layered approach where you must buy into a whole different way running your business.

The statement is hyperbolic – no payment technology requires you to “buy into a whole different way of running your business”. Vendors on the Web list items for sale and accept payment for those items in a number of different ways. This is usually accomplished by using shopping cart software that supports a variety of different payment mechanisms – eCheck, credit card, PayPal, Google Checkout, etc. PaySwarm would be one more option that a vendor could employ to receive payment.

PaySwarm standardizes an open, interoperable way that items are listed for sale on the Web, the protocol that is used to perform a transaction on the Web, and how transaction processors may interoperate with one another.

PaySwarm is a pragmatic approach that provides individuals and businesses with a set of tools to make Web-based commerce easier for their customers and thus provides a competitive advantage for those businesses that choose to adopt it. Businesses don’t need to turn off their banking and credit card processing services to use PaySwarm – it would be foolish for any standard to take that route.

PaySwarm doesn’t force any sort of out-right replacement of what businesses and vendors do today, it is something that can be phased in gradually. Additionally, it provides built-in functionality that you cannot accomplish via traditional banking and credit card services. Functionality like micro-payments, crowd-funding, a simple path to browser-integration, digital receipts, and a variety of innovative new business models for those willing to adopt them. That is, individuals and businesses will adopt PaySwarm because; 1) it provides a competitive advantage, 2) it allows new forms of economic value exchange to happen on the Web, 3) it is designed to fight vendor-lock in, and 4) it thoroughly details how to achieve interoperability as an open standard.

It is useless to call a technology idealistic, as every important technology starts from idealism and then gets whittled down into a practical form – PaySwarm is no different. The proposal for the Web was idealistic at the time, it was a multi-layered approach, and it does require a whole different way of running a Web-based business (only because Web-based businesses did not exist before the Web). It’s clear today that all of those adjectives (“idealistic”, “multi-layered”, and “different”) were some of the reasons that the Web succeeded, even if none of those words apply to PaySwarm in the negative way that is asserted in Pelle’s blog post.

However the basic PaySwarm philosophy of wanting to design a whole world view is very similar to central planning or large standards bodies like ANSI, IEEE etc. OpenTransact follows the market based approach that the internet was based on of small standards that do one thing well.

As stated previously, PaySwarm is limited in scope by the use cases that have been identified as being important to solve. It is important to understand the scope of the problem before attempting a solution. OpenTransact fails to grasp the scope of the problem and thus falls short of providing a specification that defines how interoperability is achieved.

Furthermore, it is erroneous to assert that the Internet was built using a market-based approach and small standards. The IEEE, ANSI, and even government had a very big part to play in the development of the Internet and the Web as we know it today.

Here are just a few of the technologies that we enjoy today because of the IEEE: Ethernet, WiFi, Mobile phones, Mobile Broadband, POSIX, and VHDL. Here are the technologies that we enjoy today because of ANSI: The standard for encoding most of the letters on your screen right now (ASCII and UTF-8), the C programming language standard, and countless safety specifications covering everything from making sure that commercial airliners are inspected properly, to hazardous waste disposal guidelines that take human health into account, to a uniform set of colors for warning and danger signs in the workplace.

The Internet wouldn’t exist in the form that we enjoy today without these IEEE and ANSI standards: Ethernet, ASCII, the C programming language, and many of the Link Layer technologies developed by the IEEE on which the foundation of the Internet was built. It is incorrect to assume that the Internet followed purely market-based forces and small standards. Let’s not forget that the Internet was a centrally planned, government funded (DARPA) project.

The point is that technologies are developed and come into existence through a variety of channels. There is not one overriding philosophy that is correct in every instance. The development of some technologies require one to move fast in the market, some require thoughtful planning and oversight, and some require a mixture of both. What is important in the end is that the technology works, is well thought out, and achieves the use cases it set out to achieve.

There are many paths to a standard. What is truly important in the end is that the technology works in an interoperable fashion, and in that vein, the assertion that OpenTransact does not meet the basic interoperability requirements of an open Web standard has still not been addressed.

Detailed Rebuttal (continued)

In the responses below, Pelle’s comment on his latest blog post is quoted and the rebuttal follows below each section of quoted text. Pay particular attention to how most of the responses are effectively that the “feature is out of scope”, but no solution is forthcoming to the problem that the feature is designed to address. That is, the problem is just kicked down the road for OpenTransact, where the PaySwarm specification makes a concerted effort to address each problem via the feature under discussion.

Extensible Machine Readable Metadata

Again this falls completely out of the scope. An extension could easily be done using JSON-LD as JSON-LD is simply an extension to JSON. I don’t think it would help the standard by specifying how extensions should be done at this point. I think JSON-LD is a great initiative and it may well be that which becomes an extension format. But there are also other simpler extensions that might better be called conventions that probably do not need the complication of JSON-LD. Such as Lat/Lng which has become a standard geo location convention in many different applications.

The need for extensible machine-readable metadata was explained previously. Addressing this problem is a requirement for PaySwarm because without it you have a largely inflexible messaging format. Pelle mentions that the extensibility issue could be addressed using JSON-LD, which is what PaySwarm does, but does not provide any concrete plans to do this for OpenTransact. That is, the question is left unaddressed in OpenTransact and thus the extensibility and interoperability issue remains.

When writing standards, one cannot assert that a solution “could easily be done”. Payment standards are never easy and hand waving technical issues away is not the same thing as addressing those technical issues. If the solution is easy, then surely something could be written on the topic on the OpenTransact website.

Transactions (part 1)

I don’t like the term transaction as Manu is using it here. I believe it is being used here using computer science terminology. But leaving that aside. OpenTransact does not support multi step transactions in itself right now. I think most of these can be easily implemented in the Application Layer and thus is out of scope of OpenTransact.

The term transaction is being used in the traditional English sense, the Merriam-Webster Dictionary defines a transaction as: something transacted; especially: an exchange or transfer of goods, services, or funds (electronic transactions). Wikipedia defines a transaction as: an agreement, communication, or movement carried out between separate entities or objects, often involving the exchange of items of value, such as information, goods, services, and money. Further, a financial transaction is defined as: an event or condition under the contract between a buyer and a seller to exchange an asset for payment. It involves a change in the status of the finances of two or more businesses or individuals. This demonstrates that the use of “transaction” in PaySwarm is in-line with its accepted English meaning.

The argument that multi-step transactions can be easily implemented is put forward again. This is technical hand-waving. If the solution is so simple, then it shouldn’t take but a single blog post to outline how a multi-step transaction happens between a decentralized set of transaction processors. The truth of the matter is that multi-step transactions are a technically challenging problem to solve in a decentralized manner. Pushing the problem up to the application layer just pushes the problem off to someone else rather than solving it in the standard so that the application developers don’t have to create their own home-brew multi-part transaction mechanism.

Transactions (part 2)

I could see a bulk payment extension supporting something similar in the future. If the need comes up lets deal with [it].

Here are a few reasons why PaySwarm supports multiple financial transfers to multiple financial accounts as a part of a single transaction; 1) it makes the application layer simpler, and thus the developer’s life easier, 2) ensuring that all financial transfers made it to their destination prevents race conditions where some people get paid and some people do not (read: you could be sued for non-payment), 3) assuming a transfer where money is disbursed to 100 people, doing it in one HTTP request is faster and more efficient than doing it in 100 separate requests. The need for multiple financial transfers in a single transaction is already there. For example, paying taxes on items sold is a common practice; in this case, the transaction is split between at least two entities: the vendor and the taxing authority.

OpenTransact does not address the problem of performing multiple financial transfers in a single transaction and thus pushes the problem on to the application developer, who must then know quite a bit about financial systems in order to create a valid solution. If the application developer makes a design mistake, which is fairly easy to do when dealing with decentralized financial systems, they could place their entire company at great financial risk.

Currency Exchange

…most of us have come to the conclusion that we may be able to get away with just using plain open transact for this.

While the people working on OpenTransact may have come to this conclusion, there is absolutely no specification text outlining how to accomplish the task of performing a currency exchange. The analysis was on features that are supported by each specification and the OpenTransact specification still does not intend to provide any specification text on how a currency exchange could be implemented. Saying that a solution exists, but then not elaborating upon the solution in the specification in an interoperable way is not good standards-making. It does not address the problem.

Decentralized Publishing of X (part 1)

These features listed are necessary if you subscribe to the world view that the entire worlds commerce needs to be squeezed into a web startup.

I don’t quite understand what Pelle is saying here, so I’m assuming this interpretation: “The features listed are necessary if you subscribe to the world view that all of the worlds commerce needs have to be squeezed into a payment standard.”

This is not the world-view that PaySwarm assumes. As stated previously, PaySwarm assumes a limited set of use cases that were identified by the Web Payments community as being important. Decentralization is important to PaySwarm because is ensures; 1) that the system is resistant to failure, 2) that the customer is treated fairly due to very low transaction processor switching costs, and 3) that market forces act quickly on the businesses providing PaySwarm services.

OpenTransact avoids the question of how to address these issues and instead, accidentally, further enforces silo-ed payment networks and walled gardens of finance.

Decentralized Publishing of X (part 2)

I think [decentralized publishing] would make a great standard in it’s own right that could be published separately from the payment standard. Maybe call it CommerceSwarm or something like that.

There is nothing preventing PaySwarm from splitting out the listing of assets, and listings from the main specification once we have addressed the limited set of use cases put forth by the Web Payments community. As stated previously, the PaySwarm specification can always be broken down into simpler, modularized specifications. This is an editorial issue, not a design issue.

The concern about the OpenTransact specification is not an editorial issue, it is a design issue. OpenTransact does not specify how multiple transaction processors interoperate nor does it describe how one publishes assets, listings and other information associated with the payment network on the Web. Thus, OpenTransact, accidentally, supports silo-ed payment networks and walled gardens of finance.

Decentralized Publishing of X (part 3)

If supporting these are a requirement for an open payment standard, I think it will be very hard for any existing payment providers or e-commerce suppliers to support it as it requires a complete change in their business, where OpenTransact provides a fairly simple easy implementable payment as it’s only requirement.

This argument is spurious for at least two reasons.

The first is that OpenTransact only has one requirement and thus all a business would have to implement was that one requirement. Alternatively, if businesses only want to implement simple financial transfers in PaySwarm (roughly equivalent to transactions in OpenTransact), they need only do that. Therefore, PaySwarm can be as simple as OpenTransact to the vast majority of businesses that only require simple financial transfers. However, if more advanced features are required, PaySwarm can support those as well.

The second reason is that it is effectively the buggy whip argument – if you were to ask businesses that depended on horses to transport their goods before the invention of the cargo truck, most would recoil at the thought of having to replace their investment in horses with a new investment in trucks. However, new businesses would choose the truck because of its many advantages. Some would use a mixture of horses and trucks until the migration to the better technology was complete. The same applies to both PaySwarm and OpenTransact – the only thing that is going to cause individuals and businesses to switch is that the technology provides a competitive advantage to them. The switching costs for new businesses are going to be less than the switching costs for old businesses with a pre-existing payment infrastructure.

Verifiable Receipts (part 1)

However I don’t want us to stall the development and implementation of OpenTransact by inventing a new form of PKI or battling out which of the existing PKI methods we should use. See my section on Digital Signatures in the last post.

A new form of PKI has not been invented for PaySwarm. It uses the industry standard for both encryption and digital signatures – AES and RSA. The PKI methods are clearly laid out in the specification and have been settled for quite a while, not a single person has mentioned that they want to use a different set of PKI methods or implementations, nor have they raised any technical issues related to the PKI portion of the specification.

Pelle might be referring to how PaySwarm specifies how to register public keys on the Web, but if he is, there is very little difference between that and having to manage OAuth 2 tokens, which is a requirement imposed on developers by the OpenTransact specification.

Verifiable Receipts (part 2)

Thus we have taken the pragmatic approach of letting businesses do what they are already doing now. Sending an email and providing a transaction record via their web site.

PaySwarm does not prevent businesses from doing what they do now. Sending an e-mail and providing a transaction record via their website are still possible using PaySwarm. However, these features become increasingly unnecessary since PaySwarm has a digital receipt mechanism built into the standard. That is, businesses no longer need to send an e-mail or have a transaction record via their website because PaySwarm transaction processors are responsible for holding on to this information on behalf of the customer. This means far less development and financial management headaches for website operators.

Additionally, neither e-mail nor proprietary receipts support Data Portability or system interoperability. That is, these are not standard, machine-readable mechanisms for information exchange. More to the point, OpenTransact is kicking the problem down the road instead of attempting to address the problem of machine-verifiable receipts.

Secure X-routed Purchases

These are neat applications that could be performed in some way through an application. You know I’m going to say it’s out of scope of OpenTransact. OpenTransact was designed as a simple way of performing payments over the web. Off line standards are thus out of scope.

The phrase “performed in some way through an application” is technical hand-waving. OpenTransact does not propose any sort of technical solution to a use case that has been identified by the Web Payments community as being important. Purchasing an item using an NFC-enabled mobile phone at a Web-enabled kiosk is not a far fetched use case – many of these kiosks exist today and more will become Web-enabled over time. That is, if one device has Web connectivity – is the standard extensible enough to allow a transaction to occur?

With PaySwarm, the answer is “yes” and we will detail exactly how to accomplish this in a PaySwarm specification. Note that it will probably not be in the main PaySwarm specification, but an application developer specification that thoroughly documents how to perform a purchase through a PaySwarm proxy.

Currency Mints

Besides BitCoin all modern alternative currencies have the mint and the transaction processor as the same entity.

These are but a few of the modern alternative currencies where the mint and the transaction processor are not the same entity (the year that the currency was launched is listed beside the currency): BerkShares (2006), Calgary Dollar (1996), Ithica Hours (1998), Liberty Dollar (1998-2009), and a variety of LETS and InterLETS systems (as recently as 2011).

OpenTransact assumes that the mint and the transaction processor are the same entity, but as demonstrated above, this is not the case in already successful alternative currencies. The alternative currencies above, where the mint and the transaction processor are different, should be supported by a payment system that purports to support alternative currencies. Making the assumption that the mint and the transaction processor are one and the same ignores a large part of the existing alternative currency market. It also does not protect against monopolistic behavior on behalf of the mint. That is, if a mint handles all minting and transaction processing, processing fees are at the whim of the mint, not the market. Conflating a currency mint with a transaction processor results in negative market effects – a separation of concerns is a necessity in this case.


Saying that you can not do crowd funding with OpenTransact is like saying you can’t do Crowd Funding with http. Obviously KickStarter and many others are doing so and yes you can do so with OpenTransact as a lower level building block.

The coverage of the Crowd Funding feature was never about whether OpenTransact could be used to perform Crowd Funding, but rather how one could perform Crowd Funding with OpenTransact and whether that would be standardized. The answer to each question is still “Out of Scope” and “No”.

Quite obviously there are thousands of ways technology can be combined with value exchange mechanisms to support crowd funding. The assertion was that OpenTransact does not provide any insight into how it would be accomplished and furthermore, contains a number of design issues that would make it very inefficient and difficult to implement Crowd Funding, as described in the initial analysis, on top of the OpenTransact platform.

Data Portability

We are very aware of concerns of vendor lock in, but as OpenTransact is a much simpler lower level standard only concerned with payments, data portability is again outside the scope. We do want to encourage work in this area.

PaySwarm adopts the philosophy that data portability and vendor lock-in are important concerns and must be addressed by a payment standard. Personal financial data belongs to those transacting, not to the payment processors. Ultimately, solutions that empower people become widely adopted.

OpenTransact, while encouraging work in the area, adopts no such philosophy for Data Portability as evidenced in the specification.


In doing this analysis between PaySwarm and OpenTransact, a few things have come to light that we did not know before:

  1. There are some basic philosophies that are shared between PaySwarm and OpenTransact, but there are many others that are not. Most fundamentally, PaySwarm attempts to think about the problem broadly where OpenTransact only attempts to think about one aspect of the Web payments problem.
  2. There are a number of security concerns that were raised when performing the review of the OpenTransact specification, more of which will be detailed in a follow-up blog post.
  3. There were a number of design concerns that we found in OpenTransact. One of the most glaring issues is something that was an issue with PaySwarm in its early days, until the design error was fixed. In the case that OpenTransact adopts digital receipts, excessive HTTP traffic and the duplication of functionality between digital signatures and OAuth 2 will become a problem.
  4. While we assumed that Data Portability was important to the OpenTransact specification, it was a surprise that there were no plans to address the issue at all.
  5. There was an assumption that the OpenTransact specification would eventually detail how transaction processors may interoperate with one another, but Pelle has made it clear that there are no current plans to detail interoperability requirements.

In order for the OpenTransact specification to continue along the standards track, it should be demonstrated that the design concerns, security concerns, and interoperability concerns have been addressed. Additionally, the case should be made for why the Web Payments community should accept that the list of features not supported by OpenTransact is acceptable from the standpoint of a world standards setting organization. These are all open questions and concerns that OpenTransact will eventually have to answer as a part of the standardization process.

* Many thanks to Dave Longley, who reviewed this post and suggested a number of very helpful changes.

Web Payments: PaySwarm vs. OpenTransact Shootout (Part 2)

The Web Payments Community group is currently evaluating two designs for an open payment platform for the Web. A thorough analysis of PaySwarm and OpenTransact was performed a few weeks ago, followed by a partial response by one of the leads behind the OpenTransact work. This blog post will analyze the response by the OpenTransact folks, offer corrections to many of the claims made in the response, and further elaborate on why PaySwarm actually solves the hard problem of creating a standard for an interoperable, open payment platform for the Web.

TL;DR – The OpenTransact standard does not specify the minimum necessary algorithms and processes required to implement an interoperable, open payment network. It, accidentally, does the opposite – further enforcing silo-ed payment networks, which is exactly what PaySwarm is attempting to prevent.

You can jump to each section below:

  1. The Purpose of a Standard
  2. Web Payments – The Hard Problem
  3. The Problem Space
  4. General Misconceptions
  5. Detailed Rebuttal
  6. Continuing the Discussion

The Purpose of a Standard

Ultimately, the purpose of a standard is to propose a solution to a problem that ensures interoperability among implementations of that standard. Furthermore, standards that establish a network of systems, like the Web, must detail how interoperability functions among the various systems in the network. This is the golden rule of standards – if you don’t detail how interoperability is accomplished, you don’t have a standard.

In Pelle’s blog post, he states:

OpenTransact [is] the payment standard where everything is out of scope

This is the major issue with OpenTransact. By declaring that just about everything is out of scope for OpenTransact, it fails to detail how systems on the payment network communicate with one another and thus does not support the golden rule of standards – interoperability. This is the point that I will be hammering home in this blog post, so keep it in mind while reading the rest of this article.

What OpenTransact does is outline “library interoperability”. The specification enables developers to write one software library that can be used to initiate monetary transfers by building OpenTransact URLs, but then does not specify what happens when you go to the URL. It does not specify how money gets from one system to the next, nor does it specify how those messages are created and passed from system to system. OpenTransact overly simplifies the problem and proposes a solution that is insufficient for use as a global payment standard.

In short, it does not solve the hard problem of creating an open payment platform.

Web Payments – The Hard Problem

Overall, the general argument put forward by Pelle on behalf of the OpenTransact specification is that it focuses on merely initiating a monetary transfer and nothing else because that is the fundamental building block for every other economic activity. His argument is that we should standardize the most basic aspect of transferring value and leave the other stuff out until the basic OpenTransact specification gains traction.

The problem with this line of reasoning is this: When you don’t plan ahead, you run the very high risk of creating a solution that works for the simple use cases, but is not capable of addressing the real problems of creating an interoperable, open payment platform. PaySwarm acknowledges that we need to plan ahead if we are to create a standard that can be applied to a variety of use cases. This does not mean that every use case must be addressed. Rather, the assertion is made that designing solutions that solve more than just the initiation of a simple monetary transfer is important because the world of commerce consists of much more than the initiation of simple monetary transfers.

Clearly, we should not implement solutions to every use case, but rather figure out the maximum number of use cases that can be solved by a minimal design. “Don’t bloat the specification” is often repeated as guidance throughout the standardization process. Where to draw the line on spec bloat is one of the primary topics of conversation in standards groups. It should be an ongoing discussion within the community, not a hard-line philosophical stance.

The hard problem has always been interoperability and the OpenTransact specification postpones addressing that issue to a later point in time. The point of a standard is to establish interoperability such that anyone can read the standard, implement it, and is guaranteed interoperability from others that have implemented the standard. From Pelle’s response:

We don’t specify how one payment provider transacts with another payment provider, but it is generally understood that they do so with OpenTransact.


An exchange system between multiple financial institutions can be achieved by many different means as they are today. But all of these methods are implementation details and the developer or end user does not need to understand what is going on inside the black box.

A specification elaborates on the implementation details so that you can guarantee interoperability among those that implement the standard. Implementation details are important because without those, you do not have interoperability and without interoperability, you do not have an open payment platform. Without interoperability, you have the state of online payment providers today – payment vendor lock-in.

General Misconceptions

There are a number of general misconceptions that are expressed in Pelle’s response that need to be corrected before addressing the rest of his feedback:

PaySwarm attempts to solve every single problem up front and thus creates a standard that is very smart in many ways but also very complex.

What PaySwarm attempts to do is identify real-world use cases that exist today with payment online and proposes a way to address those use cases. There are a number of use cases that the community postponed because we didn’t feel that addressing them now was reasonable. There were also use cases that we dropped entirely because we didn’t see a need to support those use cases now or in the future. To say that “PaySwarm attempts to solve every single problem up front” is hyperbolic. It is true that PaySwarm is more complex than OpenTransact today, but that’s because it attempts to address a much larger set of real-world use cases.

It’s background is I understand in a P2P media market place called Bitmunk where licenses, distribution contacts and other media DRM issues are considered important.

PaySwarm did start out as a platform to enable peer-to-peer media marketplace transactions. That was in 2004. The technology and specification have evolved considerably since that time. For example, mandatory DRM was never implemented, but watermarking was – both technologies have been dropped from the specification due to the needless complexity introduced by supporting those features. There was never the concept of a “distribution contract”, but digital contracts – outlining exactly what was purchased, the licenses associated with that purchase, and the costs associated with the transaction seem like reasonable things to support in an open payment platform.

Manu Sporny of Digital Bazaar has also been a chair of the RDFa working group so PaySwarm comes with a lot of linked data luggage as well.

I’m also the Chair of the RDF Web Applications Working Group and the JSON-LD Community Group, am a member of the HTML Working Group, founded the Data-Driven Standards Community Group, and am a member of the Semantic Web Coordination Group. Based on those qualifications, I would like to think that I know my way around Web standards and Linked Data – others may disagree :). While I don’t know if Pelle meant “luggage” in a negative sense, if he did, one must ask what the alternative is? If we are going to create an open payment platform that is interoperable and decentralized like the Web, then what alternative is there to Linked Data?

Many people do not know that we started working with RDFa and JSON-LD because we needed a viable solution to the machine-readable decentralized listing of things for sale problem in PaySwarm. That is, we didn’t get involved with Linked Data first and then carried that work into PaySwarm. We started out with PaySwarm and needed Linked Data to solve the machine-readable decentralized listing of things for sale problem.

OpenTransact comes from the philosophy that we don’t solve a problem until the problem exists and several people have real experiences solving it.

This is a perfectly reasonable philosophy to employ. In fact, PaySwarm adheres to the same philosophy. PaySwarm’s implementation of the philosophy diverges from OpenTransact because it takes more real-world problems into account. Online e-commerce has existed for over a decade now, with a fairly rich history of lessons-learned with regard to how the Web has been used for commerce. This history includes many more types of transactions than just a simple monetary transfer and therefore, PaySwarm attempts to take these other types of transactions into account during the design process.

The Problem Space

In his response, Pelle outlines a number of lessons learned from OpenID and OAuth development. These are all good lessons and we should make sure that we do not fall into the same trap that OpenID did in the beginning – attempting to solve too many problems, too soon in the process.

Pelle implies that PaySwarm falls into this trap and that OpenTransact avoids the trap by being very focused on just initiating payment transfers. The reasoning is spurious as the world is composed of many more types of value exchange than just a simple payment initiation. The main design failure of OpenTransact is to not attempt to detail how the standard applies to the real-world online payment use cases established over the past decade.

It is not that PaySwarm attempts to address too many use cases too soon, but rather that OpenTransact attempts to do too little, and by being hyper-focused, does not solve the problem of creating an open payment platform that is applicable to the state of online commerce today.

Detailed Rebuttal

The following section provides detailed responses to a number of points that are made in Pelle’s blog post:

IRIs for Identifiers

I’m sorry calling URI’s IRI just smells of political correctness. Everyone calls them URI’s and knows what it means. No one knows what a IRI is. Even though W3C pushes it I’m going to use the term URI to avoid confusion.

Wikipedia defines the Internationalized Resource Identifier (IRI) as: a generalization of the Uniform Resource Identifier (URI). While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth. It is defined by RFC 3987.

PaySwarm is on a world-standards track and thus takes the position that being able to express identifiers in one’s native language is important. When writing standards, it is important to be technically specific and use terminology that has been previously defined by standards groups. Usage of the term IRI is not only technically correct, it acknowledges the notion that we must support non-English identifiers in a payment standard meant for the world to use.

IRIs for Identifiers (cont.)

We don’t want to specify what lives at the end of an account URI. There are many other proposals for standardizing it, we don’t need to deal with that. Until the day that a universal machine readable account URI standard exist, implementers of OpenTransact can either do some sensing of the URI as they already do today (Twitter, Facebook, Github) or use proposals like WebFinger or even enter the world of linked data and use that.

The problem with the argument is expressed in this phrase – Until the day that a universal machine readable account URI standard exist[s]. PaySwarm defines a universal, machine-readable account URI standard. This mechanism is important for interoperability – without it, it becomes difficult to publish information in a decentralized, machine-readable fashion. Without describing what lives at the end of an account IRI, you can’t figure out who owns a financial account, you can’t understand what the currency of the account is, nor can you extend the information associated with the account in an interoperable way. PaySwarm asserts that we cannot just gloss over this part of the problem space as it is important for interoperability.

Basic Financial Transfer

OpenTransact does not specify how a transfer physically happens as that is an implementation detail. It could be creating a record in a ledger, uploading a text file to a mainframe via ftp, calling multiple back end systems, generating a bitcoin, shipping a gold coin by fedex, etc.

At no point does the PaySwarm specification state what must happen physically. What happens physically is outside of the scope of the specification. What matters is how the digital exchange happens. This is primarily because any open payment platform for the Web is digitally native. That is, when you pay someone, the transfer is executed and recorded digitally, at times, between two payment processors. This is the same sort of procedure that happens at banks today. Rarely does physical cash move when you use your credit or debit card.

The point of supporting a Basic Financial Transfer between systems boils down to interoperability. OpenTransact doesn’t mention how you transfer $1 from PaymentServiceA to PaymentServiceB. That is, if you are and you want to send $1 to, how do you initiate the transfer from to That is, what is the protocol? OpenTransact is silent on how this cross-payment processor transfer happens. PaySwarm asserts that specifying this payment processor monetary exchange protocol in great detail is vital to ensure that the standard enables a fair and efficient transaction processor marketplace. That is, specifying how this works is vital for ensuring that new payment processor competitors can enter the marketplace with as little friction as possible. If a payment standard does not specify how this works, it enables vendor lock-in and payment network silos.

When it comes to standards, implementation details like this matter because without explicitly stating how two systems may exchange money with one another, interoperability suffers.

Transacted Item Identifer

It would be great to have a standard way of specifying every single item in this world and that is pretty much what RDF is about. However until such day that every single object in the world is specified by RDF, we believe it is best to just identify the purchased item with a url.

This argument seems to be saying two contradictory things; 1) It would be great to have a standard way of describing machine-readable items on the Web and 2) until that happens, we should just use URLs.

PaySwarm defines exactly how to express machine-readable items on the Web. Since the first part of the statement is true today, the last part of the statement becomes unnecessary. Furthermore, both OpenTransact and PaySwarm use IRIs for transacted item identifiers – that was never in question. OpenTransact uses an IRI to identify the transacted item. PaySwarm uses an IRI to identify the transacted item, but also ensures that the item is machine-readable and digitally signed by the seller for security purposes.

There are at least two reasons that you cannot just depend on URLs for describing items on the Web without also specifying how those items can be machine-readable and verifiable.

The first reason is because the seller can change what is at the end of a URL over time and that is a tremendous liability to those purchasing that item if the item’s description is not stored at the time of sale. For example, assume someone sells you an item described by the URL When you look at that URL just before you buy the item, it states that you are purchasing tickets to a concert. However, after you make the purchase, the person that sold you the item changes the item at the end of the URL to make it seem as if you purchased an article about their experience at the concert and not the ticket to go to the concert. PaySwarm protects against this attack by ensuring that the machine-readable description of what is being transacted is machine-readable and that machine-readable description is shown to the buyer before the sale and then embedded in the receipt of sale.

The second reason is that the URL, if served over HTTP, can be intercepted and changed, such that the buyer ends up purchasing something that the seller did not approve for sale. PaySwarm addresses this security issue by ensuring that all offers for sale must be digitally signed by the seller.

Alternative Currencies

In most cases the currency mint is equal to the transaction processor.

The currency mint is not equivalent to the transaction processor. Making that assertion conflates two important concepts; 1) the issuer of a currency, and 2) the transaction processors that are capable of transacting in that currency. To put it in different terms, that’s as if one were to say that the US Treasury (the issuer of the USD currency) is the same thing as a local bank in San Francisco (an entity transacting in USD).

Access Control Delegation

But Digital Signatures only solve the actual access process so you have to create your home built authorization and revocation scheme to match what OAuth 2 gives us for free.

In software development, nothing is free. There are always design trade-offs and the design trade-off that OpenTransact has made is to adopt OAuth 2 and punt on the problem of machine-readable and verifiable assets, licenses, listings, and digital contracts. While Pelle makes the argument that OpenTransact may add digital signature support in the future, the final solution would require that both OAuth 2 and digital signatures be implemented.

PaySwarm does not reject OAuth 2 as a useful specification, it rejects it because it overly-complicates the implementation details of the open payment platform. PaySwarm relies on digital signatures instead of OAuth 2 for the same reason that it relies on JSON instead of XML. XML is a perfectly good technology, but JSON is simpler and solves the problem in a more elegant way. That is, adding XML to the specification would needlessly over-complicate the solution, which is why it was rejected.

Furthermore, PaySwarm had previously been implemented using OAuth and we found it to be overly complicated because of this very reason. OAuth and digital signatures largely duplicate functionality and since PaySwarm requires digital signatures to offer a secure, distributed, open payment platform, the most logical thing was to remove OAuth. By removing OAuth, no functionality was sacrificed and the overall system was simplified as a result.

Machine Readable Metadata

Every aspect of PaySwarm falls apart if everything isn’t created using machine readable metadata. This would be great in a perfect greenfield world. However while meta data is improving as people are adding open graph and other stuff to their pages for better SEO and Facebook integration, there are many ways of doing it and a payment standard should not be specifying how every product is listed, sold or otherwise.

This argument is a bit strange – on one hand, it is asserted that it would be great if a product could be listed in a way that is machine-readable while simultaneously stating that a payment standard shouldn’t do it. More simply, the argument is – it would be great if we did X, but we shouldn’t do X.

Why shouldn’t a payment platform standard specify how items should be listed for sale? If there are many ways of specifying how a product should be listed for sale, isn’t that a good candidate for standardization? After all, when product listings are machine-readable, we can automate a great deal of what previously required human intervention.

The reason that Open Graph and Facebook integration happened so quickly across a variety of websites is because it provided good value for the website owners as well as Facebook. It allowed websites to be more accurately listed in feeds. It also allowed Facebook to leverage the people in its enormous social network to categorize and label content, something that had been impossible on a large scale before. The same is true for Google Rich Snippets and the recent work launched by Google, Microsoft, Yahoo! and Yandex. Website owners can now mark up people, events, products, reviews, and recipes in a way that is machine-readable and that shows up directly, in an enhanced form from regular search listings, in the search engine results pages.

Making things in a web page machine-readable, like products for sale, automates a very large portion of what used to require human oversight. When we automate processes like these, we are able to gain efficiencies and innovate on top of that automation. Specifying how a product should be marked up on the Web in order to be transacted via an open Web payment platform is exactly what should be standardized in a specification, and this is exactly what PaySwarm does.

Recurring payments

With OpenTransact we are still discussing how to specify recurring payments. Before we add it to the standard we would like a couple of real world implementations experiment with it.

This is a chicken and egg problem – at some point, someone has to propose a way to perform recurring payments for an open payment platform. When the OpenTransact specification states that it won’t implement recurring payments until somebody else implements recurring payments, then the problem is just shifted to another community that must do the hard work of figuring out how to implement recurring payments.

PaySwarm has gone to the trouble of specifying exactly how recurring payments are performed. There are many other implementations of recurring payments implemented by the many credit card transaction processors, PayPal, Google Checkout, and Amazon Payments, to name a few. There are many real-world implementations of recurring payments today, so it is difficult to understand exactly what the designers of OpenTransact are waiting on.

Financial Institution Interoperability

OpenTransact is most certainly capable of interoperability between financial institutions. We don’t specify how one payment provider transacts with another payment provider, but it is generally understood that they do so with OpenTransact.

The statement above seems to contradict itself. On one hand, it states that OpenTransact is capable of interoperability between financial institutions. On the other hand, it states that OpenTransact does not specify how one payment provider transacts with another payment provider.

By definition, you do not have interoperability if you do not specify how one system interoperates with another. Furthermore, claiming that two systems interoperate but then not specifying how they interoperate is an invitation for collusion between financial institutions and is a step backwards when looking at how financial institutions operate today. That is, at least there is an inter-bank monetary transfer protocol that you can utilize if you are a bank. This functionality, of detailing how two payment processors interact, is out of scope for OpenTransact.

Digital Signatures

Digital signatures are beautiful engineering constructs that most engineers who have worked with them tend to hold up in near religious reverence. You often hear that a digital signature makes a contract valid and it supports non-repudiation.

PaySwarm does not hold up digital signatures in religious reverence, nor does it assert that by using a digital signature, a digital contract is automatically a legally enforceable agreement. What PaySwarm does do is utilize digital signatures as a tool to provide system security. It also utilizes digital signatures so that simple forgeries on digital contracts cannot be performed.

By not supporting digital signatures in its core protocol, OpenTransact greatly limits itself regarding the use cases that can be addressed with the standard. These use cases that are not addressed by OpenTransact are regarded as very important to the PaySwarm work and thus, cannot be ignored.

Secure Communication over HTTP

We are not trying to reinvent TLS because certs are expensive, which is what PaySwarm proposes.

PaySwarm does not try to re-invent TLS. PaySwarm utilizes TLS to provide security against man-in-the-middle attacks. It also utilizes TLS to create a secure channel from the customer to their PaySwarm Authority and from the merchant to their PaySwarm Authority. What PaySwarm also does is allow sellers on the system to run an online storefront from their website over regular HTTP, thus greatly reducing the cost of setting up and operating an online store-front. The goal is to make the barrier to entry for a vendor on PaySwarm cost absolutely nothing, thus enabling a large group of people that were previously unable to participate in electronic commerce via their website to do so in a way that does not require an up-front monetary investment.

Continuing the Discussion

The fundamental point made in this blog post is that by being hyper-focused on initiating payment transfers, OpenTransact misses the bigger picture of ensuring interoperability in an open payment platform. Until this issue is addressed and it is demonstrated that OpenTransact is capable of addressing more than just a few of the simplest use cases supported by PaySwarm, I fear that it will not pass the rigors of the standardization process.

In his blog post, Pelle only responded to around half of the analysis on OpenTransact and thus further analysis will be performed on his responses when he is able to find time to post them.

If you are interested in listening in on or participating in the discussion, please consider joining the Web Payments Community Group mailing list at the World Wide Web Consortium (W3C) (it’s free, and anyone can join!).

Follow-up to this blog post

[Update 2012-01-02: Second part of response by Pelle to this blog post: OpenTransact vs PaySwarm part 2 – yes it’s still mostly out of scope]

[Update 2012-01-08: Rebuttal to second part of response by Pelle to this blog post: Web Payments: PaySwarm vs. OpenTransact Shootout (Part 3)]

* Many thanks to Dave Longley, who reviewed this post and suggested a number of very useful changes.

Web Payments: PaySwarm vs. OpenTransact Shootout

The W3C Web Payments Community Group was officially launched in August 2011 for the purpose of standardizing technologies for performing Web-based payments. The group launched at that time because Digital Bazaar had made a commitment to publish the PaySwarm technology as an open standard and eventually place it under the standardization direction of the W3C. During last week’s Web Payments telecon, a discussion ensued about using the OpenTransact specification as the basis for the Web Payments work at W3C. Inevitably, the group will have to thoroughly vet both technologies to see if it should standardize PaySwarm, OpenTransact, or both.

This blog post is a comparison of the list of features that both technologies have outlined as being standardization candidates for version 1.0 of each specification. The comparison uses the latest published specifications as of the time of this blog post, OpenTransact (October 19th, 2011) and PaySwarm (December 14th, 2011). Here is a brief summary table on the list of features supported by each proposed standard:

Feature PaySwarm 1.0 OpenTransact 1.0
IRIs for Identifiers Yes Yes
Basic Financial Transfer Yes Yes
Payment Links Yes Yes
Item For Sale Identifier Yes Yes
Micropayments Yes Yes
Access Control Delegation Digital Signatures OAuth 2.0
Alternative Currencies Yes Centralized
Machine Readable Metadata Yes Only for Items for sale
Recurring Payments Yes Use case exists, but no spec text
Transaction Processor Interoperability Yes No
Extensible Machine Readable Metadata Yes No
Transactions Yes No
Currency Exchange Yes No
Digital Signatures Yes No
Secure Communication over HTTP Yes No
Decentralized Publishing of Items for Sale Yes No
Decentralized Publishing of Licenses Yes No
Decentralized Publishing of Listings Yes No
Digital Contracts Yes No
Verifiable Receipts Yes No
Affiliate Sales Yes No
Secure Vendor-routed Purchases Yes No
Secure Customer-routed Purchases Yes No
Currency Mints Yes No
Crowd-funding Yes No
Data Portability Yes No

IRIs for Identifiers

If a payment technology is going to integrate cleanly with the Web, it should identify the things that it operates on in a Web-friendly way. Identifiers are at the heart of most Internet-based systems, and therefore it is important that the identifier work in a way that allows it to be used across the world and beyond and across different systems operating in different locations. The Internationalized Resource Identifier (IRI), of which Uniform Resource Locators (URLs) are a sub-set, provide a globally scale-able mechanism for creating distributed, de-reference-able identifiers.

PaySwarm uses IRIs to identify things like identities (, financial accounts (, assets (, licenses (, listings (, transactions (, contracts (, and a variety of the other things that must be expressed when building an open protocol for a financial system.

OpenTransact uses IRIs to identify Asset Services (, identities (, transfer receipts (, callbacks (, and providers of items for sale ( While OpenTransact does not detail many of the “things” that PaySwarm does, the implicit assumption is that those “things” would also have IRIs as their identifiers.

Basic Financial Transfer

An open payment protocol for the Web must be able to perform a simple financial transfer from one financial account to another. This simple exchange is different from the more complex transaction, as outlined below, which allows multiple financial transfers to occur across multiple accounts during a single transaction.

PaySwarm supports transfers from one financial account to another both within systems and between systems.

OpenTransact outlines transfers from one account to another within a system. It is unclear whether it supports financial transfers between systems since the implementation details of how the transfer occurs are not explained in the specification.

A Payment Link is a click-able link in a Web browser that enables one person or organization on the Web to request payment from another person or organization on the Web. Clicking on the link initiates the transfer process. The URL query parameters are standardized such that all payment processors on the Web would implement a single set of known query parameters, thus easing implementation burden when integrating with multiple payment systems.

A PaySwarm Authority will depend on the Payment Links specification to specify the proper query parameters for a Payment Link. These query parameters are intended to overlap almost entirely with the OpenTransact query parameters.

The OpenTransact specification outlines a set of query parameters for payment links.

Transacted Item Identifier

Being able to identify an item that is the cause of a financial transfer on the Web is important because it enables the payment system to understand why a particular transfer occurred. Furthermore, ensuring that the item identifier is de-reference-able allows humans and in some cases, machines, to view details about the item for sale.

PaySwarm calls an item that can be transacted an asset and ensures that the description of every asset on the network meet five important criteria; the asset is identified via an IRI, the asset description can be retrieved by de-referencing the IRI, the asset description is human-readable, the asset description is machine-readable and that the asset description can be frozen in time (to ensure the description of the asset does not change from one purchase to the next).

OpenTransact leaves the identification of the item being transacted a bit more open-ended and does not assign a conceptual name to the item. It ensures that the item is identified by an IRI and that de-referencing the IRI results in an item description. The specification is silent on whether or not the item description is required to be human-readable or machine-readable and does not require that the item description can be frozen in time. Since the item IRI is saved in the receipt but a machine-readable description is not, it allows the vendor to change the description of the purchased item at any point. For example, if the buyer purchased an ebook on day one, the vendor can change the purchase to seem as if it were a rental of the ebook on day two.


Micropayments allow the transmission of very small amounts of money from sender to receiver. For example, paying $0.05 for access to an article would be considered a micropayment. One of the primary benefits of micropayments is that they enable easy-to-use pay-as-you-go services. Micropayments also allow one to transfer small amounts of funds at a time without incurring high per-transaction fees.

The smallest amount for a PaySwarm transactions is unlimited. However, most transaction processors will limit payments up to 1/10,000th of the smallest whole denomination of a currency. For example, the smallest transaction possible using the PaySwarm software that Digital Bazaar is developing in US Dollars is $0.0000001 (one ten-millionth of a US Dollar).

OpenTransact is not limited in the smallest amount that is transferable.

Alternative Currencies

Alternative currencies like Ven, Bitcoin, time banking, Bernal Bucks and various gaming currencies like XBox Points are being increasingly used to solve niche economic problems that traditional fiat currencies have not been able to address. In order to ensure that experimentation with alternative currencies are supported, a Web-based payment protocol should ensure that those currencies can be easily created and exchanged.

PaySwarm allows currencies to be specified by either an internationalized currency code, like “USD”, or by an IRI that identifies a currency. This means that anyone capable of minting an IRI, which is just about anybody on the Web, has the ability to create a new alternative currency. The one draw-back for alternative currencies is that there must also be a location, or network, on the Web that acts as the currency mint. The concept of a currency mint will be covered later in this blog post.

OpenTransact supports alternative currencies by creating what it calls Asset Service endpoints. These endpoints can be for the transmission of fixed currencies, stocks, bonds, or other financial instruments.

Access Control Delegation

When implementing features like recurring payments, often some form of access control is required to ensure that a vendor honors the agreement that they created with the buyer. For example, giving permission to a vendor to bill you for up to $10 USD per month requires that some sort of access control privileges are assigned to the vendor. This access control mechanism allows the vendor to make withdrawals without needing to repeatedly bother the buyer. It also gives power to the buyer if they ever want to halt payment to the vendor. There are two primary approaches to access control on the Web; OAuth and digital signatures.

PaySwarm relies on digital signatures to ensure that a request for financial transfer is coming from the proper vendor. Setting up access control is a fairly simple process in PaySwarm, consisting of two steps. The first step requires the vendor to request access from the buyer via a Web browser link. The second step requires that the buyer select which privileges, and limits on those privileges, they are granting to the vendor. These privileges and limits may effectively make statements like: “I authorize this vendor to bill me up to $10 per month”. The vendor may then make financial transfer requests against the buyer’s financial account for up to $10 per month.

OpenTransact enables access control via the OAuth 2 protocol, which is capable of supporting similar privileges and limitations as PaySwarm. However, the OAuth 2 protocol does not allow for digital signatures external to the protocol and thus would also require a separate digital signature stack in order to support things like machine-verifiable receipts and decentralized assets, licenses and listings.

Machine Readable Metadata

One of the benefits of creating an open payment protocol is that the protocol can be further automated by computers if certain information is exposed in a machine-readable way. For example, if a digital receipt were exposed such that it was machine-readable, access to a website could be granted merely by transmitting the digital receipt to the website.

PaySwarm requires that all data that participates in a transaction is machine readable in a deterministic way. This ensures that assets, licenses, listings, contracts and digital receipts (to name a few) can be transacted and verified without a human in the loop. This increased automation leads to far better customer experiences on websites, where a great deal of the annoying practice of filling out forms can be skipped entirely because machines can automatically process things like digital receipts.

OpenTransact describes exactly what receipt metadata looks like – it’s a JSON object that contains a number of fields. It also outlines that implementations could ask for descriptions of OpenTransact Assets and the associated list of transactions that are associated with those Assets. However, there is no mechanism that allows this metadata to be extended in a deterministic fashion. This limitation will be further detailed below.

Recurring Payments

A recurring payment enables a buyer to specify that a particular vendor can bill them at a periodic interval and removes the burden of buyers having to remember to pay bills every month. Recurring payments requires a certain level of Access Control Delegation.

Recurring payments are supported in PaySwarm by pre-authorizing a vendor to spend a certain limit at a pre-specified time interval. Many other rules can be put into place as well. For example, the buyer could limit the time period that the vendor can operate or the transaction processor could send an e-mail every time the vendor withdraws money from the buyer.

While a recurring payments use case exists for OpenTransact, no specification text has been written on how one can accomplish this feat from a technical standpoint.

Transaction Processor Interoperability

For an open payment protocol to be truly open, it must provide interoperability between systems. At the most basic level, this means that a transaction processor must be able to take funds from an account on one system and transfer those funds to a different account on a different system. Interoperability often goes deeper than that, however, as transferring transaction history, accounts, and preference information from one system to the next is important as well.

Since PaySwarm lists financial accounts in a decentralized way (as IRIs), there is no reason that two financial accounts must reside on the same system. In fact, PaySwarm is built with the assumption that payments will freely flow between various payment processors during a single transaction. This means that a single transaction could contain multiple payees and each one of those payees could reside on a different system, and as long as each system adheres to the PaySwarm protocol, the money will be transmitted to each account. While the specification text has not yet been written, the PaySwarm system will not be fully standardized until the protocol for performing the back-haul communication between each PaySwarm Authority is detailed in the specification. That is, system to system inter-operability is a requirement of the protocol.

The OpenTransact specification identifies senders and receivers in a decentralized way (as IRIs). There are no plans to specify system-to-system transactions in the OpenTransact 1.0 specification. The type of interoperability that OpenTransact provides is library API-level compatibility. That is, those writing software libraries for financial services need only implement one set of query parameters to work with an OpenTransact system. However, the specification does not specify how money flows from one system to the next. There is no system-to-system interoperability in OpenTransact and thus each implementation does nothing to prevent transaction processor lock-in.

Extensible Machine Readable Metadata

Having machine-readable data allows computers to automate certain processes, such as receipt verification or granting access based on details in a digital contract. However, machine-readable data comes at the cost of having to use rigid data structures. These rigid data structures can be extended in a variety of ways that provide the best of both worlds – machine readability and extensibility. Allowing extensibility in the data structures enables innovation. For example, the addition of new terms in a contract or license would enable new business models not considered by the designers of the core protocol.

PaySwarm utilizes JSON-LD to express data in such a way as to be easily usable by Web programmers, but extensible in a way that is guaranteed to not conflict with future versions of the protocol. This means that assets, licenses, listings, digital contracts, and receipts may be extended by transaction processors in order to enable new business models without needing to coordinate with all transaction processors as a first step.

OpenTransact utilizes JSON to provide machine-readable data for receipts, Assets and transactions. However, it does not specify any mechanism that allows one to extend the data structures without running the risk of conflicting with future advancements to the language.


A transaction is defined as a collection of one or more transfers. An example of a transaction is paying a bill at a restaurant. Typically, that transaction results in a number of transfers – there is one from the buyer to the restaurant, another from the buyer to a tax authority, and yet another transfer from the buyer to tip the waiter (in certain countries). While the restaurant gives you a single receipt, the transfer of money is often more complex than just a simple transmission from one sender to one receiver.

PaySwarm models transactions in the way that we model them in the real world – a transaction is a collection of monetary transfers from a sender to multiple receivers. An example of a transaction can be found in the PaySwarm Commerce vocabulary.

OpenTransact models only the most low-level concept of a financial transfer and leaves implementation of transactions to a higher-level application. That is, transactions are considered out-of-scope for the OpenTransact specification and are expected to be implemented at the application layer.

Currency Exchange

A currency exchange allows one to exchange a set amount of one currency for a set amount of another currency. For example, exchanging US Dollars for Japanese Yen, or exchanging goats for chickens. A currency exchange functions as a mechanism for transforming something of value into something else of value.

PaySwarm supports currency exchanges via digital contracts that outline the terms of the exchange.

OpenTransact asserts that currency exchanges can be implemented on top of the specification, but states that the details of that implementation are strictly outside of the specification. One of the primary features that are required for a functioning currency exchange is the concept of a currency mint, which is also outside of the scope of OpenTransact.

Digital Signatures

A digital signature is a mechanism that is used to verify the authenticity of digital messages and documents. Much like a hand-written signature, it can be used to prove the identity of the person sending the message or signing a document. Digital signatures have a variety of uses in financial systems, including; sending and receiving verifiable messages, access control delegation, sending and receiving secure/encrypted messages, counter-signing financial agreements, and ensuring authenticity of digital goods.

PaySwarm has direct support for digital signatures and utilizes them to provide the following capabilities; sending and receiving verifiable messages, access control delegation, sending and receiving secure/encrypted messages, counter-signing financial agreements, ensuring the authenticity of assets, licenses, listings, digital contracts, intents to purchase, and verifiable receipts.

OpenTransact does not support digital signatures in the specification.

Secure Communication over HTTP

While HTTP has served as the workhorse protocol for the Web, it’s major weakness is that it is not secure unless wrapped in a Transport Layer Security (TLS, aka SSL) connection. Unfortunately, TLS/SSL Certificates are costly and one of the design goals for an open protocol for Web Payments should be reducing the financial burden placed on the network participants. Another mechanism that can be used to secure HTTP traffic is by using AES to encrypt/decrypt or digitally sign parts of the HTTP message. This approach results in a zero-financial-cost solution for implementing a secure message channel inside of an HTTP message.

Since PaySwarm supports AES and digital signatures by default, it is also capable of securing communication over HTTP.

OpenTransact relies on OAuth 2 and thus requires a financial commitment from the participant if they want to secure their network traffic via TLS. There is no way to use OpenTransact over HTTP in a secure manner without TLS. However, this is not a problem for the subset of use cases that OpenTransact aims to solve. It does not address the cases where a digital signature or encryption is required to communicate over an un-encrypted HTTP message channel.

Decentralized Publishing of Items for Sale

A vendor would like to have their products listed for sale as widely as possible while ensuring that their control over the item’s machine-readable description is ensured, regardless of where it is listed on the Web. It is important to be able to list items for sale in a secure manner, but allow the flexibility for that item to be expressed on sites that are not under your control. Decoupling the machine-readable description of an item for sale from the payment processor allows both mechanisms to be innovated upon on different time-frames by different people. Centralized solutions are often easier to implement, but far less flexible from decentralized solutions. This holds true for how items for sale are listed. Allowing vendors to have full control over how their items for sale should appear to those browsing their wares is something that a centralized solution cannot easily offer.

PaySwarm establishes the concept of an asset and describes how an asset can be expressed in a secure, digitally signed, decentralized manner.

OpenTransact does not support machine-readable descriptions of items, nor does it support digital signatures to ensure that items cannot be tampered with by third-parties, or even the originating party.

Decentralized Publishing of Licenses

When a sale occurs, there is typically a license that governs the terms of sale. Often, this license is implied based on the laws of commerce governing the transaction in the region in which the transaction occurs. What would be better is if the license could be encapsulated into the receipt of sale and specified in a way that is decentralized to the financial transaction processor and to the item being purchased. This would ensure that people and organizations that specialize in law could innovate and standardize a set of licenses independently of the rest of the financial system.

PaySwarm establishes the concept of a license and describes how it can be expressed in a secure, digitally signed, decentralized manner. Licenses typically contain boilerplate text which are sprinkled with configurable parameters such as “warranty period in days from purchase”, and “maximum number of physical copies” (for things like manufacturing).

OpenTransact does not support machine-readable licenses, embedding licenses, nor does it support digital signatures to ensure that the license cannot be tampered with by third-parties. License tampering isn’t just a problem when transmitting the license over insecure channels, it is also an issue if the the originator of the license changes the contents of the license.

Decentralized Publishing of Listings

A listing specifies the payment details and license under which an asset can be transacted. Giving a vendor full control over when, where and how a listing is published is vital to ensuring that new business models that depend on when and how items are listed for sale can be innovated upon independently of the financial network. So, it becomes important that listings can not only be expressed in a decentralized manner, but they are also tamper-proof and re-distribute-able across the Web while ensuring that the vendor stays in control of how long a particular offer lasts.

PaySwarm establishes the concept of a listing and describes how it can be expressed in a secure, digitally signed, decentralized manner. Decentralized listings allow assets described in the listings to be sold on a separate site, under terms set forth by the original asset owner. That is, a pop-star could release their song as a listing on their website, and the fans could sell it on behalf of the pop-star while making a small profit from the sale. In this scenario, the pop-star gets the royalties they want, the fan gets a cut of the sale, and mass-distribution of the song is made possible through a strongly motivated grass-roots effort by the fans.

OpenTransact does not support machine-readable listings, nor does it support digital signatures to ensure that the license cannot be tampered with by third-parties.

Digital Contracts

A contract is the result of a commercial transaction and contains information such as the item purchased, the pricing information, the the parties involved in the transaction, the license outlining the rights to the item, and payment information associated with the transaction. A digital contract is machine-readable, is self-contained and is digitally signed to ensure authenticity.

PaySwarm supports digital contracts as the primary mechanism for performing complex exchanges of value. Digital contracts support business practices like intent-to-purchase, being able to purchase an asset under different licenses (such as personal use and broadcast use), and digital receipts.

OpenTransact does not support digital contracts nor does it support digital signatures.

Verifiable Receipts

A verifiable receipt is a receipt that contains a digital signature such that you can verify the accuracy of the receipt contents. Verifiable receipts are helpful when you need to show the receipt to a third party to assert ownership over a physical or virtual good. For example, a music fan could show a verifiable receipt confirming that they purchased a certain song from an artist to get a discount on tickets to an upcoming show. There would not need to be any coordination between the original vendor of the songs and the vendor of the tickets if a verifiable receipt was used as a proof-of-purchase.

PaySwarm supports verifiable receipts, even when the signatory of the receipt is offline. The only piece of information necessary is the public key of the PaySwarm Authority. This means that receipts can be verified even if the transaction processor is offline or goes away entirely.

OpenTransact does support receipts delivered via the Asset Service, using OAuth 2 as the access control mechanism. This means that retrieving a receipt for validation requires having a human in the loop. A verifying website would need to request an OAuth 2 token, the receipt-holder would need to grant access to the verifying website, and then the verifying website would use the token to retrieve the receipt. Currently, OpenTransact does not support digitally signed receipts, and thus it does not support receipt verification if the Asset Service is offline.

Affiliate Sales

When creating content for the Web, getting massively wide distribution typically leads to larger profits. Therefore, it is important for people that create items for sale to be able to grant others the ability to redistribute and profit off of the redistribution, as long as the original creator is compensated under their terms for their creation. Typically, this is called the affiliate model – where a single creator allows wide-distribution of their content through a network of affiliate sellers.

PaySwarm supports affiliate re-sale through digitally signed listings. A listing associates an asset for sale, the license under which asset use is governed and the payment amount and rules associated with the asset. These listings can be used on their originating site, or a third party site. Security of the terms of sale specified in the listings are ensured through the use of digital signatures.

OpenTransact does not support affiliate sales.

Secure Vendor-routed Purchases

At times, network-connectivity can be a barrier for performing a financial transaction. In these cases, the likelihood that a at least one of the transaction participants has an available network connection is high. For example, if a vendor has a physical place of business with a network connection, the customers that frequent that location can depend on the vendor’s network connection instead of requiring their own when processing payments. This is useful when implementing a payment device in hardware, like a smart payment card, without also requiring that hardware device to have a large-area network communication capability, like a mobile phone. A typical payment flow is outlined below:

  1. The vendor presents a bill to the buyer.
  2. The buyer views the bill and digitally signs the bill, stating that they agree to the charges.
  3. The vendor takes the digitally signed bill and forwards it to the buyer’s payment processor for payment.

PaySwarm supports vendor-routed purchases through the use of digital signatures on digital contracts. When the vendor provides a digital contract to a buyer, the buyer may accept the terms of sale by counter-signing the digital contract from the vendor. The counter-signed digital contract can then be uploaded to the buyer’s PaySwarm Authority to transfer the funds from the buyer to the vendor. The digital contract is returned to the vendor with the PaySwarm Authority’s signature on the contract to assert that all financial transfers listed in the contract have been processed.

OpenTransact does not support vendor-routed purchases, requiring instead that both buyer and vendor have network connectivity when performing a purchase.

Secure Customer-routed Purchases

At times, network-connectivity can be a barrier for performing a financial transaction. In these cases, the likelihood that a at least one of the transaction participants has an available network connection is high. For example, a vendor could setup a sales kiosk without any network connection (such as a vending machine) and route purchase processing via a customer’s mobile-phone. A typical payment flow is outlined below:

  1. The buyer selects the product that they would like to purchase from the kiosk (like a soda).
  2. The kiosk generates a bill and transmits it to the mobile device via a NFC connection.
  3. The buyer’s mobile phone digitally signs the bill and sends it to their payment processor for processing.
  4. The digitally signed receipt is delivered back to the buyer’s mobile device, which then transmits it via NFC to the kiosk.
  5. The kiosk checks the digital signature and upon successful verification and delivers the product to the buyer (a cold, refreshing soda).

PaySwarm supports customer-routed purchases through the use of digital signatures on digital contracts. When the buyer receives the digital contract for the purchase from the kiosk, it is already signed by the vendor which implies that the vendor is amenable to the terms in the contract. The buyer then counter-signs the contract and sends it up to the PaySwarm Authority for processing. The PaySwarm Authority then counter-signs the contract, which is delivered back to the buyer, which then routes the finalized contract back to the kiosk. The kiosk checks the digital signature of the PaySwarm Authority on the contract and delivers the product to the buyer.

OpenTransact does not support customer-routed purchases, requiring instead that both buyer and vendor have network connectivity when performing a purchase.

Currency Mints

A currency mint is capable of creating new units of a particular currency. A currency mint is closely related to the topic of alternative currencies, as previously mentioned in this blog post. In order for an alternative currency to enter the financial network, there must be a governing authority or algorithm that ensures that the generation of the currency is limited. If the generation of a currency is not limited in any way, hyperinflation of the currency becomes a risk. The most vital function of a currency mint is to carefully generate and feed currency into payment processors.

PaySwarm supports currency mints by allowing the currency mint to specify an alternative currency via an IRI on the network. The currency IRI is then used as the origin of currency across all PaySwarm systems. The currency mint can then deposit amounts of the alternative currency into accounts on any PaySwarm Authority through an externally defined process.

OpenTransact does not support currency mints. It does support alternative currencies, but does not specify how the alternative currency can be used across multiple payment processors and therefore only supports alternative currencies in a non-inter-operable way.


Crowd-funding is the act of pooling a set of funds together with the goal of effecting a change of some kind. Kickstarter is a great example of crowd-funding in action. There are a number of requirements when crowd-funding; ensure that money is not exchanged until a funding goal has been reached (an intent to fund), if a funding goal is reached – allow the mass collection of money (bulk transfer), if a funding goal is not reached – invalidate the intent to fund (cancellation).

PaySwarm supports crowd-funding. The digital contracts that PaySwarm uses can express an intent to fund. Mass-collection of a list of digital contracts expressing an intent to fund can occur in an atomic operation, which is important to make sure that the entire amount is available at once. Finally, the digital contracts containing an intent to fund also contain an expiration time for the offer to fund.

OpenTransact does not support crowd-funding as described above.

Data Portability

Data portability provides the mechanism that allows people and organizations to easily transfer their data across inter-operable systems. This includes identity, financial transaction history, public keys, and other financial data in a way that ensures that payment processors cannot take advantage of platform lock-in. Making data portability a key factor of the protocol ensures that the customers always have the power to walk away if they become unhappy with their payment processor, thus ensuring strong market competition among payment processors.

PaySwarm ensures data portability by expressing all of its Web Service data as JSON-LD. There will also be a protocol defined that ensures that data portability is a requirement for all payment processors implementing the PaySwarm protocol. That is, data portability is a fundamental design goal for PaySwarm.

OpenTransact does not specify any data portability requirements, nor does it provide any system inter-operability requirements. While certainly not done on purpose, this creates a dangerous formula for vendor lock-in and non-interoperability between payment processors.

Follow-up to this blog post

[Update 2011-12-21: Partial response by Pelle to this blog post: OpenTransact the payment standard where everything is out of scope]

[Update 2012-01-01: Rebuttal to Pelle’s partial response to this blog post: Web Payments: PaySwarm vs. OpenTransact Shootout (Part 2)]

[Update 2012-01-02: Second part of response by Pelle to this blog post: OpenTransact vs PaySwarm part 2 – yes it’s still mostly out of scope]

[Update 2012-01-08: Rebuttal to second part of response by Pelle to this blog post: Web Payments: PaySwarm vs. OpenTransact Shootout (Part 3)]

W3Conf – Day Two

W3Conf 2011: HTML5 and the Open Web Platform

The W3C, the folks that create many of the Web technologies you use today, is holding its first conference. If you want to know more about the future of HTML5 and the open Web platform – you’ve come to the right place.

The second day of events are being live-blogged on this page.

If you have access to online video, the event is being live streamed right now. If you don’t have access to video, we’ll be covering what’s going on this page. If the page doesn’t auto-refresh for you every minute, just hit the refresh button to see the latest.


W3Conf LiveBlog – Day One

W3Conf 2011: HTML5 and the Open Web Platform

The W3C, the folks that create many of the Web technologies you use today, is holding its first conference. If you want to know more about the future of HTML5 and the open Web platform – you’ve come to the right place.

The first and second day of events are being live-blogged on this page.

If you have access to online video, the event is being live streamed right now. If you don’t have access to video, we’ll be covering what’s going on this page. If the page doesn’t auto-refresh for you every minute, just hit the refresh button to see the latest.


The Need for Data-Driven Standards

Summary: The way we create standards used by Web designers and authors (e.g. HTML, CSS, RDFa) needs to employ more publicly-available usage data on how each standard is being used in the field. The Data-Driven Standards Community Group at the W3C is being created to accomplish this goal – please get an account and join the group.

Over the past month, there have been two significant events demonstrating that the way that we are designing languages for the Web could be improved upon. The latest one was the swift removal of the <time> element from HTML5 and then the even swifter re-introduction of the same. The other was a claim by Google that Web authors were getting a very specific type of RDFa markup wrong 30% of the time, which went counter to the RDFa Community’s experiences. Neither side’s point was backed up with publicly-available usage data. Nevertheless, the RDFa Community decided to introduce RDFa Lite with the assumption that Google’s private analysis drew the correct conclusions and understanding that the group would verify the claims, somehow, before RDFa Lite became an official standard.

Here is what is wrong with the current state of affairs: No public data or analysis methodologies were presented by people on either side of the debate, and that’s just bad science.

Does it do What you Want it to do?

How do you use science to design a Web standard such as HTML or RDFa? Let’s look, first, at the kinds of technologies that we employ on the Web.

A dizzying array of technologies were just leveraged to show this Web page to you. To take bits off of a hard drive and blast them toward you over the Web at, quite literally, the speed of light is an amazing demonstration of human progress over the last century. The more you know about how the Web fits together, the more amazing it is that it works with such dependability – the same way for billions of people around the world, each and every day.

There are really two sides to the question of how well the Web “works”. The first side questions how well it works from a technical standpoint. That is, does the page even get to you? What is the failure rate of the hardware and software relaying the information to you? The other side asks the question of how easy it was for someone to author the page in the first place. The first side has to do more with back-end technologies, the second with front-end technologies. It is the design of these front-end technologies that this blog post will be discussing today.

Let’s take a look at the technologies that went into delivering this page to you and try to put them into two broad categories; back-end and front-end.

Here are the two (incomplete) lists of technologies that are typically used to get a Web page to you:

Back-end Technologies: Ethernet, 802.11, TCP/IP, HTTP, TrueType, JavaScript*, PHP*
Front-end Technologies: HTML, CSS, RDFa, Microdata, Microformats

* It is debatable that JavaScript and PHP should also go in the front-end category, but since you can write a web page without them, let’s keep things simple and ignore them for now.

Back-end technologies, such as TCP/IP, tend to more prescriptive and thus easier to test. The technology either works reliably, or it doesn’t. There is very little wiggle room in most back-end technology specifications. They are fairly strict in what they expect as input and output.

Front-end technologies, such as HTML and RDFa, tend to be more expressive and thus much more difficult to test. That is, the designers of a language know what the intent of particular elements and attributes are, but the intent can be mis-interpreted by the people that use the technology. Much like the English language can be butchered by people that don’t write good, the same principle applies to front-end technologies. An example of this is the rev attribute in HTML – experts know its purpose, but it has traditionally not been used correctly (or at all) by Web authors.

So, how do we make sure that people are using the front-end technologies in the way that they were intended to be used?

Data Leads the Way

Many front-end technology standards, like HTML and RDFa, are frustrating to standardize because the language designers rarely have a full picture of how the technology will be used in the future. There is always an intent to how the technology should be used, but how it is used in the field can deviate wildly from the intent.

During standards development, it is common to have a gallery of people yelling “You’re doing it wrong!” from the sidelines. More frustratingly, for everyone involved, some of them may be right, but there is no way to tell which ones are and which ones are not. This is one of the places that the scientific process can help us. Data-driven science has a successful track record of answering questions that are difficult for language designers, as individuals with biases, to answer. Data can help shed light on a situation when your community of authors cannot.

While the first draft of a front-end language won’t be able to fully employ data-driven design, most Web standards go through numerous revisions. It is during the design process of the latter revisions that one can utilize good usage data on the Web to influence a better direction for the language.

Unfortunately, good data is exactly what is missing from most of the front-end technology standardization work that all of us do. The whole <time> element fiasco could have been avoided if the editor of the HTML5 specification had just pointed to a set of public data that showed, conclusively, that very few people were using the element. The same assertion holds true for the changes to the property attribute in RDFa. If Google could have just pointed us to some solid, publicly-available data, it would have been easy for us to make the decision to extend the property attribute. Neither happened because we just don’t have the infrastructure necessary to do good data-driven design, and that’s what we intend to change.

Doing a Web-scale Crawl

The problem with getting good usage data on Web technologies is that none of us have the crawling infrastructure that Google or Microsoft have built over the years. A simple solution would be to leverage that large corporate infrastructure to continuously monitor the Web for important changes that impact Web authors. We have tried to get data from the large search companies over the years. Getting data from large corporations is problematic for at least three reasons. The first is that there are legal hurdles that both people outside the organization and people inside the organization must overcome to publish any data publicly. These hurdles often take multiple months to overcome. The second is that some see the data as a competitive advantage and are unwilling to publish the data publicly. The third is that the raw data and the methodology are not always available, resulting in just the publication of the findings, which puts the public in the awkward position of having to trust that a corporation has their best interests in mind.

Thankfully, new custom search services have recently launched that allow us to do Web-Scale crawls. We have an opportunity now to create excellent, timely crawl data that can be publicly published. One of these new services is called 80legs, which does full, customized crawls of the Web. The other is called Common Crawl, which indexes roughly 5 billion web pages and is provided as a non-profit service to researchers. These two places are where we are going to start asking the questions that we should have been asking all along.

What Are We Looking For?

To kick-start the work, there is interest in answering the following questions:

  1. How many Web pages are using the <time> element?
  2. How many Web pages are using ARIA accessibility attributes?
  3. How many Web pages are using the <article> and <aside> elements?
  4. How many sites are using OGP vs. markup?
  5. How many Web pages are using the RDFa property attribute incorrectly?

Getting answers to these questions will allow the front-end technology developers to make more educated decisions about the standards that all of us will end up using. More importantly, having somewhere that we can all go and ask these questions is vitally important to the standards that will drive the future of the Web.

The Data-Driven Standards Community Group

I propose that we start a Data-Driven Standards Community Group at the W3C. The Data Driven Standards Community Group will focus on researching, analyzing and publicly documenting current usage patterns on the Internet. Inspired by the Microformats Process, the goal of this group is to enlighten standards development with real-world data. This group will collect and report data from large Web crawls, produce detailed reports on protocol usage across the Internet, document yearly changes in usage patterns and promote findings that demonstrate that the current direction of a particular specification should be changed based on publicly available data. All data, research, and analysis will be made publicly available to ensure the scientific rigor of the findings. The group will be a collection of search engine companies, academic researchers, hobbyists, web authors, protocol designers and specification editors in search of data that will guide the Internet toward a brighter future.

If you support this initiative, please go to the W3C Community Groups page and join to show you support the group.

A New Way Forward for HTML5

A New Way Forward for HTML5

By halting the XHTML2 work and announcing more resources for theHTML5 project, the World Wide Web Consortium has sent a clear signalon the future markup language for the Web: it will be HTML5.Unfortunately, the decision comes at a time when many working withWeb standards have taken issue with the way the HTML5 specification is being developed.

The shut down of the XHTML2 Working Group has brought to a head along-standing set of concerns related to how the new specification isbeing developed. This page outlines the current state of developmentand suggests that there is a more harmonious way to move forward. Byadopting some or all of the proposals outlined below, the standardscommunity will ensure that the greatest features for the Web areintegrated into HTML5.

What’s wrong with HTML5?

There are likely as many reasons for why HTML5 is problematic as there are for why HTML5 will succeed where XHTML2 didn’t. Some of these reasons are technical in nature, some are based on process, and others may lack sufficient evidential support.

Many, including the author of this document, have praised the WHAT WG for making steady progress on the next version of HTML. Using implementation data to back up additions, removals or re-writes to the core HTML specification has helped improve the standard. In general, browser implementors have been very supportive of the current direction, so there is much to be celebrated when it comes to HTML5.

The HTML5 editorial process, however, has also created several complaints among long-time members of the web standards community that find themselves marginalized as the specification proceeds. The problem has more to do with politics than it does science, but as we find in the real world — the politics are shaping the science.

The biggest complaint with the current process is that the power to change the originating specification lies with one individual. This gives that one individual, or group of individuals, an advantage that has created an acrimonious environment.

In a Consortium like the W3C, a process that shows favor to certain members by giving them privileges that other members can never attain is fundamentally unfair.

In this particular case, it tilts the table toward the current HTML5 specification editor and toward the browser manufacturers. Search companies, tool developers, web designers and developers, usability experts and many others have suddenly found themselves without a voice. Some say that this approach is a good thing — it focuses on those that must adhere to the standard and on those that are producing results. Unfortunately, the approach also creates conflict. The secondary communities feel a sense of unfairness because they feel their needs for the Web are not being met. It is not a simple problem with an easy solution.

HTML5 is now the way forward. In order to ensure that dissenting argumentation can have an impact on the specification, if the argumentation is valid, we must subtly change the editorial process. The changes should not affect the speed at which HTML5 is proceeding, so there is a certain finess that must be employed to any action we perform to make the HTML5 community better.

This set of proposals addresses our current situation and what other similar communities have done to improve their own development processes.

The Goal

The goal of the actions listed in this document is to allow all of the communities interested in HTML5 to collaborate on the specification in an efficient and agreeable manner. The tools that we elect to use have an effect on the perceived editorial process in place. Currently, the process does not allow for wide-scale collaboration and the sharing and discussion of proposals that support consortium-based specification authoring.

The Strategy

The strategy for moving HTML5 forward should focus on being inclusive without increasing disruption, red tape, or maintenance headaches for those who are contributing the most to the current HTML5 specification. Any strategy employed should ensure that we create a more open, egalitarian environment where everyone who would like to contribute to the HTML5 specification has the ability to do so without the barriers to entry that exist today.

About the Author

Manu Sporny is a Founder of Digital Bazaar and the Commons Design Initiative, an Invited Expert to the W3C, the editor for the hAudio Microformat specification, the RDF Audio and Video vocabularies, a member of the RDFa Task Force and the editor of the HTML5+RDFa specification.

Over the next six months, he will be raising funding from private enterprise and public institutions to address many of the issues outlined below. If your company depends on the Internet and can afford to fund just a small fraction of the work below (minimum $8K grant), then please contact him at If you know of an institution that is able to fund the work described on this page, please have them contact Manu.

The Issues

The majority of this document lists some of the current HTML5 issues and attempts to provide actions that the standards community could take to address them.

Problem: A Kitchen Sink Specification

The HTML5 specification currently weighs in at 4.1 megabytes of HTML. If one were to print it out, single-spaced and using 12-point font, the document would span 844 pages. It is not even complete yet and it is already roughly the same length as the new edition of “War and Peace” – a full 3 inches thick.

Reading an 844 page technical specification is daunting for even the most masochistic of standards junkies. The HTML 4.01 specification, problematic in its own right, is 389 pages. Not only are large specifications overwhelming, but it can be almost impossible to find all of the information you need in them. Clearly the specification needs to be long enough to be specific, but not any larger. The issue has more to do with focus and accessibility to web authors and developers than length.

The current HTML5 specification contains information that is of interest to web authors and designers, parser writers, browser manufacturers, HTML rendering engine developers, and CSS developers. Unfortunately, the spec attempts to address all of these audiences at once, quickly losing its focus. A document that is not focused on its intended audience will reduce its utility for all audiences. Therefore, some effort should be placed into making the current document more coherent for each intended audience.

Action: Splitting HTML5 into Logically Targeted Documents

“Know your audience” is a lesson that many creative writers learn far before their first comic, book, or novel is published. Instead of creating a single uber-specification, we should instead split the document into logically targeted sections that are more focused on their intended audience. For example, the following is one such break-out:

  • HTML5: The Language – Syntax and Semantics
    • This document lists all of the language elements and their intended use
    • Useful to authors and content creators
  • HTML5: Parsing and Processing Rules
    • This document lists all of the text->DOM parsing and conversion rules
    • Useful for validator and parser writers
  • HTML5: Rendering and Display Rules
    • This document lists all of the document rendering rules
    • Useful for browser manufacturers and developers of otherapplications that perform visual and auditory display
  • HTML5: An Implementers Guide
    • This document lists implementation guidelines, common algorithms, and other implementation details that don’t fit cleanly into the other 3 documents.
    • Useful for application writers who consume HTML5

Problem: Commit Then Review

The HTML5 specification, to date, has been edited in a way that has enjoyed large success in other open source projects. It uses a development philosophy called Commit-Then-Review (CTR). This process is used from time to time at the W3C for small changes, with larger, possibly contentious changes using a process called Consensus-Then-Commit (CTC).

In a CTR process, developers make changes to an open source project and commit their changes for review by other developers. If the new changes work better, they are kept. If they don’t work, they are removed. Source control systems such as CVS, Subversion, and Git are heavily relied upon to provide the ability to rewind history and recover lost changes. This process of making additions to a specification can cause an unintended psychological effect; ideas that are already in a specification are granted more weight than those that are not. In the worst case, one may use the fact that text exists to solve a certain problem to squelch arguments for better solutions.

In a CTC process, as used at the W3C, consensus should be reached before an editor changes a document. This approach assumes that it is much harder to remove language than it is to add it. The approach is often painfully slow and it can take weeks to reach consensus on particularly touchy items. Many have asserted that the HTML5 specification could not have been developed via CTC, an assertion that is also held by the author of this document.

There is nothing wrong with either approach as long as certain presumptions hold. One of the presumptions that CTR makes is that there are many people who may edit a given project. This ensures that good ideas are improved upon, bad ideas are quickly replaced by better ideas, and that no one has the ability to impose his or her views on an entire community without being challenged. However, in HTML5, there is only one person who has editing privileges for the source document. This shifts the power to that individual and requires everyone else in the project to react to changes implemented by that committer.

CTR also requires a community where there is mutual trust among the project leaders and contributors. The HTML5 community is, unfortunately, not in that position yet.

Action: More Committers + Distributed Source Control

Commit Then Review is a valuable philosophy, but it is dangerous when you only have one committer. Having only one committer creates a barrier to dissenting opinion. It does not allow anyone else to make a lasting impact on the only product of HTML WG and WHAT WG – the HTML5 specification.

It is imperative that more editors are empowered in order to level the playing field for the HTML5 specification. It is also important that edit-wars are prevented by adopting a system that allows for distributed editing and source management.

The Git source control system is one such distributed editing and management solution. It doesn’t require linear development and it is used to develop the Linux kernel – a project dealing with many more changes per day than the HTML5 specification. It allows for separate change sets to be used and, most importantly, there is no one in “control” of the repository at any given point. There is no central authority, no political barrier to entry, and no central gatekeeper preventing someone from working on any part of the specification.

Distributed source control systems are like peer-to-peer networks. They make the playing field flat and are very difficult to censor. The more people that have the power to clone and edit the source repository, the larger the network effects. We would go from the one editor we have now, to ten editors in a very short time, and perhaps a hundred contributors over the next decade.

Problem: No Way for Experts to Contribute in a Meaningful Way

There have been three major instances spanning 12-24 months in which expert opinion was not respected during the development of the current HTML5 specification. These instances concerned Scalable Vector Graphics (SVG), Resource Description Framework in Attributes (RDFa), and the Web Accessibility Initiative (WAI).

The situation has received enough attention so that there are now web comics and blogs devoted to the conflict between various web experts and the author of the HTML5 specification. These disagreements have become a spectacle with many experts now refusing to take part in the development of the HTML5 specification citing others’ unsuccessful attempts to convey decades of research to the current editor of the HTML5 specification.

Similarly, the editor of the current specification has many valid reasons not to integrate suggestions into the massive document that is HTML5. However, an imperfection in a particular suggestion does not eliminate the need for a discussion of alternative proposals. The way the problem is being approached, by both sides, is fundamentally flawed.

Action: Alternate, Swappable Specification Sections

The HTML5 document should be broken up into smaller sections. Let’s call them microsections. The current specification source is 3.4 megabytes of editable text and is very difficult to author. It is especially daunting to someone who only wants to edit a small section related to their area of expertise. It is even more challenging if one wishes to re-arrange how the sections fit together or to re-use a section in two documents without having to keep them in sync with one another.

Ideally, certain experts, W3C Task Forces, Working Groups, and technology providers could edit the HTML5 specification microsections without having to worry about larger formatting, editing, merging, or layout issues. These microsections could then be processed by a documentation build system into different end-products. For example, HTML5+RDFa or HTML5+RDFa+ARIA, or HTML5+ARIA-RDFa-Microdata. Specifications containing different technologies could be produced very quickly without the overhead of having to author and maintain an entirely new set of documents. You wouldn’t need an editor per proposal to keep all of them in sync with one another, thus reducing the cost of making new proposals. Some microsections could even be used across 2-3 HTML5-related documents.

This approach, coupled with the move to a more decentralized source control mechanism, would provide a path for anyone who can clone a Git repository and edit an HTML file to contribute to the HTML5 specification. Merging changes into the “official” repository could be done via W3C staff contacts to ensure fair treatment to all proposals.

Problem: Mixing Experimental Features with Stable Ones

There is language in the HTML5 specification that indicates that different parts of the specification are at different levels of maturity. However, it is difficult to tell which parts of the specification are at which level of maturity without deep knowledge of the history of the document. This needs to change if we are going to start giving the impression of a stable HTML5 specification.

When someone who is not knowledgeable about the arcane history of the HTML5 Editors Draft sees the <datagrid> element in the same document as the <canvas> element, it is difficult for them to discern the level of maturity of each. In other words, canvas is implemented in many browsers while datagrid is not, yet they are outlined in the same document. The HTML5 specification does have a pop-up noting which features are implemented, have test cases, and are marked for removal, however it is difficult to read the document knowing exactly which paragraphs, sentences and features are experimental and which ones are not.

This is not to say that either <datagrid> or <canvas> shouldn’t be in the HTML5 specification. Rather, there should be different HTML5 specification maturity levels and only features that have reached maturity should be placed into a document entering Last Call at the W3C. We should clearly mark what is and isn’t experimental in the HTML5 specification. We should not standardize on any features that don’t have working implementations.

Action: Shifting to an Unstable/Testing/Stable Release Model

There is much that the standards community can learn from the release processes of larger communities like Debian, Ubuntu, RedHat, the Linux kernel, FreeBSD and others. Each of those communities clearly differentiates software that is very experimental, software that is entering a pre-release testing phase, and software that is tested and intended for public consumption.

If the HTML5 specification generation process adopts Microsections and a distributed source control mechanism, it should be easy to add, remove, and migrate features from an experimental document (Editors Draft), to a testing specification (Working Draft), to a stable specification (Recommendation).

While it may seem as if this is how W3C already operates, note that there is usually only a single stage that is being worked on at a time. This doesn’t fit well with the way HTML5 is being developed in that there are many people working on stable features, testing features, and experimental features simultaneously. A new release process is needed to ensure a smooth, non-disruptive transition from one phase to the next.

Problem: Two Communities, One Specification

When the WHAT WG started what was to become HTML5, the group was asked to do the work outside of the World Wide Web Consortium process. As a benefit of working outside the W3C process, the HTML5 specification was authored and gained support and features very quickly. Letting anybody join the group, having a single editor, dealing directly with the browser manufacturers and focusing on backwards compatability all resulted in the HTML5 specification as we know it today.

When the W3C and WHAT WG decided to collaborate on HTML5 as the future language of the web, it was decided that work would continue both in the HTML WG and the WHAT WG. People from the WHAT WG joined the HTML WG and vice-versa to show good faith and move towards openly collaborating with one another. At first, it seemed as if things were fine between the two communities. That is, until the emergence of an “us vs. them” undercurrent – both at the W3C and in WHAT WG.

Keeping both communities active was and will continue to be a mistake. Instead of combining mailing lists, source control systems, bug trackers and the variety of other resources controlled by each group, we now operate with duplicates of many of the systems. It is not only confusing, but inefficient to have duplicate resources for a community that is supposed to be working on the same specification. It sends the wrong signal to the public. Why would two communities that are working on the same thing continue to separate themselves from one another, unless there was a more fundamental issue that existed?

Action: Merging the Communities

The communities should be merged slowly. Data should be migrated from each duplicate system and a single system should be selected. The mailing lists should be one of the first things to be merged. If either community feels that the other community isn’t the proper place for a list, then a completely new community should be created that merges everyone into a single, cohesive group.

The two communities should bid on an domain (,, and consolidate resources. This would not only be more efficient, but also eventually remove the “us vs. them” undercurrent.

Problem: Specification Ambiguity

A common problem for specification writers is that, over the years, their familiarity with the specification makes them unable to see ambiguities and errors. This is one of the reasons why all W3C specifications must go through a very rigorous internal review process as well as a public review process. Public review and feedback are necessary in order to clarify specification details, gather implementation feedback, and ultimately produce a better specification.

In order to contribute bug reports, features, or comments on the HTML5 specification, one must send an e-mail to either the HTML WG or the WHAT WG. The combined HTML5 and WHAT WG mailing list traffic can range from 600 to 1,200 very technical e-mails a month. Asking those who would like to comment on the HTML5 specification to devote a large amount of their time to write and respond to mailing list traffic is a formidable request. So formidable that many choose not to participate in the development of HTML5.

Action: In-line Specification Commenting and Bug Tracking

There are many websites that allow people to interactively comment on articles or reply to comments on web pages. We need to ensure that there are as few barriers as possible for commenting on the HTML5 specification. Sending an e-mail to the HTML WG or WHAT WG mailing lists should not be a requirement for providing specification feedback. It should be fairly easy to create a system that allows specification readers to comment directly on text ambiguities, propose alternate solutions, or suggest text deletions when viewing the HTML5 specification.

The down-side to this approach is that there may be a large amount of noise and a small amount of signal but that would be better than no signal at all. We must understand that many web developers, authors, and those who have an interest in the future of the Web cannot put as much time into the HTML5 specification as those who are paid to work on it.

Problem: No Way to Propose Lasting Alternate Proposals

If one were to go to the trouble of adding several new sections into HTML5 or modifying parts of the document, they would then need to keep their changes in sync with the latest version of the document at This is because there is currently only one person who is allowed to make changes to the “official” HTML5 specification. Keeping documents in sync is time consuming and should not be required of participants in order to affect change.

Since the HTML5 specification document is changed on an almost daily basis, the current approach forces editors to play a perpetual game of catch-up. This results in them spending more time merging changes than contributing to the HTML5 document. It may also cause the feeling that their changes are less important than those further upstream.

Action: At-will Specification Generation from Interchangeable Modules

As previously mentioned, moving to a microsectioned approach can help in this case as well. There can be a number of alternative microsections that editors may author so that each section can be a drop-in replacement. The document build system could be instructed, via a configuration file, on which microsections to use for a particular output product. Therefore if someone wanted to construct a version of HTML5 with a certain feature X, all that would be required is the authoring of the microsection and an instruction for the documentation build system to generate an alternative specification with the microsection included.

Problem: Partial Distributed Extensibility

One of the things that will vanish when the XHTML2 Working Group is shut down at the end of this year is the concept that there would be a unified, distributed platform extensibility mechanism for the web. In short, distributed platform extensibility allows for the HTML language to be extended to contain any XML document. Examples of XHTML-based extensibility include embedding SVG, MathML, and a variety of other specialized XML-based markup languages into XHTML documents.

HTML5 is specifically designed not to be extended in a decentralized manner for the non-XML version of the language. It special-cases SVG and MathML into the HTML platform. It also disallows platform extensibility and language extensibility in HTML5 (not XHTML5) using the same restricted rubric when they are clearly different types of extensibility mechanisms.

Many proponents of distributed extensibility are very concerned by this rubric and resulting design decision. At the heart of distributed extensibility is the assertion that anyone should be able to extend HTML in the future to address their markup needs. It is a forward-looking statement that asserts that the current generation cannot know how the world might want to extend HTML. The power should be placed into the hands of web developers so that they may have more tools available to them to solve their particular set of problems.

Action: A Set of Proposals for Distributed Extensibility

Whether or not distributed extensibility will ever be used on a large scale is not the issue. The issue is that there are currently no proposals for distributed extensibility in HTML5 (again, not XHTML5). Without a proposal, there is no counter-point for the “no distributed extensibility” assertion that HTML5 makes. Thus, if the W3C were to form consensus at this point, there would be only one option.

Consensus around a single option is not consensus. In the very least, HTML5 needs draft language for distributed extensibility. It doesn’t need to be the same solution as XHTML5 provides, it doesn’t even need to be close, but alternatives should exist. XHTML spent many years solving this problem and because of that, SVG and MathML found a home in XHTML documents. Enabling specialist communities, such as the visual arts and mathematics, to extend HTML to do things that were not conceivable during the early days of the Web is a fundamental pillar of the way the Web operates today.

Similarly, a set of tools to provide data extensibility do not exist in a form that are acceptable to the standards community. These tools are also going to fundamentally shape the way we embed and transfer information in web pages. If we are realistic about the expanse of problems that the Web is being called upon to solve, we should ensure that data extensibility capabilities are provided as we move forward.

We cannot be everything to everyone, we should provide some combination of features like Javascript, embedding XML documents, and RDFa, in both HTML5 and XHTML5 to help web developers solve their own problems without needing to affect change in the HTML5 specification.

Problem: Disregarding Input from the Accessibility Community

Accessibility is rarely seen as important until one finds oneself in a position where they or a loved one’s vision, hearing, or motor skills do not function at a level that makes it easy to navigate the web. Accessible websites are important not only to those with disabilities, but also to those who cannot interact with a website in a typical fashion. For example, web accessibility is also important when one is on a small form factor device, using a text-only interface, or a sound-based interface.

Members of the Website Accessibility Initiative (WAI) and the creators of The Accessible Rich Internet Applications (ARIA) technical specification have noted on a number of occasions that they feel as if they are being ignored by the HTML5 community.

Action: Integrate the Accessibility Community’s Input

Empowering WAI to edit the HTML5 specification in a way that does not conflict with others, but produces an accessibility-enhanced HTML5 specification is important to the future of the Web. Microsections and distributed source control would allow this type of collaboration without affecting the speed at which HTML5 is being developed. It may be that WAI needs a specification writer that is capable of producing unambiguous language that will enable browser manufacturers to easily create interoperable implementations.

The Plan of Action

In order to provide a greater impact on the near-term health of HTML5, the proposals listed above should be performed in the following order (which is not the order in which they are presented above):

  1. Implementation of git for distributed source control.
  2. Microsection splitter and documentation build system.
  3. Recruit more committers into the HTML5 community.
  4. Split features based on their experimental nature into unstable and testing during Last Call.
  5. Implement in-line feedback mechanism for HTML5 spec.
  6. Distributed extensibility proposals for HTML5.
  7. Better, more precise accessibility language for HTML5.
  8. Merge the HTML WG and WHAT WG communities.


The author would like to thank the following people for reviewing this document and providing feedback and guidance (in alphabetical order): Ben Adida, John Allsopp, Tab Atkins Jr., L. David Baron, Dan Connolly, John Drinkwater, Micah Dubinko, Michael Hausenblas, Ian Hickson, Mike Johnson, David I. Lehn, Dave Longley, Samantha Longley, Shelley Powers, Sam Ruby, Doug Schepers, and Kyle Weems.