Credentials solve a variety of real authentication problems on the Web today, we can implement them in a way that doesn’t threaten anonymity or privacy and avoid the mistakes made in previous attempts. The W3C is the right place to do this important work.
Over the past two years a growing number of organizations in education, finance, and healthcare have been gathering in W3C Community Groups around the concept of a standard format for credentials on the Web.
A credential is a qualification, achievement, quality, or piece of information about an entity’s background, typically used to indicate suitability. A credential could include information such as a name, home address, government ID, professional license, or university degree.
We use credentials when we show our shoppers card to get discounts at the grocery store, we use them when we show a drivers license to order a drink at the bar, and we use them when we show our passport as we enter a foreign country. The use of credentials to demonstrate capability, membership, status, and minimum legal requirements is something that we do on a regular basis as we go about our lives.
A variety of identity and authentication solutions exist today to perform things like single-sign on and limited attribute verification, but they are not widely deployed enough to be ubiquitous on the Web. As a result, these problems exist in the world today because it’s difficult to provide a flexible, digitally verifiable credential:
- In 2014, 12.7 million U.S. consumers experienced identity fraud losses of over $16B dollars.
- Over the last 5 years, more than $100B in identity-related fraud losses have been detected in the United States alone. That number is in the several hundreds of billions of dollars worldwide.
- The cost to law enforcement ranges from $15,000 to $25,000 to investigate each identity theft case; many of them are not investigated. Victims spend, on average, 600 hours trying to clear damaged credit or even criminal records caused by the thief.
- In 2010, 7 million people self-reported illegal use of prescription drugs in the previous month in the US. These drugs are often acquired through the use of faked credentials. The healthcare costs alone of nonmedical use of prescription opioids – the most commonly misused class of prescription drugs – are estimated to total $72.5 billion annually.
- Educational and professional service testing fraud has led to multiple hundred million dollar losses when test takers are not who they say they are and the testing agencies are held accountable as a result.
- Forty thousand legitimate Ph.D.s are awarded annually in the U.S. — while 50,000 spurious Ph.D.s are purchased; that is, more than 50% of new PhDs degrees are fake and more than 500,000 working Americans have a fraudulent degree. More than 25% of job applicants inflate their educational achievements on their resumes.
So, the simple question has been raised in the W3C community: Is it time to do something about Credentials and if so, can W3C add value to the standardization process? Or is this already a solved problem?
There are many different types of credentials in the world and many industry verticals that use credentials in a variety of different ways.
There are many categories of credentials that have been identified as important to society, here are a sample of them:
- Academic credentials and co-curricular activities are recognized and exchanged among learners, institutions, employers, or consumers.
- A worker’s certified skill or license is a condition for employment, professional development, and promotability.
- Civil Society
- Access to social benefits and contracts may be based on verifiable conditions such as marital status.
- Ensuring that medical professionals are properly licensed to write prescriptions for controlled substances or provide certain services to patients.
- Regulations require the proper identification of parties in high value transactions. The legal right to purchase a product depends on the verifiable age or location of the buyer.
The use of credentials is a big part of how our society operates. In order to standardize their usage, it’s important that a generalized mechanism is used to express all the data associated with the types of credentials above. To put it another way, the solution shouldn’t be specific to each category above because there are too many different types of credentials for a single group to standardize. Rather, the solution should be a generalized mechanism that enables verifiable claims to be made about an entity where the contents of the credential can be specified by a specific industry or market vertical (such as university entrance testing, or pharmaceutical distribution, or wealth management).
HTML, CSS, XML, and JSON-LD, are examples of formats that have been used by many different market verticals to encode data. WebRTC, Geolocation, Drag and Drop, and Web Audio are examples of new ways of encoding and exchanging data over the Web that have found use in a variety of industries. Each of these technologies were standardized at W3C.
The purpose of this post is to help build a shared understanding of the use cases for credentials and, in doing so, help accelerate work at W3C. What follows is a list of concerns and hesitations that have been raised over the past several months. It is helpful to discuss these concerns as part of a larger conversation around use cases and technical merits related to the work.
Hesitation #1: The End of Anonymity on the Web
The spectrum of identity needs across diverse use cases suggests there will be different kinds of credentials to match strong privacy use cases and strong identity use cases.
One of the hesitations expressed by privacy proponents is that if we make it easy to identify people using credentials, that we are going to lose anonymity on the Web. The Web started as a largely anonymous medium of communication. In the early days you could jump from site to site without having to identify yourself or expose any personally identifying information. Things have changed for the worse since then with the advent of IP tracking, evercookies, device fingerprinting, browser fingerprinting, email addresses as identifiers, analytics packages, and ad-driven business models. One could argue that anonymity on the Web was lost long ago.
Whether or not you believe that anonymity on the Web is a real thing today, there is a strong argument to not make things worse. Where strong privacy is required, bearer credentials can be used. Where strong identity assurances are required, we can use tightly-bound credentials.
A tightly-bound credential contains a set of claims that are associated with an identifier, effectively stating things like “John Doe is over the age of 21 — signed by some mutually trusted organization or person”. The “John Doe” part of the credential is the problem, because if you provide that credential to multiple websites, you can be tracked across those websites because they know who you are. A tightly-bound credential, however, can contain a link to retrieve a bearer credential. A bearer credential is a short lived, untrackable (by itself) credential that effectively states things like “The holder of this credential is over the age of 21 — signed by some mutually trusted organization or person”.
Bearer credentials do have downsides. Sophisticated attackers can intercept them and replay them across multiple sites. For this reason, bearer credentials have a very short lived lifetime and may not be accepted by credential consumer sites that require a stronger type of authentication such as a tightly-bound credential. That said, there are many websites where using bearer credentials to verify things like age or postal code should be acceptable.
Do bearer credentials ensure anonymity on the Web? No. The only thing you need to do to defeat them is to ask the entity with the bearer credential what their email address is and you have a universally trackable identifier for them. What bearer credentials do, however, is to help not make the tracking problem worse.
Hesitation #2: Credentials as a Gateway Drug
If an easy mechanism exists to ask for personally invasive credentials, it does not necessarily follow that every website will ask for those credentials or that people will willingly hand them over.
Another concern that is often raised is that if there is a good way to strongly identify people via credentials, and the mechanism is easy to use, more websites will require credentials in order to use their services. This is certainly a concern and predicting what will happen here is more difficult particularly because there are many market forces at work.
Websites have varying degrees of utility to people. Depending on each site’s utility, people are willing to give up more or less of their personal information to access the site. A website that is effectively a collection of cat pictures will probably not convince many to give up their email address in exchange for more cat pictures. To provide contrast to the previous example, a personal banking website will most likely be able to convince you to provide far more personally identifiable information. This dynamic will most likely not change as far as it’s clear what information is being transmitted to each site via a credential.
Ultimately, this choice is up to the person sitting behind the browser and the choice must be presented in such a way as to ensure informed consent before the credential is transmitted.
Hesitation #3: We’ve Tried This Before
While it may sound like this has been tried before, the capabilities required to meet the identified goals are more advanced than what the state of the art today supports.
The third major hesitation for not starting the Credentials work as an official work item at W3C is a misconception that many of the identified goals and capabilities are the same as previous attempts at a solution. SAML, InfoCard, Shibboleth, OAuth, OpenID Connect, WebID+TLS, Mozilla Persona; none have solved the credentialing problem stated at the beginning of this blog post in a significant way.
A detailed analysis of each of these initiatives has been performed in a separate blog post titled Credentials: A Retrospective of Mild Successes and Dramatic Failures. Here’s a summary of the state of the art ranked against desired capabilities:
|Extensible Data Model||-||-||-||✗||-||✓||✗|
|Choice of Syntax||✗||✗||✗||✗||✗||✓||✗|
|Choice of Storage||✗||✓||✗||✗||✗||✓||✗|
The primary finding of the analysis above is simple: there still is no widely deployed and adopted way of exchanging credentials on the Web today, but we do have quite a bit of insight into why the prior attempts at solving the general identity/credentialing problem have failed.
Hesitation #4: There Is Nothing to Do
While the problem may seem trivial on the surface, to solve it correctly requires carefully understanding the current gaps in the Open Web Platform.
Clearly, we should reuse existing technologies wherever possible to build a single coherent, stable solution. We do not want to reinvent the wheel.
If it is true that all the technologies already exist, we could quickly solve the problem and move on to other pressing problems on the Web. Unfortunately, this view is not held by the majority of the community that has had contact with the credentialing problem. Participants in the community have tried and failed to implement credentialing solutions using all of the technologies listed at the beginning of this section. It is true that pieces of the solution exist across multiple standardization organizations, but a set of standards for a vibrant credentialing ecosystem do not exist yet as detailed in this blog post: Credentials: A Retrospective of Mild Successes and Dramatic Failures.
We use credentials in our daily lives. We often receive them from one party and then share them with another. If this is a solved problem on the Web, then why is there no similarly vibrant ecosystem of disparate but interoperable Web-based credentialing systems? Where is the specification that outlines how to express a digitally signed credential? Or the protocol for issuing a credential to a storage location? Or the protocol for requesting a credential?
Even if all of the core technologies exist, they have yet to be put together into a coherent, secure, easy to integrate solution for the Web and that sounds like something where the W3C could add value.
This work is worth doing at the W3C because the Open Web Platform doesn’t have a functionality that society depends on to run its education, healthcare, finance, and government sectors. It’s worth doing because fraud is a big problem on the Web for high-stakes exchanges, and the problem is only getting worse. It’s worth doing because the W3C has a platform that 3 billion people around the world use and we could solve this problem at an international scale if successful. It’s worth doing because we plan to add another 3 billion people to the Web in the next 5 years, many of them lacking the basic credentials they need to participate in society, and we can improve upon that condition. It’s worth doing because the W3C membership has solved problems like this before, and has changed the world for the better in doing so.