Today, I published all of my known genetic data as open source and released all my rights to the data. I’m proud to be the first person in the world to commit my genetic data into a decentralized source control system under a public domain license. The initial reactions that I received when I told some of my friends that I was going to do this was a combination of shock and skepticism.
“Why would you do something like that?”
“Aren’t you afraid that somebody is going to use that against you?
“What if your healthcare provider got a hold of that? They’d love to look through it in order to deny you for some pre-existing condition!”
“Ugh, I’d never want to know that sort of stuff about myself!”
“What if somebody clones you!?”
I’ve thought long and hard about each of those questions and the many more that you ask yourself before publishing this sort of personal data. There are large privacy implications in doing this. However, speaking solely for myself, I think the benefits outweigh the drawbacks. I’ll explain my thought process behind each of those questions in a separate blog post.
However, the result of that thought process is that I’m releasing my genetic data today – that’s what I’d like to focus on in this blog post. So, let’s explore exactly what this data is and how I hope people that write software will use it.
Your Genetic Code
There is a website called 23andme.com that is in the business of analyzing your DNA. To become a member of the service, you pay a fee, they send you a test tube, you spit in the test tube and send it back to them. They then take your spit and place it onto something called a genotyping beadchip. In this particular case, my spit was placed onto the Illumina OmniExpress Plus Genotyping Beadchip. This particular chip is capable of detecting around one million genetic markers. These markers are called single-nucleotide polymorphisms or SNPs (pronounced ‘snip’) for short.
In combination, these SNPs can tell you quite a bit about your genetic makeup. Things such as your eye color, hair color, hair curl, whether you are at an increased risk for diabetes, where your ancestors came from, or even things like if you’re resistant to the HIV virus or if you have the type of muscles that would make you a good sprinter.
There are around 10 million SNPs in the human genome, the Illumina chip can currently analyze around 1 million of them (966,977 – to be exact). Of those roughly 1 million pieces of data, all of science only knows what around 14,515 of them do. Of the SNPs that we know about, we’re still shaky about all of the things that many of them affect – we’re not so sure about what the data is telling us. On the 23andme site, they only list around 160 SNPs and their effect on you. This means that of the raw data I’m publishing today, science still doesn’t know what 952,462 of these markers do. Talk about a treasure trove of information, just waiting to be unlocked! As science marches steadily onward, we’ll learn more about each one of those 952,462 markers and how they affect how we are born, grow, live and die.
One of the best features of 23andme is that they allow you to download your entire genetic profile from the Illumina chip in a raw, non-proprietary format. This is very big news for people that are capable programmers. It means that for the first time in history, there is an inexpensive service that can extract, decode and export your genetic information to a non-proprietary file format.
As an open source software developer, there are certain commits that you make to a public source code repository that leave you feeling better about the state of the world. This was certainly one of them for me:
msporny@tao:~/work/dna$ git add ManuSporny-genome.txt msporny@tao:~/work/dna$ git commit -a [master a08b027] Added my genome into source control. 1 files changed, 966992 insertions(+), 0 deletions(-) create mode 100644 ManuSporny-genome.txt
Doing that made me realize how quickly we’re narrowing in on some of the most debilitating human diseases. It gave me hope that our children may enjoy a far better quality of healthcare than we do today. Most of all, it gave me hope that we will be able to better help the nurses, doctors and medical researchers as a society – more than with just money, but with our time, expertise and energy. That commit sent chills up my spine – to me, it symbolized a brighter future for all of us.
So, now that all of us can get a hold of that data, what can we do with it?
Analyzing your Genetic Data
23andme does a great job giving you reports on research that they’re confident of, for example, I’m at a 13.4% increased risk for Age-related Macular Degeneration. The average is 7% – which means that I’m about 1.91 times more likely than the average person to start losing my eyesight as a result of old age. This makes sense as one of my grandparents has a bad case of age-related macular degeneration. There are around 160 of these types of reports that you get with your 23andme data, but what if you want to dive deeper into your genetic code?
Code is code, whether it is 1s and 0s or A, G, C, and Ts. Analyzing code and data is something that many Computer Scientists do quite often and quite well. Think of the amount of data that Facebook, Google and Twitter deal with on a daily basis. Think about how quickly you can search over a trillion documents on Google (less than a second in most cases).
Personally, I was expecting the same sort of instant searching and analysis functionality on 23andme. It’s just not there. Don’t get me wrong, 23andme is a great service and if this kind of stuff interests you, you should definitely get a kit right now. The kits go on sale twice a year. I got my spit analyzed for $150 total – it’s a deal, any way that you look at it. That and you get instant access to your raw data – that’s the best part.
However, searching through your raw data on 23andme sucks. Remember, there are only about 160 reports on the 23andme site, but there are over 14,515 SNPs that are known. If you want to find out more than just the 160 reports that 23andme has, there is this great website out there called SNPedia.com. SNPedia is basically the Wikipedia of genetic information.
Keep in mind there are usually many SNPs that come into play for traits like eye color, hair color or certain types of cancer, or where your ancestors came from. 23andme does the heavy lifting for most of their reports, but there are many SNPs that they don’t show you in their reports. So, if you want to find out about anything that is not on the 23andme site, you have to manually search for the SNPs you’re looking for on SNPedia. To make this even more difficult, SNPs have fairly opaque names like rs1815739.
If you are looking for more than 1 SNP, it can take a long time. You have to first look up the original SNP that interests you on SNPedia. Once you have the original marker on the screen, it might link to upwards of 10 additional SNPs that affect the trait you’re researching. You have to manually type in each SNP one-by-one into the 23andme site, click “Search”, write down the sequence for that SNP, such as “GG” or “AA” and repeat this process for as many SNPs as you’re looking for.
What can Web Programmers do for Genetics?
Manually searching for these markers is unnecessarily time consuming. Doing stuff like this is why we have computers – they’re good at computing! Your genetic data fits in 25 Megabytes of memory – a tiny, tiny fraction of the tiniest USB thumb-drive. This genetic data is the equivalent of 5 MP3 songs, a small website, or 5-7 high resolution digital photos. You can type a Google search for “eye color” and get back a result in less than a second after searching the entire Internet. Why can’t you do that for your genetic data?
I think programmers, especially Web programmers, can do better. That’s the driving reason that I’m releasing this data into the public domain. I’d like to see an open source website that can search SNPedia in the blink of an eye – just like Google Instant does. If I type in “blood type” it should tell me all of the things it can find out about my blood type. If I type in “eyes”, it should be able to tell me everything that it knows about me concerning macular degeneration, eye color, etc. There is a lot of data out there on SNPedia, we just need a nice, personalized interface to work with it.
That’s just one idea, though. There are thousands of other ideas hidden away out there on the Web. One of them may be hiding in that beautiful brain of yours. I hope that you will share this story with other people that may be interested in helping us to reduce suffering in the world. I hold great hope for this new technology – we are primed for some amazing health-related advances in our lifetime. If you know how to program, design or write – you can help. You can start by blogging or tweeting about this post, or you can: