All posts in Genetics

The Implications of Genetic Data in the Public Domain

Exactly one week ago, I published my genetic data on github and placed it into the public domain. The response was overwhelmingly positive and the coverage was far greater than anticipated. Here’s a short list of surprising outcomes from that single blog post:

  • It took less than a day for someone to fork and edit the genetic data – and the pull request was hilarious.
  • This started a very popular discussion thread on YCombinator’s Hacker News Channel. Another one was started for the original post.
  • The sheer number of bioengineering/computer science nerd-jokes was staggering. I loved how much fun everyone was having with the post and the genetic data.
  • Engineers from 23andme (the service that sequenced bits of my genome) and from Illumina (the people that make the beadchip that analyzed my genome) left comments and sent me very nice and supportive e-mails. They’re really passionate about what they do – which is always great to see.
  • My tweet hit the front page of Twitter.
  • The blog post hit the front page of Slashdot shortly thereafter.
  • A large magazine in Russia did a story on me, and I found out that it’s illegal to send any of your genetic material outside of Russia to have it analyzed.
  • The founder of contacted me and performed a free analysis on my genome.

Most importantly, there were tens of thousands of people that read the blog post and learned a little more about this exciting new area of research and discovery. There are already folks that have started projects based on the availability of my genetic data. I’m positively thrilled and humbled.

In the previous post, I had mentioned that there were many privacy implications to releasing your genetic data to the world, whether it is in the public domain or not. This post will cover what some of those implications are and how I reasoned my way through them in order to come to the conclusion that it would be safe to release my genetic data into the public domain. The rest of this post will be broken into two sections – short-term and long-term concerns. Each section will contain questions and brief reasoning on the implications.

Short-Term Implications

The short term privacy concerns revolve around things that may happen in the next week, next year, or during the next 20 years. I am 32 years old now, so these are the things I was concerned about happening before the year 2031.

As you read the rest of this post, keep in mind that I tend to be fairly optimistic about how people treat each other based on prior knowledge about health and ancestry. Yes, there have been horrible exceptions (ethnic cleansing, racism, etc.) throughout history, but the people of the world eventually work these atrocities out. Humankind is self-serving and self-preserving, no doubt – but these inherent traits are not incompatible with treating each other with dignity. I think global society, for a large part, works because people are inherently good – perhaps one day we’ll find the genetic markers that elaborate on why.

In many ways I choose to publish my data because I believe that it will help others. Of the dangers below, I tend to perceive them as small enough, and the potential benefits large enough, such that releasing my genetic data into the public domain will be a net-positive act.

What if someone would use this data against you?

My genetic data is fairly average. It shows no terrible health risks. I don’t think it would necessarily hurt my chances of getting a job or being denied for health insurance any more than any other average American pursuing a job or health insurance with the same genetic profile. One of the questions I found is whether or not your genetics could be used against you when applying for a job or health insurance. Due to the Genetic Information Non-Discrimination Act of 2008 (GINA), it is highly unlikely that this would happen (in a public or documented manner) as it is now a punishable federal offense to discriminate based on genetic information.

That does not mean that an employer won’t Google your genetic information and find something that they don’t like and not hire you because of it. There are many illegal reasons an employer may not hire you, and if they’re smart, they’ll never tell you the real reasons. For example, they may not hire you due to your race, age, or accent. Speaking as the owner of a company, in almost every case, these reasons would be bad decisions on the part of the company.

Typically when you try to hire someone, you want the person that is best qualified. Rarely do their genetics and ancestry come into play. If they do, more likely than not, you’re looking for a reason to not hire them. Their behavior and how they get along with others plays far more heavily into the decision. At the moment, we don’t have clear genetic markers that give us strong clues as to the nature of human behavior. If someone is looking for an excuse not to hire you, any irrational reason will do – including your genetic information. The only words that you will ever hear are “We’re sorry, but you didn’t seem to be a good fit for the position.”

What may come into play more often are genetic differences in height and weight. For example, I’m a fairly small person. If I were to apply for a job as a stone mason against someone else that is a large framed, muscular person with the exact same qualifications, my genetic information could be used to determine that I was not the best candidate for the position. However, even in this case – a quick face-to-face meeting could determine that information. Even though my genetics are being used against me in the interview, it would never surface in the conversation. The equally-skilled, large framed, muscular person could probably get more done in a day than I could. They have that genetic advantage and since a good business owner wants to make the most reasonable decision, picking the larger person over me will give their company an advantage. This sort of “genetic discrimination” may happen, but it’s not something that I’m necessarily interested in protecting myself from because the decision is being made for completely rational reasons.

Now, let’s assume that I have markers for some sort of debilitating disease that is not known now, but the genetic markers are found for it in 10 years. Since my genetic information is already out there – people will eventually know that I have this condition. It may cause me to not be able to perform certain types of jobs, but even in that case – knowing this information could help both me and an employer deal with the condition if it were to arise. Keep in mind that many of these markers just signal an “increased chance”, they’re not a certainty. Genetic markers cannot predict exactly when you’re going to have a stroke or die of cancer because a large part of those diseases are environmental – some more so than others.

So, while the information could be used against me by an employer or a healthcare provider – it would have to be done in such a way where I would never know that it was used against me. There are many other things, other than genetic information, that may fall into the same category. Couple this with the fact that using this information to discriminate is illegal and we find ourselves in a world where, like race and ethnicity, this sort of information will increasingly not be used to discriminate.

What about the privacy of your relatives?

By divulging my genetic data, I am inevitably divulging short sequences of my family’s genome. Using my data, one could find out who my family members are (if they had access to their genetic information) and potentially, whether or not they’re susceptible to the same diseases and health risks that I am. However, keep in mind that if someone wanted to find out my genetic data or one of my family member’s genetic data, all that they would have to do would be to follow them around. We shed genetic information in droves everyday. We do this on the cups we drink from, by blowing our nose, using a public rest room, brushing our hair and eating. We are a fountain of DNA – gushing our genetic information onto every surface that we touch.

I view how this information could be used against my family members in the same way that I view how this information could be used against me. The previous question delved into how the information could realistically be used against them. I don’t think the risk is high for the reasons stated previously.

As for finding out if I’m related to someone else – public records are a better place to look for that sort of information. If someone were to want to find a terrible genetic secret in a family member’s DNA, there are plenty of other places that they could get a sample from to analyze.

Often the shortest path to this information is the most efficient, and computationally analyzing genetic information is very far from the shortest path to the information.

What if someone tried to kill you using your genetic information

As ridiculous as it sounds, I was somewhat irrationally worried about this. I do have a few allergies and one fairly bad one that could cause me to go into anaphylactic shock in the worst case. I was told this by a doctor 15 years ago. I don’t even know if I would still react in the same way – so don’t go trying to kill me using the allergy, it may not work! The marker for this is not in my genetic information yet, but I assume that it is only a matter of time before it is found. However, assume that it is found – if someone wants to get rid of me that badly, I would assume that they’d take a more direct route than trying to use allergies as my downfall.

The people that know me well know that I avoid this particular allergy. If I died from it, there would be a very strong suspicion that something was amiss.

In reality, if someone wants you dead that badly – they’ll find a way or get caught trying. If they’re smart, you’ll never see it coming. A staged suicide may be the way to go. There are also many medications that can cause heart attacks and other life-threatening events that may be untraceable. There are many ways that someone that wanted you gone could get rid of you. Worrying about how your genetic data will play into this will not prevent them from finding an easier way. Assuming that there is nothing in your genetics to help them kill you, unless they’re Jigsaw, they will almost always go for the easiest path.

In the end, not releasing your genetic data to the world because you’re worrying about how someone will kill you with that genetic data is wandering into tin-foil hat territory. It is my opinion that there is far more good that can be done by releasing your genetic data than by not releasing it.

Are you worried about a major company suing you because of some sort of patented gene that you may have?

That’s not really how the genetic patents work. Typically, the RNA process is the thing that is patented, not the SNPs. The patented process usually reprograms DNA in some way, whereas the data I’m publishing are just SNP markers – they don’t do anything and are thus not patentable. The likelihood of a major pharmaceutical company suing me because of the way I was born is abysmally low. The ramifications of the courts allowing such a case to proceed are far reaching – it would place a restriction on freedom to reproduce that the general populace would not allow to exist.

It would be a public relations nightmare for the company that was suing – imagine the headlines: “Pharmacom sues people for being born.” Being afraid of what the most slithery of lawyers and companies will do to you is no way to live. I believe that the likelihood of coming under legal pressure for effectively publishing facts about your genetic makeup is abysmally low. Especially since the benefits of publishing such data is already clear, as outlined in the beginning of this post.

Aren’t you worried about friends or strangers finding out this information about you and potentially judging you because of your genetics?

In short, no. I believe that we are far more than just the genetics that make up our body. There is a great TED presentation by Sebastian Seung about the human Connectome that suggests that the way we wire and re-wire our brains really captures who we are. Our genetics just boot up the hardware – our bodies. It’s the software – our minds – that we use to interact with our friends and strangers. It is largely our mind that determines how strangers and our friends interact with us.

My Connectome is something that I would never, ever share with anyone as it could be used to predict my behavior and that is far more dangerous than being able to know my genetic data. A model that is predictive of what I may do from day to day is an incredibly dangerous thing.

If you think about it, this is what websites and search companies are after when they track our movements online. They want to know more about our Connectome than our genetics. They want to know what we’re thinking, what we want, our fears, what motivates us. I don’t think that one’s genetic data is predictive enough of the thing that truly matters – your Connectome.

I’m not divulging information about my mind, which is very private and not really useful to many people. I’m publishing information about my body, which I don’t view as private information and could be of use to people building tools and software to help other people learn more about themselves.

Long-Term Implications

I define long-term as things that will happen more than 20 years from now to well past when I am dead and buried. I don’t spend much time thinking about the long-term implications not because they’re not important or interesting, but more because it’s impossible for me to predict what may happen with this data more than 5 years out. It’s all just wild speculation. Our global society is making technological breakthroughs at such an increasing pace that it’s difficult for me to grasp what the next 10 years will bring in this area.

Would you ever publish your entire genome?

The 23andme genetic data only contains about 1 million SNPs – about 25MB of data. My entire genome is around 350,000MB of data. Sequencing an entire human genome costs around $20,000 USD today, there are a few people that have done it, but that price is currently far outside of my reach.

However, it’s not always going to be and I think that in the next 20 years, it will be possible to get my entire genome sequenced for less than $250. I haven’t decided whether I would want to publish that data, but I expect that I probably will do so in the future. The reasons will be for the same reasons I published my limited genome last week – to advance science and help people write better tools and software to work with the data.

What if people would use your information to clone you?

The data I released isn’t even close to the required amount necessary to make a clone of yours truly. My answer to this question is heavily influenced by the “Connectome” response above. Your body is not your mind, and I believe that it is your mind that is the thing that is unique to each of us, not necessarily our bodies.

The question really comes down to the intended purpose of the clone. Would I support a clone of me that specifically did not have a brain, but was used to produce biologically compatible organs for life extension purposes? If it was ethically and morally sound, yes. I would prefer it if only the organ necessary for transplant were grown – but the question is more interesting if we don’t have that choice.

If I were to start dying before I wanted to, and an ethically-conscious clone could save my life with multiple organ transplants, I would choose to use the clone to extend my life. If my brain-dead clone could help someone else survive, I would support its use for that purpose. That is, assuming that I have any say over what my clone could be used for. I would expect that I wouldn’t, even in the case where it is brain dead. Would I support the creation of a brain-dead clone of myself and then have my brain transplanted into the clone’s body? Absolutely. I’d love to choose when I died.

To put it another way, if someone wanted to clone my Connectome, I would have strong concerns with publishing that information because the mind is the most private thing you own. You have expended great effort in building it. However, if someone wants to create a clone of me, go right ahead – I had nothing to do with the creation of my DNA sequence. To assert ownership over it is asserting rights that I do not have.

Are you worried about how publishing your genetics will affect your children and descendants?

I am concerned, but I am not worried. There is always the unknown. What happens if there is another Hitler, and it just so happens to be in the country where my children and descendants live? If Hitler had a way of testing genetic information, he would have used it to select those that would be forced into the gas chambers. However, if that happens – my genetic data being public domain would not save any of my descendants from that fate. 23andme already exists and that technology cannot be undone.

I think the day is approaching where we will have a great deal of control over what characteristics our children will have. We may even get the choice to use the best of genomes from around the world – where our children may have some of our DNA, but will also have thousands of other people’s DNA. They will be the best of the best. If you could afford to ensure that your children would have a good body to use throughout their life, wouldn’t you make the decision to do so? Keep in mind that when this technology becomes available, and it is cheap enough, it becomes a viable choice for almost everyone.

When that day comes, wouldn’t you choose the best for your children? Wouldn’t doing anything else be considered negligence on the part of the parent?

Many thanks to Dave Longley for his insight, suggestions and numerous corrections to this article.

Open Sourcing My Genetic Data

Today, I published all of my known genetic data as open source and released all my rights to the data. I’m proud to be the first person in the world to commit my genetic data into a decentralized source control system under a public domain license. The initial reactions that I received when I told some of my friends that I was going to do this was a combination of shock and skepticism.

“Why would you do something like that?”
“Aren’t you afraid that somebody is going to use that against you?
“What if your healthcare provider got a hold of that? They’d love to look through it in order to deny you for some pre-existing condition!”
“Ugh, I’d never want to know that sort of stuff about myself!”
“What if somebody clones you!?”

I’ve thought long and hard about each of those questions and the many more that you ask yourself before publishing this sort of personal data. There are large privacy implications in doing this. However, speaking solely for myself, I think the benefits outweigh the drawbacks. I’ll explain my thought process behind each of those questions in a separate blog post.

However, the result of that thought process is that I’m releasing my genetic data today – that’s what I’d like to focus on in this blog post. So, let’s explore exactly what this data is and how I hope people that write software will use it.

Your Genetic Code

There is a website called that is in the business of analyzing your DNA. To become a member of the service, you pay a fee, they send you a test tube, you spit in the test tube and send it back to them. They then take your spit and place it onto something called a genotyping beadchip. In this particular case, my spit was placed onto the Illumina OmniExpress Plus Genotyping Beadchip. This particular chip is capable of detecting around one million genetic markers. These markers are called single-nucleotide polymorphisms or SNPs (pronounced ‘snip’) for short.

In combination, these SNPs can tell you quite a bit about your genetic makeup. Things such as your eye color, hair color, hair curl, whether you are at an increased risk for diabetes, where your ancestors came from, or even things like if you’re resistant to the HIV virus or if you have the type of muscles that would make you a good sprinter.

There are around 10 million SNPs in the human genome, the Illumina chip can currently analyze around 1 million of them (966,977 – to be exact). Of those roughly 1 million pieces of data, all of science only knows what around 14,515 of them do. Of the SNPs that we know about, we’re still shaky about all of the things that many of them affect – we’re not so sure about what the data is telling us. On the 23andme site, they only list around 160 SNPs and their effect on you. This means that of the raw data I’m publishing today, science still doesn’t know what 952,462 of these markers do. Talk about a treasure trove of information, just waiting to be unlocked! As science marches steadily onward, we’ll learn more about each one of those 952,462 markers and how they affect how we are born, grow, live and die.

One of the best features of 23andme is that they allow you to download your entire genetic profile from the Illumina chip in a raw, non-proprietary format. This is very big news for people that are capable programmers. It means that for the first time in history, there is an inexpensive service that can extract, decode and export your genetic information to a non-proprietary file format.


As an open source software developer, there are certain commits that you make to a public source code repository that leave you feeling better about the state of the world. This was certainly one of them for me:

msporny@tao:~/work/dna$ git add ManuSporny-genome.txt 
msporny@tao:~/work/dna$ git commit -a
[master a08b027] Added my genome into source control.
 1 files changed, 966992 insertions(+), 0 deletions(-)
 create mode 100644 ManuSporny-genome.txt

Doing that made me realize how quickly we’re narrowing in on some of the most debilitating human diseases. It gave me hope that our children may enjoy a far better quality of healthcare than we do today. Most of all, it gave me hope that we will be able to better help the nurses, doctors and medical researchers as a society – more than with just money, but with our time, expertise and energy. That commit sent chills up my spine – to me, it symbolized a brighter future for all of us.

So, now that all of us can get a hold of that data, what can we do with it?

Analyzing your Genetic Data

23andme does a great job giving you reports on research that they’re confident of, for example, I’m at a 13.4% increased risk for Age-related Macular Degeneration. The average is 7% – which means that I’m about 1.91 times more likely than the average person to start losing my eyesight as a result of old age. This makes sense as one of my grandparents has a bad case of age-related macular degeneration. There are around 160 of these types of reports that you get with your 23andme data, but what if you want to dive deeper into your genetic code?

Code is code, whether it is 1s and 0s or A, G, C, and Ts. Analyzing code and data is something that many Computer Scientists do quite often and quite well. Think of the amount of data that Facebook, Google and Twitter deal with on a daily basis. Think about how quickly you can search over a trillion documents on Google (less than a second in most cases).

Personally, I was expecting the same sort of instant searching and analysis functionality on 23andme. It’s just not there. Don’t get me wrong, 23andme is a great service and if this kind of stuff interests you, you should definitely get a kit right now. The kits go on sale twice a year. I got my spit analyzed for $150 total – it’s a deal, any way that you look at it. That and you get instant access to your raw data – that’s the best part.

However, searching through your raw data on 23andme sucks. Remember, there are only about 160 reports on the 23andme site, but there are over 14,515 SNPs that are known. If you want to find out more than just the 160 reports that 23andme has, there is this great website out there called SNPedia is basically the Wikipedia of genetic information.

Keep in mind there are usually many SNPs that come into play for traits like eye color, hair color or certain types of cancer, or where your ancestors came from. 23andme does the heavy lifting for most of their reports, but there are many SNPs that they don’t show you in their reports. So, if you want to find out about anything that is not on the 23andme site, you have to manually search for the SNPs you’re looking for on SNPedia. To make this even more difficult, SNPs have fairly opaque names like rs1815739.

If you are looking for more than 1 SNP, it can take a long time. You have to first look up the original SNP that interests you on SNPedia. Once you have the original marker on the screen, it might link to upwards of 10 additional SNPs that affect the trait you’re researching. You have to manually type in each SNP one-by-one into the 23andme site, click “Search”, write down the sequence for that SNP, such as “GG” or “AA” and repeat this process for as many SNPs as you’re looking for.

What can Web Programmers do for Genetics?

Manually searching for these markers is unnecessarily time consuming. Doing stuff like this is why we have computers – they’re good at computing! Your genetic data fits in 25 Megabytes of memory – a tiny, tiny fraction of the tiniest USB thumb-drive. This genetic data is the equivalent of 5 MP3 songs, a small website, or 5-7 high resolution digital photos. You can type a Google search for “eye color” and get back a result in less than a second after searching the entire Internet. Why can’t you do that for your genetic data?

I think programmers, especially Web programmers, can do better. That’s the driving reason that I’m releasing this data into the public domain. I’d like to see an open source website that can search SNPedia in the blink of an eye – just like Google Instant does. If I type in “blood type” it should tell me all of the things it can find out about my blood type. If I type in “eyes”, it should be able to tell me everything that it knows about me concerning macular degeneration, eye color, etc. There is a lot of data out there on SNPedia, we just need a nice, personalized interface to work with it.

That’s just one idea, though. There are thousands of other ideas hidden away out there on the Web. One of them may be hiding in that beautiful brain of yours. I hope that you will share this story with other people that may be interested in helping us to reduce suffering in the world. I hold great hope for this new technology – we are primed for some amazing health-related advances in our lifetime. If you know how to program, design or write – you can help. You can start by blogging or tweeting about this post, or you can:

Download Manu Sporny’s genetic data.