Open Sourcing My Genetic Data

Today, I published all of my known genetic data as open source and released all my rights to the data. I’m proud to be the first person in the world to commit my genetic data into a decentralized source control system under a public domain license. The initial reactions that I received when I told some of my friends that I was going to do this was a combination of shock and skepticism.

“Why would you do something like that?”
“Aren’t you afraid that somebody is going to use that against you?
“What if your healthcare provider got a hold of that? They’d love to look through it in order to deny you for some pre-existing condition!”
“Ugh, I’d never want to know that sort of stuff about myself!”
“What if somebody clones you!?”

I’ve thought long and hard about each of those questions and the many more that you ask yourself before publishing this sort of personal data. There are large privacy implications in doing this. However, speaking solely for myself, I think the benefits outweigh the drawbacks. I’ll explain my thought process behind each of those questions in a separate blog post.

However, the result of that thought process is that I’m releasing my genetic data today – that’s what I’d like to focus on in this blog post. So, let’s explore exactly what this data is and how I hope people that write software will use it.

Your Genetic Code

There is a website called that is in the business of analyzing your DNA. To become a member of the service, you pay a fee, they send you a test tube, you spit in the test tube and send it back to them. They then take your spit and place it onto something called a genotyping beadchip. In this particular case, my spit was placed onto the Illumina OmniExpress Plus Genotyping Beadchip. This particular chip is capable of detecting around one million genetic markers. These markers are called single-nucleotide polymorphisms or SNPs (pronounced ‘snip’) for short.

In combination, these SNPs can tell you quite a bit about your genetic makeup. Things such as your eye color, hair color, hair curl, whether you are at an increased risk for diabetes, where your ancestors came from, or even things like if you’re resistant to the HIV virus or if you have the type of muscles that would make you a good sprinter.

There are around 10 million SNPs in the human genome, the Illumina chip can currently analyze around 1 million of them (966,977 – to be exact). Of those roughly 1 million pieces of data, all of science only knows what around 14,515 of them do. Of the SNPs that we know about, we’re still shaky about all of the things that many of them affect – we’re not so sure about what the data is telling us. On the 23andme site, they only list around 160 SNPs and their effect on you. This means that of the raw data I’m publishing today, science still doesn’t know what 952,462 of these markers do. Talk about a treasure trove of information, just waiting to be unlocked! As science marches steadily onward, we’ll learn more about each one of those 952,462 markers and how they affect how we are born, grow, live and die.

One of the best features of 23andme is that they allow you to download your entire genetic profile from the Illumina chip in a raw, non-proprietary format. This is very big news for people that are capable programmers. It means that for the first time in history, there is an inexpensive service that can extract, decode and export your genetic information to a non-proprietary file format.


As an open source software developer, there are certain commits that you make to a public source code repository that leave you feeling better about the state of the world. This was certainly one of them for me:

msporny@tao:~/work/dna$ git add ManuSporny-genome.txt 
msporny@tao:~/work/dna$ git commit -a
[master a08b027] Added my genome into source control.
 1 files changed, 966992 insertions(+), 0 deletions(-)
 create mode 100644 ManuSporny-genome.txt

Doing that made me realize how quickly we’re narrowing in on some of the most debilitating human diseases. It gave me hope that our children may enjoy a far better quality of healthcare than we do today. Most of all, it gave me hope that we will be able to better help the nurses, doctors and medical researchers as a society – more than with just money, but with our time, expertise and energy. That commit sent chills up my spine – to me, it symbolized a brighter future for all of us.

So, now that all of us can get a hold of that data, what can we do with it?

Analyzing your Genetic Data

23andme does a great job giving you reports on research that they’re confident of, for example, I’m at a 13.4% increased risk for Age-related Macular Degeneration. The average is 7% – which means that I’m about 1.91 times more likely than the average person to start losing my eyesight as a result of old age. This makes sense as one of my grandparents has a bad case of age-related macular degeneration. There are around 160 of these types of reports that you get with your 23andme data, but what if you want to dive deeper into your genetic code?

Code is code, whether it is 1s and 0s or A, G, C, and Ts. Analyzing code and data is something that many Computer Scientists do quite often and quite well. Think of the amount of data that Facebook, Google and Twitter deal with on a daily basis. Think about how quickly you can search over a trillion documents on Google (less than a second in most cases).

Personally, I was expecting the same sort of instant searching and analysis functionality on 23andme. It’s just not there. Don’t get me wrong, 23andme is a great service and if this kind of stuff interests you, you should definitely get a kit right now. The kits go on sale twice a year. I got my spit analyzed for $150 total – it’s a deal, any way that you look at it. That and you get instant access to your raw data – that’s the best part.

However, searching through your raw data on 23andme sucks. Remember, there are only about 160 reports on the 23andme site, but there are over 14,515 SNPs that are known. If you want to find out more than just the 160 reports that 23andme has, there is this great website out there called SNPedia is basically the Wikipedia of genetic information.

Keep in mind there are usually many SNPs that come into play for traits like eye color, hair color or certain types of cancer, or where your ancestors came from. 23andme does the heavy lifting for most of their reports, but there are many SNPs that they don’t show you in their reports. So, if you want to find out about anything that is not on the 23andme site, you have to manually search for the SNPs you’re looking for on SNPedia. To make this even more difficult, SNPs have fairly opaque names like rs1815739.

If you are looking for more than 1 SNP, it can take a long time. You have to first look up the original SNP that interests you on SNPedia. Once you have the original marker on the screen, it might link to upwards of 10 additional SNPs that affect the trait you’re researching. You have to manually type in each SNP one-by-one into the 23andme site, click “Search”, write down the sequence for that SNP, such as “GG” or “AA” and repeat this process for as many SNPs as you’re looking for.

What can Web Programmers do for Genetics?

Manually searching for these markers is unnecessarily time consuming. Doing stuff like this is why we have computers – they’re good at computing! Your genetic data fits in 25 Megabytes of memory – a tiny, tiny fraction of the tiniest USB thumb-drive. This genetic data is the equivalent of 5 MP3 songs, a small website, or 5-7 high resolution digital photos. You can type a Google search for “eye color” and get back a result in less than a second after searching the entire Internet. Why can’t you do that for your genetic data?

I think programmers, especially Web programmers, can do better. That’s the driving reason that I’m releasing this data into the public domain. I’d like to see an open source website that can search SNPedia in the blink of an eye – just like Google Instant does. If I type in “blood type” it should tell me all of the things it can find out about my blood type. If I type in “eyes”, it should be able to tell me everything that it knows about me concerning macular degeneration, eye color, etc. There is a lot of data out there on SNPedia, we just need a nice, personalized interface to work with it.

That’s just one idea, though. There are thousands of other ideas hidden away out there on the Web. One of them may be hiding in that beautiful brain of yours. I hope that you will share this story with other people that may be interested in helping us to reduce suffering in the world. I hold great hope for this new technology – we are primed for some amazing health-related advances in our lifetime. If you know how to program, design or write – you can help. You can start by blogging or tweeting about this post, or you can:

Download Manu Sporny’s genetic data.


Got something to say? Feel free, I want to hear from you! Leave a Reply to J Prusik

  1. Dan Brickley says:

    Interesting! (I’ve only published blood type and a photo of a dental x-ray so far…).

    If you don’t mind me asking… – do you have children? or plan to? One tricky thing with this this kind of ‘personal’ data is that it releases information about your closest relatives too. Not clear how one gets informed consent when nobody really knows what the future folds…

    • ManuSporny says:

      Hi Dan,

      Aside: I do plan to mark up and publish this data using RDFa via my FOAF profile.

      I don’t have children yet, but plan to in the near future.

      Yes, there is this sticky “informed consent” problem. This is some of the discussion that I was hoping to trigger with publishing this data. Not to say that people haven’t postulated about it before, but the act of publishing one’s genetic information to the public does make you think more deeply about these questions out of necessity.

      I didn’t have anything in my genetic profile that was too damning – average all around. I don’t think that my family will mind having average statistical data about myself that may or may not apply to them published on the Internet. Granted, there is still lots of information that we don’t know about and given two or three family members, you can predict a parent or child’s makeup with a certain amount of statistical significance. This will only become more accurate over time.

      However, let’s assume that one of my blood-relative family members did have an issue with it, and I didn’t have a very good relationship with them and decided to publish this information anyway. Now let’s assume that they drag me into a US Court to sue me for that act. Due to the Genetic Information Non-Discrimination Act of 2008, they probably couldn’t get any damages based on potential loss of job or healthcare issues. Perhaps they could sue me for violation of privacy – but even in that case, it’s not their privacy that I violated – directly, anyway. The information that I published was my genetic information. I don’t know how a court case like that would play out, but (and I’m recklessly conjecturing at this point) I would expect that the outcome of that case to not be that serious. It seems like a violation of my personal freedoms to not allow me to publish data about myself – perhaps even a free speech issue.

      After all, I “publish” this genetic information every day by touching door handles, scratching my head, eating in public, combing my hair, clipping my nails, blowing my nose, using public restrooms, etc. One could argue that this is not a willful violation of a family member’s privacy, but the information is out there none-the-less. In other words, it’s easy to get at this information. With the vast reduction in cost for these genetic analysis services, I can see morally-gray people anonymously submitting samples for their boyfriends, girlfriends, parents, friends, and suspected cheating spouses. I do see that as a direct violation of privacy, but publishing my own data? That’s gray area.

      Also, keep in mind how all of our financial data and web usage data is tracked, bundled and sold today. It has been proven that it is fairly easy to figure out who you are on the Web, even if your information is blinded. I see that as an even greater violation of our privacy because it has a direct bearing on our near-term future behavior – what we’re doing, what we’re thinking, what we’re worried about, significant events in our lives, etc. So, is the sharing of public genetic information worse than that? I honestly don’t know – and I don’t know how many people will care enough to file a suit in a court of law. I think it will eventually happen, and there will be rules to enforce such behavior, but I would expect that the burden of proof as to the monetary damages caused by releasing that sort of information to be high. I can’t imagine sending a family member to jail just because they published their genetic information. I could see fines in the tens of thousands of dollars – but only in the most extreme cases.

      As for what the future holds, who knows – that’s what makes all of this so exciting! 🙂

      The question that I kept coming back to was “What is the worst that somebody could do with this data?” – and I couldn’t find something that would cause more harm than the information that is already out there about us.

    • ManuSporny says:

      Hi Matthew – Thanks for the link. 🙂 – I had seen that site, and will probably post my genetic data on there as well. The reason I chose github is because I wanted to start a source repository that will eventually contain the tools I’ll be developing with some of the folks that I know. One of these is the tool I outline in the last section of the post.

  2. Chris says:

    FYI, there’s a lot of 23andme data available under CC0 as part of the Personal Genome Project — if you sign up there, it will actually be used by researchers.

    The title of the post’s a bit misleading; I was hoping you were publishing a full genome, rather than just 23andme data. Since 23andme mainly looks at common variant alleles, anything they find can’t be particularly strong — rare variants just won’t be covered by their chip. To track rare variants we really need whole genome data, unfortunately!

    • ManuSporny says:

      Thanks, I had seen the Personal Genome Project – but am attempting to do something slightly different.

      I think those that are interested in advances in genetic analysis need to use self-interest as a driver. We need to get as many computer scientists signed up to 23andme and analyzing their DNA as possible. Hopefully this will lead to a virtuous cycle.

      Developers tend to address problems that they’re having – if they’re having trouble visualizing and analyzing their genetic data, they will build tools to do it. That’s one of the reasons that the open source movement has been so successful. Donating your genetic data to research is great, but that’s not the goal for me. To re-iterate: the goal is to get more Web developers and computer scientists involved by showing them that this is a problem that they can attack with their current tool-chains – source control, websites, data analysis tools.

      Sorry that the title of the post was misleading, I’ve since changed it and removed all mentions of the word “genome” where it made sense to do so. I’m new to this genetics stuff and didn’t think the nuance between “genome” and “genetic data” was large enough to matter to folks. I was wrong. I’ve since read the dictionary definition and understand the nuance now.

  3. Nikhil Gopal says:

    Fantastic! I can’t tell you how excited I am to see people publishing their data. I’ve been in bioinformatics for a few years now and there is a general initiative in the field to get people to publish this kind of data. By mining these data we can uncover all sorts of great discoveries!

    Here are three links you may be interested in:

    I am slowly developing and publishing data mining tools for 23andMe data. I am also posting a walkthrough of my analyses if folks would like to re-create them. It’ll be fun. Take a look if you are interested.

    Once again, great job!

  4. Jack Won says:

    Why not just publish the sequences anonymously?

    • ManuSporny says:

      Because I want people to know where the data came from in order to have a discussion about what we can do with the data now that it’s out there. As new capabilities are added to these chips, my data will also change over time, so it’ll be interesting to track my data today, next year, the year after and perhaps even in 20 years. Removing anonymity allows me to have interesting conversations with people about what we can do with this data now that it’s public.

  5. TeMPOraL says:

    The Open Source Community is working hard on improving your code. Some significant bugfixes and optimizations were already performed and are available for you to pull from forked repositories:

    Congratulations for becoming the most patched human in the whole universe!

    • ManuSporny says:

      Haha, that’s hilarious! 🙂

      I know we’re joking about this today, but there will come a day where this joke becomes possible. I’m hoping that it’s in our lifetimes.

      I’m probably still not going to accept the pull request from you when that day comes – people that feel comfortable with Lisp scare me – your brain works in a way that is not normal for a human being. 😛

      • TeMPOraL says:

        You really should try Lisp. It’s better than alcohol, drugs and direct vacuum exposure. It really opens your mind for a different view. ;).

        • ManuSporny says:

          The last time I used Lisp heavily, I couldn’t walk straight for a week. Also, I’m pretty sure part of my mind was violated – those years are hazy. To this day, I still have an active restraining order out against the language. Lisp is beautiful, in the same way that Vera Renczi was beautiful. She will eat you alive, my friend.

          • TeMPOraL says:

            You should try again to embrace Lisp. The Path is still open.
            For great holy armies shall be gathered and trained to fight all who embrace C. In the name of the Lisps, ships shall be built to carry the warriors out among the web and we will spread Lisp to all the unbelievers. The power of the Lambda will be felt far and wide and the wicked shall be vanquished.

          • ManuSporny says:

            See, this is exactly the sort of stuff that I’m talking about. 😛

      • ; msporny created dna and everyone else forked it. This is the family tree.

        Oh god, that is for once so very appropriate.

  6. Danny says:

    It sounds like your a web dev guy yourself – if you start the project of connecting snpedia and 23andme and want a hand, let me know, I would be happy to help you build the application.

    • ManuSporny says:

      Awesome, thanks Danny. I think we’re going to do a first cut at some point in the next year and put it out there and then let folks hack on it. We’re doing this in our spare time (while also doing a start-up), so it’ll take a while. I’ll ping you when we get something out there.

  7. Sam Snyder says:

    It’s really cool that you did this! I put my 23andMe data online as well, albeit not in a version control system:

  8. Paulo says:

    Your genome is NOT public domain, your SNPs are. And that’s mildly relevant, or even not relevant at all for anyone else. Only if you have a rare syndrome, but I can’t comment about that.

    • ManuSporny says:

      Yes, I’ve since removed all references to “genome” and replaced it with “genetic data”. Sorry that the title of the post was misleading, I’ve since changed it and removed all mentions of the word “genome” where it made sense to do so. I’m new to this genetics stuff and didn’t think the nuance between “genome” and “genetic data” was large enough to matter to folks. I was wrong. I’ve since read the dictionary definition and understand the nuance now.

      Regarding the comment on relevance of this data – you’re correct, it’s not very relevant to other people doing research. However, it is very relevant to software tool builders, especially web software tool builders. I hope to get more software developers involved by providing data and source code in a form that is familiar to software developers.

      Developers tend to address problems that they’re having – if they’re having trouble visualizing and analyzing their genetic data, they will build tools to do it. That’s one of the reasons that the open source movement has been so successful. Donating your genetic data to research is great, but that’s not the goal for me. To re-iterate: the goal is to get more Web developers and computer scientists involved by showing them that this is a problem that they can attack with their current tool-chains – source control, websites, data analysis tools.

  9. igniman says:

    First , this is not your genome, it’s a bunch of snips that 23andme deems important. Second, outside of 23andme there is little value for ur data, because 23andme has a large pool of data to make inferences. Third, many of the studies about snps are incomplete or inconclusive. Fourth , are you looking for someone to create that snp browser you are too lazy to build yourself?

    • ManuSporny says:

      Regarding your point #1): Yes, you’re correct. I’ve since removed all references to “genome” and replaced it with “genetic data”. Sorry that the title of the post was misleading, I’ve since changed it and removed all mentions of the word “genome” where it made sense to do so. I’m new to this genetics stuff and didn’t think the nuance between “genome” and “genetic data” was large enough to matter to folks. I was wrong. I’ve since read the dictionary definition and understand the nuance now.

      Regarding your point #2): You’re correct, outside of 23andme’s research team, the data isn’t very relevant to other people. However, it is very relevant to software tool builders, especially web software tool builders. I hope to get more software developers involved by providing data and source code in a form that is familiar to software developers. Developers tend to address problems that they’re having – if they’re having trouble visualizing and analyzing their genetic data, they will build tools to do it. That’s one of the reasons that the open source movement has been so successful. Donating your genetic data to research is great, but that’s not the goal for me. To re-iterate: the goal is to get more Web developers and computer scientists involved by showing them that this is a problem that they can attack with their current tool-chains – source control, websites, and data analysis and visualization tools.

      Regarding your point #3): Yes, many of them are at the moment. That doesn’t mean they will continue to be. The idea is to build a software framework that grows with new discoveries in an automatic way. Just because we don’t have all the data on all the SNPs doesn’t mean we can’t build tools that will one day operate on millions of SNPs vs. just the handful that we know about today.

      Regarding your point #4): Accusing people that you don’t know, in a public forum, of being lazy is a dick move. That said, a couple of friends and I are building that SNP browser and plan to release it as open source software at some point in the next year.

  10. TedC says:

    I wonder when O’Reilly will publish the “The Guide to Public Genome URIs” book? This really is Data’s break out year!

  11. Dan Brickley says:

    I like the text in Github,

    “msporny created dna and everyone else forked it. This is the family tree.” 🙂

    Thanks for the long reply, Manu. I’m not so worried about people suing you, as about the difficulty of evaluating the consequences of actions in such a new field. But I’m not particularly worried at all, really. Maybe you’ll be affecting your great-grandchildren’s insurance premiums? Worse things have happened.

    BTW a long time ago, I put a foaf:dnaChecksum property into FOAF as a kind of joke, a warning that open data-sharing technology could be combined with biometrics in unsettling ways. The joke is fast getting old, and maybe the property could actually be defined now.

    Is the 23andme data enough to indicate someone’s race / ethnic origin? is fascinating…

    • cariaso says:

      Manu, you want the killer web 2.0 ui and so do I. Here’s what I’ve got so far. It’s got a long way to go, but perhaps it’s more than you’ve seen.

      > Is the 23andme data enough to indicate someone’s race / ethnic origin?
      It’s indeed. Manu here is clearly not Caucasian as indicated by his rs1426654. A more dramatic example is the blue eyes and european ancestry of Orta
      who’s genome is also available via github

      • ManuSporny says:

        Hi Mike!

        I’m a very big fan of the work that you do! I think SNPedia is fantastic – very foward looking stuff. I also really like Promethease. Great work on all of that stuff so far! We want to help make this ecosystem even more awesome. Thanks for doing the Promethease run on my genome, we’ve been looking at Promethease for inspiration, as well as some other sites like Google Instant, etc.

        We’ve spent the past several years building a Web Services platform in pure C++ called Monarch. We’re thinking of applying it to this problem. Our CTO already has the 23andme data loading into that system and querying for particular SNP sequences. We think that we can get basic “Google Instant”-like searching on the SNPedia data in a few months time. The lag in time is because we only work on it in our spare time. Really, our company doesn’t do anything w/ genetics – it’s just a really interesting problem that we like to talk about when we’re not talking about the stuff our company works on.

        Thanks for introducing yourself here – we’ll definitely be in touch when we have something to share with the global community of genome hackers. I’ll announce the release through this blog, my twitter stream, and the code will be checked into github under an open source license.

  12. rhem1224 says:

    Absolutely brilliant! I’ve got a background in Computational Systems Biology and I am now working in Marketing (for one of the above mentioned companies). I have so many visions for the field of personal genomics and this is a great, bold step in the right direction. Data interpretations and data visualization is going to be one of the next vital steps in bringing these advancements into the public and I’m really excited to see what the github community can make of it. Overall it’s a very complex social – economic and scientific problem, but that’s also what makes it so exciting.
    I commend you! Thanks for taking this step.

    • ManuSporny says:

      Thanks Rebecca – glad to know that you’re in this field as well and really excited about what’s to come. 🙂

      We plan to release some software as open source that allows you to do real-time searching on your genome in the next couple of months to a year. We hope that this will be a step toward the “data interpretations” and “data visualization” of the future that you’re talking about.

  13. w00t says:

    FML: Fork My Life.

  14. Zack says:

    Very interesting, Manu.

    I am working on a project to analyze bigeographical ancestry for South Asians. I would appreciate it if you would let me use your data for the project. It would be great if you would send me an email with some of your info to make the analysis more useful. Thanks!

    • ManuSporny says:

      Hey Zack,

      Please do use my data for the project – after all, that’s why I placed it into the public domain. I’m from Sri Lanka originally. My mother is full Sri Lankan, and my father was American (German/Polish). What sort of info do you need from me? I would be happy to help.

      • Zack says:

        Great, that’s the info I needed to place your ancestry analysis results in context with your actual ancestry. Thanks!

        I should have some admixture results from your genetic data up on my site this week.

  15. forum flood says:

    That’s wonderfull !
    Congrats for this decision 😀

    • Lidia says:

      Thanks for the coverage!I am dnetfiiely a liberal’ when it comes to sequencing, and the venues in which it should be applied. I think that the diagnostic benefits that can be obtained will revolutionize medicine.I would also agree that current US medicine is not ready for routine clinical sequencing. However, I think that the rate of development of genomics and genetics provides the US healthcare industry an opportunity to evolve over the next decade. Open more Genetic Counseling programs in order to train more counselors. Improve on genetic training of doctors. And take the opportunity to educate the general public.I see these challenges as opportunities, not insurmountable obstacles. There in lies hope for success.Keep up the great blog.

  16. Juuso Alasuutari says:

    I’m waiting for the completion of my own genome’s analysis at You’re not helping at all in trying to keep my excitement within reasonable limits! 😉

    About 4-6 weeks to go…

    • ManuSporny says:

      I was excited before I got the results, but even more after I got them and understood the amount of information that they give you. Pretty great stuff – you will be very happy with the amount of new information you will have on yourself.

  17. Greg says:

    Found an exploit for your genetic code!

  18. ukash says:

    Hi, ManuSporny,
    I can’t find in your genetic data genotype with rsid rs333 (for example)
    I mean:
    I try to find out structure of your genome file and connect this with SNPedia, at first to understand that and to write something useful in next step.

    All good for you from Poland!

    • cariaso says:

      23andMe does not test everything. rs333 is one of the places they do not test.

    • ManuSporny says:

      I guess that rs333 is one of the markers that is not tested in the beadchip that 23andme uses. Some of my other SNP data came back as not detected when some of my friends had theirs correctly detected – for blood type, I think. There was an SNP marker that helped identify my ABO blood type that wasn’t detected on the gene chip. Not all SNPs are checked using 23andme’s current beadchip.

  19. il.anso says:

    “The question that I kept coming back to was “What is the worst that somebody could do with this data?” – and I couldn’t find something that would cause more harm than the information that is already out there about us.”
    Here are two that come to mind:
    1. What if you end up being found a direct descendant of Hitler or Stalin? Or Howard Hughes?
    2. What if some advanced malevolent aliens were to get a hold of our common blueprint (which in its electronic form, as opposed to your doorknob fingerprints would be accessible across parsecs of space)? Lèse majesté would pale in gravity by comparison.
    3. What if a super-ability can in the near future be inferred from your data that would make you subject to immediate police quarantine or perpetual paparazzi attention?

    • ManuSporny says:

      #1) I don’t think it would really affect me that much. Genetics play such a small part of who we are as people – just because your grand-parents were good/evil doesn’t necessarily mean that you are going to be the same. I don’t think it would bother me much to find out I was a direct descendant of anyone good or evil.

      #2) I think my privacy and clones of me would be the least of my worries if there were a group of extra-terrestrials that were cable of cloning and travelling through space in order to get to earth.

      #3) I would be thankful for the super-ability. As for quarantine – I would expect human rights issues to enter the picture. If it were really that harmful or dangerous to the world, I would expect that the quarantine would be a healthy precaution. Hard to tell until it happens, if it ever happens. 🙂

      The first question is fairly easy to answer, the second and third question assume a fundamental shift in the way that we think about our place in the universe and what humankind is capable of achieving. The latter two are risks I am willing to take, based on the very very low probability that they will become true in my lifetime.

  20. gioby says:

    Why don’t you put your repo on bitbucket, where you won’t have limit of disk space? If you are going to put all your data and the results of analysis there, it will almost certainly take one or two hundred of Megabytes, and you will be forced to pay for an advanced github user account.

    • ManuSporny says:

      I use github for all of my other projects – it just works. I’ve only used bitbucket a few times, and it was fine, but I really like github’s interfaces and tie-ins (IRC channel checkin announcements, for instance). If I did hit the 300MB limit for github, I’d happily pay for an account at $7/month. They do a great job and I think it’s just a matter of time before I sign up for their for-pay service.

  21. Interesting post, Manu. I work at Infochimps, which is actually a marketplace for data. You can actually make your 23andme data free and open to the public at our website at no cost at Although we are a marketplace, the majority of our data sets are available to the public at no cost as a service to the community. Many of our users are capable of creating algorithms that could improve our knowledge of what particular genetic patterns mean.

  22. Latj says:

    CEO of Illumina also has also given out 350GiB of his whole genome. They are raw fastqs, no alignment or ‘nothing:

    • ManuSporny says:

      Cool, didn’t know that was out there – although, I wouldn’t know how to read the data. As you’ve said – quite a bit of post-processing is required for the genome, right? If I could afford the $19,500 price tag to get my entire genome sequenced, I’d do it. For now, 23andme will have to suffice.

  23. Happyfeet says:

    Here is our digital Adam 🙂

  24. palisade says:

    I’ve written a piece of software that can analyze your DNA and help pick out the interesting tidbits. Worth noting, one of your pairs exists only in people of Asian descent. I left the notes in my office, so I can grab them tomorrow and let you know which one it was.

  25. zornarf says:

    This is a serious account of sloppy programming and bad leadership of an opensource project!
    I have submitted 26 patches in the last couple of days and none of them have been merged with the main tree.
    There are several bad termination issues and a lot of of excess data.
    Please run this project properly or we will have to Fork.


    A concerned developer

  26. palisade says:


  27. lysdexia says:

    I hear and obey:

    This’ll get your (or whoever cares to add their own) genome into couchdb for sorts. I’ll work on some of the rest of it this weekend.

    Have fun!

    • ManuSporny says:

      Very cool – what sort of project are you thinking of doing once you have the CouchDB instance up and running? I hadn’t even thought about dumping the data into CouchDB – we’ve been using SQLite ourselves. You had said that this takes up 6GB? Why is that? Indexes in CouchDB?

      In any case – really cool that you’re working on this and would like to hear more as you progress. 🙂

  28. I found your weblog website on google and check just a few of your early posts. Continue to keep up the very good operate. I simply further up your RSS feed to my MSN Information Reader. In search of ahead to reading extra from you in a while!…

  29. Andrew Evans says:

    Hi Manu –

    We recently released an open-source Firefox browser extension for 23andMe users (soon to include others) that enhances the content of web pages, like blog posts and journal articles, with your actual raw 23andMe calls – so you don’t have to go look them up. It’s called SNPTips – I think you may find this useful. Check it out at – we were featured this week on OpenHelix as the Tip of the Week –


  30. J Prusik says:

    Wow, I never thought anyone would actually do it.

    Color me impressed.

  31. Pierre-Luc Germain says:

    Hi Manu,
    I know you mentioned that (biomedical) research was not your aim, but I believe it’s quite connected with the webtools you are advocating. Seeing the composition of the audience here I could not resist asking your thoughts on the topic…
    Considering the potential research outcomes of the 23andMe database (especially in pharmacogenetics), I find it quite odd that they are giving their users surveys about smelling asparagus or sneezing in the sun. Fear of concerns about privacy/informed consent?
    On the other hand, Google Health is offering the possibility to have your online health record under your control. Many months ago, when a user asked them whether they would allow importation of 23andMe data, the question received no reply (to please conspiracy theorists, yes, I suppose M. Google and his wife thought about the answer already). If you do put the two together and fish out correlations (let’s pretend it’s an easy task), you can get an amazing source of information (not so much in a straightforward way like 23andMe present it, but the effect on our understanding of drugs’ adverse effects, for instance, might be quite big).
    From what I understood of the Google health privacy policy, they won’t use information in this way without asking for explicit consent. Whatever they do in the future is still open, but if they do something, the statistical information will be in their hands. My point is: it’s great that you open-sourced your SNPs, but it would be even greater to open-source their correlates.
    If we had an open “Google Health”-like platform, but with the idea that you “publish” (albeit partly anonymously) this information rather than have the service keeping it for you (we’ve seen with Facebook that most people don’t have such a problem with privacy), and that anyone can access and query the data, we could help filling up the SNPedia. Without having to ask a private company for the info.
    Briefly said, putting 23andMe, Google Health, and PatientsLikeMe together, open, to get a web 2.0 version of the famous “Framingham study”.
    Of course, for the sake of brievety, I’m skipping all the tricky points.
    I wouldn’t hesitate to put my medical cv and SNPs there. I’d tend to think Manu wouldn’t either. My question is whether we’re crazy, or whether a critical mass would be just as crazy. (If I hadn’t learned to be suspicious of my enthusiasm, I’d already be coding).
    BTW nice initiative, and sorry for the length!

    • ManuSporny says:

      My question is whether we’re crazy, or whether a critical mass would be just as crazy.

      I don’t think it’s crazy – I think it’s inevitable. It may take us 30-40 years to get there, but the benefits of placing your genetic information into the public domain (blinded or not blinded) are not that terrible at the moment. Assuming that nothing changes that drastically with privacy, genetics, or bio-data-mining – the benefits will outweigh the drawbacks by a large degree.

      One of the things that I think the Web excels at is bringing people together with similar viewpoints and desires into a loosely-knit online community. Statistics is such that we don’t really need to get that much data to start finding out some pretty amazing things. I do think that scientific discovery and annotation of the human genome should happen in the public, not in private platforms. That is not to say that those that come up with novel ways of analyzing the information shouldn’t be rewarded financially, but rather that the data should be out there to be built upon. In other words, Google exists because they are able to search and find meaning in all of the Web pages that are out there. Google is a very rich company for the services that they provide. A Google for genetics could just as easily exist as long as the basic data is out there — and you really wouldn’t need a great deal of data to start extracting benefits for society at large.

      There is a fine line between crazy and early adopter. I think you are an early adopter and much of your thinking will be validated over the decades to come.

      • Pierre-Luc Germain says:

        Thanks for your answer!
        “A Google for genetics could just as easily exist” – Of course, I’m not doubting it! But as much as I like google, I’m just wondering whether I really want *them* to do it – rather than some open source community…

  32. मनु says:

    -> this sounds kindofweird in context :

    “In various Hindu traditions, Manu is a title accorded to the progenitor of mankind […]”

  33. Zvi says:

    Any way there is a data about us in the web, isn’t it?
    So it is better that I will publish my personal data then it is will be by others, isn’t it?

  34. Bastian says:

    I’m a customer of 23andMe myself and I already published my genotyping results. Being a biologist with a litte background in bioinformatics myself, some friends (also biologists) and I decided to start a project that is dedicated to the collection of genotyping raw data from DTC-customers, along with their phenotypic information. We recently released our project, dubbed openSNP, into the wild. It now can be found at

    We built the site to create a central repository for genotyping results as those from companies such as 23andme or deCODEme, annotated with phenotypic information the customers have provided us. But the site also parses PLoS, Mendeley and SNPedia for the newest literature results, so customers get to know more than the aforementioned companies tell them.

  35. Sever says:

    About macro and not micro (Genom…) the best is to pub. an open data, for example see open personal page (only for older than 18 years old!),-

  36. Marijn says:

    Thanks for putting this interesting data online, free for download! I made a data visualisation with this data in processing, on my website you can watch a movie of the visualisation.

  37. Jason Bobe says:

    Have you considered joining the Harvard Personal Genome Project? This is a cohort of individuals who are open sourcing their biologies (genomes + microbiomes + environments). Public data here:

    A subset of these folks meet-up each year at the GET Conference:

    Best wishes,
    Jason Bobe
    Executive Director

  38. Logan says:

    I recently did both the AncestryDNA and 23&me and one thing I noticed when comparing the raw data files from both is there are a lot of rsid’s missing from both that the other has, but the one’s they are are in the same order.

    I’m wondering if there is a “master” rsid list that would allow me to shuffle the rsid’s together in proper order. I.e. insert the missing ones from the one into the other and vice versa? So that I’d have a more complete sequence to share.

    • Logan says:

      “but the one’s they are are in the same order.”

      should read:

      “but the one’s they do have in common are in the same order.” :ob

Trackbacks for this post

  1. Man places his genome in public domain, on Github « Interesting Tech
  2. “My Genome is Public Domain” http://manu… « The United Persons
  3. Bruce Lawson’s personal site  : Reading List
  4. Fork me – Dados Genéticos no Github | Agulha no Palheiro
  5. 开源开发者开源其基因数据 | IT News
  6. eamcet*[SEO対策調査自動更新ブログ] | GitHubでDNA情報が公開される
  7. Samat K Jain (samatjain)'s status on Monday, 14-Feb-11 04:37:25 UTC -
  8. Links 14/2/2011: GNU/Linux Education in Valencia, London Stock Exchange Goes Live With GNU/Linux | Techrights
  9. Monday Links from the Subsidised Canteen Vol. LIX
  10. Перша людина з відкритим кодом « Блог одного кібера
  11. Genotyp Open Source? « : blog
  12. Revision 14: Hashbangs, PhantomJS und Github-Gene | Working Draft
  13. Working Draft Revision 14: Hashbangs, PhantomJS und Github-Gene • Peter Kröner, Webdesigner & Frontendentwickler
  14. Энтузиаст создал в GitHub открытый проект c расшифровкой своего ДНК | – Всероссийский портал о UNIX-системах
  15. Энтузиаст создал в GitHub открытый проект c расшифровкой своего ДНК
  16. Links for 2011-02-10 through 2011-02-15 » MC Development
  17. Впервые геном выложен под лицензией CC |
  18. Энтузиаст создал в GitHub открытый проект c данными о своей ДНК | – Всероссийский портал о UNIX-системах
  19. Энтузиаст создал в GitHub открытый проект c данными о своей ДНК |
  20. Man Open Sources His Genetic Data | JetLib News
  21. Open Source DNA
  22. Open Sourcing Genetic Data » Essential Liberty
  23. Project Update | Harappa Ancestry Project
  24. The Implications of Genetic Data in the Public Domain | The Beautiful, Tormented Machine
  25. 开源开发者开源其基因数据 | Article2 Web
  26. » El primer genoma argentino, y con licencia CC-By-SA 3.0
  27. What License Controls Your Genes? | Turn On The Dark
  28. Генетични данни в открит достъп @ Петър Иванов
  29. Bruce Lawson’s personal site  : Notes on Contents Strategy Forum 2012, Cape Town
  30. Noli Irritare Leones » My Old Kentucky Home
  31. Do we need a Human Data Project(HDP)? | Kumar Thangudu
  32. SimoleonSense | Part 1. Genomics, Bioinformatics, & Bio-Hacking
  33. GitHub Saved My Marriage | Software Engineer InformationSoftware Engineer Information

Leave a Reply to J Prusik

Let us know your thoughts on this post but remember to play nicely folks!