Genome, Big Data and Google

By on November 7th, 2014 in Blog Posts, Privacy & Security

Google is offering cloud storage and genomic specific services for genome data bases.  It is unclear (to this blogger) what levels of anonymity can be assured with such data.  Presumably a full sequencing (perhaps 100 GB of data) is unique to a given person (or set of identical twins since this does not, yet, include epigenetic data) providing a specific personal identifier — even if it lacks name or social security number. Researchers can share data sets with team members, colleagues or the public.  The National Cancer Institute has moved thousands of patient datasets to both Google and Amazon cloud storage.

So here are some difficult questions:

If the police have a DNA sample from a “perp”, and search the public genome records, and find a match, or parent, or … how does this relate to U.S. (or other jurisdiction) legal rights?  Can Google (or the researcher) be forced to identify the related individual?

Who “owns” your DNA dataset? The lab that analyses it,  the researcher, you?  And what can these various interests do with that data?  In the U.S. there are laws that prohibit discrimination for health insurance based on this data, but not long term care insurance, life insurance or employment decisions.

Presumably for a cost of $1000 or so I can have any DNA sample sequenced.  Off of a glass from a restaurant, or some other source that was “left behind”.  Now what rights, limits, etc. are implicit in this collection and the resulting dataset?  Did you leave a coffee cup at that last staff meeting?

The technology is running well ahead of our understanding of the implications here — it will be interesting.

Image: National Human Genome Research Institute [Public domain], via Wikimedia Commons