DNA, Privacy, and the Golden State Killer

Awhile ago Tom asked me a question about DNA and privacy. I don’t think I had much of an answer then but today I have a new perspective.

Yesterday news broke out that police had caught the Golden State Killer using publicly available DNA databases that are intended to be used for genealogy or DNA research. And they did it without a search warrant.  They didn’t need one for the way they used the database(s).  So far the only DNA database that has been publicly identified is GedMatch.com which generally has the least personally identifiable information visible to users.

When your DNA is on Ancestry or MyHeritage it is normally attached to a family tree. After all, that is the point of using DNA on one of those sites.  On Ancestry, I look at my DNA Matches and near the top of the list is my cousin John. I see his full name, so I know it is John.  He hasn’t put in much of a tree yet, but if he put our grandmother in his tree, Ancestry would also tell me that not only is John a DNA match but we both have Mary Elizabeth Parr in our family tree so we are 1st cousins.

A few years ago, the only way to get your DNA into most of the DNA database was to spit in a tube or swipe a wand around in your mouth and put it in a tube and mail the tube off to the company you paid and your DNA results would go into their database. But one database was different, GedMatch.com.  They did not sell any tests.  They let anyone upload DNA results to their database for free.  This allowed genealogists and DNA researchers to reach across all the testing companies to do comparisons in one place.  And it is a place for researchers to test their theories about genetics and tools they develop.  Now many of the testing sites let you upload DNA results you already have instead of spitting in their tube.

GedMatch lets you do fun and serious comparisons between people.  If you upload both of your parent’s DNA then you can run a test to see if they are related. Or you can see a chromosome map between you and a potential relative that will show you all the places on each chromosome pair that you match.  But on GedMatch.com all the DNA kits on your account have code names that you created when you uploaded the file.  You can connect kits to people on a tree, but most people keep their trees pretty sparse here with very little identifiable data.  Bernd and I manage a lot of DNA kits for family members and other than their code name, the only identifiable piece of information is my email address.

Let’s get back to the Golden State Killer.  If the police did not get a search warrant, then they must have just uploaded the DNA results from the crime scene and gave it a code name.  They then could compare that DNA to everyone else in the database.  Once they found a reasonable match, the only thing they could do without a warrant is email the person who manages that DNA kit and ask for help.  If they found enough reasonable matches, they could have collected up several email addresses and found more people to help them figure out who was the serial killer.  Not so very different than talking to everyone in a neighborhood of interest.

What is the concern about privacy?  It boils down to the fact that your DNA is not as unique to you as you would think —  is shared with many others in your extended family. How many in your family have the same hair color, shape of the eyes or nose?  Does your daughter look like your aunt?  Do you have a cousin who looks more like you than your brother?

You do not need to submit your own DNA in a database to have most of your DNA in one of these databases because  every bit of your DNA is shared with someone else in your extended family.  And through your family, someone could find you.