Listen

Easy DNA Identifications With Genealogy Databases Raise Privacy Concerns

Oct 11, 2018
Originally published on October 11, 2018 6:53 pm

Police in California made headlines this spring when they charged a former police officer with being the Golden State Killer, a man who allegedly committed a series of notorious rapes and murders in the 1970s and '80s.

Authorities revealed they used DNA from a publicly available genealogy website to crack the case.

Since then, police around the country have started doing the same sort of thing to solve other cold cases.

That prompted Yaniv Erlich, the chief science officer at the Israeli company MyHeritage, to investigate just how easy it is to use public genealogy databases to track down people.

"We wanted to quantify how powerful this technique is to identify individuals," Erlich says. So he and his colleagues analyzed the genomes of 1.28 million people in the company's database.

In a paper published Thursday in the journal Science, the researchers projected that they could identify third cousins and more closely related relatives in more than 60 percent of people of European descent. (They chose this group because most people in their database have that ancestry.)

"It's kind of like each person in this database is a beacon that illuminates hundreds of distant relatives," Erlich says. "So it's enough to have your third cousin or your second cousin once-removed in these databases to actually identify you."

And when the researchers combined their strategy with other information, such a specific geographic area or the approximate age of a person, they could quickly reduce a list of possibilities to just a few people.

"Of course, you need the genealogical records. You need to do the work. But you have enough power to to get very close," Erlich says.

And that's not all. Erlich estimates that as his and other databases grow, investigators will essentially be able to identify anyone in the United States within that ethnic background within a few years.

"It seems that very quickly we can get virtually to nearly everyone," Erlich says.

In another part of the study, the researchers went even further to see if they could do the same thing with other DNA databases. They were able to use their techniques to identify a supposedly anonymous woman whose DNA was stored in the 1,000 Genomes Project, a National Institutes of Health research database.

"This technique doesn't only get you criminals," Erlich says. "You can also use this technique for other purposes — maybe purposes that could be illegitimate."

And that, he says, raises serious questions about privacy.

"The police currently [are] using these techniques to find ... [murderers] and bad people," Erlich says. "But are we OK with using this technique to identify people in a political demonstration who left their DNA behind? There are many scenarios that you can think about misuse."

But some people involved in genealogical forensics defend the use of the techniques to help solves serious crimes.

"I was excited to see this demonstration that genetic genealogy is so powerful," says Ellen Greytak, director of bioinformatics at Parabon Nanolabs, Inc., which helps police solve crimes this way.

"We're working on these cases that haven't been able to be solved for decades. They are all either homicide or sexual assault. And some of these are horrific," she says.

But Greytak and her colleagues caution that this study suggests the process is easier than it seems.

"There are a number of problematic assumptions made in the study that do not reflect the reality of the work I am doing," writes CeCe Moore, who works with Parabon, in an e-mail. "The study demonstrates the power of genetic genealogy in a theoretical way, but does not fully capture the challenges of the work in practice."

But others argue that the findings underscore the need to make sure people know what they're getting into when they provide their genetic information to genealogy services and other databases.

"When you make those decisions to put the genome out in the world it's really hard to dial it back," Erin Murphy, a professor at the New York University School of Law.

"And more importantly," she says, "you've made a decision not just for yourself but for your siblings, for your distant cousins, people you don't even know you're related to, for your children, for your children's children."

A second paper published Thursday in the journal Cell found that it could be possible to link ancestry databases to older law enforcement DNA databases, giving police yet another potential tool.

"We were trying to pose the question of whether a newer, more modern system of genetic markers could be tested against the old system and still get matches and find relatives," says Noah Rosenberg, a biology professor at Stanford University.

Taking these studies together, some bioethicists and legal experts say they show that it's important to take steps to protect genetic information and make sure people providing DNA samples are aware of the risks.

"We can tell people that we can de-identify their data," says Benjamin Berkman, a bioethicist at the National Institutes of Health, who was speaking for himself, not NIH. "We can tell them about all the procedural and technical safeguards that we've put in place to protect the confidentiality of their data. But I don't think we can promise people anonymity."

As a result, Berkman says, "it's incumbent on anyone collecting and aggregating and sharing genomic data to be clear exactly how the data will be treated and whether there are any risks to genomic privacy."

For his part, Erlich proposes that all genetic information be encrypted to protect the information and enable people to explicitly provide consent for using their data.

"It sounds geeky and complicated, but it's very simple in practice," Erlich says.

Copyright 2018 NPR. To see more, visit http://www.npr.org/.

MARY LOUISE KELLY, HOST:

Millions of Americans post our ancestry information online. New research is showing just how easy it is for law enforcement to use this data to zero in on relatives who may have committed a crime. NPR health correspondent Rob Stein has the details.

ROB STEIN, BYLINE: Police made headlines last spring when they finally nabbed a suspect for a series of brutal rapes and murders in California from the 1970s and '80s.

(SOUNDBITE OF MEDIA MONTAGE)

UNIDENTIFIED PERSON #1: Has the Golden State Killer finally been captured?

UNIDENTIFIED PERSON #2: California investigators used gedmatch.com to name Joseph James DeAngelo as a suspect in a...

UNIDENTIFIED PERSON #3: Used genealogy websites to try to identify the notorious Golden State Killer.

STEIN: And that was just the beginning. Police around the country have started doing this same sort of thing to solve other cold cases. So Yaniv Erlich at the company MyHeritage wondered, just how easy is it to use databases like his to find people?

YANIV ERLICH: We wanted to quantify how powerful is this technique to identify individuals.

STEIN: So Erlich and his colleagues analyzed the DNA from more than 1.2 million people in this company's database and discovered something startling. For more than 60 percent of people of European descent, they could identify a relative as distant as a third cousin. Most of the people in this database are white.

ERLICH: Each person in this database is a beacon that illuminates hundreds of distant relatives. So it's enough to have your third cousin or your second cousin once removed in these databases to actually identify you the same way that a GPS system uses multiple satellites to find a location.

STEIN: And when the researchers combined their data with other information like where a person probably lives and how old they are, they could quickly zoom in on a suspect.

ERLICH: Of course you need the genealogical records. You need to do the work. But you have enough power to get very close.

STEIN: And Erlich used the same technique to identify a supposedly anonymous woman whose DNA was stored on a National Institutes of Health Research database, raising questions about how anonymous these supposedly anonymous databases really are.

ERLICH: This technique doesn't only - can get you criminals or you can catch criminals with this technique. But you can also use this technique for other purposes, maybe purposes that could be illegitimate.

STEIN: So Erlich says the findings raise questions about how genetic information could be misused.

ERLICH: The police currently is using these techniques to find these really like, you know, murderers and bad people. But are we OK with using this technique to identify people that - I don't know - in a political demonstration that left their DNA behind? Are we OK with it if foreign governments are going to exploit this technique to identify U.S. citizens for their own purposes? So there are many scenarios that you can think about misuse.

STEIN: Now, many people defend the use of these techniques to help solve serious crimes.

ELLEN GREYTAK: I was excited to see this demonstration that genetic genealogy is so powerful.

STEIN: Ellen Greytak works at Parabon Nanolabs, which helps police solve crimes this way.

GREYTAK: We're working on these cases that, you know, haven't been able to be solved for decades. They are all either homicide or sexual assault. And some of these are just - I mean, they're horrific, you know, murders of children, things like that. It's just things that need to be solved.

STEIN: And Greytak says it's not nearly as easy as this new research may make it sound. But others argue that the findings underscore the need to make sure people know what they're getting into when they agree to give up their genetic information. Erin Murphy is a New York University law professor.

ERIN MURPHY: If it comes out tomorrow that they can use genetic information for something that feels a little unsavory, it's going to be virtually impossible to claw back the information that you've put out into the world. And more importantly, you've made a decision not just for yourself but for your siblings, for your distant cousins, for people you don't even know you're related to, for your children, for your children's children.

STEIN: So Murphy and others think better ways are needed to protect people's DNA to make sure it isn't misused. Rob Stein, NPR News.

(SOUNDBITE OF ASH BLACK BUFFALO'S "BUHO") Transcript provided by NPR, Copyright NPR.