Showing posts with label haplogroup. Show all posts
Showing posts with label haplogroup. Show all posts

Friday, December 5, 2014

DNA Convergence and Chicken Little

   For me, the topic of convergence in yDNA first came up early in 2014.  I had just posted a paper and one of the comments was – “What about convergence?”  I said to myself, “What convergence?”  I admit I had to look up the topic.

Convergence: A term used in genetic genealogy to describe the process whereby two different haplotypes mutate over time to become identical or near identical resulting in an accidental or coincidental match. - Turner A & Smolenyak M 2004.

My response back to the comment was - “All of the haplotypes in my paper are unique.”  My data did not exhibit convergence. 
Convergence casts a shadow on genetic genealogy
   I started to poke around on the topic of convergence within yDNA STR haplotypes and the immediate impression that I got was that folks were ready to give up on STRs in favor of SNPs and the sky was falling.  Chicken Little was running around in the genetic genealogy circles.  Here is a small sample:

Y-STRs are effectively dead” - Dienekes Pontikos, 2011

Convergence of Y chromosome STR haplotypes from different SNP haplogroups compromises accuracy of haplogroup prediction” – Wang, et al, 2013

   Okay, convergence happens, but it’s an illusion.

   Let’s take a big step backwards in this story.  Did you know that most scientific papers relating to genetic genealogy use 17 STR markers or less?  Some use as few as 9 or 10.  For any of you who ever took one of the original 12 STR marker tests, you know that the results were essentially useless for anything except deep haplogroup association and history.

   Many researchers in the last couple of years are using the AmpFLSTR® Yfiler® to get their 17 marker results.  This equipment is approved for forensic cases.  Research papers are not forensic cases and researchers don’t need to limit themselves to 17 markers.  Thirty-seven marker yDNA tests have been available since 2004.

   Why does the number of STR markers matter?  I’m going to release my inner math geek to help explain.  If we look at marker DYS19, usually listed first in science papers and third in Family Tree DNA results, it can have a value within the range of 7 to 22 across all haplogroups.  Looking at R1b specifically, DYS19 ranges from 10 to 17 and statistically at two standard deviations (2 sigma) the range of values narrows to 13, 14 and 15.  From a probability point of view, there is a 1 in 3 chance that DYS19 will be 13, 14 or 15.  Making the odds even better in our favor, 95% of the time DYS19 for R1b will already be 13, 14 or 15.  This means there is a 1 in 2 chance that DYS19 could change to another value on its way to converging with another haplotype.

   Taking standard deviation into account to determine the possible number of values for the STR markers and then multiplying each probability gives the odds that a haplotype could converge.
STR
DYS393
DYS390
DYS19
DYS391
DYS385a
DYS385b
DYS426
DYS388
DYS439
DYS389i
DYS392
DYS389ii

Total
# of possible
marker values
2
4
2
2
2
4
1
1
2
2
2
2

4096

   There is a 1 in 4096 chance that two R1b 12 marker haplotypes could converge.  This is not the probability that one marker will change.  This is the probability that all 12 markers will change enough to match another haplotype.  These are very good odds and the reason why a 12-marker test is practically useless. 

   With a high probability that 12 STR markers will converge, haplotypes start to blend together.  Two different haplogroups or family lines will appear to be the same.  Converging also means that when we calculate the time to the most recent common ancestor (TMRCA), it will look like less time has passed.  Convergence makes a 12-marker test result unusable for genealogical matching, haplogroup prediction and TMRCA calculations.  The Chicken Littles are correct, we have a problem with 12 marker STR results.

   What about 17 markers, a quasi-industry standard for science papers?  Taking the same approach with statistics and probability, a 17-marker yDNA R1b result has a 1 in 2 million chance of converging with another haplotype.  Each haplogroup has slightly different odds.  There is a 1 in 500,000 chance of an R1a 17 marker haplotype converging.  Those odds are better than any lottery.  Convergence is still a problem at 17 markers.

   When Dienekes Pontikos proclaimed the death of yDNA STRs, he was commenting on the attempt to get good TMRCA dates from 10-marker results.  I agree, you can’t get valid TMRCA dates from 10-markers.  When Wang, et al, determined that convergence compromises haplogroup prediction, they were correct, 17 marker haplotypes can converge to make one haplogroup look like another.

   In a quick analysis of 4,300 unique 37-marker R1b haplotypes, the average genetic distance is 17 steps for 37 markers.  That means there are 17 mutations required for convergence in a 37-marker haplotype.  Nearly half of the markers in the haplotypes would need to change.  When we look at the probability of 25-marker haplotype convergence, the chances are 1 in 84 million.  Considering there are about 3.6 billion men on the planet, one in 84 million is still in the realm of possibility.  By the time we get to 37-markers, the odds are 1 in 49 trillion.

   There is a 1 in 49 trillion chance that all the necessary mutations will occur in order for two 37-marker haplotypes to converge.  The odds are likely much higher.  I’ve only looked at the probable values for each marker and I haven’t taken into account the STR mutation rates, the possibility that a marker will change over time. 

   There is essentially no such thing as convergence when 37 or more markers are tested and researched.  If you eliminate the possibility of convergence by using 37 STR markers, then immediately TMRCA calculation become more accurate and haplotypes from different haplogroups no longer resemble each other.  The reports of the death of yDNA STR results have been greatly exaggerated.


   I can’t tell you why researchers are currently stuck on 17 markers.  I can tell you that any research using less than 37 markers runs the risk of convergence in their data, which in turn could lead to the wrong conclusions.  I still consider genetic genealogy to be in its infancy.  Every month new research papers are published and the new concepts introduced are latched onto immediately.  It is understandable that papers from over a decade ago used a dozen STRs and a handful of SNPs, that was the height of technology.  If the latest technology and best data are not being used in today’s research papers, is that equivalent to scientific negligence?  Or, am I missing something and this is a case of scientific ignorance on my part?

Thursday, March 1, 2012

My Cousin Otzi: A Story Written in DNA



   There has been a lot in the news lately about Cousin Otzi.   They talk about the fact that he had brown eyes, was lactose intolerant, was suffering from Lyme disease and that he was murdered.  What they don’t talk about was that he liked long walks along the glacier, a nice goat steak every once in a while and that he would give the pelt off his back for a friend.

   As soon as the world learned that they were going to test Otzi’s DNA the conjecture began.  Most folk assumed that Otzi would be part of haplogroup I (one of the earliest groups in Europe) or R1b (the largest genetic group in Western Europe).

   Europe is dominated by haplogroups I1, I2, R1a and R1b.  The rest of the landscape has a scattering of E, G, J and N.

   Otzi’s Y-DNA haplogroup was leaked late last year and confirmed two days ago as G2a2b (formerly G2a4).  My haplogroup is G2a3b.  This means that Otzi and I share a common G2a ancestor.

   G2a2b, G2a3b and G2a are subgroups of G.  Every time a new mutation within a haplogroup is identified a subgroup gets created or expanded.  Here is an example of a long R1b subgroup - R1b1a2a1a1b.

   While Otzi’s haplotype hasn’t been published yet, I did review a number of G2a2b records with the same L91+ mutation.  I ran an MRCA (most recent common ancestor) between my data and this group of Otzi-like folk and a conservative estimate makes our connection about 7,200 years ago.  I can picture our ancestor, and at least two of his sons, sitting around a fire somewhere along the Danube River.

   I look forward to getting to know Cousin Otzi better.

Monday, December 5, 2011

Migration Mapping: Eldred the Terrible

   Genetic genealogy has been very good at identifying distant origins and for making connections along paternal and maternal lines going back a half dozen centuries.  What seems to be missing is how we got from point A to point B.

'Eldridge' clan mapping

   At some distant place in time in every genealogy the surname becomes irrelevant.  The only way to go further back is to use DNA testing.  We have to rely on Clans and Tribes, genetically related groups of individuals, to get an understanding of our history.

   Pride in your historic nationality is wonderful and can tell you much about your family, but we are all descendants of nomads.  As nomads we belong to ancient cultures just as much as we belong to any one nationality.  To know what culture you are you need to know where your tribe was and when.

   When I had my DNA tested I learned that I was part of haplogroup G with origins in the Caucasus Mountains going back about 22,000 years.  I also learned that I had no close matches in the last few centuries.  That left me with very little to work with. So, I put on my analyst hat and developed a technique for plotting the migration path of my tribe at different periods in history.  I needed to answer how my people got from the Caucasus to a little village outside of Naples, Italy.

   I knew I had hit on something after my first mapping exercise.

'Maglio' clan mapping

   The individuals that I plotted lined up along the Rhine River and down the Apennines (with a few stragglers in Wales).  Successive maps, each going back further in time, showed a pattern along the Danube and around the Black Sea back to the Caucasus Mountains.  I now have my migration answers and a plausible correlation to the Etruscan metalworking culture.

   I have been using my technique to help my clients get a deeper understanding of their history and their culture.  For all of you with the surname Eldridge, Eldredge, Aldrich and variation, I have posted a sample report on my website - "The Genetic Genealogy of Eldridge"  

   I'd love to hear about other successes mapping genetic data across time.