Wednesday, January 29, 2014

Getting More from Your Genetic Testing Results: Y-DNA Mapping

   Y-DNA testing has only been around for about 15 years and can tell us about our direct paternal ancestry.  It is sometimes criticized for only being able to tell us about a small portion of our genetic history.  I think the critics forget that our mothers have fathers and there are a number of ways to obtain male test subjects to expand Y results across the entire family tree.  In the last few years, there has been a brighter spotlight on autosomal testing and its ability to test a larger portion of our genes.  This may have diverted research attention.  Relatively speaking, y-DNA is still in its infancy and there is much more to be learned.  In addition to our deep ancestral origins dating back over 10,000 years, we should be able to identify our old world homeland and our nomadic ancestor’s migration routes.  It is meaningful to know not only where they lived, but also when they lived to give us clues to their part in history.

   My closest genetic cousins and I share a common ancestor over 1,100 years ago.  With that many years between us, I didn’t spend a lot of time looking for a common surname.  For the last four years, I’ve been working with y-DNA to determine what can be learned beyond cousin matches and distant origins.  I originally tested with the Genographic Project and then transferred my results to Family Tree DNA for an upgrade.  In the time since getting my upgrade to 67 markers, I have received only one match.  This is due to my haplotype being somewhat rare and not a reflection of Family Tree DNA’s ability to provide matches.  I learned that I was part of haplogroup G, from the Caucasus Mountains.   I didn’t learn anything about how the ancestors of my Italian family traveled from Western Asia to Italy.
   I needed to find more cousin matches than the client base that had tested with FTDNA.  Ancestry.com allowed me to enter my y-DNA markers into their DNA section.  I compared my haplotype against the database at Sorenson Molecular (SMGF.org).  I also tried Genebase.com and any other place I could find.  I didn’t locate any cousins.  I did get a good education on what is out there for DNA companies and services.  FTDNA allowed me to transfer my results to Ysearch.org, a free public DNA matching service that they provide.  Ysearch provided much more flexibility.  Where FTDNA would only show me matches with a genetic distance of seven or less, Ysearch allowed me to select the genetic distance.  Without the genetic distance restrictions, Ysearch showed me hundreds of very distantly related cousins.  I’d rather have distant than none.  FTDNA is not trying to be difficult.  They are trying to be realistic and keep matches within genealogical timeframes.  I just needed more.
  While not used consistently, one of the best features of Ysearch is its ability to present the most distant known paternal ancestor (MDKPA) and their origin.  So now, I had cousins and their ancestral origins.  The greater the genetic distance, the closer I got to the Caucasus Mountains.  I mapped my closest cousins and their pushpins created a line from the Alps to the North Sea, directly along the Rhine River.  This was a pattern.  I like patterns.  There is usually a scientific reason for a pattern.

Figure 1 – Mapping Ancestral Origins

   I’ve run a few hundred genetic mapping exercises on the majority of Y haplogroups.  The patterns are consistent.  Our ancestors traveled along rivers and coastlines.  They skirted mountain ranges and crossed bodies of water.  There is always a flow to the patterns, but the science and the why were still missing.  I’ve always been interested in early Eurasian history and have tried to associate it to the patterns in the maps that I generated.  Some maps have places and dates that connect well to notable events.  Other maps raise more questions than answers.  Genetic tests and history alone were not making a complete picture.  I needed to add population genetics, migration models, anthropology and evolutionary biology to my knowledge base.
   Take any  two people on the planet, compare their DNA and you can calculate, approximately, how far back in time their common ancestor lived – time to most recent common ancestor (TMRCA).  Our ancestors were nomadic and traveled about 25 to 30 km per generation or roughly 1 km/year on average.  If a common ancestor lived 300 years ago, then that person’s descendants may have migrated 300 km from the geographic origin of that ancestor.  In the figure below, d1 and d2 represent the origins of two known y-DNA genetic records.  The circles show the distance their ancestors may have migrated.  The intersections are the potential locations of their common ancestor.  In this case, there are two intersections, a1 and a2.  More y-DNA records are required to figure out which intersection is correct.

Figure 2 – Distance to Most Recent Common Ancestor (Bilateration)

   The genetic analysis that I have developed is similar to a navigation technique.  Radio navigation uses two or more beacons with known locations and a measurement of the time it takes to receive a signal from each.  The time is converted to a distance.  A current location can then be identified.  In my method, the “beacons” are the ancestral geographic origins. The “signal” is the TMRCA, measured in years, converted to a distance by multiplying the average migration rate – creating a distance to most recent common ancestor (DMRCA).  A location for the common ancestor can then be figured out by looking at the intersections.  Multiple y-DNA records are needed to determine migration direction and geographic origins of a related set of genetic cousins.  The science is starting to take shape.
  I am G-Z726, which is a subgroup of G-Z725.  At Ysearch or on an FTDNA project you may still see an older naming convention - G2a3b.  As an example, let’s look at 18 of my closest genetic cousins – haplogroup G-Z725, (DYS388=13).  TMRCA data is generated using Dean McGee’s Y-Utility.  The output is turned into a phylogenetic tree and the DMRCA is calculated. 

Figure 3 - Phylogenetic Tree - G-Z725 Sample

Each distance is used to draw a circle that represents the migration range of that ancestral line.  My range (MAG) and the range of my next closest cousin on the tree (BAB) create two intersections on the map.  One point is in Africa and the other is in Eastern Europe.  

Figure 4 - Intersection of MAG and BAB

By adding the range of cousin EBE, the correct intersection representing common ancestor 7 (CA7) is identified.  Pairs of records are mapped to continue the analysis.

Figure 5 - Intersection of BIR and RUF

BIR and RUF are another example and they identify common ancestor CA3.  Each record is added until all common ancestors are identified. 

Figure 6 - Common Ancestors Mapped

Connections are drawn based on the phylogenetic tree.

Figure 7 - Phylogenetic Tree Superimposed

Then for simplification, the migration flow is generalized.  Based on the phylogenetic analysis, common ancestor 7 (CA7) is the most distant common ancestor (MDCA) of samples in this example.  The western migration flow has a slight correlation to the Danube River and the North/South migrations have a strong correlation to the Rhine River.  Early attempts at direct mapping of genetic data (Fig. 1) gave similar results based on TMRCA alone.  There was no corroborating evidence of directionality or definitive proof of MDCA.  

Figure 8 - Generalized Migration Flow - G-Z725

   The new methodology that I have developed shows ancestral origins and migration direction.  I call this method biogeographical multilateration (BGM).  It is the geographical distribution of y-DNA data based on multiple distance measurements from reference positions. 
   This method also has the potential to identify the location of a DNA sample when the origin is unknown. For one of my clients, I identified that his paternal ancestor that had immigrated to the United States may have anglicized their name.  The previous name was German and that surname was found predominantly in central Germany.  Applying biogeographical multilateration to my client’s y-DNA results, while leaving the origin as an unknown, indicated a continental European origin within 200km of the city with the highest surname density.
   In the referenced paper, the example haplogroup, I-L22, has an approximate correlation to the Norse/Viking invasions of Britain and identifies a Frisian coast staging area.

Figure 9 - Generalized Migration Flow - I-L22

  We are still in the infancy of what we can learn from y-DNA.  My BGM analysis has room for refinement.  There is enough science out there to help us.  New research is being done at the aggregate population level.  We can apply elements of that research to the individual level.  Biogeographical multilateration has the potential to fill the gap in our knowledge between genealogical records and deep haplogroup roots.  This new tool can give us old world ancestral origins, migration flow and historical timeframes.
   I needed to learn more about my DNA and my origins.  The big testing companies haven’t filled the knowledge gaps that exist.  They are in the business of providing quality test results.  The analysis and tool creation is falling on the shoulders of “citizen scientists”.  Sometimes, you just have to do it yourself.

Reference:

Maglio, MR (2014)  Y-Chromosome Haplotype Origins via Biogeographical Multilateration (Link)

2 comments:

  1. Hi Mike,

    Great post.

    I would be interested in trying out this method with my own results. I'm have a very good idea where my paternal line was 500-800 years ago, but I would like to be able to see if I could go further back,

    I was wondering what sort of settings you used to come up with your list on Ysearch? Using 25 or more markers with a genetic distance of less than 6, I come up with about 150 matches. A lot of those don't have an origin location. Some of them also include a US location, so I assume that you would ignore those if we are talking about European migration. I also have some matches that only use a very general location, like "Scotland" or "England". Would you still use those and place the individual's marker in the middle of the country?


    -Mike Durkin

    ReplyDelete
    Replies
    1. Hi Mike,
      This study was done with 37 markers or more. I get better results. 25 markers could be used, but no less. I do start with the max distance of 6. If I don't get enough samples, I expand to maximum genetic distance of 1 per marker compared above 8.

      I do focus on the samples with old world origins. I have not attempted to see if the metrics apply to the Americas.

      I try to avoid general locations especially if the genetic distance (in km) is small.

      Let me know about your success.

      Thanks,
      Mike

      Delete