Origin Hunters - Genetic Genealogist
At Origin Hunters we are passionate about DNA and helping adoptees. The answers start with your DNA.
Saturday, December 31, 2022
Saturday, February 15, 2020
Relaunching Origin Hunters
For the past four years I haven't been very active writing or attending genealogical conferences. My focus has been quietly helping adoptees find their biological parents. I've also been able to solve some non-paternal events in my own family.
I am now relaunching my OriginHunters blog and website with a new focus on adoptees, non-paternal events and brick walls that can all be solved with DNA.
I know that finding a biological parent, grandparent or even great-grandparent using DNA can take as little as a few days of research or multiple years. Sometimes it’s just a case of waiting for the right person to test.
With this in mind, I have set rates accordingly – when we solve your case, you pay just $250. If we don’t solve your case, then there is no cost at all.
Wednesday, January 28, 2015
Ghosts of DNA Past: Irish Kings
In 2006, Laoise T.
Moore and the folks at Trinity College in Dublin published a paper famous for
identifying the modal haplotype of Irish High King Niall of the Nine Hostages. In their work, they used seventeen Y-DNA STR
markers. While time to most recent
common ancestor (TMRCA) calculations have accuracy issues, having only 17
markers gives a common ancestor over 2,000 years ago. What the Trinity folks really accomplished was
the identification of Niall’s paternal ancestor from over 400 years
earlier. The media in 2006 had a field
day in their interpretation that most of Ireland is descended from Niall. “Niall may be the most prolific male in Irish
history.” Also at 17 markers, there is a
very high probability of convergence.
Through normal mutations, haplotypes can change over time to appear
similar or identical to other haplotypes.
The lower the number of markers, the higher the chance of
convergence. At that time only high
level SNPs were tested to determine haplogroup.
Without terminal SNPs it would have been impossible to recognize
convergence, if it existed in the samples.
In my research on
the Kings of Ireland, I have used 67 markers to reduce the chance of convergence
and to calculate the age of common ancestors on the descendant side of the
target rather than the ancestor side. I
will demonstrate traditional median-joining networks and novel “tribal” markers
for the identification of four historic Kings of Ireland. Did Trinity get Niall’s haplotype correct
with the limited data they had at the time?
Ghost: a manifestation of a dead person
Modal haplotype: a derived haplotype based on the DNA tests of
a group of people
A modal haplotype
is a ghost of a person. When we look at
multiple DNA test results and calculate the mode, by definition we are just
taking the values that appear most often.
There is no way to determine if the modal haplotype is the actual
haplotype of the historic individual we are researching (short of historic
samples). While the modal is not
perfect, it will be close enough at 67 markers for us to determine the genetic
“ghost”.
The septs of
Ireland provide us an opportunity to develop genetic genealogy techniques and
processes. Irish surnames are typically
patronymic. The surnames generally take
the form of Mac Cárthaigh (McCarthy),
meaning son of Cárthaigh or Ui Néill
(O’Neill), meaning grandson / descendant of Néill. Irish septs serve as a collective of related
families with shared ancestry and patronymic surnames. Multiple septs then belong to larger
dynasties such as the Eóganachta and the Dál gCais.
If septs are
patrilineal, then Y-DNA haplotypes should be consistent across sept
surnames. Research on the Uí Néill haplotype
started with a geographical selection and then a subsequent reduction by sept
surnames (Moore et al 2006). For each
target sept, affiliated surnames were identified. In the case of Uí Néill, the following
surnames and associated Y-DNA STR records were accessed from Family Tree DNA
projects: O’Neill, Gallagher, Doherty and O’Donnell. The selection includes 600 records and 5
common European haplogroups.
Median-joining
networks have been in use for over a decade for the visualization of genetic
relationships. The use of them at 67 STR
markers has been rare, but it should be the norm. This first image has the central cluster of a
median joining network based on 25 STR markers from the Uí Néill group. It is just a single cluster with no
differentiation.
Figure 1
- Using only 25 STR markers, the Uí Néill network collapses to a single
cluster.
When we look at the same group using 67 markers, we get four
distinct clusters, each with their own SNP.
The cluster at the far right is predominantly R-L159 and the cluster at
the lower right has R-P311/R-L151 nodes.
The cluster at the left contains all of the Uí Néill dynastic surnames,
has the majority of nodes and is SNP R-M222, which is consistent with earlier
studies.
Figure 2
- View of the Uí Néill network torso showing four distinct clusters. Three groups on the right are O’Neill only.
As a double check to make sure that I wasn’t seeing some
other phenomena, I analyzed three random Irish surnames; Duffy, Kelly and
McCormick. The random sample produced
over ten unique clusters with no surname overlap. This comparison shows that septs are
patrilineal and that Y-DNA haplotypes are consistent across sept surnames.
Figure 3 - Median-joining network of yDNA sampled from three random Irish surnames; Duffy, Kelly and McCormick.
Re-evaluating the Uí Néill data also shows that Trinity was
correct in their identification of a 17-marker Uí Néill haplotype. New data and new techniques allow us to
produce a 67-marker haplotype.
Figure 4 - Sixty-seven STR Uí Néill Modal Haplotype
(Niall of the Nine Hostages).
A
different technique that I’d like to illustrate involves the fact that not all
STR markers are created equal. This method
takes advantage of “slow” mutating STR markers.
Each marker has its own mutation rate.
By selecting the 15 “slowest” markers with an average mutation rate of
0.00024, a virtual tribal haplotype is created that would be stable within the
last 2,000 years (90% probability of 80 generations). This is an order of magnitude lower than the
average rate of 0.0029 used as a constant in typical TMRCA calculations. The “tribal” markers isolated are DYS426, DYS388, DYS392, DYS455, DYS454,
DYS578, DYS590, DYS641, DYS472, DYS594, DYS436, DYS490, DYS450 and DYS640.
To manipulate the “tribal” haplotype of 15
microsatellites faster the resulting values are concatenated into a string –
ex. 12121411119168108101212811. The
“tribal” haplotypes are summarized per surname and plotted to illustrate
majority and affinity.
Figure 5 - Uí Néill dynastic haplotypes converted into 15
marker “tribal” haplotypes and summarized.
The Uí Néill dataset resolved into 37 unique
“tribal” haplotypes. Figure 5 shows that
haplotype 12121411119168108101212811 is the most dominant across the Uí Néill
surnames. As with the median-joining
network analysis, this “tribal” haplotype is consistent with SNP R-M222.
I repeated these two techniques for the Uí
Briúin sept using the following surnames and associated Y-DNA records: O’Brien,
Hogan, Kennedy and McMahon. The
selection includes 615 records. The Mac
Cárthaigh dataset has the following surnames: McCarthy, Callaghan, Donovan and
Sullivan. The selection includes 319
records. The Ua Conchobhair data has the
following surnames: O’Connor, McManus, Reilly and Rourke. The selection includes 352 records.
For
more details, see my paper at Academia.edu.
Figure 6 - Sixty-seven STR Uí Briúin Modal Haplotype
(Brian Boru).
Figure 7 - Sixty-seven STR Mac Cárthaigh Modal Haplotype
(McCarthy Eoganachta Kings).
Figure 8 - Sixty-seven STR Ua Conchobhair Modal Haplotype
(Last High King Roderick O'Connor).
Here are a couple
of interesting insights from my research.
Niall Noígíallach was High King of Ireland around 378 CE and founder of
the Uí Néill dynasty. Historically, his
half-brother Brión, was one of the founders on the Connachta dynasty and an
ancestor of the last High King of Ireland, Ruaidrí Ua Conchobair. If their genealogies are correct, the
evidence is in their descendant’s DNA.
The data shows that Uí Néill and Ua Conchobair share the same SNP,
R-M222. The Uí Néill and Ua Conchobair
modals are a 6-step match at 67 markers.
There is a 99% probability of a relationship not further than 1,260
years ago. The results make a strong
case for the validity of this historic genealogy.
Brian Boru, High
King of Ireland in 1002 CE, belonged to the Dál gCais dynasty and Tadhg Mac
Cárthaigh, the first King of Desmond, belonged to the Eóganachta dynasty. Ancient genealogies have the Eóganachta and
Dál gCais dynasties descended from Ailill Aulom, the son-in-law of legendary
king Conn of the Hundred Battles. The
Mac Cárthaighs and Uí Briúins do not share the same SNP (R-L226 vs. R-CTS4466),
but by descent they would share a common R-DF13 ancestor. The Mac Cárthaigh and Uí Briúin modals are an
11-step match at 67 markers. There is a
99% probability of a relationship not further than 1,920 years ago. This puts a Mac Cárthaigh-Uí Briúin common
ancestor as a contemporary of the legendary Conn.
New and improved genetic
genealogy techniques are invaluable for the identification of historic
individuals and the reconstruction of distant family trees at the macro level.
Reference:
Maglio, MR (2015) Identifying Y-Chromosome Dynastic
Haplotypes: The High Kings of Ireland Revisited (Link)
Monday, December 8, 2014
Atrocities and Assimilation: Crusader DNA in the Near East
This paper got its
start back in February of this year while I was researching R1b-DF100 for my
posting, The Third Brother. Among the data, primarily Western European
haplotypes, was a single Armenian record.
The R1b-L11>DF100 group that I was working with had as one of their
theories that L11 was a fairly recent, 3,000 to 4,000 years, arrival from the
Near East and that the Armenian record was part of that evidence. I looked at the Armenian record, ran a
phylogenetic test on it, the L11 group and some similar Near East records. The Armenian record fell squarely within a
Baltic cluster on the tree with a rough TMRCA of about 1,200 years. This Armenian was clearly more European than
Armenian, at least on the paternal line.
My comment back to the L11 group was that their Armenian was probably
the descendant of a Crusader based on the timing and directionality.
In September, I ran
across Pierre Zalloua’s paper - Y-ChromosomalDiversity in Lebanon Is Structured by Recent Historical Events (2008). He and the other authors had put together a
good correlation between Crusader DNA and haplogroup R1b in Lebanon. The paper also correlated haplogroup J and
the Muslim expansion. The paper received
quite a bit of feedback about haplogroup J and little or no mention about
haplogroup R1b. Considering the extent
of the Crusader’s presence in the Near East from 1096 to 1343, if they left DNA
behind it would have been spread farther than Lebanon.
The real question
is not – if they left DNA behind. There
is significant literature that details the atrocities; raping and pillaging was
standard operating procedure for the Crusaders.
There are also numerous accounts of assimilation. During the Crusader’s 247-year occupation and
roughly eight generations, they married local women and raised families. The real question is did Crusader DNA survive
to modern time.
Crusader DNA Distribution |
If Crusader DNA survived,
it would be spread from Istanbul to Jerusalem and beyond. The graphic above shows the potential for DNA
distribution during the Crusader occupation (red) and the distribution over the
past 918 years (gray). My research
focused on the following Near East countries - Armenia, Georgia, Iran, Iraq,
Israel, Jordan, Lebanon, Palestine, Saudi Arabia, Syria and Turkey.
Here is something I
found bizarre. Zalloua and team
published their paper in 2008. Every
researcher looking at Near East R1b should be taking a lesson and validating
that their data is not of Crusader origin.
Obviously, Crusader DNA wasn’t restricted to Lebanon. In 2010, Balaresque, et al and again in 2011,
Myres, et al, published papers using Near East R1b data (Turkish). Forty-two percent of the Turkish R1b haplotypes
from Balaresque and Myres was identical to Zalloua’s Lebanese R1b data. This didn’t seem to raise any flags as
Balaresque and Myres used the Turkish data to suggest a Near East origin and
Neolithic expansion for R1b. These folks
must not talk to each other. Two of
Zalloua’s team members went on to work with Balaresque and Myres on their
papers. The first thing I would have
said was – “Considering what Zalloua found, we need to validate the origins of
the Turkish data further back than one or two generations”.
When presenting an
analysis it is always good to show comparison data. I collected R1b data and haplogroup G and J
data from multiple Family Tree DNA projects.
I have a higher comfort factor that G and J are associated with the
Neolithic expansion, so they were used as a basis for comparison. For each 37-marker Near East record obtained,
I used the haplotype to query a larger set of related records from ySearch (I
call this haplotype aggregation). A Near
East set and a Western European set of data was developed for each haplogroup. I then compared each individual Near East
haplotype against the entire Near East set and the entire Western Europe
set. You would expect that the Near East
haplotypes would be more closely related to their peers in the Near East set.
The haplogroup J
data tells the best story. The results
cluster down J1-M267 and J2-M172 lines.
The neutral line (diagonal triangles) represents zero affinity towards
the Near East or Western Europe. Points
falling to the right of neutral show an affinity toward the Near East and to
the left of neutral, an affinity towards Western Europe.
J1 haplotypes (diamonds), which are rare in Europe, are
closely related to their peers in the Near East. The J1 data only shows an affinity toward the
Near East. The trend line for J1
indicates a fairly stationary population pattern with no suggestion of
migration to Western Europe. A trend
line that doesn’t cross the neutral represents a strong peer affinity and
little or no migration between the Near East and Western Europe. J2 data (squares) shows a tipping point at
which the more distantly related records lean toward the Near East and the
closely related records lean toward Western Europe. That transition shows a TMRCA of about 3,900
± 800 years. The tipping point indicates
a point in time where the Near East J2 haplotypes became more common in Western
Europe, illustrating a migration.
Haplogroup G shows
very similar results as J2. Haplogroups J2 and G have been associated with the
Neolithic spread of agriculture from the Near East to Western Europe. Both J2 and G present a consistent
distribution from distant relationship (high variance) to closer relationship
(low variance). The trend lines for J2
and G represent migration events from the Near East to Western Europe. The trend line for J1 represents no migration
event. These results are consistent with
other published information.
Haplogroup R1b does
not exhibit either a migration or a non-migration pattern. The haplotypes cluster in a fairly homogenous
group. There is a slight lean toward
Western Europe and essentially no continuum from high variance to low variance. The more distantly related haplotypes don’t
exist in the Near East. The Near East
individuals are just as related to the Western European individuals as they are
to their own peers. The approximate
TMRCA for the R1b Near East – Western European group is 1,800 ± 500 years.
Through atrocities
and assimilation, Western European DNA from Crusaders was permanently
introduced into the Near East less than 1,000 years ago. Western European and Near East R1b haplotypes
are highly and recently related. The
data indicates that within the last 2,000 years there was a migration from one
geography to the other. There is no
documented migration in the past 2,000 years that would account for Western
European R1b populations coming from the Near East and replacing indigenous
European populations. The introduction
of Western European DNA into the Near East by Crusaders accounts for the west
to east genetic flow.
The sampling
practices of research studies are questionable.
The origin of participants is typically only validated for one or two
previous generations. This is equivalent
to not knowing the origin for study participants. Sampling needs to be undertaken with a genetic
genealogy approach and 37 markers or greater.
The population genetics approach of less than 17 markers, poor origin
validation and haplogroup generalization needs to change.
Previous papers
(Balaresque & Myres) that have used Near East R1b data as the basis of
their research are suspect. In light of
the introduction of Crusader DNA into the Near East within the past 1,000
years, any theory on a Neolithic origin for haplogroup R1b will have to be
re-evaluated.
Reference:
Maglio, MR (2014) Y-Chromosomal Haplogroup R1b Diversity in
Near East is Structured by Recent Historical Events (Link)
© Michael R. Maglio
Friday, December 5, 2014
DNA Convergence and Chicken Little
For me, the topic of convergence in yDNA first came up early
in 2014. I had just posted a paper and
one of the comments was – “What about convergence?” I said to myself, “What convergence?” I admit I had to look up the topic.
Convergence: A
term used in genetic genealogy to describe the process whereby two different
haplotypes mutate over time to become identical or near identical resulting in
an accidental or coincidental match. - Turner A & Smolenyak M 2004.
My response back to the comment was - “All of the haplotypes
in my paper are unique.” My data did not
exhibit convergence.
Convergence casts a shadow on genetic genealogy |
I started to poke around on the topic of convergence within
yDNA STR haplotypes and the immediate impression that I got was that folks were
ready to give up on STRs in favor of SNPs and the sky was falling. Chicken Little was running around in the
genetic genealogy circles. Here is a
small sample:
“Y-STRs are
effectively dead” - Dienekes Pontikos, 2011
“Convergence of Y
chromosome STR haplotypes from different SNP haplogroups compromises accuracy
of haplogroup prediction” – Wang, et al, 2013
Okay, convergence happens, but it’s an illusion.
Let’s take a big step backwards in this story. Did you know that most scientific papers
relating to genetic genealogy use 17 STR markers or less? Some use as few as 9 or 10. For any of you who ever took one of the
original 12 STR marker tests, you know that the results were essentially
useless for anything except deep haplogroup association and history.
Many researchers in the last couple of years are using the AmpFLSTR®
Yfiler® to get their 17 marker results.
This equipment is approved for forensic cases. Research papers are not forensic cases and
researchers don’t need to limit themselves to 17 markers. Thirty-seven marker yDNA tests have been
available since 2004.
Why does the number of STR markers matter? I’m going to release my inner math geek to
help explain. If we look at marker DYS19,
usually listed first in science papers and third in Family Tree DNA results, it
can have a value within the range of 7 to 22 across all haplogroups. Looking at R1b specifically, DYS19 ranges
from 10 to 17 and statistically at two standard deviations (2 sigma) the range
of values narrows to 13, 14 and 15. From
a probability point of view, there is a 1 in 3 chance that DYS19 will be 13, 14
or 15. Making the odds even better in
our favor, 95% of the time DYS19 for R1b will already be 13, 14 or 15. This means there is a 1 in 2 chance that
DYS19 could change to another value on its way to converging with another
haplotype.
Taking standard deviation into account to determine the
possible number of values for the STR markers and then multiplying each
probability gives the odds that a haplotype could converge.
STR
|
DYS393
|
DYS390
|
DYS19
|
DYS391
|
DYS385a
|
DYS385b
|
DYS426
|
DYS388
|
DYS439
|
DYS389i
|
DYS392
|
DYS389ii
|
Total
|
# of possible
marker values
|
2
|
4
|
2
|
2
|
2
|
4
|
1
|
1
|
2
|
2
|
2
|
2
|
4096
|
There is a 1 in 4096 chance that two R1b 12 marker
haplotypes could converge. This is not
the probability that one marker will change. This is the probability that all 12 markers
will change enough to match another haplotype.
These are very good odds and the reason why a 12-marker test is
practically useless.
With a high probability that 12 STR markers will converge, haplotypes
start to blend together. Two different
haplogroups or family lines will appear to be the same. Converging also means that when we calculate
the time to the most recent common ancestor (TMRCA), it will look like less
time has passed. Convergence makes a 12-marker
test result unusable for genealogical matching, haplogroup prediction and TMRCA
calculations. The Chicken Littles are
correct, we have a problem with 12 marker STR results.
What about 17 markers, a quasi-industry standard for science
papers? Taking the same approach with statistics
and probability, a 17-marker yDNA R1b result has a 1 in 2 million chance of
converging with another haplotype. Each
haplogroup has slightly different odds.
There is a 1 in 500,000 chance of an R1a 17 marker haplotype converging. Those odds are better than any lottery. Convergence is still a problem at 17 markers.
When Dienekes Pontikos proclaimed the death of yDNA STRs, he
was commenting on the attempt to get good TMRCA dates from 10-marker
results. I agree, you can’t get valid
TMRCA dates from 10-markers. When Wang,
et al, determined that convergence compromises haplogroup prediction, they were
correct, 17 marker haplotypes can converge to make one haplogroup look like
another.
In a quick analysis of 4,300 unique 37-marker R1b haplotypes,
the average genetic distance is 17 steps for 37 markers. That means there are 17 mutations required
for convergence in a 37-marker haplotype.
Nearly half of the markers in the haplotypes would need to change. When we look at the probability of 25-marker
haplotype convergence, the chances are 1 in 84 million. Considering there are about 3.6 billion men
on the planet, one in 84 million is still in the realm of possibility. By the time we get to 37-markers, the odds
are 1 in 49 trillion.
There is a 1 in 49 trillion
chance that all the necessary mutations will occur in order for two 37-marker
haplotypes to converge. The odds are likely
much higher. I’ve only looked at the probable
values for each marker and I haven’t taken into account the STR mutation rates,
the possibility that a marker will change over time.
There is essentially no such thing as convergence when 37 or
more markers are tested and researched.
If you eliminate the possibility of convergence by using 37 STR markers,
then immediately TMRCA calculation become more accurate and haplotypes from
different haplogroups no longer resemble each other. The reports of the death of yDNA STR results have
been greatly exaggerated.
I can’t tell you why researchers are currently stuck on 17
markers. I can tell you that any research
using less than 37 markers runs the risk of convergence in their data, which in
turn could lead to the wrong conclusions.
I still consider genetic genealogy to be in its infancy. Every month new research papers are published
and the new concepts introduced are latched onto immediately. It is understandable that papers from over a
decade ago used a dozen STRs and a handful of SNPs, that was the height of
technology. If the latest technology and
best data are not being used in today’s research papers, is that equivalent to
scientific negligence? Or, am I missing
something and this is a case of scientific ignorance on my part?
Tuesday, December 2, 2014
DNA, SNP, STR, OMG!
(Originally published May 2014 in Going In-Depth)
Oh my gosh, there are many acronyms in
genetic genealogy. You have to agree
that using the acronym DNA is better than writing deoxyribonucleic acid
repeatedly. Although, when we talk about
using DNA for genealogy and we only use acronyms, they start to lose their meaning
and become just another ‘thing’. “Hey,
I’ve got a SNP. Do you have a SNP?” “I dunno, let me check.” Maybe I’m weird. I like to understand what all the acronyms
mean and how they play a part in the larger picture.
When we talk about DNA, we often also talk about mitochondrial DNA. Mitochondria exist outside of the nucleus as an energy source for the cell and have their own independent DNA. Mitochondrial DNA has just over 16,000 base pairs in comparison to the 3 billion base pairs in our nuclear DNA. We inherit our mitochondrial DNA only from our mothers.
DNA is divided into coding regions (genes that define proteins for such things as eye color) and non-coding regions (sometimes called junk DNA). The coding region that defines us is less than 2% of our overall DNA and within that, there are less than 25,000 genes. A gene is a sequence of nucleotides averaging about 23,000 base pairs. One of the largest genes, which encodes for the Caspr2 protein, has over 2.3 million base pairs.
Within the 3 billion base pairs of our DNA there are variations (normally occurring mutations), where one base pair has been replaced with another base pair. As an example, it was adenine (A) and now its guanine (G). This is a single nucleotide polymorphism or SNP (pronounced snip). There are over 15 million SNPs in our DNA. Once a SNP occurs, it is usually permanent in the population. The farther back in time that the SNP occurred, the more people will have that particular mutation. To be considered a SNP, it has to exist in greater than 1% of the population. They are found in both the coding and non-coding regions of our DNA. In the coding regions, SNPs are often markers for genes.
Let’s divide our DNA into four groups. Group one, the autosomes, are the first 22 pairs of chromosomes. The next two groups, the sex chromosomes, are one X and one Y if you are male and two Xs if you are female. That gives us yDNA and xDNA. The last DNA group is mitochondrial. All types of DNA have SNPs. Autosomal SNPs are used for health and ethnicity. Mitochondrial and Y-DNA SNPs are used to determine world haplogroups. While there are 1,000s of X SNPs, there doesn’t seem to be much research around them.
SNPs have no effect on health, but their presence may predict a health risk. If you had an autosomal test from 23andMe (prior to the FDA ruling), they would have delivered health information with your results. They were able to report SNPs in the coding region associated with gene combinations responsible for health risks, like cancer or Alzheimer’s or basic information, like eye and hair color. Even though you cannot get health information from 23andMe currently, you can still use your autosomal results with Promethease from SNPedia.com to research your health risks.
Combinations of SNPs are analyzed to determine ancestry-informative markers (AIM – another new acronym for you). AIMs are used to estimate the ethnicity or at least the geographic origins of your ancestors. When you receive ethnicity results from an autosomal test, it will be based on the AIMs that the test company are using. They don’t all use the same markers, so results will vary. There are even 42 SNPs associated with having Neandertal ancestry.
SNPs are used to organize us into larger branches of the human family tree (haplogroups). Our maternal family tree is organized into 26 branches (A through Z) using mitochondrial DNA. Our paternal tree is similarly organized into 20 branches (A through T) using yDNA SNPs. As an example, take four men (I use men because the scenario works for both mitochondrial DNA and yDNA), Abe, Bob, Chaz and Dave. Test each of them for three SNPs, X, Y and Z. You find that they all test positive for SNP Z, Abe and Chaz test positive for X and Bob and Dave test positive for Y. You can start to see the branches and the beginning of a tree.
The first yDNA and mtDNA trees were built using only a few dozen SNPs. Today, the paternal and maternal haplogroup trees are much more detailed, based on thousands of SNPs. Complete SNP testing has been available for mitochondrial DNA for a number of years. Starting last year, complete SNP testing is available for yDNA from companies like FamilyTreeDNA with their Big Y test. Previously yDNA SNP tests were designed to look for specific SNPs. With advances in technology, they can now look for all the SNPs across over 12 million yDNA base pairs.
Just to add another acronym to the pile, there are also STRs or short tandem repeats (aka microsatellites). STRs are short sequences of base pairs that repeat. These repeats are found in autosomal, y and x DNA. You may have heard the term CODIS if you watch Crime/Drama shows on television. CODIS is the FBI’s Combined DNA Index System (more acronyms). When DNA is collected for CODIS, they typically test for 13 STR markers across the autosomes. When you have a yDNA STR test done, genetic genealogy companies test for up to 111 markers only on the Y chromosome. They will also perform a basic SNP test to identify your paternal haplogroup. SNPs and STRs are different in that SNPs appear to be permanent changes in our DNA and STRs are variable. STRs are identified by location on the chromosome and by the number of times that the repeat occurs. The number of repeats per STR can change over time, sometimes increasing, sometimes decreasing in number or increasing then decreasing again (known as a back mutation). The combined set of STR markers is your haplotype and may be unique to your surname or span multiple surnames. With the advances in yDNA SNP testing, SNPs will be found that are unique to your surname, which could make STR testing obsolete.
We all have DNA: 23 chromosomes in our cell nuclei, half from mom and half from dad. We also have mitochondrial DNA from our moms. Less than 2% of our DNA is in the form of genes, which define who we are. SNPs can be used to identify our “good” and “bad” genes. SNPs can also help identify our ethnicity and build our paternal and maternal family trees. STRs can organize us down to the paternal surname level. When folks start talking DNA, don’t be afraid to question them about, “What kind of DNA?”, “What does that SNP indicate?” or “What type of STR is being tested?”. We’ll never get away from using acronyms to simplify how we communicate genetic genealogy. That doesn’t mean we need to let the acronyms simplify the meanings to a point where the science is lost. Every little bit of knowledge adds to our understanding of ourselves.
DNA is divided into coding regions (genes that define proteins for such things as eye color) and non-coding regions (sometimes called junk DNA). The coding region that defines us is less than 2% of our overall DNA and within that, there are less than 25,000 genes. A gene is a sequence of nucleotides averaging about 23,000 base pairs. One of the largest genes, which encodes for the Caspr2 protein, has over 2.3 million base pairs.
Within the 3 billion base pairs of our DNA there are variations (normally occurring mutations), where one base pair has been replaced with another base pair. As an example, it was adenine (A) and now its guanine (G). This is a single nucleotide polymorphism or SNP (pronounced snip). There are over 15 million SNPs in our DNA. Once a SNP occurs, it is usually permanent in the population. The farther back in time that the SNP occurred, the more people will have that particular mutation. To be considered a SNP, it has to exist in greater than 1% of the population. They are found in both the coding and non-coding regions of our DNA. In the coding regions, SNPs are often markers for genes.
Let’s divide our DNA into four groups. Group one, the autosomes, are the first 22 pairs of chromosomes. The next two groups, the sex chromosomes, are one X and one Y if you are male and two Xs if you are female. That gives us yDNA and xDNA. The last DNA group is mitochondrial. All types of DNA have SNPs. Autosomal SNPs are used for health and ethnicity. Mitochondrial and Y-DNA SNPs are used to determine world haplogroups. While there are 1,000s of X SNPs, there doesn’t seem to be much research around them.
SNPs have no effect on health, but their presence may predict a health risk. If you had an autosomal test from 23andMe (prior to the FDA ruling), they would have delivered health information with your results. They were able to report SNPs in the coding region associated with gene combinations responsible for health risks, like cancer or Alzheimer’s or basic information, like eye and hair color. Even though you cannot get health information from 23andMe currently, you can still use your autosomal results with Promethease from SNPedia.com to research your health risks.
Combinations of SNPs are analyzed to determine ancestry-informative markers (AIM – another new acronym for you). AIMs are used to estimate the ethnicity or at least the geographic origins of your ancestors. When you receive ethnicity results from an autosomal test, it will be based on the AIMs that the test company are using. They don’t all use the same markers, so results will vary. There are even 42 SNPs associated with having Neandertal ancestry.
SNPs are used to organize us into larger branches of the human family tree (haplogroups). Our maternal family tree is organized into 26 branches (A through Z) using mitochondrial DNA. Our paternal tree is similarly organized into 20 branches (A through T) using yDNA SNPs. As an example, take four men (I use men because the scenario works for both mitochondrial DNA and yDNA), Abe, Bob, Chaz and Dave. Test each of them for three SNPs, X, Y and Z. You find that they all test positive for SNP Z, Abe and Chaz test positive for X and Bob and Dave test positive for Y. You can start to see the branches and the beginning of a tree.
The first yDNA and mtDNA trees were built using only a few dozen SNPs. Today, the paternal and maternal haplogroup trees are much more detailed, based on thousands of SNPs. Complete SNP testing has been available for mitochondrial DNA for a number of years. Starting last year, complete SNP testing is available for yDNA from companies like FamilyTreeDNA with their Big Y test. Previously yDNA SNP tests were designed to look for specific SNPs. With advances in technology, they can now look for all the SNPs across over 12 million yDNA base pairs.
Just to add another acronym to the pile, there are also STRs or short tandem repeats (aka microsatellites). STRs are short sequences of base pairs that repeat. These repeats are found in autosomal, y and x DNA. You may have heard the term CODIS if you watch Crime/Drama shows on television. CODIS is the FBI’s Combined DNA Index System (more acronyms). When DNA is collected for CODIS, they typically test for 13 STR markers across the autosomes. When you have a yDNA STR test done, genetic genealogy companies test for up to 111 markers only on the Y chromosome. They will also perform a basic SNP test to identify your paternal haplogroup. SNPs and STRs are different in that SNPs appear to be permanent changes in our DNA and STRs are variable. STRs are identified by location on the chromosome and by the number of times that the repeat occurs. The number of repeats per STR can change over time, sometimes increasing, sometimes decreasing in number or increasing then decreasing again (known as a back mutation). The combined set of STR markers is your haplotype and may be unique to your surname or span multiple surnames. With the advances in yDNA SNP testing, SNPs will be found that are unique to your surname, which could make STR testing obsolete.
We all have DNA: 23 chromosomes in our cell nuclei, half from mom and half from dad. We also have mitochondrial DNA from our moms. Less than 2% of our DNA is in the form of genes, which define who we are. SNPs can be used to identify our “good” and “bad” genes. SNPs can also help identify our ethnicity and build our paternal and maternal family trees. STRs can organize us down to the paternal surname level. When folks start talking DNA, don’t be afraid to question them about, “What kind of DNA?”, “What does that SNP indicate?” or “What type of STR is being tested?”. We’ll never get away from using acronyms to simplify how we communicate genetic genealogy. That doesn’t mean we need to let the acronyms simplify the meanings to a point where the science is lost. Every little bit of knowledge adds to our understanding of ourselves.
© Michael Maglio
Wednesday, September 24, 2014
DNA Mysteries: Iberian R1b-V88 in Africa
When I first heard about R1b in Africa, my immediate assumption was that the predominantly Celtic haplogroup must have been a recent transplant. I ran some of the V88 haplotypes against the big databases (FTDNA & ySearch) expecting to see matches to European men within the African colonial timeframe. It wasn’t that easy. Common ancestor analysis put the R1b Africans (V88) thousands of years removed from the rest of their European R1b cousins. Where did they come from? How did they get there?
I started with the given that the R1b defining mutations (SNPs) occurred in the Iberian Peninsula. The jury is still out on this hypothesis. There have been scientific papers for and against Iberian origins of R1b. My own work (Iberian Origins of R1b) supports an origin prior to the Neolithic expansion. Could V88 have made a straight-line migration from Iberia to the Lake Chad region of Africa? Could V88 have crossed the Straits of Gibraltar, travelled across the Sahara, which 7,000 years ago was a savannah well populated with animals for hunting, and arrived at Lake Mega-Chad? That was my early premise. I was wrong.
The distribution of V88 is much larger than any of the scientific papers would indicate. While I agree with the work that’s been done correlating the spread of V88 with the spread of Chadic languages (Cruciani et al 2010), the Chadic population is only a subset. Nobody takes into consideration the V88 populations in Europe and the Middle East. If they do, it is a sideways glance to say were ignoring them because they don’t fit into what we are trying to prove. If you don’t look at the entire picture, your conclusions will be skewed.
I wanted the largest selection of V88 Y-DNA records with at least 37 markers tested. I started with Family Tree DNA projects that had the records SNP tested. Those haplotypes were run against the ySearch database to identify highly related records with no SNP testing. The initial gathering of records picked up individuals with SNP M73. These were removed. The key differentiator between V88 and M73 was DYS464a&b. V88 was typically 12,12 and M73 was 15,15. Thirty-seven or more STR markers are helpful in identifying additional related haplotypes and even more necessary in determining the relationship between records. Most studies only looks at SNPs or a small handful of STR markers. This is shortsighted. Imagine a reference population of 100 records all with the same SNP. Without enough STR markers you can’t tell whether you are looking at one haplotype with minor 1 or 2 step variations or 100 unique haplotypes. That’s the difference between a founder event starting with as few as one individual or a group with greater diversity and age.
My final set of 119 records has at least 37 STR markers, V88 SNP testing or is highly related via STR and has the geographic location of the most distant known ancestor. The records are processed through PHYLIP to generate a phylogenetic tree. The phylogenetic tree give a visual depiction of the relationships in the dataset and an approximate number of years back to common ancestors, represented as the nodes between the records.
All of this is very standard genetic genealogy. I add a twist (Biogeographical Multilateration) by converting the years back to a common ancestor to a distance using Cavalli-Sforza’s migration rate of 1 to 1.2 km per year. This is enough for me to solve a series of cascading equations giving me the locations of the common ancestors. Looking back at the phylogenetic tree shows us how all the nodes and locations are connected, essentially the flow of migration.
The out of Iberia event took place about 7,700 ± 1,600 years ago. TMRCA calculations have been shown to be very inconsistent. Some folks use a constant mutation rate and some use rates per marker. I include a TMRCA to give a relative chronology. While the majority of R1b is known for its Western Atlantic migrations, V88 took a path along the Mediterranean coast and down the Adriatic. While none of the V88 records indicated Crete as an ancestral location, it appears multiple times as a common ancestor location. The data shows Crete as a stepping-stone in the Mediterranean as V88 migrated to the Nile River Valley. The back to Africa event(s) occurred roughly 5,500 ± 1,000 years ago.
The majority of the Chadic records (Cameroon, Chad and Nigeria) have relatively close genetic connections to individuals in the Middle East (mainly Saudi Arabia). The Chadic and Middle Eastern records tie back to common ancestors along the upper Nile. There is a significant lack of information to understand what impact R1b-V88 had on the Nile Valley cultures. Considering that there was only 1 out of 119 records with an exact Nile River location, I would venture a guess that V88 didn’t integrate well.
While the V88 back to Africa migration has captured much attention, the data shows a more fascinating event. There was a V88 re-migration back to Europe from Africa. The back to Europe event took place about 3,200 ± 1,000 years ago. Again, Crete played a role as a stepping-stone as V88 entered the Eastern Adriatic region and spread into Central and Eastern Europe. Someone will probably notice that many of the V88 in Eastern Europe are Jewish and that the date for leaving the Nile region is close to the time of Exodus. There is nothing in any of the data to indicate that this was the Jewish Exodus from Egypt. The V88 group in Eastern Europe is closely related and there is phylogenetic evidence to support that this may have been a founder event with a single male or small group of closely related males. There is no evidence to support that those founders were Jewish when they left Africa.
By looking at the big picture, including all the data and letting the data illustrate the patterns, we can unravel what appears to be the mysterious appearance of R1b in Central Africa. Along the way, we can uncover a previously unknown re-migration from Africa to Europe. Too often haplogroup data is treated as discrete buckets of information living in a vacuum with no interaction to other haplogroups and no internal relationships. Every DNA record is connected to every other record in a network. Each haplotype is a vector with location and direction. The sooner we treat genetic records as a network analysis, the sooner we will solve more DNA mysteries.
I started with the given that the R1b defining mutations (SNPs) occurred in the Iberian Peninsula. The jury is still out on this hypothesis. There have been scientific papers for and against Iberian origins of R1b. My own work (Iberian Origins of R1b) supports an origin prior to the Neolithic expansion. Could V88 have made a straight-line migration from Iberia to the Lake Chad region of Africa? Could V88 have crossed the Straits of Gibraltar, travelled across the Sahara, which 7,000 years ago was a savannah well populated with animals for hunting, and arrived at Lake Mega-Chad? That was my early premise. I was wrong.
The distribution of V88 is much larger than any of the scientific papers would indicate. While I agree with the work that’s been done correlating the spread of V88 with the spread of Chadic languages (Cruciani et al 2010), the Chadic population is only a subset. Nobody takes into consideration the V88 populations in Europe and the Middle East. If they do, it is a sideways glance to say were ignoring them because they don’t fit into what we are trying to prove. If you don’t look at the entire picture, your conclusions will be skewed.
I wanted the largest selection of V88 Y-DNA records with at least 37 markers tested. I started with Family Tree DNA projects that had the records SNP tested. Those haplotypes were run against the ySearch database to identify highly related records with no SNP testing. The initial gathering of records picked up individuals with SNP M73. These were removed. The key differentiator between V88 and M73 was DYS464a&b. V88 was typically 12,12 and M73 was 15,15. Thirty-seven or more STR markers are helpful in identifying additional related haplotypes and even more necessary in determining the relationship between records. Most studies only looks at SNPs or a small handful of STR markers. This is shortsighted. Imagine a reference population of 100 records all with the same SNP. Without enough STR markers you can’t tell whether you are looking at one haplotype with minor 1 or 2 step variations or 100 unique haplotypes. That’s the difference between a founder event starting with as few as one individual or a group with greater diversity and age.
My final set of 119 records has at least 37 STR markers, V88 SNP testing or is highly related via STR and has the geographic location of the most distant known ancestor. The records are processed through PHYLIP to generate a phylogenetic tree. The phylogenetic tree give a visual depiction of the relationships in the dataset and an approximate number of years back to common ancestors, represented as the nodes between the records.
All of this is very standard genetic genealogy. I add a twist (Biogeographical Multilateration) by converting the years back to a common ancestor to a distance using Cavalli-Sforza’s migration rate of 1 to 1.2 km per year. This is enough for me to solve a series of cascading equations giving me the locations of the common ancestors. Looking back at the phylogenetic tree shows us how all the nodes and locations are connected, essentially the flow of migration.
The out of Iberia event took place about 7,700 ± 1,600 years ago. TMRCA calculations have been shown to be very inconsistent. Some folks use a constant mutation rate and some use rates per marker. I include a TMRCA to give a relative chronology. While the majority of R1b is known for its Western Atlantic migrations, V88 took a path along the Mediterranean coast and down the Adriatic. While none of the V88 records indicated Crete as an ancestral location, it appears multiple times as a common ancestor location. The data shows Crete as a stepping-stone in the Mediterranean as V88 migrated to the Nile River Valley. The back to Africa event(s) occurred roughly 5,500 ± 1,000 years ago.
The majority of the Chadic records (Cameroon, Chad and Nigeria) have relatively close genetic connections to individuals in the Middle East (mainly Saudi Arabia). The Chadic and Middle Eastern records tie back to common ancestors along the upper Nile. There is a significant lack of information to understand what impact R1b-V88 had on the Nile Valley cultures. Considering that there was only 1 out of 119 records with an exact Nile River location, I would venture a guess that V88 didn’t integrate well.
While the V88 back to Africa migration has captured much attention, the data shows a more fascinating event. There was a V88 re-migration back to Europe from Africa. The back to Europe event took place about 3,200 ± 1,000 years ago. Again, Crete played a role as a stepping-stone as V88 entered the Eastern Adriatic region and spread into Central and Eastern Europe. Someone will probably notice that many of the V88 in Eastern Europe are Jewish and that the date for leaving the Nile region is close to the time of Exodus. There is nothing in any of the data to indicate that this was the Jewish Exodus from Egypt. The V88 group in Eastern Europe is closely related and there is phylogenetic evidence to support that this may have been a founder event with a single male or small group of closely related males. There is no evidence to support that those founders were Jewish when they left Africa.
By looking at the big picture, including all the data and letting the data illustrate the patterns, we can unravel what appears to be the mysterious appearance of R1b in Central Africa. Along the way, we can uncover a previously unknown re-migration from Africa to Europe. Too often haplogroup data is treated as discrete buckets of information living in a vacuum with no interaction to other haplogroups and no internal relationships. Every DNA record is connected to every other record in a network. Each haplotype is a vector with location and direction. The sooner we treat genetic records as a network analysis, the sooner we will solve more DNA mysteries.
Out of Iberia and back to Africa. Followed by a return to Europe. |
Reference:
Maglio, MR (2014) Y Chromosome Haplogroup R1b-V88: Biogeographical Evidence for an Iberian Origin (Link)
Maglio, MR (2014) Y Chromosome Haplogroup R1b-V88: Biogeographical Evidence for an Iberian Origin (Link)
Tuesday, August 12, 2014
Iberian R1b Y-DNA: First Movers in Europe
The disputed origins of haplogroup R1b, most commonly thought of as Celtic, remains split between Iberia prior to the end of the last ice age and various West Asian locations after the ice age. A new view on the R1b homeland comes out every year. With all we know about DNA, shouldn’t we be coming to a consensus? Typically, I refer to R1b as Celtic to help an audience make the connection between lettered haplogroups and culture or ethnicity. I also add the caveat that Celtic is a misleading label. R1b is supergroup of cultures including; Iberian, Gallic, Celtic, Germanic and Scandinavian. To attribute empires or nationalities to R1b would be foolish, as R1b is tens of thousands of years older than any known empire.
Perhaps I’m naïve. I like simple, logical answers. The earliest publications on R1b described their ancestor R1, entering Europe from central Asia during a warm period about 30,000 – 40,000 years ago. The last ice age forced R1 to split and take refuge south in Iberia and the Balkans. Time and separation gave us the mutations R1b in Iberia and R1a in the Balkans. That split is roughly what we see today in those regions. That’s clean and simple. The real world is much more complex. R1b and R1a were not alone in Europe. Their interactions with the other major European haplogroups- E, G, I, J and N has to be taken into consideration. We can’t analyze R1b as if it were in a vacuum.
Let’s take y-DNA haplogroups out of the picture for a moment. We know that modern humans survived and flourished in the Iberian refuge during the end of the last ice age, based on mitochondrial DNA studies. [Could someone please run some y-DNA tests on those samples?] The tribes in western Europe, whoever they were, had a 1,000 to 2,500 year head start over the tribes in central and eastern Europe on repopulating the continent. The ice sheets melted and retreated earlier on the west coast than in the rest of Europe. This gave the inhabitants of the Iberian refuge an advantage – a “first-mover” advantage gained by being the first to move north. These first-movers gained a land-monopoly. A tribe with a first-mover advantage and over a 1,000 year head start should have been hard to displace from western Europe. In other anthropological situations, those original inhabitants are forced into niche locations by invading populations, but very rarely are displaced completely. What we see on the west coast of Europe, is a very strong R1b presence and no niche haplogroups of a significant age. From this point of view, either R1b is the original Iberian inhabitant or R1b completely decimated another earlier haplogroup that had a 1,000 year geographical head start. I like simple. R1b was in Iberia first.
Let’s throw some data at the problem. The R1b haplogroup population is enormous. The majority fall into SNPs R-P312 (Celto-Iberian) and R-U106 (Celto-Germanic). There is so much information there that it tends to be noise. If you want to get to the root of R1b (R-M343), you need to work with the branches that are closest to the root - R-L278*, R-V88, R-M73*, R-YSC0000072/PF6426 and R-L23.
I collected 250 records that matched these SNPs or were genetically close by STR haplotype. These records were mapped based on user-reported most distant ancestor location.
This is not a connect the dot exercise. Just because two or more records appear geographically close doesn’t mean that they are genetically close. These 250 records have to be treated like a network. If this were Facebook, these folks would be randomly associated through family, business, school or neighbor connections. These are y-DNA records. There is a relationship between every pair. Each pair has a different common ancestor, with a different number of generations to get back to that ancestor. Here is an example of what that relationship looks like across multiple pairs. The number represents years back to a common ancestor (TMRCA).
When all of the interrelations are taken into consideration, the group of records can be displayed as a relationship tree of who is older or younger and who is more closely related to whom (phylogenetic tree).
Now we have who, where, when and how the records are connected. At this point it does become a connect the dots exercise. I’ve used a biogeographical analysis to connect very specific sets of dots based on the calculated interrelation of the entire group.
The R1b genetic family tree has a trunk and many branches. The trunk of the R1b data is firmly rooted in Iberia. The main core of the tree stretches along the western Atlantic coast of Europe and branches across Europe and even back into Asia. The results that I found support the work of the earliest pioneers in the field and conflict with the latest publications.
Every analysis has its limitations. The work that I’ve done looks back at the R1b family about 8,000 years. The scarcity of data only allowed for me to predict the origin of R-L278, which is currently one branch below the main root of R-M343. I can’t tell where R1b was between the times that R1 split into R1b and R1a, yet.
In my analysis, I have included R-V88. They are a curious group of R1b found in Africa and the Middle East. I will be treating R-V88 in a separate write-up to do justice to a very interesting back migration story. The R-V88 article can be found here.
Reference:
Maglio, MR (2014) Biogeographical Evidence for the Iberian Origins of R1b-L278 via Haplotype Aggregation (Link)
Perhaps I’m naïve. I like simple, logical answers. The earliest publications on R1b described their ancestor R1, entering Europe from central Asia during a warm period about 30,000 – 40,000 years ago. The last ice age forced R1 to split and take refuge south in Iberia and the Balkans. Time and separation gave us the mutations R1b in Iberia and R1a in the Balkans. That split is roughly what we see today in those regions. That’s clean and simple. The real world is much more complex. R1b and R1a were not alone in Europe. Their interactions with the other major European haplogroups- E, G, I, J and N has to be taken into consideration. We can’t analyze R1b as if it were in a vacuum.
Let’s take y-DNA haplogroups out of the picture for a moment. We know that modern humans survived and flourished in the Iberian refuge during the end of the last ice age, based on mitochondrial DNA studies. [Could someone please run some y-DNA tests on those samples?] The tribes in western Europe, whoever they were, had a 1,000 to 2,500 year head start over the tribes in central and eastern Europe on repopulating the continent. The ice sheets melted and retreated earlier on the west coast than in the rest of Europe. This gave the inhabitants of the Iberian refuge an advantage – a “first-mover” advantage gained by being the first to move north. These first-movers gained a land-monopoly. A tribe with a first-mover advantage and over a 1,000 year head start should have been hard to displace from western Europe. In other anthropological situations, those original inhabitants are forced into niche locations by invading populations, but very rarely are displaced completely. What we see on the west coast of Europe, is a very strong R1b presence and no niche haplogroups of a significant age. From this point of view, either R1b is the original Iberian inhabitant or R1b completely decimated another earlier haplogroup that had a 1,000 year geographical head start. I like simple. R1b was in Iberia first.
Let’s throw some data at the problem. The R1b haplogroup population is enormous. The majority fall into SNPs R-P312 (Celto-Iberian) and R-U106 (Celto-Germanic). There is so much information there that it tends to be noise. If you want to get to the root of R1b (R-M343), you need to work with the branches that are closest to the root - R-L278*, R-V88, R-M73*, R-YSC0000072/PF6426 and R-L23.
• • R1b M343
• • • R1b1 L278
• • • • R1b1a P297
• • • • • R1b1a1 M73
• • • • • R1b1a2 M269
• • • • • • R1b1a2a L23
• • • • R1b1c V88
[• • • • • • • • • R1b1a2a1a1 U106 - too far downstream]
[• • • • • • • • • R1b1a2a1a2 P312 - too far downstream]
I collected 250 records that matched these SNPs or were genetically close by STR haplotype. These records were mapped based on user-reported most distant ancestor location.
This is not a connect the dot exercise. Just because two or more records appear geographically close doesn’t mean that they are genetically close. These 250 records have to be treated like a network. If this were Facebook, these folks would be randomly associated through family, business, school or neighbor connections. These are y-DNA records. There is a relationship between every pair. Each pair has a different common ancestor, with a different number of generations to get back to that ancestor. Here is an example of what that relationship looks like across multiple pairs. The number represents years back to a common ancestor (TMRCA).
When all of the interrelations are taken into consideration, the group of records can be displayed as a relationship tree of who is older or younger and who is more closely related to whom (phylogenetic tree).
Now we have who, where, when and how the records are connected. At this point it does become a connect the dots exercise. I’ve used a biogeographical analysis to connect very specific sets of dots based on the calculated interrelation of the entire group.
The R1b genetic family tree has a trunk and many branches. The trunk of the R1b data is firmly rooted in Iberia. The main core of the tree stretches along the western Atlantic coast of Europe and branches across Europe and even back into Asia. The results that I found support the work of the earliest pioneers in the field and conflict with the latest publications.
Every analysis has its limitations. The work that I’ve done looks back at the R1b family about 8,000 years. The scarcity of data only allowed for me to predict the origin of R-L278, which is currently one branch below the main root of R-M343. I can’t tell where R1b was between the times that R1 split into R1b and R1a, yet.
In my analysis, I have included R-V88. They are a curious group of R1b found in Africa and the Middle East. I will be treating R-V88 in a separate write-up to do justice to a very interesting back migration story. The R-V88 article can be found here.
Reference:
Maglio, MR (2014) Biogeographical Evidence for the Iberian Origins of R1b-L278 via Haplotype Aggregation (Link)
Thursday, June 26, 2014
Your Autosomal DNA Tapestry
Deep Into DNA*
Every year new tools become available to help us
understand our genetic patterns and learn about the stories written in our
genes. There are stories of health
issues, both good and bad. There are
stories of our cousin connections. There
is diverse color in our ethnic background.
My autosomal tapestry hangs proudly on the wall.
What does a tapestry have in common with your
autosomal DNA? A tapestry is a colorful
and complex weaving that tells a story.
Your autosomal DNA is a complex weaving of 3 billion base pairs
inherited from your ancestors. Autosomal
DNA can tell multiple stories about ethnicity, health and relationships. As you will see, your DNA can be quite
colorful.
Bayeux Tapestry (Source: Wikimedia Commons) |
...continued at The In-Depth Genealogist with a free subscription.
*The Deep Into DNA article series is published each month in the new Going In-Depth
digital genealogy magazine presented by The In-Depth Genealogist.
*The Deep Into DNA article series is published each month in the new Going In-Depth
digital genealogy magazine presented by The In-Depth Genealogist.
Tuesday, April 29, 2014
Exploring Rollo's Roots: DNA Leads the Way
It’s been nearly a year since I wrote about William the Conqueror’s DNA. Based on a study of men with surnames historically associated with William and their corresponding Y-DNA, I concluded that I identified the genetic signature of the first Norman King of England. Now it’s time to get back to William and more specifically his 3rd great grandfather, Rollo. To be honest, the 37 marker Y-DNA haplotype that I published is really connected to Richard the Fearless, William’s great grandfather. Genealogically, the surnames in the study trace back to Richard. As long as there was no hanky-panky, William the Conqueror has the same Y-DNA as Richard. What that also means is that Richard has the same Y-DNA as his grandfather, Rollo.
Based on the work done in my previous paper, the following haplotype is that of William the Conqueror (and Richard the Fearless)-
DYS393
|
DYS390
|
DYS19
|
DYS391
|
DYS385a
|
DYS385b
|
DYS426
|
DYS388
|
DYS439
|
DYS389i
|
DYS392
|
DYS389ii
|
13
|
24
|
14
|
11
|
11
|
14
|
12
|
12
|
12
|
13
|
13
|
29
|
DYS458
|
DYS459a
|
DYS459b
|
DYS455
|
DYS454
|
DYS447
|
DYS437
|
DYS448
|
DYS449
|
DYS464a
|
DYS464b
|
DYS464c
|
DYS464d
|
17
|
9
|
10
|
11
|
11
|
25
|
15
|
19
|
29
|
15
|
15
|
17
|
17
|
DYS460
|
Y-GATA-H4
|
YCAIIa
|
YCAIIb
|
DYS456
|
DYS607
|
DYS576
|
DYS570
|
CDYa
|
CDYb
|
DYS442
|
DYS438
|
11
|
11
|
19
|
23
|
15
|
15
|
17
|
17
|
36
|
37
|
12
|
12
|
There is an assumption, inherent in genetic genealogy, that there weren’t any non-paternal events between the generations that separate Rollo and William and that this haplotype is that of Rollo as well. One of the goals for this Rollo study is to get more accurate with his haplotype by narrowing the dataset to only those records with 67 markers. The second goal is to determine Rollo’s haplogroup R SNP. The best I was able to determine for William was R-P312, which is a fairly high level SNP. My third goal is to determine Rollo’s origin using my TribeMapper analysis. Whether Rollo is Danish or Norwegian has been disputed for hundreds of years.
I picked up where I left off with William. There were 152 Y-DNA records that made it into the William the Conqueror Modal Haplotype (WCMH). For each of these records a 67 marker test result and SNP testing result were added to the analysis, where the data was available. I threw out any record that didn’t have enough data and retained the ones that grouped into a single SNP of R-DF13 (just downstream of R-L21). Based on these final 25 records, I have identified the 67 marker Rollo Norman Modal Haplotype (RNMH) as follows:
DYS393
|
DYS390
|
DYS19
|
DYS391
|
DYS385a
|
DYS385b
|
DYS426
|
DYS388
|
DYS439
|
DYS389i
|
DYS392
|
DYS389ii
|
13
|
24
|
14
|
11
|
11
|
14
|
12
|
12
|
12
|
13
|
13
|
29
|
DYS458
|
DYS459a
|
DYS459b
|
DYS455
|
DYS454
|
DYS447
|
DYS437
|
DYS448
|
DYS449
|
DYS464a
|
DYS464b
|
DYS464c
|
DYS464d
|
17
|
9
|
10
|
11
|
11
|
25
|
15
|
19
|
29
|
15
|
15
|
17
|
17
|
DYS460
|
Y-GATA-H4
|
YCAIIa
|
YCAIIb
|
DYS456
|
DYS607
|
DYS576
|
DYS570
|
CDYa
|
CDYb
|
DYS442
|
DYS438
|
11
|
11
|
19
|
23
|
15
|
15
|
17
|
17
|
36
|
37
|
12
|
12
|
DYS531
|
DYS578
|
DYF395S1a
|
DYF395S1b
|
DYS590
|
DYS537
|
DYS641
|
DYS472
|
DYF406S1
|
DYS511
|
DYS425
|
DYS413a
|
DYS413b
|
11
|
9
|
15
|
16
|
8
|
10
|
10
|
8
|
10
|
10
|
12
|
23
|
23
|
DYS557
|
DYS594
|
DYS436
|
DYS490
|
DYS534
|
DYS450
|
DYS444
|
DYS481
|
DYS520
|
DYS446
|
DYS617
|
DYS568
|
16
|
10
|
12
|
12
|
16
|
8
|
12
|
22
|
20
|
13
|
12
|
11
|
DYS487
|
DYS572
|
DYS640
|
DYS492
|
DYS565
|
13
|
11
|
11
|
12
|
12
|
Based on this modal haplotype and the associated SNP, a broader collection of genetic cousin records were identified to be used with my new TribeMapper analysis (Biogeographical Multilateration).
This map shows the geographic distribution of Rollo’s cousins. The large number of points along the coast of Normandy is a good sign. If the majority of points were in Eastern Europe, I would have to revisit my whole hypothesis about William the Conqueror. It is best not to try to interpret any relationships until we look at them through the lens of a phylogenetic tree.
The TribeMapper analysis takes into consideration the mapped location, the tree node connections and the time between common ancestors. The time is converted to distance based on the demic diffusion migration rate. The distance is plotted to ‘triangulate’ the geographic location of each common ancestor. This is a process called multilateration.
The earliest documented origins for Rollo come from Dudo of Saint-Quentin in 1015 and William of Jumièges in 1060. Both ‘histories’ were commissioned by the House of Normandy and attribute a Danish origin to Rollo. Commissioned biographies can border on mythology. The Norwegian Orkneyinga Saga, from the 13th century, gives Rollo a Norwegian origin.
I’ve run the analysis with Rollo’s record as an unknown location. TribeMapper allows us to back into the location for any unknown point. What we get is a highly constrained location for Rollo’s ancestor, in the middle of Denmark. The data then shows that Rollo may have lived within 226 km of that paternal ancestor. The red circle illustrates the range for Rollo. This covers the majority of Denmark. The data also shows that Rollo’s ancestors, going back at least 12 generations were also in Denmark.
We can give the Norwegians some credit also. The ancestors of Rollo’s ancestors were Nowegian, with an origin on the west coast of Norway. Rollo’s ancestors were responsible for multiple branches of migration into Europe. This includes a back migration into Norway that then went on to invade Scotland.
This was accomplished with small sample of 65 records for simplification. Much larger data sets could determine the genetic flow in a greater geographic and chronologic view. Additional records within the same SNP grouping could result in a more accurate origin for Rollo. Records that are genetically upstream from the SNP and STR group used, may identify the nomadic migrations prior to the Western Norway settlement.
I’ve run this simulation multiple times, getting the same results. I’m comfortable calling Rollo – “The Dane”.
Reference:
Maglio, MR (2014) Biogeographical Origins and Y-chromosome Signature for the House of Normandy (Link)
Reference:
Maglio, MR (2014) Biogeographical Origins and Y-chromosome Signature for the House of Normandy (Link)
Subscribe to:
Posts (Atom)