Autosomal DNA and Probability


The general wisdom is that matches on autosomal DNA are only accurate for up to four or five generations (or to second cousins). Beyond this limit any matches that may occur probably occur by chance, not by inheritance. This is because there is always the probability that any match of any kind of 5% or less can be attributed to random chance and not to inheritance.
My purpose here is to suggest that, by referring to our traditional written family history research and by careful planning our DNA tests, we may be able to identify matches way beyond our great grandparents and our second cousins.

My Ancestors
My Ancestors

I have two parents. It is expected that I receive half or 50% of my autosomal DNA from my father and half from my mother. This seems to be an acceptable proposition.
I have four grandparents. It is expected that I receive one quarter or 25% of my autosomal DNA from each of my grandparents. That is, it is expected that I received 25% from my grandfather Bert Baulch, 25% from my grandmother Annie Abbey, 25% from my grandfather Noel Learmonth and 25% from my grandmother Edith Salter.
I have eight great grandparents. It is expected that I received one eighth or 12.5% of my autosomal DNA from each of my eight great grandparents.
At the fifth generation it is expected that I received one sixteenth or 6.25% of my autosomal DNA from each of my two great grandparents. Can the expected values for receiving autosomal DNA from my two greats grandparents definitely be attributed to inheritance? After all, the upper mark of 5% which is used to indicate matches that may be wholly attributed to chance is not all that far removed from the 6.25% that may be attributable to inheritance from one of my two greats grandparents.
Now none of my direct ancestors are alive and so aren’t available for DNA testing. I have to rely upon my siblings and upon my cousins. The expected values of a match on autosomal DNA tests for my ancestors, siblings and cousins can be summarised in tabular form as follows:
Relationship-Chart
The expected value of sharing autosomal DNA with one of my siblings is 50%. I actually share 38% autosomal DNA with one of my brothers. The expected value for shared autosomal DNA with any one of my first cousins once removed is 6.25%. I actually share 7.3% autosomal DNA with one first cousin once removed and 5.4% with another.
Should actual values that differ from expected values be cause for concern? Absolutely not!
However, rather than accepting the relationship for any autosomal DNA match by a testing company as being set in stone, I do believe that my written genealogy confirms the autosomal DNA match result. Equally, the autosomal DNA match is a further independent source that may substantiate my written genealogy. The two are not separate but dependent one upon the other.
The methodology for calculating the likelihood of what autosomal DNA we are expected to have should be familiar to us all.
Consider tossing a coin. The first toss may be heads. The probability of the second toss being heads is still 50%. Even if the second toss is heads the probability of the third toss being heads is still 50%. Thus in a small population of 3 tosses the result of three heads doesn’t indicate that a double headed coin is being used. However, if the result still remains heads after hundreds or thousands of tosses I might be inclined to check whether the coin is biased in some way. According to Bernoulli’s theorem, the more a coin is tossed the more likely it is that the actual value of the number of times a head is tossed approaches the expected value of 50%.
Now consider throwing a die or dice. The first toss may be a 4. The probability of throwing a 4 is one sixth. Indeed for an unbiased die the probability of throwing one of the six numbers is always one sixth irrespective of the previous throws. For a short number of throws there may be a run on a particular number but this in no way alters the probability for the next throw of the die. For each number that probability is one sixth. As for the coin toss, over hundreds and thousands of throws of the die the actual value over all of these throws will approach the expected probability of one sixth for each of the six numbers on the die.
This method of calculating expected values for the toss of a coin and the throw of a die can be applied to the passing of autosomal DNA from two parents to a child. The options for a toss of a coin are either heads or tails. The options for the throw of a die are 1, 2, 3, 4, 5 or 6. The options for a child are that the child receives its autosomal DNA half from its father and half from its mother. As for the coin and as for the die the actual value of autosomal DNA received in the short term may differ from the expected value. As for the coin and as for the die over millions and indeed billions of generations the actual value of autosomal DNA a child receives from its parents will approach the expected value of 50% from its father and 50% from its mother.
But is this so? What is it that Family Tree DNA and AncestryDNA testing with respect to autosomal DNA? Is there an equal chance of this autosomal DNA information coming from one parent as from the other parent? Let’s start by looking at DNA in the whole cell before focusing on autosomal DNA.
Each cell in our body contains DNA. In the cell proper DNA can be found in the mitochondria. This DNA is known as mitochondrial DNA. DNA is also found in the cell nucleus which contains 23 pairs of chromosomes each containing DNA. The 23rd pair is known as pair of the sex chromosomes. The 23rd pair for men is made up of one X chromosome and one Y chromosome. Women have 2 X chromosomes.  The first 22 pairs of chromosomes are known as autosomes. Autosomes contain autosomal DNA.

Cell with nucleus and mitochondria
Cell with nucleus and mitochondria

In a search for genealogical DNA the testing companies test in excess of 700,000 markers on the “junk” DNA portion of our autosomal chromosomes. These markers are the sites of single nucleotide polymorphisms or SNPs (pronounced snips). A person’s autosomal SNPs can be identified and compared another person’s autosomal SNPs.
Apart from identical twins, each of us is unique. We see this as we walk down the street or glance around a football crowd at the MCG. It is easy, therefore, to apply the law of large numbers as discussed above to the more than 700, 000 SNPs. To me 700,000 seems to be a large number. Surely, for each marker or SNP there is a 50% chance that I inherited that SNP from my father and a 50% chance that I inherited that SNP from my mother. Surely, as with the coin and the die, I had an equal chance of receiving each marker independent of the previous marker and the marker following.
There are two difficulties with this assumption.
Firstly, autosomal DNA tests are not able to distinguish which markers I inherited from my father and which I inherited from my mother.
Secondly, if the first wasn’t a knockout blow, the markers are set out on a strand of DNA. Unlike each toss of a coin or each throw of a die, whether or not I inherit a marker from my father or from my mother is not independent of who I inherited the previous marker from or who I inherited the next marker from. That is, the 700,000 SNPs are linked along the DNA strand. For example, the autosomal DNA I share with my brother on chromosome 3 and which we must have inherited from our father or our mother or a combination of both occurs along most of the chromosome.

Chromosome 3
Chromosome 3

Now we don’t match along the whole of chromosome 3 but where we do match it is mostly in one long strand. Indeed, the longer the strand we share the more closely is our predicted relationship.
Consider a little. This phenomenon of linked markers has helped me detect relationships beyond those predicted by chance – beyond our great grandparents and our second cousins. For example, I have confirmed a relationship with a third cousin twice removed as well as – wait for this – a sixth cousin twice removed! These results are quite beyond my great grandparents and second cousins (that is second cousins without any removes).
DNA testing for family historians is still in its infancy. The databases of results are still very small. Nevertheless I think I can apply traditional genealogical research techniques to my DNA research:

  • DNA is no substitute for quality traditional genealogical research. Sad to say but true.
  • I have started my analysis with an autosomal DNA test and started with myself. Then I moved from my closer relations to my more distant relations.
  • I have tried to optimise my chances of detecting matches by including a family tree of my ancestors and of the names of my ancestors were possible.
  • I have uploaded my information to Gedmatch as some family have tested on Family Tree DNA and some on AncestryDNA. My challenge now is to encourage our family to also share their results by uploading to Gedmatch (www.gedmatch.com) especially those who have tested with AncestryDNA for AncestryDNA has no facility to examine results  (for those who tested with AncestryDNA go to Settings and download the raw DNA data. Create a Gedmatch account and follow the instructions for uploading to Gedmatch. BE WARNED! These raw files are very, very large and take quite some to download and upload).
  • It will involve some of that boring work that doesn’t seem to yield any exciting results but I suspect that it may be worthwhile in the long term to examine my results down to the 1centiMorgan level and by each chromosome. I see this as akin to searching through parish registers or census results.