Autosomal DNA and Probability

The general wisdom is that matches on autosomal DNA are only accurate for up to four or five generations (or to second cousins). Beyond this limit any matches that may occur probably occur by chance, not by inheritance. This is because there is always the probability that any match of any kind of 5% or less can be attributed to random chance and not to inheritance.
My purpose here is to suggest that, by referring to our traditional written family history research and by careful planning our DNA tests, we may be able to identify matches way beyond our great grandparents and our second cousins.

My Ancestors
My Ancestors

I have two parents. It is expected that I receive half or 50% of my autosomal DNA from my father and half from my mother. This seems to be an acceptable proposition.
I have four grandparents. It is expected that I receive one quarter or 25% of my autosomal DNA from each of my grandparents. That is, it is expected that I received 25% from my grandfather Bert Baulch, 25% from my grandmother Annie Abbey, 25% from my grandfather Noel Learmonth and 25% from my grandmother Edith Salter.
I have eight great grandparents. It is expected that I received one eighth or 12.5% of my autosomal DNA from each of my eight great grandparents.
At the fifth generation it is expected that I received one sixteenth or 6.25% of my autosomal DNA from each of my two great grandparents. Can the expected values for receiving autosomal DNA from my two greats grandparents definitely be attributed to inheritance? After all, the upper mark of 5% which is used to indicate matches that may be wholly attributed to chance is not all that far removed from the 6.25% that may be attributable to inheritance from one of my two greats grandparents.
Now none of my direct ancestors are alive and so aren’t available for DNA testing. I have to rely upon my siblings and upon my cousins. The expected values of a match on autosomal DNA tests for my ancestors, siblings and cousins can be summarised in tabular form as follows:
Relationship-Chart
The expected value of sharing autosomal DNA with one of my siblings is 50%. I actually share 38% autosomal DNA with one of my brothers. The expected value for shared autosomal DNA with any one of my first cousins once removed is 6.25%. I actually share 7.3% autosomal DNA with one first cousin once removed and 5.4% with another.
Should actual values that differ from expected values be cause for concern? Absolutely not!
However, rather than accepting the relationship for any autosomal DNA match by a testing company as being set in stone, I do believe that my written genealogy confirms the autosomal DNA match result. Equally, the autosomal DNA match is a further independent source that may substantiate my written genealogy. The two are not separate but dependent one upon the other.
The methodology for calculating the likelihood of what autosomal DNA we are expected to have should be familiar to us all.
Consider tossing a coin. The first toss may be heads. The probability of the second toss being heads is still 50%. Even if the second toss is heads the probability of the third toss being heads is still 50%. Thus in a small population of 3 tosses the result of three heads doesn’t indicate that a double headed coin is being used. However, if the result still remains heads after hundreds or thousands of tosses I might be inclined to check whether the coin is biased in some way. According to Bernoulli’s theorem, the more a coin is tossed the more likely it is that the actual value of the number of times a head is tossed approaches the expected value of 50%.
Now consider throwing a die or dice. The first toss may be a 4. The probability of throwing a 4 is one sixth. Indeed for an unbiased die the probability of throwing one of the six numbers is always one sixth irrespective of the previous throws. For a short number of throws there may be a run on a particular number but this in no way alters the probability for the next throw of the die. For each number that probability is one sixth. As for the coin toss, over hundreds and thousands of throws of the die the actual value over all of these throws will approach the expected probability of one sixth for each of the six numbers on the die.
This method of calculating expected values for the toss of a coin and the throw of a die can be applied to the passing of autosomal DNA from two parents to a child. The options for a toss of a coin are either heads or tails. The options for the throw of a die are 1, 2, 3, 4, 5 or 6. The options for a child are that the child receives its autosomal DNA half from its father and half from its mother. As for the coin and as for the die the actual value of autosomal DNA received in the short term may differ from the expected value. As for the coin and as for the die over millions and indeed billions of generations the actual value of autosomal DNA a child receives from its parents will approach the expected value of 50% from its father and 50% from its mother.
But is this so? What is it that Family Tree DNA and AncestryDNA testing with respect to autosomal DNA? Is there an equal chance of this autosomal DNA information coming from one parent as from the other parent? Let’s start by looking at DNA in the whole cell before focusing on autosomal DNA.
Each cell in our body contains DNA. In the cell proper DNA can be found in the mitochondria. This DNA is known as mitochondrial DNA. DNA is also found in the cell nucleus which contains 23 pairs of chromosomes each containing DNA. The 23rd pair is known as pair of the sex chromosomes. The 23rd pair for men is made up of one X chromosome and one Y chromosome. Women have 2 X chromosomes.  The first 22 pairs of chromosomes are known as autosomes. Autosomes contain autosomal DNA.

Cell with nucleus and mitochondria
Cell with nucleus and mitochondria

In a search for genealogical DNA the testing companies test in excess of 700,000 markers on the “junk” DNA portion of our autosomal chromosomes. These markers are the sites of single nucleotide polymorphisms or SNPs (pronounced snips). A person’s autosomal SNPs can be identified and compared another person’s autosomal SNPs.
Apart from identical twins, each of us is unique. We see this as we walk down the street or glance around a football crowd at the MCG. It is easy, therefore, to apply the law of large numbers as discussed above to the more than 700, 000 SNPs. To me 700,000 seems to be a large number. Surely, for each marker or SNP there is a 50% chance that I inherited that SNP from my father and a 50% chance that I inherited that SNP from my mother. Surely, as with the coin and the die, I had an equal chance of receiving each marker independent of the previous marker and the marker following.
There are two difficulties with this assumption.
Firstly, autosomal DNA tests are not able to distinguish which markers I inherited from my father and which I inherited from my mother.
Secondly, if the first wasn’t a knockout blow, the markers are set out on a strand of DNA. Unlike each toss of a coin or each throw of a die, whether or not I inherit a marker from my father or from my mother is not independent of who I inherited the previous marker from or who I inherited the next marker from. That is, the 700,000 SNPs are linked along the DNA strand. For example, the autosomal DNA I share with my brother on chromosome 3 and which we must have inherited from our father or our mother or a combination of both occurs along most of the chromosome.

Chromosome 3
Chromosome 3

Now we don’t match along the whole of chromosome 3 but where we do match it is mostly in one long strand. Indeed, the longer the strand we share the more closely is our predicted relationship.
Consider a little. This phenomenon of linked markers has helped me detect relationships beyond those predicted by chance – beyond our great grandparents and our second cousins. For example, I have confirmed a relationship with a third cousin twice removed as well as – wait for this – a sixth cousin twice removed! These results are quite beyond my great grandparents and second cousins (that is second cousins without any removes).
DNA testing for family historians is still in its infancy. The databases of results are still very small. Nevertheless I think I can apply traditional genealogical research techniques to my DNA research:

  • DNA is no substitute for quality traditional genealogical research. Sad to say but true.
  • I have started my analysis with an autosomal DNA test and started with myself. Then I moved from my closer relations to my more distant relations.
  • I have tried to optimise my chances of detecting matches by including a family tree of my ancestors and of the names of my ancestors were possible.
  • I have uploaded my information to Gedmatch as some family have tested on Family Tree DNA and some on AncestryDNA. My challenge now is to encourage our family to also share their results by uploading to Gedmatch (www.gedmatch.com) especially those who have tested with AncestryDNA for AncestryDNA has no facility to examine results  (for those who tested with AncestryDNA go to Settings and download the raw DNA data. Create a Gedmatch account and follow the instructions for uploading to Gedmatch. BE WARNED! These raw files are very, very large and take quite some to download and upload).
  • It will involve some of that boring work that doesn’t seem to yield any exciting results but I suspect that it may be worthwhile in the long term to examine my results down to the 1centiMorgan level and by each chromosome. I see this as akin to searching through parish registers or census results.

Y-DNA Baulch

Cell showing nucleus and mitochondria
Cell showing nucleus and mitochondria

There are so many genealogical collections readily available these days it is tempting to try them all. Without thought or regard as to a collection’s relevance to the particular information sought. Those collections that are at hand are accessed first. Never mind the other 95% of collections which have yet to be digitised or indexed. It is easy to tap a key and search for the information online when I really do know in my head that my searching would be more productive if only I travelled to archives on the other side of the world or just spent time searching painstakingly through films and microfiche nearer to home.
But where to start searching further for my three greats grandmother Mary, wife of George Watts? I have found her in two English census returns indicating that she may have been born a British subject in foreign parts. Foreign parts? Where to begin?
I asked my cousin Val whether she would indulge my curiosity and undergo a DNA test. She kindly obliged. It was not until Val’s results arrived that I realised how little I know about DNA and today’s genetics. I was lost to Mendelian genetics when dominant brown eyes and recessive blue eyes were discussed. Where did that leave my hazel eyes? So the current genealogical literature about DNA seemed to me to be riddled with scientific terms that still leave me confused. I guess there is just so much to absorb that my little brain has been in overload for quite some time now.
Should I have done the more traditional or paper genealogical research that I had been avoiding before I set out on my DNA journey? Definitely. In a way my avoidance of a little hard work has voided the DNA results received – at least for the time being.
Val’s results have sent me back to reassess my research strategy and use of DNA as a research tool. But my brother John’s results are more promising if not equally confusing. So I am using John’s results as a medium for gaining an understanding of DNA analysis for genealogists.
John and I can trace out ancestry back on our paternal side to a Charles Baulch who married Ann Biddlecombe on 1 April 1799 at Muchelney, Somerset, England. On reviewing the information I agree with my sister. She says that because she couldn’t find the death of Charles Baulch in the civil indexes she concluded that he must have died before civil registration began in 1837. That doesn’t mean Charles Baulch died in 1836 and indeed our best guess is that Charles died between the time the Muchelney churchwardens wondered what to do with Baulch’s children and the time shortly later when their concern focused on what to do with Ann Baulch’s children.
We also have a dilemma about when our ancestor Charles Baulch was born. Certainly a Charles Baulch was born in Muchelney on 25 January 1767 to Roger Balch and Betty Gaylard. However, a Charles Baulch was buried just over a month later on 8 March 1767 in Muchelney and the infant son of Roger Balch seems to be the only candidate for this burial. So who married Ann Biddlecombe on 1 April 1799?
The obvious course of action is to search neighbouring parishes for a suitable Charles Baulch – fanning out to further parishes if necessary. Fortunately there is a copy of Dr Campbell’s index to baptisms and marriages for Somerset held on microfilm at the Genealogical Society of Victoria and indexes for many Somerset parishes now available on FreeReg  so I have a deal of work to do searching through these two sources available to me without having to travel the world.
Meanwhile, until I am able to motivate myself to do this paper genealogy is there anything in the analysis of John’s DNA that catches my attention? Maybe.
There are three parts to the analysis of John’s DNA. The first part involves analysis of his Y chromosome. The human cell contains a nucleus which includes 46 chromosomes. The first 44 are paired but the last two form the sex chromosome. A male has one Y chromosome and one X chromosome. For a male they receive their Y chromosome from their father who receives his Y chromosome from his father and so on. That is, the surname and the Y chromosome follow the paternal line.
In particular my brother received his Y chromosome from our father who received it from his father (our grandfather) who received it from his father, Samuel Baulch who received it from his father Francis Baulch who received it from Charles Baulch, our three greats grandfather. And there our paper genealogy trail finishes for the moment. But who did Charles Baulch receive his Y chromosome from?
Two tests are performed on the Y chromosome. In the first test short segments of DNA (markers) are measured and the number of repeats, short tandem repeats (STRs) are recorded. These results form an individual’s haplotype.

DNA strand
DNA strand

The second test examines particular points on the Y chromosome looking for mutations or single nucleotide polymorphisms (SNPs). That is the particular point is examined to see whether an instance of adenine, thymine, cytosine or guanine has mutated to one of the other three. Paternal lineages may be constructed for the Y chromosome using these mutations as nodes in the paternal lineages.
The results from both tests for Y-DNA analysis predict which haplogroup an individual belongs. John, for example, belongs to haplogroup I-M253 based on analysis of his Y-DNA. And while the database is still small there are also several Baulchs that belong to this haplogroup including many who can trace their ancestry back to Somerset. But many generations earlier than I have been able to establish our genealogy.
There is still a great deal of research to be done.

Britain’s genes

Traditional family history and genealogy take a back seat. Genes and DNA have taken over. In just three years. I was disappointed when we first received DNA results for a cousin.  Not anymore. I have listened to the presentations about genes and DNA here at the Canberra Congress. We haven’t even scraped the surface yet. This is before my daughter Alica mentioned the article about detecting county boundaries by genetic data alone just published in Nature. It’s like starting traditional genealogy all over again.

Genealogy Do Over – DNA (1)

The results of my first foray into DNA testing arrived in time for consideration as part of GDO Week 10 DNA considerations. My first request was not for myself nor for my brother but for a cousin of my father’s as she is a direct maternal descendant of my two greats grandmother Lydia Watts.
I have over the past three weeks paused to reflect again on my Genealogy Do Over so far. I have come to the conclusion that until the Do Over I have been beguiled by the ease of access electronically to many sources. This has caused me to churn my research. To do the same searches over and over again. With the same results. I may not have brick walls at these places at all. I have been trapped into looking at the sources that are easy to access rather than those that are most likely to give me some results.
Way back in Week 1 of the Genealogy Do Over we were advised to set aside our genealogical research so far, to abandon our bad habits and start over. What good advice! My perceived brick walls may not be brick walls at all. I have been just too lazy to put together a research plan that, while it may involve some actual work by me, is more likely to yield my hoped for results.
I was particularly struck by this when I asked for a review of where I was at with my Ralston ancestors at the recent Glasgow and Strathclyde region library research day at the Genealogical Society of Victoria. All that is lacking is a little actual work on my part. Something that I would have done years ago before the advent of personal computers and online databases. I should be searching a little further afield than just at Ralston, Renfrewshire. Not churning through the Paisley registers again and again. The information contained therein is exactly the same as what was there last time I looked.
Sure, there is a lot of planning and there is some actual research to do. Sure most of the information may only be available in various repositories and not online. Yet isn’t this now I went about my family research before the 1990s?
Similarly, I have doubts about the Charles, son of Roger Baulch and Elizabeth Gaylard, who was baptised on 25 Jan 1767 in Muchelney, Somerset, being my ancestor for a Charles Baulch was buried just over a month later on 8 Mar 1767 at Muchelney (see http://www.freereg.org.uk/). But have I searched those surrounding parishes not yet indexed on either FreeREG, FamilySearch or Somerset Online Parish Clerks (http://wsom-opc.org.uk/)? No. I just took fright at the number of parishes yet to be searched.
On the other hand at least I have started gathering information about John Bourke Ryan. So easy to search for as he always used his full name. I have found some rich archival material which I have transcribed. Nevertheless before I start churning my online research here I do need to stop and think about the information so far gathered. And how that all fits in with the economic and political climate at the time.
Which brings me to Mary McCade or McCord, the mother of Lydia and Lazarus Watts.
The 1841 and 1851 England Censuses indicate that Mary was born in Foreign Parts (that is, she wasn’t born in the British Isles) although, as I have found, that information isn’t necessarily correct.
The question now is – was Mary of British ethnicity or was she of the ethnic background of wherever she was born? Or someplace else for that matter.
Mitochondrial DNA is passed from mother to daughter. The test results I have just received yielded an mtDNA haplogroup of J1c9 – a classification that is confined to the United Kingdom. This haplogroup had been passed to my father’s grandmother, Eliza Ann Porter by her mother Lydia Watts. Lydia Watts would have received this haplogroup from her mother Mary McCade or McCord.
While Mary may have been born in foreign parts it is possible that she and perhaps her family returned to the United Kingdom and, as their children didn’t arrive until after George Watts was pensioned out of the British Army, it is also possible that George Watts and Mary McCade married, not in foreign parts, but in England.
Another brick wall for which I must stop churning and start creating a research plan that may actually yield some results.