Sunday, January 13, 2008

In Search of Noah's Daughters-in-law

I am neither a geneticist nor a student of genetics. I am simply a retired pharmacist, who wanted to investigate whether or not there was any scientific evidence that mankind was populated from the family of the biblical Noah. I began this project without knowing where it would lead or how long the project would take. I knew the project could not prove the idea of a single family populating the world; it could only hint that it was possible. I did not review any pre-existing conclusions or data. I was told that I wanted to re-invent the wheel and to a large measure that was true. One thing for sure though, I now am familiar with that wheel, at least on the ‘big picture’ level.

I found that humanity is indeed populated by just three distinct lineages and these lineages could be the families of the three sons of Noah. I also found, by using the DNA found in the mitochondria, the human race to be very close genetically. The average deviation of this DNA between people from all races and continents is around 12 variations in 16,000 DNA bases. By comparison, the deviation between the chimpanzee and the human is 1300 variations per 16,000.

While comparing DNA the idea of race vanished. There is a sub lineage that has a higher variation rate ( around 50 or so) but that is only a sublineage.



Before I tell you my story, I first feel I should define certain words.

Definitions:

1) Mitochondria are tiny jelly bean shaped structures in all the cells of eukaryote (organisms containing a nucleus) organisms. Mitochondria are structures of the cell that produce most of the energy from chemicals of ingested food. They are very small; it is estimated that approximately one thousand reside in certain cells.

2) Proteins are large molecules composed of strings of amino acids. Proteins can be used to make structures or enzymes for the body. Enzymes are chemicals that facilitate chemical reactions such as extracting energy from food. DNA is used as a code/template in making proteins.

3) DNA are large molecules whose backbone is comprised of a long string of four bases (chemicals). Some DNA molecules may contain a string of one million of these bases. These four bases are represented by the letters: A, C, T, and G. These four bases, comprising the backbone of the DNA molecule, form the code/template which cells use in the building proteins. It is known that these bases are read in threes. As the DNA is decoded by the body, each set of three bases (triplets) translates into a certain amino acid which is incorporated into the newly forming protein. These triplets have 64 possible variations of bases. Since there are only 20 different amino acids, that means that more than one triplet code will translate into the same amino acid. I mention this to show that a mutation or changing of a base in the DNA code does not necessarily mean that the meaning of the code has been altered. There are other codes in the DNA that tell the cell where to stop reading DNA, regulation of reactions, and etc.

The cell's DNA is found in the nucleus and the mitochondria.
Nuclear DNA is grouped into what is called genes. Genes are further grouped into structures called chromosomes. There are a total of 46 chromsomes or 23 chromosome pairs in the human. Each chromosome has its 'twin-though not identical' chromosome. We inherit one chromosome twin from each parent. In the male there is only one Y chromosome paired with a X chromosome. In the female the X chromosome is paired with another X chromosome.


More about DNA;

Replication of DNA is a complex process and can be divided into two different processes.

A) Regular DNA replication occurs when a cell prepares to divide and create another cell. The cell must duplicate its DNA so each daughter cell will receive an exact (hopefully) copy of the parent cell's DNA. This occurs when any non-reproductive cell divides. The DNA in the mitochondria and nucleus replicates this way.
B) Reproductive DNA replication occurs in the forming of the gametes (egg and sperm) of the parents. These gametes contain only one chromosome of DNA of each chromosome pair. This chromosome, more than likely, has undergone a phenomenon called crossing-over. Crossing -over is there is a mixing of DNA bases between the chromosome pairs. This means that the DNA of the gametes is not identical with the parent cell’s DNA.

Mitochondrion have their own DNA. Mitochondrial DNA (mtDNA) is small, roughly 16,500 bases long and is divided into two parts.
A) The D-loop region that apparently does not code for protiens, contains about 1000 bases.
B) The rest of the mtDNA , comprising about 15,500 bases, used to code for the building of chemicals needed in the cell.

Mitochondrial DNA is unusual in two ways:
A) It never experience base-mixing due to crossing-over.
B) It is almost completely inherited through the mother.
Without crossing-over and only maternal inheritance, one can easily see that the mitochondrial DNA of the child should be the same as the mother, grandmother, great grandmother, great great grandmother, and so forth. This would be true if mistakes never happened during the replication of mtDNA. These mistakes are referred to as mutations. Mutations are rare. For instance; my results show that since the flood (or the evolutionary spawning of man) there has been an average of less than 0.2% deviation in the mtDNA between individuals. This implies an extremely impressive system of replication.

Mitochondrial DNA sequence of bases is recorded with a line number and six groupings of ten bases.

How a base location is identified:

A location of a specific spot in mtDNA chain is given by a number marking its place in the mtDNA followed by its base abbreviation in parenthesis, such as 10233(g).

An example line taken from an individual pulled from the database at NCBI (National Center for Biotechnology Information) is:

4321 ccttatttct aggactatga gaaTcgaacc catccctgag aatccaaaat tctccgtgcc

The location of the 'mutation' shown by the uppercase ‘T' in the example above is the location 4344(t). Count from the left; the first letter being location 4321. This designates the base and location relative to this particular individual. When comparing different individuals to each other a ‘standard’ must be selected. This standard is used to set every individual mtDNA base to a specific location number. This must be done because occasionally there are additions or deletions of bases, especially in the D-loop area, which skewers the numbers to the left or to the right.
Geneticists have picked an individual in Europe and given him/her the honor of being the ‘standard’. His/her sequence has been checked and rechecked again and again by a group of geneticists. The NCBI code for this individual is J01415.2 and is called the rCRS or revised Cambridge Reference Sequence. The same location of the mutation above with the standard rCRS is 4345(t).

Line 4321 of the rCRS is
4321 cccttatttc taggactatg agaaTcgaac ccatccctga gaatccaaaa ttctccgtgc

Notice that the line has been skewed to the right and the location is one number higher.

Note: All genetic locations in this article are reference with the rCRS standard unless they are otherwise noted.

Note: The ancient mtDNA sequence of early man may not be best represented by the individual J01415.2. This individual was simply designated as ‘the standard’.

Except for mutations, a mitochondrial DNA sequence remains the same generation after generation. If a mother has a certain mutation at location 10622(a), her children and children’s children (through female lineage) will have the same mutation generation after generation. It is important to remember that the mother is the only parent that passes mtDNA to the next generation.(*2) It can be reasoned that if one would compare a group of peoples' mtDNA, one could determine how any in the group were related.


Picture a mutation that occurred in ‘Helen’ ten generations ago. This mutation changed base C at location 10622(c) to a T.
The great-great-granddaughter (Edith) had another mutation at location 9988(g) that changed the base G to an A.
Edith had a grandson (George) with a mutation at location 4234(a) that changed the base A to base G.
Edith also had a great granddaughter (Kay) with a mutation at location 743(c) that changed the base C to the base T.

In the present time if you could look at the mtDNA at a group of people from the location of the original Helen family. You could deduce how or they were related.

First off, any offspring from George would not be included in the study because nothing done to George’s mtDNA means a hoot except to him. Remember mtDNA is passed only by the mothers.

A person male or female with 10622(t) in their mtDNA is directly related to Helen.
A person male or female with 10622(t) and 9988(a) is directly related to Helen and Edith.
A person male or female with 10622(t) and 9988(a) and 743(t) is directly related to Helen, Edith, and Kay.

A world populated only by Noah’s three daughters-in-law would produce, at most, only three most ancient mutational variations or Lineages (lines). These ancient mutational variations would have occurred before the flood. Think of these variations as trunks of all of earth’s family trees. ‘At most’ was written because the three daughters-in-law could have been closely related. In that case their mtDNA would/could be the same. In such a scenario the three lines would be undetectable.



The Procedure of the Study:

Eight individuals were chosen from the NCBI. They were not chosen at random but were chosen from every race and as widely separated geographically as possible. The mtDNA sequences of these eight people were compared. If anyone was related they should share common ‘mutations’ that the other individuals do not share. I use the word ‘mutations’ here loosely because the original base at any given location can not be ascertained with complete certainty.
This first sampling of individuals was compared by hand, not with a computer program. Comparing one mtDNA sequence containing 16000 bases by hand takes from two to three hours. This time demand was the reason for the small sample used in the beginning of the study.

The results were surprising. There were only eight ‘mutations’ shared by any of the individuals. These ‘mutations’ were at these locations: 8701 – 9540 – 10398 – 10400 – 10873 – 14783 – 15043 - 15301.

These eight locations divided the chosen individuals into two separate lineages. (I did not detect the third line until a much larger sample was taken.) I designated these lineages to be Line A and Line B.



Below are the locations with their specific bases of these different lines.

Line A: 8701(a) - 9540(t) -10398(a) - 10400(c) - 10873(t) - 14783(t) - 15043(g) - 15301(g)
Line B: 8701(g) - 9540(c) - 10398(g) - 10400(t) - 10873(c) - 14783(c) - 15043(a) - 15301(a)

I formulated a new ‘standard’ mtDNA sequence (instead of the rCRS standard) from this first undertaking. This sequence was chosen for a new standard which reflected the most common base at each given location. It can be found at www.butterfliesetc.com/linea.php. This standard was used for the remainder of this study.

I decided to see if these two groupings or lineages held true in larger samples taken from all over the world.

For the second test, I took 67 individuals and focused only on these eight locations. I was still doing the sequence comparisons by eye. What I found; out of the 536 (67 x 8) locations examined, only 8 deviations were found.

My third and last test: to try to find newer family lineages and from them verify that line A and B were in fact the most ancient human lines.

Picture a tree with many branches. A trunk can support or have many branches attached to it. Each branch, however, can be attached to only one larger branch or trunk. If the above lineages were truly the most ancient and not merely random flukes, I would need to study further and measure the complete mtDNA sequences of a much larger sampling of people.

This would have been impossible was it not for a program my son made (in just three hours, I might add). This program could compare two 16,000 mtDNA locations in seconds. I placed this program on the web (*3). Please feel free to use it.

I also received help from Head Speaker/Scientist, Dr. Robert W. Carter at CMI. He directed me to an article of his that contained 827 individuals whose mtDNA sequences were reviewed and free of genetic disease factors. Remember, many of the sequences contained in the NCBI contain errors which would skew any results. Make sure your data has been reviewed and accepted by knowledgeable people. I almost stopped my research because of faulty data once.

From the 827 individuals taken from Dr. Carter’s study, I could only find about 750 or so at the NCBI site. These formed the data base for my final study.

In undertaking so large a study I decided to use only the functional part of the mtDNA sequence and dropped the D-loop area. I also did not incorporate deletions or additions into my data. By doing this, I believed I was comparing apples to apples not oranges to apples.

At first I also decided to record only mutations that were not in the original eight locations. I was looking for other branches or lineages.

Seventeen of lineages were found from these 750 individuals. About a quarter of 750 individuals could not be said to belong to any of these 17 lines. I purposely omitted lineages that were obviously from close families or locations.

I then complared these 17 lines with the original eight mutational locations. I found that each of these 17 newer lines were only contained in one of the most ancient lines I found at first.
At this time a third variation of the original eight mutational locations was found. I called it Ancient Line D.

From the outcome of this study it could be said that mankind is indeed divided into three separate ancient lineages.

My reasoning was this; if any of these 17 lines were found in more than one of the lines A, B, or D, then they could be said to be older than A, B, or D.
Remember a branch can be attached to only one stem. From the data none of the newer lines were found in more than one of the A, B, or D lines.
The newer (younger) lines detected in the 750 individual samples are below.

Line E—2706a-3010a-4336c-6776c-7028c-11719g-12705c-14766c (only ANCIENT LINE A)
Line F—1811g-3348g-12308g-12372a-12705c (only ANCIENT LINE A)
Line G—709a-1888a-4216c-8697a-10463c-11812g-12705c-13368a-15452a-15607g (only ANCIENT LINE A)
Line H—4117c-5843g-8790a-12940a (only ANCIENT LINE B)
Line I—5465c-6719c-10238c-12705c (only ANCIENT LINE A)
Line J—5460a-10238c-8251a-12501a (lines I and J could have had a common ancestor with one mutation at 10236) (only ANCIENT LINE A)
Line K—3970t-6392c-12705c-13759a (only ANCIENT LINE A)
Line L—709a-4048a-4071a-4164t-4454g-4454c-5351g-5460a-6455t-6680c-9824c-12405t (only ANCIENT LINE B)
Line N—3010a-3394c-4216c-12612g-12705c-13708a-14798c-15452a (only ANCIENT LINE A) (*6)
Line O—3552a-4715g-7196a-8584a-11914a-15486t (only ANCIENT LINE B)
Line P—663g-1736g-4248c-4824g-8563g (only ANCIENT LINE A)
Line Q—758a-1018a-1442a-3594t-6026a (only ANCIENT LINE D)
Line R—656c-1719a-10245c (only ANCIENT LINE B)
Line S—12007a (several individuals in India had just this one deviation) (only ANCIENT LINE B)
Line T—3010a-4882t-5177a (only ANCIENT LINE B)
Line U—709a-4833g-5108c (only ANCIENT LINE B)

This new variation of the original location is:
Line D—8701(g)-9540(c)-10398(g)-10400(c)-10873(c)-14783(t)-15043(g)-15301(a)

Conclusion

Mankind’s families can be traced to one of three lineages. These three lineages are separated by the bases they have at the study’s original eight locations.
These three lineages are as follows:
Line A—8701(a)-9540(t)-10398(a)-10400(c)-10873(t)-14783(t)-15043(g)-15301(g)
Line B—8701(g)-9540(c)-10398(g)-10400(t)-10873(c)-14783(c)-15043(a)-15301(a)
Line D—8701(g)-9540(c)-10398(g)-10400(c)-10873(c)-14783(t)-15043(g)-15301(a)
Remember that I was told I was trying to reivent to wheel? These lines can be found in the mito-map and are labeled as follows:

Line A is Line N
Line B is Line M
Line D is the L Lines.

It sure took a lot of effort to reinvent this wheel! You can find the mtDNA tree at http://www.mitomap.org/mitomap-phylogeny.pdf . This diagram is pretty hard to grasp, remember all mutations are based on the rCRS and START at the rCRS individual (look at the bottom right for that individual)

When looking at this mito-map, visualize the M and N lines not linked and extending up to single families. Then picture the line joining the L lines to the M lines as broken and extending up to a third family with L3, L2, L1, and L0 lines dropping down as further variations of the third line.
This paper could stop here and could be seen as a powerful tool ‘proving’ the ‘possibility’ of a world populated by only three women. I feel there is a need to go a little farther and address a problem with the amount of mutations observed in certain of the L lines.

If the cells replication rate remains exactly the same for all of these three lines and if the mutational rate is related to replication rate, then all three lines should have closely the same amount of mutations. This is because the biblical flood was at a specific time and all three lines emerged at that time. Two of the Lines in fact do. These two are the A (N) Line and the B (M) Line.

Some of the D(L) lines, however, have over twice the number of mutations on average than Lines B(M) and A(N). This would appear to place the Line; L0 and L1 as much older than the M or N Lines. This puts doubt in the theory that the D (L) Line can be dated as starting at the same moment in time.

Unless …
The Astronomer Ptolemy made a system that predicted the motions of the plants. He started by believing the Earth was the center of the universe. When he plotted the positions of the planets he had to invent a theory; as the planets revolved around the earth they also moved in little circles called epicycles. This theory was necessary to force the observable data to fit his first premise that the earth was the center of the universe. In science today epicycles are being made to tweak data to fit a preconceived ‘fact’. Sometimes the epicycles give rise to sound theories; sometime they just prolong the life of a false theory.

I will also use a sort of epicycle, a scenario that can explain the problem of the higher mutation count of the L lines and also account for the three lines of the daughters-in-law of Noah. I do not say this is provable but that it is just possible. A lot of theories are not provable either but that does not mean they are incorrect.

So here goes ...
In the beginning there was the first woman; we will call her Eve. This woman had a certain mtDNA sequence. She had sons and daughters. The sons married their sisters and the original mtDNA sequence was passed on to their offspring. I believe this mtDNA sequence was close to the ‘standard’ found in this study. There is no way of knowing if this is true or not.
In the generations following, the mtDNA of Eve began to randomly mutate at several locations. During this time distinct family line sequences emerge. At some point the mtDNA Line D (L) was formed and as time passed, line A and Line B branched from Line D. There were many such family lines occurring throughout this time. At one point daughters of family lines A, B, and D married Noah’s sons. Then the flood came and all but these three child bearing females died.

After the flood;
The migration of the three Lines:
Europe: Line A
India: Predominately Line B a few Line A
China and Siberia: Line A and Line B
Mid East: Line A
Australia and Pacific Islands: Line A and Line B
North Africa: Line A
South Africa: Line D, Line A and B
The Americas (same as Siberia): Both Line A and Line B.- This gives further credence to the populating of the Americas through Siberia.

Now for the epicycle;
As Line D settled into Sub Saharan Africa, gradually their mtDNA underwent further change. Around the L2 line on the mito-map a speeding up of the mutation rate happened. That accounted for the large variations found in the L0, L1, and some L2 lines.
What could have caused this? Ask someone more knowledgeable than I!

But I do know;
It could have been caused by several large jumps by a few individuals or by a gradual speeding of the overall mutation rate. Either one of these methods (or both) could have happened. Mutation rate is not well understood. I did a study on great white sharks around Australia and New Zealand. The study showed that their D-loop mtDNA varied less between other shark individuals than human individuals varied between their D-loop mtDNA.

There are checks and balances within cells that try to maintain the integrity of the DNA. After all, life depends upon it. The replication of DNA is very complicated. Just a little compromise in its efficacy would be enough the effect mutation rate. How DNA’s replication is accomplished and regulated is thought to be programmed somewhere in the DNA itself. If a mutation occurred in the DNA which regulated the replication process, a change in the quality of the process could be achieved and a much higher mutation rate obtained. This is just an educated guess.

The way most geneticists view the mito-map is this;
The first human came from Africa and those primordial lines are still there, represented by the L0, L1, and some of the L2 Lines. Man then spread out of Africa and into Europe and Asia and America after forming the M and N lines. It does seem odd though; that the M and N lines average mutation variation in the study were both 12, signifying both og these lines occurred at the same time in history.
Stephen F. Smith





Footnotes:
*1) The NCBI is a government data resource center where thousands and thousands of animal and human DNA sequences are stored.
A good page to visit to download mtDNA sequences is http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=4512

*2) There has been a rare documentation of an offspring that has inherited the father’s mtDNA in certain of its cells. New England Journal of Medicine, vol 347, pp 576-580.


*3) http://dna.laeelin.com/


*4) While looking at the internet, however, I ran across an article that stated the data I was looking at was in error and later there were an apology from the doctor who presented the data.
*6) Most, though not all, of line N had a variation at location 10398(g).

2 Comments:

At January 14, 2008 at 5:06 PM , Blogger David said...

This comment has been removed by the author.

 
At January 14, 2008 at 5:11 PM , Blogger David said...

WOW! Very educational, inspirational and informative from every angle. I am a seeker and enjoy the fruits of other seekers as well.

All too often we tend to put God into a box. We think that God and science conflict with one another, yet, "science", as "time" are both God's creation. Science reveals God - just as time reveals truth. God's unique and specific will for each of our lives is written in our members as gifts, talents and skills - so also is His revelation written in scientific discoveries - sought by seekers of truth.

However, when we are bound by closed thoughts and narrow attitudes of God, we fail to see Him in all His glory and as creator of all that is. From the smallest thing in existance to the whole of the universe - God is revealing Himself.

Can we ever know the whole truth in this life? Who knows. But never-the-less, the truth is waiting for discovery.

Great job, Steve!

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home