Genealogy

Historical Genealogy of Clan Subpopulations
The genealogical information used in this study are courtesy of the respective clan
associations. As these data are based on records generally closed to the public and
esoteric enough to be incomprehensible even if they were to be made public, there is no
effective method of verifying their accuracy. As such, there may well be errors in the
information provided that the author remains unaware of. However, as the clan
associations were enthusiastic and generous enough to put in the time and effort to
provide the required assistance in exchange for so little, it should be taken as a point of
honour that they have done the utmost to provide accurate data, and that any possible
mistakes in the information lie outside their sphere of collective knowledge. To outrightly
assume that the records are false would be to do insult to undeserved parties.
Furthermore, it is not inconceivable that sample confusions could have happened in the
laboratory, while they were under the author's care. Hence, this dissertation will tr​​eat the
genealogical data as accurate, but with the understanding that some errors may yet be
present.


Clan Origins
Returning to Figures 3.1-1 & 3.1-2, the flow charts of the Zhang & Wu
subpopulations depict the origins of the subjects participating in this study. The Wu
surnamed is attributed to Tai Bo Kung, who took it on during his lifetime over 3000 years
ago. His descendent, Wu Xian Shou, relocated his entire nuclear family (which in those
days were much larger) to Hainan Island around 805 AD. The members of the Hainan
Goh Clan Association trace their lineage back to this ancestor-founder. The recorded
99generations indicate which generation (from Wu Xian Shou) the respective subjects
belong to. With the exception of subjects 101 and 102, none of the genotyped subjects are
known to have more recent common ancestors.

For the Zhang subpopulation, the clan association maintains that their lineage can
be traced back almost 5'000 years to Zhang Hui, the 5th son of Huang Ti. Two of the
descendent groups, the Jianhu and Qujiang Shishing, account for most of the Zhang
population sampled in this study. Before proceeding further, note that the tangs take their
names from the province or lineage where they were founded. The Jianhu group
eventually produced the HuiChen and HuiAnn tangs, while the Joo Lim and JinJian 
groups were formed from the Qujiang Shishing tang. The Joo Lim group gave birth to the
Ankui, BanQiao, and Sawei tangs. The JinJian group produced the Teochew,
HockChiew, and Hakka lineages, and some of the descendents of the HockChiew group
went on to form the PuTian Heng tang and the YongChoon Yu Qing tang. It is unclear
when a Zhang-surnamed individual established the surname line on Hainan Island, and
thus Subjects 037 & 067 remain 'unrooted'. With these details, it then becomes easy to
see that the lineages will form clusters where those of common origin come together. It
was also supposed that members of these lineage clusters would be more genetically
related to each other than with members of tangs within other clusters.

Genetic Variation
The results of this study indicate that there is little correlation between the
genealogical records and actual genetic data, for both Zhang and Wu subpopulations.
This is effectively visualised in the phylogenetic trees and also in the diversity of Y
haplogroups and haplotypes in the individual tangs. The groupings are not reflected by
100the genetic information, which scatters the members of the various tangs among the tree
branches. The diversity in the Zhang subpopulation appears to begin on the finer tang
level, with all the groups having a mix of haplogroups. The variety of Y chromosomes in
the tangs do not conform to any discernible pattern, suggesting that the effects which
produced the observed admixed distributions are entirely random and uncoordinated.
In lineages with shorter histories (less than 1'000 years old), such as Ankui, 
BanQiao, and JinJian Hakka, an identifiable dominant Y Chromosome can be found in
the lineages. The dominant Y in these cases strongly indicate that the particular
individuals have a common ancestor who hailed from the specific lineage. What it does
not confirm is whether the ancestor was the founder of the tang or whether he was an
addition made either recently or early on in the tang's history. There are also observed
instances (Subjects 057 & 088 for example) where individuals from different lineages
share the same Y Chromosome. 057 is from the Ankui tang and 088 from the JinJian
Hakka tang. Assuming that the records are accurate and there have been no sample
misidentifications, this suggests that the history of the Zhang clan is peppered with many
instances of repeated movements of lineages or individuals between tangs and provinces,
much like how small streams break off from one river to join another for a distance
before returning to the parent or joining yet another river. This would account for the
presence of identical Y Chromosomes in such distant arms of the Zhang subpopulation,
for it is highly unlikely that the Y would have been passed down over 4'000 years
unchanged.

Of a more interesting note would be the practical use of the Y haplotypes and
haplogroups in determining the origins of individuals whom the genealogical records are
101unable to place. Subjects 061 & 065 share the same Y Chromosome with 012 & 056, who
belong to the Ankui lineage. It may be of interest to the subjects to determine whether
they share their ancestry. The high degree of admixture in tangs and the possibility of
identical Y Chromosomes being discovered in more than one lineage make it difficult to
assign historical origins with any guarantee of accuracy. However, out of the 8 instances
of identical Y Chromosome groupings in the Zhang subpopulations, only 1 is attributed
to being distributed in more than 1 lineage. Based on current data, the author believes that
Y genotypes comprising 11 STR markers & the Y haplogroup are sufficient for use in
predicting the probabilities of genealogical origins in the Singaporean Han population, so
long as there are other evidence in support (surname, genotype exclusive to or
overwhelmingly-dominating specific lineage / population group).
If assumptions about the accuracy of historical records and genetic information
hold, the question of how and why the surname groups, supposedly a bastion of kinship
and camarederie that extends through time and space to new circumstances and alien
lands, would be so overwhelmingly diverse genetically as to approach the level found in
a random population of mixed surnamed individuals. The most common and logical
explanations would be adoptions, surname changes (to escape persecution or for other
reasons), marriage of a man into the wife's family, and non-paternity events. The
question that remains unanswerable is whether all the variations observed can be
attributed to and accounted for by these events. It is entirely possible that the surname
line initially contained several genetically-disimilar lineages, which the genealogical 
records glossed over for dramatic effect. It is not unheard of for lineages in dynastic 
China to merge to generate safety in numbers [Pan 1998]. In these cases, a putative
102ancestor-founder (almost certainly from one of the lineages) were often used. This would
result in a clan of several lineages, though it is not likely that more than 1 or 2 lineages
would be invited to join, in order to maintain the exclusivity of the surname.
In addition, one must not neglect to consider factors of sampling and unique
expansion histories of the respective subpopulations. In the example in Figure 4.1-1
below, the two halves of the tree are contemporaries and equivalent in structure. Due to
the sampling, however, researchers may erronously decide that Subpopulation 1 is less
genetically diverse than Subpopulation 2. Another natural effect that may adversely affect

the outcomes of analyses is depicted in Figure 4.1-2. The 2 halves of the population are
also contemporaries and have equivalent genetic distance. They have equal number of
subjects in the present, sampled generation. Yet, their genetic compositions can be vastly
different due to their different histories. Subpopulation expanded to its present size early
on in its history, and has had a much longer time to accumulate mutations compared to
Subpopulation 2, which has a more recent expansion. This would result in Subpopulation
1 being more diverse than Subpopulation 2. These 2 factors are but some examples of the
unknown properties of populations that complicate population genetics analyses.
4.1.3 Dialect Group Affiliations

The analyses results indicate that there are no Y Chromosomal bases for dialect group stratifications. The distribution of Y Chromosomes appear equally diverse in
various groups (though, admittedly, the sample sizes are undeniably low). This does not
imply that there are no heritable genetic factors on the other chromosomes with
phenotypic effects along the same division lines. However, with recombination and
shuffling of genes every generation, as well as increasing inter-marriage rates among
individuals of different dialect group affiliation, it is highly unlikely that these autosomal
genetic factors would still adhere to dialect groupings.

4.1.4 Limitations of the Study

The primary weakness of this study is the small numbers of subjects in many of 
the Zhang tangs. This is a problem in the dialectical group analyses as well. Additionally,
a more comprehensive study of the Han population should involve more surname groups
and a more balanced representation of dialect groups, as this study has an overall
104overrepresentation of Fujianese samples and underrepresentation of Cantonese samples.
The original scope of the project required much more emphasis to be placed on dialect
group differences, but as the sampling failed to effectively collect sufficient numbers of
individuals of each group, this aspect was deemphasized. While the author agrees that
there may be valid criticism that the dialect group comparisons were only performed
within a single surname population, the author also wishes to point out that doing so
eliminates much of the variability a mixed population would possess, and allows the data
to be based entirely on the Y Chromosome with no interference from the autosomes.

Any population genetics study on population history and genealogy will encouter
difficulties in inferrences and postulations extended to populations past. This is a natural
difficulty stemming from the inability to sample (in most cases) populations long since
dead and disposed of. The veracity of genealogical records may also come into question,
as historical records are subject to historian & witness bias, human errors, and the
inability to record all possible details which may prove, in foresight, to be vital.
4.2 Population Comparisons
4.2.1 Taiwan & Beijing
The Han populations of both Taiwan and Beijing were compared with the local
Han population using 7 STR markers. It was discovered that the Beijing Han population
has significant distance from the Taiwanese and Singaporean Han populations. This may
be due to the fact that Beijing Hans are mostly Northern Hans, whereas Taiwanese Hans,
like Singaporean Hans, are descendents of Southern Hans. Lin et al 2001 concluded that
Taiwanese Hans and Singaporean Hans are drawn from the same gene pool based on

HLA allele frequencies. Other studies [Ying et al 1985][Shaw et al 1999][Jin & Su 2000]
also noted the division between Northern and Southern Hans, while Shi et al 2004 argues
further that even finer scales subdividing both Northern and Southern populations are
necessary for accuracy.
4.2.2 British Population: Sykes
Aside from confirming that Causcasian Y Chromosomes are significantly
different from Han Y Chromosomes [Underhill et al 2000], the comparisons with the
Sykes data also served to illustrate the effect of time on the diversity of subpopulations.
As the existence time for an established lineage increases, so does the Y Chromosome
diversity of the lineage. The 700 year-old Sykes lineage has a haplotype that is both
exclusive to the surname and of dominant numbers within the surname. Han surname
subpopulations do not display an identical structure, until the subpopulation is broken
into even finer scale and the vintage taken into consideration. The Sykes and Han
populations likely have very different historical backgrounds, and the different structures
of their Y Chromosome pool are a reflection of that fact.

The Sykes study made use of only 4 STR markers, which may not have sufficient
discriminatory power. But even with this low power, the Zhang subpopulation proved
itself to be of a very admixed disposition. The Wu subpopulation resembles the pattern of
the Sykes subpopulation more. It however is only a third of the size of the Sykes
subpopulation. It is impossible to predict which way the Wu subpopulation will swing
(increasing diversity or decreasing diversity) should more subjects be sampled. While
Sykes concluded that it is possible to associate specific haplotypes to surnames in the
106English population, it is doubtful that the method would be applicable in modern Han
populations without the aid of other supportive information.
4.3 Y Markers
4.3.1 Y-STRs
The primary concern in the use of STR markers in identification work is that they
may have population-specific variabilities. An informative marker in one population may
prove to be entirely uninformative in another. As the markers used in this study were
developed mostly by caucasian-prominent populations, there is always the chance that
they are insufficiently optimised for Han populations, and that a different mix of markers
would do a finer job of teasing out population structures in the Hans. This is always a
possibility that can never be entirely removed. However, the use of the Minimal and
Extended Haplotypes, as defined by the International Society for Forensic Genetics, is
relatively established on the forensics arena, and repeated usage by the international
community have reported few inadequacies overall. The database of the YHRD {1} is
built by contributions from the worldwide community using exactly these markers.

Several questions remain unanswered in regards to the novel dinucleotide
markers. There are some minor sizing discrepancies in the allele products, such as
inconsistent gap sizes between alleles, which could presumbly be solved by an efficient
sequencing of the amplified region. Though not reported in the earlier parts of this
dissertation, attempts were made at sequencing the affected markers, but were not
successful in producing reliable results. This did not change the evaluation that the novel
107Y STR markers evaluated in this study likely had low polymorphism information content
in Han subpopulations.
As the author had no access to the samples themselves, the Control subpopulation
was not included in the analyses of the Mq11.2.3 marker. However, judging by the
similarity of the results of the Zhang and Wu 11 marker comparisons to the 10 marker
results, including the Mq11.2.3 locus into the Control subpopulation would not likely 
have a significant effect on the outcomes of the analyses.

Y-SNPs
The Y Chromosome Binary SNPs used in this study streamed the sampled
chromosomes into haplogroups defined by the Y Chromosome Consortium haplogroups
[The Y Chromosome Consortium 2002]. The process of identifying SNPs and
experimentally predicting their vintage are ongoing processes. As with the STR markers,
this work is pioneered by and based mostly on caucasian populations. Much still remain
to be discovered and mapped on most ethnic groups. More generic haplogroups (marked
with an *) still require refinements, as they often contain various related chromosomes
which are in need of further separation with as yet undiscovered markers. This may
explain why such substantial differences exist between haplotypes in some of the
haplogroups (such as O3*).
 As expected, the majority of the Singaporean Hans fall into the O haplogroups.
89.16% of the Control subpopulation, 91.24% of the Zhang subpopulation, and all of the
Wu subpopulation belong to the O haplogroups. Underhill et al 2000 reported similar
figures (90.48% of Hans) in China, while Taiwanese Hans had a 98.70% proportion.

Deng et al 2004 estimated that as much as 65% of all males in China (of various ethnic
groups) belong to the O haplogroups. There would be more data for comparisons among
global Han populations once the genotyping kit used in this study becomes widely
available.




No comments:

Post a Comment