Research: Signatures of
peopling processes revealed through diagrams of surnames frequencies
distribution
Edwin-FranciscoHerrera-Paz. Universidad Católica de Honduras, Campus San Pedro y San Pablo,
Faculty of Medicine, San Pedro Sula, Honduras.
ABSTRACT
Since Crow and Mange first described isonymy theory,
surnames analysis is a common tool used to assess the structure and dynamics of
human populations. Most common surnames in a territory usually represent mainly
the descendants of founders, while rare ones may correspond mostly to recent
immigrants. Additionally, frequent surnames are of low occurrence while rare
ones are much more abundant, with distribution of frequencies following a
power-law. Here, a graphical representation of the distribution of frequencies
that provides, at a glance, abundant information of the peopling of a territory
is presented. Surnames frequencies distribution graphs for several ethnic
communities in the republic of Honduras were constructed. Signatures of
peopling processes in these graphs are in accordance with known migration
patterns.
KEY WORDS
INTRODUCTION
Since the year in which Crow and Mange (1965) described
isonymy theory and its utility in determining the genetic structure of a
population, it has undergone many refinements. Surnames are inherited in a patrilineal
way similar to Y chromosome markers; therefore, surname analysis can be used to
explore various features of human populations, becoming a useful complementary
tool for genetic studies (Sykes and Irven, 2000; King and Jobling, 2009; Baek
et al, 2011). Databases of surnames can be interrogated in order to assess
--although in a rough manner-- the genetic structure of a population by
estimating the FST statistic, providing information on the historical genetic
drift (and thus on the degree of isolation) which is of interest to medical
genetics (Dipierri et al., 2014; Mathias et al., 2000) and anthropology (Barrai
et al., 2001; Rodriguez-Larralde et al., 2003; Herrera-Paz et al., 2014). Briefly,
isonymy calculated as the sum of the squared frequencies of surnames is roughly
four times FST. Additionally, frequencies of surnames can be used to determine
genetic distances between populations. In countries in which two surnames are
used (the first inherited from the father and the second from the mother) more
accurate estimates of the proportions of consanguineous marriages can be
assessed (Pinto-Cisternas et al., 1985); and it is possible to elucidate
endogamous customs and ethnic segregation in admixed populations--through the
FIS statistic--, as well as rough proportions of illegitimate children
(Herrera-Paz, 2013a).
In any given population most common surnames tend to
represent mainly the descendants of founders (Bedoya et al, 2006), while those
that are at low frequencies, such as doubletons and sigletons, probably
correspond mostly to individuals who immigrated recently, assuming low mutation
rate (errors in the registry). Recent founder effects may wipe out a fraction
of rare surnames in the same way it reduces the frequency of rare alleles in
polymorphic DNA loci (Luikart et al., 1998). On the other hand, studies show
that frequent surnames are of low occurrence while rare ones are much more
abundant, following a Pareto (power-law) distribution, which is a common
feature of many aspects of nature (Bak et al., 1987; Perry 1995; Sole et al., 1997;
Brunk, 2002) and society (Manruvka et al., 2010; Manrubia and Zanette, 2002;
Adamic and Huberman, 2012), namely the Zipf-Mandelbrot law (Zipf, 1949;
Mandelbrot, 1983). This property of the frequencies distribution of surnames (or
of any other type of informative elements) is usually represented by scaled
graphs, with the X axis as the logarithm of the frequency of surnames, and the
Y axis as the logarithm of the fraction of the population (log-log graphs).
Models for the surnames frequencies distribution under various conditions have
been amply studied (as for instance see Manrubia and Zanette, 2002).
Although log-log graphs offer an opportunity to
explore a universal feature of nature (power-law) through isonymy, from the
anthropological and population genetics perspectives it is of more interest to
assess certain characteristics of the populations under study, as for example, peopling
processes such as immigrations. I present here a modification of the surnames
frequencies distribution graph that offers, at a glance, plenty of information
about the historical peopling of a populated place.
Materials and Methods
Lists of surnames were obtained from the electoral
organism of the republic of Honduras, updated for the 2009 presidential
elections. Several rural and urban populated places from the country were used
for this study. Isonymy values and associated parameters for some of these places
can be found in Herrera-Paz (2013a; 2013b), and for the whole country up to
three administrative levels in Herrera-Paz et al. (2014).
The proportions of surnames from each place were
calculated and then plotted on graphs of frequencies distribution of surnames,
with points for X axis as the frequencies of surnames. It is important to note
that the set of points in the X axis must not be an arbitrarily fixed set of
numbers, but instead, each point represents the actual frequency (different
from 0) of one or more surnames, avoiding the curve drop to cero. The axis is
arranged in descending order, with the highest frequencies to the left and the
lowest to the right; therefore, it represents a timeline from the foundation of
the population (left end) to the present (right end), under the assumption that
most frequent surnames correspond to the founders and the less frequent to
newcomers. Y axis simply represents the proportion of the population.
Finally, the resulting diagrams were interpreted in light of known
facts about the peopling history of the communities.
Results and discussion
I argue here that the curves joining the points in the
frequencies distribution graphs so constructed constitute a diagram, reminiscent
of the peopling of the places where the X axis represents different historical
moments, and peaks (Y axis) represent immigration waves. The beginning (left
side) of the graph may correspond to initial peopling. An initial negative
slope would be indicative of drift, while a positive one with a high
termination (right side) would show very high recent immigration, overshadowing
initial peopling process.
The Garífuna is an afro-descendant ethnic group that
arrived to the republic of Honduras, Central America, just over 200 years ago
experiencing thereafter a huge population expansion, founding and peopling
communities throughout the Caribbean coast in a pattern well studied (Herrera
Paz et al., 2008; 2010), spreading linearly eastward and westward from
Trujillo, the city of arrival. From Cristales and Río Negro (the first Garífuna
communities in continental Honduras located in Trujillo), foundation of Garífuna
villages extended to the east until they reached the territory dominated by the
Miskito ethnic group (Herrera-Paz and Mejía, 2010), while the development of
important Honduran urban centers in the 20th century boosted Garífuna
male migrations toward the west direction. Figures 1 and 2 show the frequencies
distributions of surnames for 26 Garífuna and two Miskito communities.
Figure 1. Garífuna communities originated in the
westward migrations from Cristales and Río Negro. Arrows show the direction of
migration.
|
Similar patterns with multiple immigration waves (peaks)
can be seen in all four communities. In general (with some exceptions), Garífuna
communities show a typical pattern characterized by a high left end with a
steep slope, relatively low occurrence of surnames of intermediate frequencies,
and an ascending slope terminating in a high point. These may correspond to
initial isolation with high genetic drift, together with high recent
immigration in most communities. Communities to the west tend to present more peaks
(immigration waves) compared to the ones to the east. Additionally, in many
places to the east, the beginning point is in a higher position with respect to
the ending point, showing drift dominating over immigration. These
characteristics are in accordance with actual migrations.
Some exceptions to the “Garífuna pattern” are notable.
Particularly, Plaplaya, the last community toward the east, exhibit a similar
pattern to Miskito communities of Ibans and Palacios, with a very low beginning
and high ending. Peaceful relations and admixture between both groups in the
overlapping territory have been documented (Herrera Paz, 2013a), though, the
observed pattern might be due to high admixture. Moreover, Masca (the last
community to the west) and Saraguayna, are known for their high level of
admixture with Mestizo populations, confirming the “admixed pattern”.
Interestingly, Iriona Puerto and Iriona Viejo (two
twin communities) show similar general patterns between each other, with a
single migration wave affecting both short after initial peopling. However,
whereas Iriona Viejo shows high drift and isolation, Iriona Puerto’s pattern is
in accordance with high recent immigration, with a growing population.
The municipality of Trinidad, in the western region of
Honduras, is composed by a main populated area (Trinidad) and various minor
villages, a distribution typical of rural Honduras. Trinidad and its villages, founded
by Serfardic Jews, are known for their high isolation and the inbreeding
customs of their people (Herrera-Paz, 2013a). Figure 3 shows four populated
places from Trinidad including the main population (Trinidad), all displaying a
pattern that appears to be typical of isolation, i.e., high beginning and low
ending of the curve. In fact, all nine places from this municipality showed the
same pattern, with minor variations (data not shown).
Finally, the “urban pattern” is shown in figure 4.
Frequencies distribution graphs of four urban neighborhoods located at the two
Honduran main cities were constructed. Although with variable ethnic
backgrounds and from different socioeconomic strata, all four share some
characteristics in common, probably reflecting the intense urbanization process
in Honduras. Urban pattern presents multiple immigration waves with immigration
clearly dominating the landscape.
Figure 4. Surnames frequencies distribution diagrams of four Honduran
urban neighborhoods
|
In summary, peopling processes affects the
distribution of surnames frequencies of human populations. Signatures of settlement
and subsequent immigration waves can be detected in the form of peaks on graphs
where the data is conveniently displayed. However, the reliability of such
graphs lies on the assumption that the proportion of a surname is a function of
the time lapse since arrival of surname carriers to the territory, as well as on
all other assumptions that apply for isonymy studies (Relethford, 1988).
The work presented here is based on empirical
observations in a reduced number of populated places. In order to validate the power
of the graphs in assessing peopling processes, studies that include a wide
range of populations, as well as simulation models, are needed.
Literature cited
Adamic LA, Huberman BA. 2002.
Zipf law and the Internet. Glottometrics 3:143–150
Baek, S. K., Bernhardsson,
S., & Minnhagen, P. (2011). Zipf's law unzipped. New J Phys 13(4):043004
Bak P, Tang C,
Wiesenfeld K. 1987. Self-organized criticality: An explanation of the 1/f
noise. Phys Rev Lett 59(4):381-384
Barrai
I, Rodriguez‐Larralde A, Mamolini E, Manni F, Scapoli
C. 2001. Isonymy structure
of USA population. Am J Phys Anthropol 114(2):109-123
Bedoya G, Montoya P, García J, Soto I, Bourgeois S,
Carvajal L, Labuda D, Alvarez V, Ospina J, Hedrick PW, Ruiz-Linares A. 2006.
Admixture dynamics in Hispanics: a shift in the nuclear genetic ancestry of a
South American population isolate. P Natl Acad Sci USA 103(19):7234-7239
Brunk GG. 2002.
Why do societies collapse? A theory based on self-organized criticality. J Theor Polit 14(2):195-230
Crow JF, Mange AP. 1965. Measurement of inbreeding
from the frequency of marriages between persons of the same surname. Biodemography
Soc Biol 12(4):199-203
Dipierri
J, Rodríguez-Larralde A, Barrai I, Camelo JL, Redomero EG, Rodríguez CA, Alfaro
E. 2014. Random
inbreeding, isonymy, and population isolates in Argentina. J Community Genet.
In Print.
Herrera
Paz EF, García LF, Aragon-Nieto I, Paredes M. 2008. Allele frequencies distributions for 13 autosomal STR
loci in 3 black Carib (Garifuna) populations of the Honduran Caribbean coasts.
Forensic Sci Int Genet 3:e5–e10.
Herrera Paz EF, Matamoros M, Carracedo A. 2010 The Garífuna
(Black Carib) people of the Atlantic coasts of Honduras: Population dynamics,
structure, and phylogenetic relations inferred from genetic data, migration
matrices, and isonymy. Am J Hum Biol 22(1):36–44
Herrera
Paz EF, Mejía DA. 2010. Surnames
in Gracias a Dios: Population structure and residence patterns in the Honduran
Miskito Territory assessed through Isonymy. Document freely available in the
internet at: http://lahondurasvaliente.blogspot.com/2010/09/original-research-surnames-in-gracias.html
Herrera-Paz
EF. 2013a. Estimación del aislamiento e ilegitimidad en 60 comunidades
hondureñas mediante el análisis de apellidos. Rev Med Hondur 81(1):18-28
Herrera-Paz
EF. 2013b. Apellidos e isonimia en las comunidades garífunas de la costa
atlántica de Honduras. Rev Med Inst Mex Seguro Soc 51(2):150-7
Herrera
Paz EF, Scapoli C, Mamolini E, Sandri M, Carrieri A, Rodriguez‐Larralde
A, Barrai I. 2014. Surnames in
Honduras: A Study of the Population of Honduras through Isonymy. Ann Hum Genet78(3):165-177
King TE, Jobling MA. 2009.
What's in a name? Y chromosomes, surnames and the genetic genealogy revolution.
Trends Genet 25(8):351-360
Luikart G, Allendorf FW, Cornuet JM, Sherwin WB. 1998.
Distortion of allele frequency distributions provides a test for recent
population bottlenecks. J Hered 89(3):238-247.
Mandelbrot B. 1983.
The Fractal Structure of Nature. New York: Freeman
Manrubia SC, Zanette DH. 2002. At the boundary between
biological and cultural evolution: The origin of surname distributions. J Theor
Biol 216(4):461-477
Maruvka YE,
Shnerb NM, Kessler DA. 2010. Universal features of surname distribution in a
subsample of a growing population. J Theor Biol 262(2):245-256
Mathias RA, Bickel CA, Beaty TH, Petersen GM,
Hetmanski JB, Liang KY, Barnes KC. 2000. A study of contemporary levels and
temporal trends in inbreeding in the Tangier Island, Virginia, population using
pedigree data and isonymy. Am J Phys Anthropol 112(1):29-38
Perry DA. 1995.
Self-organizing systems across scales. Trends Ecol
Evol 10(6):241-244
Pinto-Cisternas
J, Pineda L, Barrai I. 1985. Estimation
of inbreeding by isonymy in Iberoamerican populations: an extension of the
method of Crow and Mange. Am J Hum Genet 37(2):373
Relethford JH. 1988. Estimate
of kinship and genetic distance from surnames. Hum Biol 60(3):475-492
Rodriguez‐Larralde
A, Gonzales‐Martin A, Scapoli C, Barrai I. 2003. The names of Spain: a study of the isonymy structure
of Spain. Am J Phys Anthropol 121(3):280-292
Sykes B, Irven C. 2000. Surnames and the Y chromosome.
Am J Hum Genet 66(4):1417-1419
Sole RV,
Manrubia SC, Benton M, Bak P. 1997. Self-similarity of extinction statistics in
the fossil record. Nature 388(6644):764-767
Zipf GK. 1949.
Human behavior and the principle of least effort. Oxford: Addison–Wesley