domingo, 4 de mayo de 2014


Research: Signatures of peopling processes revealed through diagrams of surnames frequencies distribution

Edwin-FranciscoHerrera-Paz. Universidad Católica de Honduras, Campus San Pedro y San Pablo, Faculty of Medicine, San Pedro Sula, Honduras.

Since Crow and Mange first described isonymy theory, surnames analysis is a common tool used to assess the structure and dynamics of human populations. Most common surnames in a territory usually represent mainly the descendants of founders, while rare ones may correspond mostly to recent immigrants. Additionally, frequent surnames are of low occurrence while rare ones are much more abundant, with distribution of frequencies following a power-law. Here, a graphical representation of the distribution of frequencies that provides, at a glance, abundant information of the peopling of a territory is presented. Surnames frequencies distribution graphs for several ethnic communities in the republic of Honduras were constructed. Signatures of peopling processes in these graphs are in accordance with known migration patterns.


Since the year in which Crow and Mange (1965) described isonymy theory and its utility in determining the genetic structure of a population, it has undergone many refinements. Surnames are inherited in a patrilineal way similar to Y chromosome markers; therefore, surname analysis can be used to explore various features of human populations, becoming a useful complementary tool for genetic studies (Sykes and Irven, 2000; King and Jobling, 2009; Baek et al, 2011). Databases of surnames can be interrogated in order to assess --although in a rough manner-- the genetic structure of a population by estimating the FST statistic, providing information on the historical genetic drift (and thus on the degree of isolation) which is of interest to medical genetics (Dipierri et al., 2014; Mathias et al., 2000) and anthropology (Barrai et al., 2001; Rodriguez-Larralde et al., 2003; Herrera-Paz et al., 2014). Briefly, isonymy calculated as the sum of the squared frequencies of surnames is roughly four times FST. Additionally, frequencies of surnames can be used to determine genetic distances between populations. In countries in which two surnames are used (the first inherited from the father and the second from the mother) more accurate estimates of the proportions of consanguineous marriages can be assessed (Pinto-Cisternas et al., 1985); and it is possible to elucidate endogamous customs and ethnic segregation in admixed populations--through the FIS statistic--, as well as rough proportions of illegitimate children (Herrera-Paz, 2013a).

In any given population most common surnames tend to represent mainly the descendants of founders (Bedoya et al, 2006), while those that are at low frequencies, such as doubletons and sigletons, probably correspond mostly to individuals who immigrated recently, assuming low mutation rate (errors in the registry). Recent founder effects may wipe out a fraction of rare surnames in the same way it reduces the frequency of rare alleles in polymorphic DNA loci (Luikart et al., 1998). On the other hand, studies show that frequent surnames are of low occurrence while rare ones are much more abundant, following a Pareto (power-law) distribution, which is a common feature of many aspects of nature (Bak et al., 1987; Perry 1995; Sole et al., 1997; Brunk, 2002) and society (Manruvka et al., 2010; Manrubia and Zanette, 2002; Adamic and Huberman, 2012), namely the Zipf-Mandelbrot law (Zipf, 1949; Mandelbrot, 1983). This property of the frequencies distribution of surnames (or of any other type of informative elements) is usually represented by scaled graphs, with the X axis as the logarithm of the frequency of surnames, and the Y axis as the logarithm of the fraction of the population (log-log graphs). Models for the surnames frequencies distribution under various conditions have been amply studied (as for instance see Manrubia and Zanette, 2002). 

Although log-log graphs offer an opportunity to explore a universal feature of nature (power-law) through isonymy, from the anthropological and population genetics perspectives it is of more interest to assess certain characteristics of the populations under study, as for example, peopling processes such as immigrations. I present here a modification of the surnames frequencies distribution graph that offers, at a glance, plenty of information about the historical peopling of a populated place. 

Materials and Methods

Lists of surnames were obtained from the electoral organism of the republic of Honduras, updated for the 2009 presidential elections. Several rural and urban populated places from the country were used for this study. Isonymy values and associated parameters for some of these places can be found in Herrera-Paz (2013a; 2013b), and for the whole country up to three administrative levels in Herrera-Paz et al. (2014).

The proportions of surnames from each place were calculated and then plotted on graphs of frequencies distribution of surnames, with points for X axis as the frequencies of surnames. It is important to note that the set of points in the X axis must not be an arbitrarily fixed set of numbers, but instead, each point represents the actual frequency (different from 0) of one or more surnames, avoiding the curve drop to cero. The axis is arranged in descending order, with the highest frequencies to the left and the lowest to the right; therefore, it represents a timeline from the foundation of the population (left end) to the present (right end), under the assumption that most frequent surnames correspond to the founders and the less frequent to newcomers. Y axis simply represents the proportion of the population. 

Finally, the resulting diagrams were interpreted in light of known facts about the peopling history of the communities. 

Results and discussion

I argue here that the curves joining the points in the frequencies distribution graphs so constructed constitute a diagram, reminiscent of the peopling of the places where the X axis represents different historical moments, and peaks (Y axis) represent immigration waves. The beginning (left side) of the graph may correspond to initial peopling. An initial negative slope would be indicative of drift, while a positive one with a high termination (right side) would show very high recent immigration, overshadowing initial peopling process.  

The Garífuna is an afro-descendant ethnic group that arrived to the republic of Honduras, Central America, just over 200 years ago experiencing thereafter a huge population expansion, founding and peopling communities throughout the Caribbean coast in a pattern well studied (Herrera Paz et al., 2008; 2010), spreading linearly eastward and westward from Trujillo, the city of arrival. From Cristales and Río Negro (the first Garífuna communities in continental Honduras located in Trujillo), foundation of Garífuna villages extended to the east until they reached the territory dominated by the Miskito ethnic group (Herrera-Paz and Mejía, 2010), while the development of important Honduran urban centers in the 20th century boosted Garífuna male migrations toward the west direction. Figures 1 and 2 show the frequencies distributions of surnames for 26 Garífuna and two Miskito communities.

Westward Gariguna Migrations

Figure 1. Garífuna communities originated in the westward migrations from Cristales and Río Negro. Arrows show the direction of migration.
Garifuna eastward migrations

Figure 2. Garífuna communities originated in the eastward migrations from Cristales and Río Negro. The two communities at the bottom, labeled as Ibans and Palacios, are of Miskito affiliation (not Garífuna). Arrows show the direction of Garífuna migration.
Similar patterns with multiple immigration waves (peaks) can be seen in all four communities. In general (with some exceptions), Garífuna communities show a typical pattern characterized by a high left end with a steep slope, relatively low occurrence of surnames of intermediate frequencies, and an ascending slope terminating in a high point. These may correspond to initial isolation with high genetic drift, together with high recent immigration in most communities. Communities to the west tend to present more peaks (immigration waves) compared to the ones to the east. Additionally, in many places to the east, the beginning point is in a higher position with respect to the ending point, showing drift dominating over immigration. These characteristics are in accordance with actual migrations. 

Some exceptions to the “Garífuna pattern” are notable. Particularly, Plaplaya, the last community toward the east, exhibit a similar pattern to Miskito communities of Ibans and Palacios, with a very low beginning and high ending. Peaceful relations and admixture between both groups in the overlapping territory have been documented (Herrera Paz, 2013a), though, the observed pattern might be due to high admixture. Moreover, Masca (the last community to the west) and Saraguayna, are known for their high level of admixture with Mestizo populations, confirming the “admixed pattern”.  

Interestingly, Iriona Puerto and Iriona Viejo (two twin communities) show similar general patterns between each other, with a single migration wave affecting both short after initial peopling. However, whereas Iriona Viejo shows high drift and isolation, Iriona Puerto’s pattern is in accordance with high recent immigration, with a growing population.  

The municipality of Trinidad, in the western region of Honduras, is composed by a main populated area (Trinidad) and various minor villages, a distribution typical of rural Honduras. Trinidad and its villages, founded by Serfardic Jews, are known for their high isolation and the inbreeding customs of their people (Herrera-Paz, 2013a). Figure 3 shows four populated places from Trinidad including the main population (Trinidad), all displaying a pattern that appears to be typical of isolation, i.e., high beginning and low ending of the curve. In fact, all nine places from this municipality showed the same pattern, with minor variations (data not shown). 
Trinidad genetic isolates

Figure 3. Surnames frequencies distribution diagrams of four Honduran genetic isolates

Finally, the “urban pattern” is shown in figure 4. Frequencies distribution graphs of four urban neighborhoods located at the two Honduran main cities were constructed. Although with variable ethnic backgrounds and from different socioeconomic strata, all four share some characteristics in common, probably reflecting the intense urbanization process in Honduras. Urban pattern presents multiple immigration waves with immigration clearly dominating the landscape. 
Urban populations

Figure 4. Surnames frequencies distribution diagrams of four Honduran urban neighborhoods

In summary, peopling processes affects the distribution of surnames frequencies of human populations. Signatures of settlement and subsequent immigration waves can be detected in the form of peaks on graphs where the data is conveniently displayed. However, the reliability of such graphs lies on the assumption that the proportion of a surname is a function of the time lapse since arrival of surname carriers to the territory, as well as on all other assumptions that apply for isonymy studies (Relethford, 1988). 

The work presented here is based on empirical observations in a reduced number of populated places. In order to validate the power of the graphs in assessing peopling processes, studies that include a wide range of populations, as well as simulation models, are needed. 

Literature cited

Adamic LA, Huberman BA. 2002. Zipf law and the Internet. Glottometrics 3:143–150

Baek, S. K., Bernhardsson, S., & Minnhagen, P. (2011). Zipf's law unzipped. New J Phys 13(4):043004

Bak P, Tang C, Wiesenfeld K. 1987. Self-organized criticality: An explanation of the 1/f noise. Phys Rev Lett 59(4):381-384

Barrai I, RodriguezLarralde A, Mamolini E, Manni F, Scapoli C. 2001. Isonymy structure of USA population. Am J Phys Anthropol 114(2):109-123

Bedoya G, Montoya P, García J, Soto I, Bourgeois S, Carvajal L, Labuda D, Alvarez V, Ospina J, Hedrick PW, Ruiz-Linares A. 2006. Admixture dynamics in Hispanics: a shift in the nuclear genetic ancestry of a South American population isolate. P Natl Acad Sci USA 103(19):7234-7239
Brunk GG. 2002. Why do societies collapse? A theory based on self-organized criticality. J  Theor Polit 14(2):195-230

Crow JF, Mange AP. 1965. Measurement of inbreeding from the frequency of marriages between persons of the same surname. Biodemography Soc Biol 12(4):199-203

Dipierri J, Rodríguez-Larralde A, Barrai I, Camelo JL, Redomero EG, Rodríguez CA, Alfaro E. 2014. Random inbreeding, isonymy, and population isolates in Argentina. J Community Genet. In Print.

Herrera Paz EF, García LF, Aragon-Nieto I, Paredes M. 2008. Allele frequencies distributions for 13 autosomal STR loci in 3 black Carib (Garifuna) populations of the Honduran Caribbean coasts. Forensic Sci Int Genet 3:e5–e10.

Herrera Paz EF, Matamoros M, Carracedo A. 2010 The Garífuna (Black Carib) people of the Atlantic coasts of Honduras: Population dynamics, structure, and phylogenetic relations inferred from genetic data, migration matrices, and isonymy. Am J Hum Biol 22(1):36–44

Herrera Paz EF, Mejía DA. 2010. Surnames in Gracias a Dios: Population structure and residence patterns in the Honduran Miskito Territory assessed through Isonymy. Document freely available in the internet at:
Herrera-Paz EF. 2013a. Estimación del aislamiento e ilegitimidad en 60 comunidades hondureñas mediante el análisis de apellidos. Rev Med Hondur 81(1):18-28

Herrera-Paz EF. 2013b. Apellidos e isonimia en las comunidades garífunas de la costa atlántica de Honduras. Rev Med Inst Mex Seguro Soc 51(2):150-7

Herrera Paz EF, Scapoli C, Mamolini E, Sandri M, Carrieri A, RodriguezLarralde A, Barrai I. 2014. Surnames in Honduras: A Study of the Population of Honduras through Isonymy. Ann Hum Genet78(3):165-177

King TE, Jobling MA. 2009. What's in a name? Y chromosomes, surnames and the genetic genealogy revolution. Trends Genet 25(8):351-360

Luikart G, Allendorf FW, Cornuet JM, Sherwin WB. 1998. Distortion of allele frequency distributions provides a test for recent population bottlenecks. J Hered 89(3):238-247.

Mandelbrot B. 1983. The Fractal Structure of Nature. New York: Freeman

Manrubia SC, Zanette DH. 2002. At the boundary between biological and cultural evolution: The origin of surname distributions. J Theor Biol 216(4):461-477

Maruvka YE, Shnerb NM, Kessler DA. 2010. Universal features of surname distribution in a subsample of a growing population. J Theor Biol 262(2):245-256

Mathias RA, Bickel CA, Beaty TH, Petersen GM, Hetmanski JB, Liang KY, Barnes KC. 2000. A study of contemporary levels and temporal trends in inbreeding in the Tangier Island, Virginia, population using pedigree data and isonymy. Am J Phys Anthropol 112(1):29-38

Perry DA. 1995. Self-organizing systems across scales. Trends Ecol Evol 10(6):241-244

Pinto-Cisternas J, Pineda L, Barrai I. 1985. Estimation of inbreeding by isonymy in Iberoamerican populations: an extension of the method of Crow and Mange. Am J Hum Genet 37(2):373

Relethford JH. 1988. Estimate of kinship and genetic distance from surnames. Hum Biol 60(3):475-492

RodriguezLarralde A, GonzalesMartin A, Scapoli C, Barrai I. 2003. The names of Spain: a study of the isonymy structure of Spain. Am J Phys Anthropol 121(3):280-292

Sykes B, Irven C. 2000. Surnames and the Y chromosome. Am J Hum Genet 66(4):1417-1419

Sole RV, Manrubia SC, Benton M, Bak P. 1997. Self-similarity of extinction statistics in the fossil record. Nature 388(6644):764-767

Zipf GK. 1949. Human behavior and the principle of least effort. Oxford: Addison–Wesley