Categorization of humans in biomedical research: genes, race and disease
Neil Risch, Esteban Burchard, Elad Ziv, and Hua Tang
Genome Biology, 2002
With this as background, it is not surprising that numerous human population genetic studies have come to the identical conclusion - that genetic differentiation is greatest when defined on a continental basis. The results are the same irrespective of the type of genetic markers employed, be they classical systems , restriction fragment length polymorphisms (RFLPs) , microsatellites [7,8,9,10,11], or single nucleotide polymorphisms (SNPs) . For example, studying 14 indigenous populations from 5 continents with 30 microsatellite loci, Bowcock et al.  observed that the 14 populations clustered into the five continental groups, as depicted in Figure 1. The African branch included three sub-Saharan populations, CAR pygmies, Zaire pygmies, and the Lisongo; the Caucasian branch included Northern Europeans and Northern Italians; the Pacific Islander branch included Melanesians, New Guineans and Australians; the East Asian branch included Chinese, Japanese and Cambodians; and the Native American branch included Mayans from Mexico and the Surui and Karitiana from the Amazon basin. The identical diagram has since been derived by others, using a similar or greater number of microsatellite markers and individuals [8,9]. More recently, a survey of 3,899 SNPs in 313 genes based on US populations (Caucasians, African-Americans, Asians and Hispanics) once again provided distinct and non-overlapping clustering of the Caucasian, African-American and Asian samples : "The results confirmed the integrity of the self-described ancestry of these individuals". Hispanics, who represent a recently admixed group between Native American, Caucasian and African, did not form a distinct subgroup, but clustered variously with the other groups. A previous cluster analysis based on a much smaller number of SNPs led to a similar conclusion: "A tree relating 144 individuals from 12 human groups of Africa, Asia, Europe and Oceania, inferred from an average of 75 DNA polymorphisms/individual, is remarkable in that most individuals cluster with other members of their regional group" . Effectively, these population genetic studies have recapitulated the classical definition of races based on continental ancestry - namely African, Caucasian (Europe and Middle East), Asian, Pacific Islander (for example, Australian, New Guinean and Melanesian), and Native American.
Populations that exist at the boundaries of these continental divisions are sometimes the most difficult to categorize simply. For example, east African groups, such as Ethiopians and Somalis, have great genetic resemblance to Caucasians and are clearly intermediate between sub-Saharan Africans and Caucasians . The existence of such intermediate groups should not, however, overshadow the fact that the greatest genetic structure that exists in the human population occurs at the racial level.
Most recently, Wilson et al.  studied 354 individuals from 8 populations deriving from Africa (Bantus, Afro-Caribbeans and Ethiopians), Europe/Mideast (Norwegians, Ashkenazi Jews and Armenians), Asia (Chinese) and Pacific Islands (Papua New Guineans). Their study was based on cluster analysis using 39 microsatellite loci. Consistent with previous studies, they obtained evidence of four clusters representing the major continental (racial) divisions described above as African, Caucasian, Asian, and Pacific Islander. The one population in their analysis that was seemingly not clearly classified on continental grounds was the Ethiopians, who clustered more into the Caucasian group. But it is known that African populations with close contact with Middle East populations, including Ethiopians and North Africans, have had significant admixture from Middle Eastern (Caucasian) groups, and are thus more closely related to Caucasians . Furthermore, the analysis by Wilson et al.  did not detect subgroups within the four major racial clusters (for example, it did not separate the Norwegians, Ashkenazi Jews and Armenians among the Caucasian cluster), despite known genetic differences among them. The reason is clearly that these differences are not as great as those between races and are insufficient, with the amount of data provided, to distinguish these subgroups.
Are racial differences merely cosmetic?
Two arguments against racial categorization as defined above are firstly that race has no biological basis [1,3], and secondly that there are racial differences but they are merely cosmetic, reflecting superficial characteristics such as skin color and facial features that involve a very small number of genetic loci that were selected historically; these superficial differences do not reflect any additional genetic distinctiveness . A response to the first of these points depends on the definition of 'biological'. If biological is defined as genetic then, as detailed above, a decade or more of population genetics research has documented genetic, and therefore biological, differentiation among the races. This conclusion was most recently reinforced by the analysis of Wilson et al. . If biological is defined by susceptibility to, and natural history of, a chronic disease, then again numerous studies over past decades have documented biological differences among the races. In this context, it is difficult to imagine that such differences are not meaningful. Indeed, it is difficult to conceive of a definition of 'biological' that does not lead to racial differentiation, except perhaps one as extreme as speciation.
A forceful presentation of the second point - that racial differences are merely cosmetic - was given recently in an editorial in the New England Journal of Medicine : "Such research mistakenly assumes an inherent biological difference between black-skinned and white-skinned people. It falls into error by attributing a complex physiological or clinical phenomenon to arbitrary aspects of external appearance. It is implausible that the few genes that account for such outward characteristics could be meaningfully linked to multigenic diseases such as diabetes mellitus or to the intricacies of the therapeutic effect of a drug." The logical flaw in this argument is the assumption that the blacks and whites in the referenced study differ only in skin pigment. Racial categorizations have never been based on skin pigment, but on indigenous continent of origin. For example, none of the population genetic studies cited above, including the study of Wilson et al. , used skin pigment of the study subjects, or genetic loci related to skin pigment, as predictive variables. Yet the various racial groups were easily distinguishable on the basis of even a modest number of random genetic markers; furthermore, categorization is extremely resistant to variation according to the type of markers used (for example, RFLPs, microsatellites or SNPs).
Genetic differentiation among the races has also led to some variation in pigmentation across races, but considerable variation within races remains, and there is substantial overlap for this feature. For example, it would be difficult to distinguish most Caucasians and Asians on the basis of skin pigment alone, yet they are easily distinguished by genetic markers. The author of the above statement  is in error to assume that the only genetic differences between races, which may differ on average in pigmentation, are for the genes that determine pigmentation.