Although these estimates of national IQ are claimed to be "highly valid" (Rushton, 2003, p. 368) or "credible" (McDaniel, 2008, p. 732) by some authors, the work by Lynn (and Vanhanen) has also drawn criticism (Barnett & Williams, 2004; Ervik, 2003; Hunt & Carlson, 2007; Hunt & Sternberg, 2006; Lane, 1994). One point of critique is that Lynn (and Vanhanen)'s estimate of average IQ among Africans is primarily based on convenience samples, and not on samples carefully selected to be representative of a given, targeted, population (Barnett & Williams, 2004; Hunt & Sternberg, 2006). Unfortunately, in many developing countries, such representative samples are lacking (McDaniel, 2008).
A literature review is necessarily selective. Despite Lynn's objective of providing a "fully comprehensive review of the evidence" (Lynn, 2006, p. 2), a sizeable portion of the relevant literature was not considered in both his own review, and in reviews with Vanhanen. Nowhere in their reviews did Lynn (and Vanhanen) specify the details of their literature search. Our own searches in library databases resulted in additional relevant studies that may be used to estimate national IQ. For instance, Lynn and Vanhanen (2006) accorded a national IQ of 69 to Nigeria on the basis of three samples (Fahrmeier, 1975; Ferron, 1965; Wober, 1969), but they did not consider other relevant published studies that indicated that average IQ in Nigeria is considerably higher than 70 (Maqsud, 1980a,b; Nenty & Dinero, 1981; Okunrotifa, 1976). As Lynn rightly remarked during the 2006 conference of the International Society for Intelligence Research (ISIR), performing a literature review involves making a lot of choices. Nonetheless, an important drawback of Lynn (and Vanhanen)'s reviews of the literature is that they are unsystematic. Unsystematic literature reviews do not adhere to systematic methodology to control for potential biases in the many choices made by the reviewer (Cooper, 1998; Light & Pillemer, 1984). Lynn (and Vanhanen) failed to explicate the inclusion and exclusion criteria they employed in their choice of studies. Such criteria act as a filter, and may thus affect the estimate of national IQ. Lynn (and Vanhanen) excluded data from several sources without providing a rationale. For instance, they used IQ data from Ferron (1965), who provided averages in seven samples of children from Sierra Leone and Nigeria on a little-known IQ test called the Leone. For reasons not given, Lynn (2006) and Lynn and Vanhanen (2006) only used data from the two lowest scoring samples from Nigeria. Most of the remaining samples show higher scores, but those samples were not included in the estimation of the national IQ of Nigeria and Sierra Leone. Likewise, Lynn (and Vanhanen) did not consider several relatively high-scoring African samples from South Africa (Crawford Nutt, 1976; Pons, 1974). It is unfortunate that Lynn (and Vanhanen) did not discuss their exclusion criteria. In some cases (Crawford Nutt, 1976; Pons, 1974), the Raven's Progressive Matrices was administered with additional instruction. Although this instruction is quite similar to an instruction as described in the test manual (Raven, Court, & Raven, 1996), some have argued that this instruction artificially enhances test performance (cf. Rushton & Skuy, 2000). Given the likely differences in opinion on which samples to include or exclude in a review, inclusion and exclusion criteria should be explicated clearly and employed consistently. It is well known that unsystematic literature reviews may lead to biased results (Cooper, 1998; Light & Pillemer, 1984). Another problem is that the computation of statistics in literature reviews is quite error-prone. Indeed Lynn's work contains several errors (Loehlin, 2007).
Lynn responded, attempting to defend his work, and Wicherts et al. fired back immediately with an even stronger rejoinder, repeating their previous criticism of his methodology and flat out accusing him of cherry-picking data that supports his position while ignoring the rest:
In this rejoinder, we criticize Lynn and Meisenberg's (this issue) methods to estimate the average IQ (in terms of British norms after correction of the Flynn Effect) of the Black population of sub-Saharan Africa. We argue that their review of the literature is unsystematic, as it involves the inconsistent use of rules to determine the representativeness and hence selection of samples. Employing independent raters, we determined of each sample whether it was (1) considered representative by the original authors, (2) drawn randomly, (3) based on an explicated stratification scheme, (4) composed of healthy test-takers, and (5) considered by the original authors as normal in terms of Socio-Economic Status (SES). We show that the use of these alternative inclusion criteria would not have affected our results. We found that Lynn and Meisenberg's assessment of the samples' representativeness is not associated with any of the objective sampling characteristics, but rather with the average IQ in the sample. This suggests that Lynn and Meisenberg excluded samples of Africans who average IQs above 75 because they deemed these samples unrepresentative on the basis of the samples' relatively high IQs. We conclude that Lynn and Meisenberg's unsystematic methods are questionable and their results untrustworthy.
Then in a later paper, Wicherts et al. dug even deeper, finding that in addition to picking and choosing, Lynn actively seeks out and uses data that's not reliable or representative:
The samples, considered by Lynn (and Vanhanen), but discarded here, are given in the Appendix. Besides the two samples described above (Klingelhofer, 1967; Zindi, 1994), these are Wober's (1969) sample of factory workers, and Verhaegen's (1956) sample of uneducated adults from a primitive tribe in the then Belgian Congo in the 1950s. Verhaegen indicated that the SPM test format was rather confusing to the test-takers, and that the test did not meet the standards of valid measurement. In Wober's study, the reliability and validity were too low (Wober, 1975). In three of the samples in Table 1, the average IQ is below 70. These are Owen's large sample of Black South African school children tested in the 1980s, the 17 Black South Africans carefully selected for their illiteracy by Sonke (2001), and a group of uneducated Ethiopian Jewish children, who lived isolated from the western world in Ethiopia and immigrated to Israel in the 1980s (Kaniel & Fisherman, 1991). The last two samples cannot be considered to be representative.
[...]
Our review of the literature on the performance of Africans on the Raven's tests showed that the average IQ of Africans on the Raven's tests is lower than the average IQ in western countries. However, the average IQ of Africans is not as low as Lynn (and Vanhanen) and Malloy (2008) maintained. The majority of studies on IQ test performance of Africans not taken into account by Lynn (and Vanhanen) and Malloy showed considerably higher average IQs than the studies that they did review. We judge the reviews of Lynn (and Vanhanen) and Malloy to be unsystematic. These authors missed a large part of the literature on IQ testing in Africa, failed to explicate their inclusion and exclusion criteria, and made downward errors in the conversion of raw scores to IQs (Wicherts, 2007). Lynn (and Vanhanen)'s estimate of average IQ of Africans of around 67 is untenable. Our review indicates that it is about 78 (UK norms) or 80 (US norms). These means are somewhat lower than the means of Africans on other IQ tests, which lie around 82 (Wicherts et al., 2010). These results undermine evolutionary theories of race differences in intelligence of Lynn (2006), Rushton (2000), and Kanazawa (2004) (Wicherts, Borsboom, & Dolan, 2010a; Wicherts et al., 2010b).
Lynn responded to that too, accusing Wicherts et al. of deriving their higher estimate of average African IQ from elite samples, but they once again showed that his lower estimate results from the unsystematic use of samples that are not random or representative.