Out of Africa: what skin color tells us about human evolution

girl and dna

Editor's Introduction

Loci associated with skin pigmentation identified in African populations

Annotated by
Allison Mayle, Beth Ruedi

Examining the genetic diversity underlying differences in skin pigmentation can reveal a great deal about human evolution and population migration patterns. Africa is rich in genetic diversity for many traits; and, in fact, human skin pigmentation is far more diverse across African populations than anywhere else in the world. Despite this, researchers had not explored the genetic architecture underlying skin pigmentation differences in African populations until now. What will this rich diversity tell us about the biology of skin pigmentation, the patterns of human migration, and our evolutionary history?

Paper Details

Supporting Media Partner

Original title
Loci associated with skin pigmentation identified in African populations
Nick Crawford Sarah Tishkoff
Original publication date
Vol. 358, Issue 6365, eaan8433
Issue name


Despite the wide range of skin pigmentation in humans, little is known about its genetic basis in global populations. Examining ethnically diverse African genomes, we identify variants in or near SLC24A5MFSD12DDB1TMEM138OCA2 and HERC2 that are significantly associated with skin pigmentation. Genetic evidence indicates that the light pigmentation variant at SLC24A5 was introduced into East Africa by gene flow from non-Africans. At all other loci, variants associated with dark pigmentation in Africans are identical by descent in southern Asian and Australo-Melanesian populations. Functional analyses indicate that MFSD12 encodes a lysosomal protein that affects melanogenesis in zebrafish and mice, and that mutations in melanocyte-specific regulatory regions near DDB1/TMEM138 correlate with expression of UV response genes under selection in Eurasians.


Variation in epidermal pigmentation is a striking feature of modern humans. Human pigmentation is correlated with geographic and environmental variation (Fig. 1). Populations at lower latitudes have darker pigmentation than populations at higher latitudes, suggesting that skin pigmentation is an adaptation to differing levels of ultraviolet radiation (UVR) (1). Because equatorial regions receive more UVR than temperate regions, populations from these regions (including sub-Saharan Africans, South Asians, and Australo-Melanesians) have darker pigmentation (Fig. 1), which likely mitigates the negative impact of high UVR exposure such as skin cancer and folate degradation (1). In contrast, the synthesis of vitamin D3 in response to UVR, needed to prevent rickets, may drive selection for light pigmentation at high latitudes (1).

Fig 1

Fig. 1. Correlations between allele frequencies at genes associated with pigmentation and UV exposure in global populations. (A) Global variation in skin pigmentation indicated by Melanin index (MI). This data was integrated with MI data for global populations from (1102). (B) Mean erythemal dose rate. (C) Manhattan plot of -log10 transformed p-values from GWAS of skin pigmentation with the Illumina Omni5 SNP array. (D) Quantile-Quantile plot of observed vs expected p-values from the GWAS. In both (C) and (D) significant SNPs at p < 5 × 10−8 are highlighted in purple. (E to L) Allele frequencies of genetic variants associated with skin pigmentation in global populations. African populations included are 1. Ethiopia Nilosaharan, 2. Ethiopia Omotic, 3. Ethiopia and Tanzania Cushitic, 4. Ethiopia Semitic, 5. Tanzania Nilosaharan, 6. Tanzania Hadza, 7. Tanzania Sandawe, 8. Botswana Bantu 9. Botswana San/Bantu admixed, and 10. Botswana San. The Melanesian (MEL) samples are from (12) and the Australian Aboriginal and Papua New Guinean samples (merged) are from the SGDP (PNG) (13). All other populations are from the TGP (10). Non-Aboriginal populations in the Americans are indicated: CEU (European ancestry), ASW (African-American Southwest US) and ACB (African Caribbean in Barbados).

Panels A and B

Panel A shows a map with the melanin index (level of skin pigmentation) indicated for populations in different parts of the world, based on data from previous publications. Panel B is a map of the erythemal dose rate in different parts of the world. Erythemal dose is a measure of how quickly the sun will cause fair skin to temporarily turn red. Darker colors on the map indicate a higher erythemal dose rate.

Genome-wide Association Study (GWAS)

Genome-wide association studies, like this one, compare variation in a phenotype (in this case, skin pigmentation) with variation in DNA across the whole set of DNA for that organism (genome) in order to determine significant associations between the two. This helps pinpoint which regions across the genome most likely impact the trait of interest.

The authors here examined differences in DNA across the genome using a genotyping array; in this case, Illumina Infinium arrays, which are explained here:

These arrays reveal differences in DNA bases referred to as single nucleotide polymorphisms (SNPs).

For each person included in this study, the authors had a melanin index (see panel A), and the whole set of SNPs across their genome. The authors analyzed the data set looking for correlations between melanin level and SNP variant. These data are plotted in a Manhattan plot in panel C. First, each correlation between melanin level and SNP variant is assigned a P-value—in other words, the probability that the correlation being observed happened by chance. A lower P-value for a SNP indicates a greater chance that the variant is truly associated with skin pigmentation.

It’s hard to easily visualize “low value” as “high significance.” So, when producing a Manhattan plot, researchers take the -log10 of the P-values for each SNP that has been characterized across the genome and plot the -log10(P-value) instead. A significant P-value will have a higher -log10(P-value), so the spots corresponding to that SNP will fall above the baseline. When there are variants with high significance across the genome, the plot sometimes looks like a set of skyscrapers, hence the name Manhattan plot.

Quantile-Quantile plot (QQ plot)

Panel D shows a Quantile-Quantile (QQ) plot of observed vs. expected (by random chance) P-values from the GWAS analysis. A QQ plot is used to compare the shapes of distributions. If the two distributions are identical, the plot will show a 45° line of Y=X. Because this plot compares observed vs. expected P-values, if there are any points that deviate from that 45° line that indicates that those SNPs are not distributed as expected and are likely associated with the phenotype.

Panels E-L

Each panel is a map showing distribution of variants for a locus where specific genetic variants were associated with differences in skin pigmentation. Within a panel, the frequency of the SNP variants, or alleles, for that locus are displayed across several human populations. Each population is represented by a circle. There were 10 African populations included on these maps, with frequencies calculated from the current study. The other populations’ frequencies were calculated using previously published data, and included: CEU = Americans of European ancestry, ASW = African American Southwest U.S., ACB = African Caribbean in Barbados, MEL = Melanesian, and PNG = merged Australian Aboriginal and Papua New Guinean.

The basal layer of human skin contains melanocytes, specialized pigment cells that harbor subcellular organelles called melanosomes in which melanin pigments are synthesized and stored and then transferred to keratinocytes (2). Melanosome morphology and content differs between melanocytes that synthesize mainly eumelanins (black-brown pigments) or pheomelanins (pigments which range from yellow to reddish-brown) (3). Variation in skin pigmentation is due to the type and quantity of melanins generated, melanosome size, and the manner in which keratinocytes sequester and degrade melanins (4).

While over 350 pigmentation genes have been identified in animal models, only a subset of these genes have been linked to normal variation in humans (5). Of these, there is limited knowledge about loci that affect pigmentation in populations with African ancestry (67).

Skin pigmentation is highly variable within Africa

To identify genes affecting skin pigmentation in Africa, we used a DSM II ColorMeter to quantify light reflectance from the inner arm as a proxy for melanin levels in 2,092 ethnically and genetically diverse Africans living in Ethiopia, Tanzania, and Botswana (table S1 and figs. S1 and S2) (8). Skin pigmentation levels vary extensively among Africans, with darkest pigmentation observed in Nilo-Saharan speaking pastoralist populations in Eastern Africa and lightest pigmentation observed in San hunter-gatherer populations from southern Africa (Fig. 2 and table S1).


Fig. 2. Melanin distributions. Histograms of melanin index computed from under-arm measurements with a DSM II ColorMeter (68) for all individuals in each population as described in (67). Skin tones were visualized by displaying the mean red, green, and blue values from the ColorMeter for individuals binned by melanin index.

Major question

The authors wanted to identify genetic variants that affect skin pigmentation in African populations, so they first needed a large number of individuals across populations with different skin pigmentation to compare.


The authors traveled across African countries and used a handheld device to measure the pigmentation of skin on the underside of the upper arm. Pigmentation is measured by calculating the absorption of an array of different light wavelengths, then combining those absorption measurements into a single melanin index.

Variation in skin pigmentation

African populations have an incredibly diverse range of skin pigmentations, with a range of melanin indices displayed in this graph between 50 and >125.

Here’s a map of global distribution of human skin color. Though it uses a different method for indexing, it is obvious that Africa has the richest diversity in skin pigmentation.

Barsh Fig

Figure from: Barsh GS. What Controls Variation in Human Skin Color? PLoS Biology. 2003;1(1):e27. doi:10.1371/journal.pbio.0000027.

A locus associated with light skin color in Europeans is common in East Africa

We genotyped 1,570 African individuals with quantified pigmentation levels using the Illumina Infinium Omni5 Genotyping array. After quality control, we retained ~4.2 million biallelic single nucleotide polymorphisms (SNPs) for analysis. A GWAS analysis with linear mixed models, controlling for age, sex, and genetic relatedness (9), identified four regions with multiple significant associations (p-value < 5 × 10−8) (Fig. 1,fig. S3, and tables S2 and S3).

We then performed fine-mapping using local imputation of high coverage sequencing data from a subset of 135 individuals and data from the Thousand Genomes Project (TGP) (10) (Fig. 3 and table S3). We ranked potential causal variants within each locus using CAVIAR, a fine-mapping method that accounts for linkage disequilibrium (LD) and effect sizes (11) (Table 1). We characterized global patterns of variation at these loci using whole genome sequences from West African, Eurasian and Australo-Melanesian populations (101213).


Fig. 3. Genomic context of GWAS loci. Plot of -log10(p-value) versus genomic position for variants near the four regions with most strongly associated SNPs from GWAS, including annotations for genes, MITF ChIP-seq data for melanocytes (45), a CTCF ChIP-seq track for NHEK keratinocytes, and H3K27ac, DNase hypersensitive sites (DHS), and chromHMM tracks for melanocytes and keratinocytes from the Roadmap Epigenomics dataset (29). Genome-wide significant variants are highlighted in red. Circles, squares, and triangles denote noncoding, synonymous, and non-synonymous variants, respectively. (A) SLC24A5 locus, (B) MFSD12 locus, (C) DDB1/TMEM138 locus, and (D) OCA2/HERC2 locus.

GWAS variants

The top of each panel is a plot of -log10(P-value) (see Figure 1 tabs for explanation of this statistic) for each variant previously identified to be potentially causal. Red variants are those that are significant in the context of the whole genome. Circles are noncoding variants (variants that occur in regions of the genome that do not encode a protein. These can be in regulatory regions). Squares are synonymous variants (variants that do not change the amino acid sequence of the protein encoded by the gene). Triangles are nonsynonymous variants (variants that do change the amino acid of the protein encoded by the gene).

ChIP-seq tracks

The authors wanted to explore if any of the associated variants may have a regulatory function. One method for exploring this is ChIP-seq, which analyzes the interaction between proteins and chromatin. It combines chromatin immunoprecipitation (ChIP) with DNA sequencing (seq) to identify protein binding sites, which are probable regulatory regions.

MITF is melanogenesis associated transcription factor, green peaks indicate where MITF is bound to DNA in melanocytes. CTCF is a protein that binds together DNA strands to form loops in chromatin and bring regulatory elements closer to genes. Maroon peaks indicate where CTCF is bound in NHEKs (normal human epidermal keratinocytes).

Data from Roadmap Epigenomics

The authors also used previously published data from the Roadmap Epigenomics Project at the National Institutes of Health to explore the epigenetic context for their variants. Epigenetics refers to changes in gene regulation or expression that are not related to changes in the DNA sequence. Epigenomics refers to epigenetic profiles across the whole genome. Many times, epigenetic differences can be inherited, although environment also plays a role in gene regulation and epigenetic profiles.

H3K27ac: Acetylation at the 27th lysine residue of the histone H3 protein, a marker associated with active transcription. Orange peaks indicate this modification occurs at this location.

DNaseI Hypersensitivity Sites (DHS): Regions of DNA that are accessible to the DNaseI enzyme, which indicates that the region would also be available for binding by transcription factors.

ChromHMM: A combination of different chromatin markers that provides an overview of the state that chromatin is in at each locus. Red and pink indicate promoters, yellow indicates enhancers, green indicates transcription, and gray indicates silenced chromatin.

Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET): A method to determine previously undiscovered long-range chromatin interactions genome-wide. When regulatory regions are connected to genes physically, as indicated by the boxes and lines in the ChIA-pet plot, they may also functionally control expression of the gene.

Table 1

Table 1. Annotations of candidate causal SNPs from GWAS. Top candidate causal variants for the four regions identified based on analysis with CAVIAR (11). For each variant, the genomic position (Location), RSID, and Ancestral>Derived alleles are shown, with the allele associated with dark pigmentation in bold. Beta and standard error (Beta(SE)) and the p-values from the GWAS (F test, linear mixed model) are given. For functional genomic data, nearest genes are given and variants overlapping DHS sites for melanocytes (E059) (DHS melanocytes) and/or other cell types (DHS other) available from Roadmap Epigenomics are indicated with X (2945). Variants intersecting regions with enhancer activity confirmed by luciferase expression assays are indicated with X (Luciferase Activity) as are chromatin interactions with nearby genes measured in MCF7 or K562 cell lines as identified by ChIA-PET (Chromatin Interactions) (4647). Those variants where luciferase activity was tested were labeled with Y (significant enhancer activity) or N (no enhancer activity) (fig. S7). SNPs that are in strong LD (r2 > 0.7 in East Africans) are numerically labeled in the column titled LD Block.

The SNPs with strongest association with skin color in Africans were on chromosome 15 at or near the Solute Carrier Family 24 Member 5 (SLC24A5) gene (Figs. 1 and 3 and tables S2 and S3). A functional non-synonymous mutation within SLC24A5 (rs1426654) (14) was significantly associated with skin color (F-test, p-value = 5.5 × 10−62) and was identified as potentially causal by CAVIAR (Table 1). The rs1426654 (A) allele is at high frequency in European, Pakistani, and Indian populations (Fig. 1) and is a target of selection in Europeans, Central Asians and North Indians (1517). In Africa this variant is common (28-50% frequency) in populations from Ethiopia and Tanzania with high Afroasiatic ancestry (1819) and is at moderate frequency (5-11%) in San and Bantu populations from Botswana with low levels of East African ancestry and recent European admixture (2021) (Fig. 1 and figs. S2 and S4). We observe a signature consistent with positive selection at SLC24A5 in Europeans on the basis of extreme values of Tajima’s D statistic (fig. S5).

Video. Author Sarah Tishkoff explains the FST statistic that is used to examine genetic differentiation (differences) between populations. FST is used frequently in this paper.

Based on coalescent analysis with sequence data from the Simons Genomic Diversity Project (SGDP) (13), the time to most recent common ancestor (TMRCA) of most Eurasian lineages containing the rs1426654 (A) allele is 29 kya (95% credible interval (CI) 28-31 kya), consistent with prior studies (616) (Fig. 4). Haplotype analysis indicates that the rs1426654 (A) variant in Africans is on the same extended haplotype background as Europeans (Fig. 5 and fig. S6), likely reflecting gene flow from western Eurasia over at least the past 3–9 ky (22). The rs1426654 (A) variant is at high frequency (28%) in Tanzanian populations, suggesting a lower bound (~5 kya) for introduction of this allele into East Africa, the time of earliest migration from Ethiopia into Tanzania (23). Further, the frequency of the rs1426654 (A) variant in eastern and southern Africans exceeds the inferred proportion of non-African ancestry (figs. S2 and S4). Estimates of genetic differentiation (FST) at the rs1426654 SNP between the West African Yoruba (YRI) and Ethiopian Amhara populations is 0.76, among the top 0.01% of values on chromosome 15 (table S4). These results are consistent with selection for the rs1426654 (A) allele in African populations following introduction, although complex models of demographic history cannot be ruled out.


Fig. 4. Coalescent Trees and TMRCA dating. Inferred genealogies for regions flanking candidate causal loci. Each leaf corresponds to a single sampled chromosome from one of 278 individuals in the Simons Genome Diversity Project (13). Leaf nodes are colored by the population of origin of the individual and sequences carrying the light allele are indicated with an open dot. Node heights and 95% credible intervals are presented for a subset of internal nodes. Gene genealogies are shown for regions flanking (ASLC24A5, rs1426654 (15:48426484), (BMFSD12, rs10424065 (19:3545022), (CMFSD12, rs6510760 (19:3565253), (DTMEM138, rs7948623 (11:61137147), (E)DDB1, rs11230664 (11:61076372), (FOCA2, rs1800404 (15:28235773), (GHERC2, rs4932620 (15:28514281), and (HHERC2, rs6497271 (15:28365431).

Major question

What are the genealogies (lines of descent) of the variants that have been identified as potentially causing differences in skin pigmentation, and what are their times to most recent common ancestor (TMRCAs)?

Coalescent trees

The middle of each tree has a node with the TMRCA labeled (CI 95% range in parentheses). The “leaf” on the outside indicates one data point from an individual who was part of the Simons Genome Diversity Project (a project that sequenced over 100 individuals from diverse backgrounds), and the color of the leaf indicates the origin of that individual. Open circles next to the leaf indicate that this allele is associated with lighter skin pigmentation.

The point where two branches meet indicates a coalescence event, or the point where two alleles merged. This is most likely when the derived variant first occurred. Under each tree is the name of the gene containing the variant of interest, the position in the gene that the variant is located, and in parenthesis the chromosome the gene is located on followed by the chromosomal location of the variant.

Evolution of skin pigmentation

Light and dark alleles at loci identified in this study appear to have been segregating in the hominin lineage for many thousands of years. Analysis of the SNPs identified in this paper indicate that for half of these loci, light pigmentation was the ancestral state. In fact, the sequenced Neandertal and Denisovan genomes, some of humans’ closest extinct relatives, have the ancestral variant for those SNPs. This supports the previous hypothesis that darker pigmentation is a derived trait that was introduced in the Homo genus after human ancestors lost most of their thick body hair.

Many of these skin pigmentation alleles arose before the origin of modern humans. What might that mean about the skin pigmentation of other Homo species?


Fig. 5. Haplotype networks at SLC24A5MFSD12DDB1/TMEM138, and OCA2/HERC2. Median joining haplotype networks of regions containing candidate causal variants. Connections between circles indicate genetic relatedness while size is relative to the frequency of haplotypes. Ancestry proportions are displayed as pie charts. Yellow/red subfigures indicate which haplotypes contain the allele associated with dark pigmentation (red) or light pigmentation (yellow). (A) 75kb region flanking the causal variant at SLC24A5. (B and C) 3kb regions flanking rs10424065 in MFSD12 and rs6510760 upstream of MFSD12. (D) 195 kb region flanking DDB1 extending from PGA5 through SDHAF2. (EF, and G) 50 kb flanking regions 1, 3, and 2 at OCA2 and HERC2 (ordered based on highest to lowest probability of being causal from CAVIAR analysis).


DNA variants can be inherited together more frequently than we would expect by chance; this is known as linkage disequilibrium. This phenomenon is useful for helping us understand population mixing, ancestry, and evolutionary history.

Haplotype analysis uses sets of variants in linkage disequilibrium across the genome to investigate a range of factors that have led to evolution (ex: natural selection, genetic drift, population bottlenecks, inbreeding) and to help put newly discovered genetic variants (like the SNPs from this paper) in context. Humans have distinct sets of "haplotype blocks," meaning groups of nonoverlapping loci in strong linkage disequilibrium. That has been extremely helpful in human genetic analyses. Specific haplotypes are also referred to as haploid genotypes.

Haplotype networks

Haplotype networks represent the relationships between the haploid genotypes within a data set. The size of each circle represents the number of people who have that haplotype. The colors in the pie charts represent the population of origin of each person with that haplotype.

The subset yellow and red haplotype network shows the allele associated with light pigmentation in yellow and the allele associated with dark pigmentation in red. The top of each panel lists the gene name and the distance that the haplotype being analyzed extends.

Remaining questions

The authors have now established that there are some genetic variants that are significantly associated with skin pigmentation in African populations. They have identified possible genealogies for these alleles (Figure 4), and determined their haplotypes (this figure).

What would you want to know next? If a genetic variant is associated with differences in a trait, does that mean it’s causal?

The next question is whether these variants can be functionally verified.

A lysosomal transporter protein associated with skin pigmentation

The region with the second strongest genetic association with skin pigmentation contains the Major Facilitator Superfamily Domain Containing 12 (MFSD12) gene on chromosome 19 (Figs. 1and 3 and tables S2 and S3). MFSD12 is homologous to other genes containing MFS domains, conserved throughout vertebrates, which function as transmembrane solute transporters (24). MFSD12 mRNA levels are low in de-pigmented skin of vitiligo patients (25), likely due to autoimmune related destruction of melanocytes.

The MFSD12 locus is in a region with extensive recombination, enabling us to fine-map eight potentially causal SNPs (Table 1 and table S3) that cluster in two regions: one within MFSD12and the other ~7,600–9,000 bp upstream of MFSD12 (Fig. 3). Many SNPs are in predicted regulatory regions active in melanocytes and/or keratinocytes (Table 1 and Fig. 3) and show enhancer activity in luciferase expression assays in a WM88 melanoma cell line (Table 1, table S5, and fig. S7). Within MFSD12, the two SNPs that CAVIAR identifies as having highest probability of being causal are rs56203814 (F-test, p-value = 3.6 × 10−18), a synonymous variant within exon 9, and rs10424065 (F-test, p-value = 5.1 × 10−20), located within intron 8. They are 130 bp apart, in strong LD, and impact gene expression in luciferase expression assays (1.5 – 2.7 X higher expression than the minimal promoter; fig. S7). The SNPs upstream of MFSD12 with highest probability of being causal are rs112332856 (F-test, p-value = 3.8 × 10−16) and rs6510760 (F-test, p-value = 6.5 × 10−15). They are 346 bp apart, in strong LD, and impact gene expression in luciferase expression assays (4.0 – 19.7 X higher expression than the minimal promoter; fig. S7).

The derived rs56203814 and rs10424065 (T) alleles associated with dark pigmentation are present only in African populations (or those of recent African descent) and are most common in East African populations with Nilo-Saharan ancestry (Fig. 1 and fig. S4). Coalescent analysis of the SGDP dataset indicates that the rs10424065 (T) allele predates the 300 kya origin of modern humans (26) (estimated TMRCA of 612 kya, 95% CI 515-736 kya) (Fig. 4).

At rs6510760 and rs112332856, the ancestral (G) and (T) alleles, respectively, associated with light pigmentation, are nearly fixed in Europeans and East Asians and are common in San as well as Ethiopian and Tanzanian populations with Afroasiatic ancestry (Fig. 1 and fig. S4). The derived rs6510760 (A) and rs112332856 (C) alleles (associated with dark pigmentation) are common in all sub-Saharan Africans except the San, as well as in South Asian and Australo-Melanesian populations (Fig. 1 and fig. S4). Haplotype analysis places the rs6510760 (A) allele (and linked rs112332856 (C) allele) in Australo-Melanesians on similar haplotype backgrounds relative to central and eastern Africans (Fig. 5 and fig. S6), suggesting they are identical by descent from an ancestral African population. Coalescent analysis of the SGDP dataset indicates that the TMRCA for the derived rs6510760 (A) allele is 996 kya (95% CI 0.82–1.2 mya; Fig. 4).

We do not detect evidence for positive selection at MFSD12 using Tajima’s D and iHS statistics [figs. S5 and S8, as expected if selection were ancient (27)]. However, levels of genetic differentiation are elevated when comparing East African Nilo-Saharan and western European (CEU) populations (e.g., FST=0.85 for rs112332856, top 0.05% on chromosome 19), consistent with differential selection at this locus (28) (table S4).

MFSD12 is within a cluster of 10 genes with high expression levels in primary human melanocytes relative to primary human keratinocytes (29), with MFSD12 the most differentially expressed (90X; table S6). The genomic region (chr19:3541782-3581062) encompassing MFSD12 and neighboring gene HMG20B (a transcription factor common in melanocytes) has numerous DNase I hypersensitive sites and is enriched for H3K27ac enhancer marks in melanocytes (top 0.1% genome-wide; Fig. 3), suggesting this region may regulate expression of genes critical to melanocyte function (30).

Analyses of gene expression using RNA-sequencing data from 106 primary melanocyte cultures (table S7), indicates that African ancestry is significantly correlated with decreased MFSD12gene expression (Pearson Correlation Coefficient (PCC) p-value = 5.0 × 10−2; fig. S9). We observed significant associations between genotypes at rs6510760 and rs112332856 with expression of HMG20B (Bonferroni adjusted p-value (Padj) < 4.9 × 10−3) and MFSD12 (Padj < 3.4 × 10−2) (fig. S9). In each case, the alleles associated with dark pigmentation correlate with decreased gene expression. Allele-specific expression (ASE) analysis indicates that individuals heterozygous for either rs6510760 or rs112332856 show increased allelic imbalance, relative to homozygotes, for MFSD12 (Mann-Whitney-Wilcoxon (MWW) test, p-value = 4.9 × 10−3 and 1.3 × 10−2, respectively), consistent with regulation of gene expression in cis. A haplotype containing the rs6510760(A)/rs112332856(C) variants associated with dark pigmentation showed 4.9 times lower expression in luciferase assays than the haplotype containing rs6510760(G)/rs112332856(T) variants associated with light pigmentation [Kruskal-Wallis Rank Sum (KWRS) Test, p-value=7.7 × 10−7; fig. S7 and table S5]. We did not have power to detect an association between expression of MFSD12 and rs56203814 or rs10424065 due to low frequency (~2%) of the alleles associated with dark pigmentation in the primary melanocyte cultures.

MFSD12 suppresses eumelanin biogenesis in melanocytes from lysosomes

We silenced expression of the mouse ortholog of MFSD12 (Mfsd12) using small hairpin RNAs (shRNAs) in immortalized melan-Ink4a mouse melanocytes derived from C57BL/6J-Ink4a−/−mice (31) which almost exclusively make eumelanin (Fig. 6). Reduction of Mfsd12 mRNA by ~80% with two distinct lentivirally encoded shRNAs (Fig. 6A) caused a 30-50% increase in melanin content compared to control cells (Fig. 6B), and a higher percentage of melanosomes per total cell area in most cells compared to cells transduced with non-target shRNA (Fig. 6, C and D). A fraction of MFSD12-depleted cells harbored large clumps of melanin in autophagosome-like structures (fig. S10). These data suggest that MFSD12 suppresses eumelanin content in melanocytes and may offset autophagy.


Fig. 6. MFSD12 suppresses eumelanin production but localizes to lysosomes. Immortalized melan-Ink4a melanocytes expressing non-target (sh NT) shRNA or either of two shRNA plasmid clones (#1 and 2) targeting Mfsd12 were analyzed for Mfsd12 mRNA content by (A) qRT-PCR, (B) melanin content by spectrophotometry, or (C) percent of cell area containing melanin by bright field microscopy; (D) shows quantification. Data in (A) to (C) represent mean ± s.e.m., normalized to sh NT samples, from three separate experiments; in (C), n(sh NT) = 97 cells, n (shMfsd12 #1) = 68 cells, n (shMfsd12 #2) = 71 cells. Bar in (C), 10 μm. (E to G) Melan-ink4a melanocytes transiently expressing MFSD12-HA (E) or not transfected (F) and (G) were fixed, immunolabeled for HA (E) and for LAMP2 to mark lysosomes E) and (G) or for TYRP1 to mark melanosomes (G), and analyzed by immunofluorescence and bright field microscopy. Bright field (melanin) images show pigmented melanosomes (pseudocolored red in the merged images). Insets, 4x magnification of boxed regions. Arrows, MFSD12-containing structures that overlap LAMP2 (E) or TYRP1-containing structures that overlap melanosomes; arrowheads, structures that do not overlap. Bars, 10 μm. (H) Quantification of overlap for structures labeled by MFSD12, TYRP1, LAMP2 and pigment. Data represent mean ± s.e.m. from three independent experiments, n = 17 cells (MFSD12 overlap with LAMP2 and melanin), 33 cells (TYRP1 overlap with melanin), or 23 cells (LAMP2 and melanin).

Major question

What is the functional characterization of Mfsd12 in human melanocytes (cells that produce melanin)?

Knockdown of Mfsd12

The authors used short hairpin RNA (shRNA)—an artificial RNA molecule that can silence expression of targeted genes using RNA interference—to reduce expression of Mfsd12 in human melanocyte cell lines. This is also known as “knockdown.”

There are three different cell lines here: sh NT is the control, where shRNA was introduced but did not target a specific gene; shMfsd12 #1 and #2 are two different lines with two different shRNAs that both targeted Mfsd12.

Gene expression

Panel A shows the results of qRT-PCR (quantiative reverse transcription polymerase chain reaction), a method to quantify RNA extracted the sh NT, shMfsd12 #1, and shMfsd12 #2 cell lines.

Both shMfsd12 samples have significantly less Mfsd12 RNA than the control NT sample. This means that the two experimental cell lines are expressing less of the Mfsd12 gene, indicating that the knockdown was successful.

Quantification of melanin

The authors knew that their knockdown lines were expressing less Mfsd12 RNA. However, did that have an impact on pigmentation?

Panel B shows the quantification of melanin via spectrophotometry. This is a way to quantify how much of something—in this case, melanin—there is in a sample by detecting the amount of light that gets absorbed by the sample.

Panel C shows another method for quantification: bright-field microscopy. This is a simple form of microscopy in which light is shone on the sample from below, and the sample is observed from above, allowing the observer to quantify transmitted light.

Panel D has a graph displaying the quantities of melanin in the three cell lines. Knockdown of Mfsd12 increased the amount of melanin in cells.

Localization of melanin

Do these cells express Mfsd12 throughout, or is the expression contained in certain areas?

The authors tagged the MFSD12 protein with human influenza hemagglutinin (HA), which can then be detected using immunofluorescent techniques. Panel E show cells with HA-tagged MFSD12; control cells without the tagged protein are in panels F and G.

Cells are fixed so that their proteins don’t degrade, then labeled with antibodies against the HA tag. The authors used two antibodies: LAMP2, a marker of lysosomes; and TYRP1, a marker of melanosomes. These antibodies have fluorescent proteins attached for visualization by immunofluorescence, a type of microscopy where a specific wavelength of light is used to excite the fluorescent proteins so they can be easily visualized.

Each panel has a bright field picture (see quantification panel), as well as immunofluorescence images. Panel E has arrows showing MFSD12-containing structures that overlap lysosomes (LAMP2). Panel F has arrows showing MFSD12-containing structures that overlap melanosomes (TYRP1). In panel G, small arrowheads show structures that don’t overlap, namely LAMP2 and melanin.

All overlaps were quantified and displayed in panel H, and the authors were surprised to find that MFSD12 did not overlap much with melanin, which is a pattern more similar to LAMP2 (lysosomes) than TYRP1 (melanosomes).


Since MFSD12 localizes to lysosomes and not to eumelanosomes, this may reflect an indirect effect through modified lysosomal function.

We assessed the localization of human MFSD12 isoform c (RefSeq NM_174983.4) tagged at the C terminus with the HA epitope (MFSD12-HA). By immunofluorescence microscopy, MFSD12-HA localized to punctate structures throughout the cell. Surprisingly, these puncta, like those labeled by the endogenous lysosomal membrane protein LAMP2, but not the melanosomal enzyme TYRP1, overlapped only weakly with pigmented melanosomes (Fig. 6, E to G; quantified in Fig. 6H). Instead, MFSD12-HA co-localized with LAMP2 (Fig. 6E, quantified in Fig. 6H), indicating that MFSD12 protein localizes to late endosomes and/or lysosomes in melanocytes and not to eumelanosomes.

MFSD12 influences pigmentation in zebrafish xanthophore pigment cells

We targeted transmembrane domain 2 (TMD2) in the highly conserved zebrafish ortholog of mfsd12a with CRISPR/Cas9 (Fig. 7). We focused on mfsd12a as its paralog mfsd12b is predicted to be a pseudogene (32). Although pigmentation was not qualitatively altered in melanophores, the cells that make eumelanin, compound heterozygotes of mfsd12a alleles exhibited reduced staining of xanthophores, the cells responsible for pteridine-based yellow pigmentation in wild type zebrafish (Fig. 7, A and B) (3334). This was not due to a failure of the xanthophores to develop in mfsd12a mutants, since GFP labeled xanthophores were robust along the lateral line in both wild-type and mfsd12a mutant zebrafish (Fig. 7, C and D). Taken together, these results suggest that MFSD12 influences xanthophore pigment production in pterinosomes.


Fig. 7. In vivo zebrafish and mouse models of MFSD12 deficiency. (A and B) Representative images of methylene blue staining in wild-type TAB5. (A) and compound heterozygous mfsd12a zebrafish (6dpf). Note the absence of stained xanthophores in the mfsd12a mutant (B). (C and D) No difference was observed in the number or distribution of xanthophores detected by mosaic Tg(aox5:PALM-GFP) expression in injected wild-type TAB5 (C) or compound heterozygous mutant mfsd12a (D) zebrafish (5dpf). (E) A wild-type agouti mouse (left) is shown with a gray Mfsd12 targeted littermate (right). (F) Hair from the Mfsd12 targeted mouse has grossly normal eumelanin (lower black region of the hair shaft), however, the upper subapical yellow band in WT (E, left) appears white in the Mfsd12 mutant (E, right) due to a reduction in pheomelanin.

Major question

Is MFSD12 function conserved across different vertebrate animals?

Mfsd12 mutation in zebrafish

Mfsd12 was altered in zebrafish using CRISPR/Cas9, a method that allows targeted gene editing. They created two different kinds of mutations with this method, and then bred fish with each mutation to produce offspring with compound heterozygotes.

The authors looked at developing zebrafish 6 days after fertilization of the eggs. The compound heterozygote fish had less pigmentation in their xanthophores (panel B), compared to the wildtype fish (panel A). Melanophores in the mutant zebrafish are pigmented similarly to the wildtype.

Using green fluorescent protein, a molecule that fluoresces naturally under a specific wavelength, the authors could see where Mfsd12 was localized in the zebrafish.

They examined zebrafish xanthophores, which are yellow pigment cells, and saw that the xanthophores were still present in the mutant fish (panel D), but didn’t have as much pigmentation as the wildtype fish (panel C).

Mfsd12 knockout in mice

The authors also used CRISPR/Cas9 to knock out—completely stop expression of a gene—Mfsd12 in mouse embryos, and compared those to wildtype mice that had agouti coat color pigmentation.

The knockout mice (on right side of panel E) had gray coats, unlike their wildtype littermates which were the expected agouti. Looking more closely at individual hairs of the wildtype and mutant mice, the authors observed that while the eumelanin in the bottom part of the fur is similar in both, the pheomelanin in the top part of the hair was white in mutants instead of yellow as seen in the wildtype mice.


The function of the Mfsd12 gene and corresponding protein is involved in pigmentation in two model vertebrate systems. This verifies that the gene is causal for this trait, and indicates conservation of function across evolutionary history.

Functional characterization of MFSD12 in mice

CRISPR/Cas9 was used to generate a Mfsd12 null allele in a wild-type mouse background (Fig. 7E and fig. S11). Four founders were observed with a uniformly gray coat color, rather than the expected agouti coat color (fig. S11, A and B). These four gray founders harbored deletions at the targeted site (fig. S11C). Microscopic observation revealed a lack of pheomelanin, resulting in white, rather than yellow, banding of hairs in Mfsd12 mutants (Fig. 7F).

The Mfsd12 knockout coat color appeared phenotypically similar to that of grizzled (gr) mice, an allele previously mapped to a syntenic ~2 Mb region overlapping Mfsd12 (35). Like our CRISPR/Cas9 Mfsd12 knockout, homozygous gr/gr mice are characterized by a gray coat resulting from dilution of yellow pheomelanin pigment from the sub-terminal agouti band of the hair shaft. Exome sequencing of an archived gr/gr DNA sample, subsequently confirmed by Sanger sequencing in an independent colony, identified a 9 bp in-frame deletion within exon2 of Mfsd12 (fig. S12) as the sole mutation affecting a coding sequence in this mapped candidate region. The deleted amino acids for the gr/gr allele, Mfsd12 p.Leu163_Ala165del, are in the cytoplasmic loop between the transmembrane domains TM4 and TM5 within a highly conserved MFS domain (fig. S13). These results indicate that mutation of Mfsd12 is responsible for the gray coat color of gr/gr mutant mice, and that loss of Mfsd12 reduces pheomelanin within the hairs of agouti mice.

Taken together, these results indicate that MFSD12 plays a conserved role in vertebrate pigmentation. Depletion of MFSD12 increases eumelanin content in a cell-autonomous manner in skin melanocytes, consistent with the lower levels of MFSD12 expression observed in melanocytes from individuals with African ancestry. Since MFSD12 localizes to lysosomes and not to eumelanosomes, this may reflect an indirect effect through modified lysosomal function. By contrast, loss of MFSD12 has the opposite effect on pheomelanin production, reflecting a more direct effect on function of pheomelanosomes, which have a distinct morphology (3), gene expression profile (36), and, like zebrafish pterinosomes, a potentially different intracellular origin from eumelanosomes (37). While disruption of MFSD12 alone accounts for changes in pigmentation, the role of neighboring loci such as HMG20B on pigmentation remains to be explored.

Skin pigmentation associated loci that play a role in UV response are targets of selection

Another genomic region associated with pigmentation encompasses a ~195 kb cluster of genes on chromosome 11 that play a role in UV response and melanoma risk including the Damage Specific DNA Binding Protein 1 (DDB1) gene (Figs. 1 and 3 and table S3). DDB1 (complexed with DDB2 and XPC) functions in DNA repair (38); levels of DDB1 are regulated by UV exposure and MC1R signaling, a regulatory pathway of pigmentation (39). DDB1 is a component of CUL4-RING E3 ubiquitin ligases that regulate several cellular and developmental processes (40); it is critical for follicle maintenance and female fertility in mammals (41) and for plastid size and fruit pigmentation in tomatoes (42). Knockouts of DDB1 orthologs are lethal in both mouse and fruit fly development (43), and DDB1 only exhibits rare (<1% frequency) non-synonymous mutations in the TGP dataset. Genetic variants near DDB1 were associated with human pigmentation in an African population with high levels of European admixture (7).

Due to extensive LD in this region, CAVIAR identified 33 SNPs predicted to be causal (Table 1). The most strongly associated SNPs are located in a region conserved across vertebrates flanked by TMEM138 and TMEM216 (44) ~36-44 kb upstream of DDB1, and are in high LD within this cluster (r2>0.7 in East Africans) (Fig. 3Table 1, and table S3). Among these, the most significantly associated SNP is rs7948623 (F-test, p-value = 2.2 × 10−11), located 172 bp downstream of TMEM138, which shows enhancer activity in WM88 melanoma cells (91.9 – 140.8 X higher than the minimal promoter; fig. S7 and table S5).

A second group of tightly linked SNPs (LD r2 > 0.7 in East Africans) with predicted high probability of containing causal variants spans a ~195 kb region encompassing DDB1 and TMEM138 (Table 1 and Fig. 3). Two SNPs that tag this LD block are rs1377457 (F-test, p-value = 1.5× 10−9), located ~7600 bp downstream of TMEM138, and rs148172827 (F-test, p-value = 1.8 × 10−9), an insertion/deletion polymorphism at TKFC (Triokinase And FMN Cyclase) located in an enhancer active in WM88 melanoma cells (67.6 – 76.2 X higher than the minimal promoter; fig. S7 and table S5) which overlaps a MITF binding site in melanocytes (2945); both SNPs interact with the promoters of DDB1 and neighboring genes in MCF-7 cells (4647) (Table 1 and Fig. 3). SNPs within introns of DDB1 (rs12289370, rs7934735, rs11230664, rs12275843, and rs7120594) also tag this LD block (Table 1 and Fig. 3).

RNA-Seq data from 106 primary melanocyte cultures indicate that African ancestry is significantly correlated with increased DDB1 gene expression (PCC, p-value = 2.6 × 10−5; fig. S9). Association tests using a permutation approach indicated that, of the 35 protein-coding genes with a transcription start site within 1Mb of rs7948623, expression of DDB1 is most strongly associated with a SNP in an intron of DDB1, rs7120594, at marginal statistical significance after correction for ancestry and multiple testing (Padj = 0.06; fig. S9). The allele associated with dark pigmentation at rs7120594 correlates with increased DDB1 expression. We did not have power to detect an association between expression of DDB1 and SNPs in LD with rs7948623 due to low minor allele frequencies (~2%). The role of DDB1 and neighboring loci on human pigmentation remains to be further explored.

The derived rs7948623 (T) allele near TMEM138 (associated with dark pigmentation) is most common in East African Nilo-Saharan populations and is at moderate to high frequency in South Asian and Australo-Melanesian populations (Fig. 1 and fig. S4). At SNP rs11230664, within DDB1, the ancestral (C) allele (associated with dark pigmentation) is common in all sub-Saharan African populations, having the highest frequency in East African Nilo-Saharan, Hadza, and San populations (88-96%), and is at moderate to high frequency in South Asian and Australo-Melanesian populations (12 – 66%) (Fig. 1 and fig. S4). The derived (T) allele (associated with light pigmentation) is nearly fixed in European, East Asian, and Native American populations.

In South Asians and Australo-Melanesians, the alleles associated with darker pigmentation reside on closely related, or identical, haplotypes to those observed in Africa (Fig. 5 and fig. S6), suggesting that they are identical by descent. The TMRCAs for the derived dark allele at rs7948623 and the derived light allele at rs11230664 are estimated to be older than 600 kya and 250 kya, respectively (Fig. 4).

Consistent with a selective sweep, we see an excess of rare alleles (and extreme negative Tajima’s D values) and high levels of homozygosity extending ~350-550 kb in Europeans and Asians, respectively (figs. S5 and S14). We observe extreme negative Tajima’s D values in East African Nilo-Saharans and San over a shorter distance (115 kb and 100 kb, respectively) (fig. S5). A haplotype extending greater than 195 kb is common in Eurasians and rare in Africans (Fig. 5) and tags the alleles associated with light skin pigmentation. The TMRCA of a large number of haplotypes carrying the rs7948623 (A) allele in non-Africans, associated with light pigmentation, is 60 kya (95% CI: 58–62 kya), close to the inferred time of the migration of modern humans out of Africa (4849) (Fig. 4). These results, combined with large FST values between Africans and Europeans at SNPs tagging the extended haplotype near DDB1 (e.g., FST = 0.98 between Nilo-Saharans and CEU at rs7948623, within the top 0.01% of values on chromosome 11, table S4) are consistent with differential selection of alleles associated with light and dark pigmentation in Africans and non-Africans at this locus.

Identification of variation at OCA2 and HERC2 impacting skin pigmentation

Another region of significantly associated SNPs encompasses the OCA2 and HERC2 loci on chromosome 15 (Fig. 3 and table S3). HERC2 was identified in GWAS for eye, hair, and skin pigmentation traits (675052). The oculocutaneous albinism II gene (OCA2, formerly called the P gene) encodes a 12-transmembrane domain-containing chloride transporter protein and affects pigmentation by modulating melanosomal pH (53). The most common types of albinism in Africans are caused by mutations in OCA2 (54).

Due to extensive LD in the OCA2 and HERC2 region, CAVIAR predicted 10 potentially causal SNPs (Table 1) that cluster within three regions. We order these clusters on the basis of physical distance; region 1 is located within OCA2, and regions 2 and 3 are located within introns of HERC2 (Fig. 3).

The SNP with highest probability of being causal from CAVIAR analysis is rs1800404 (F-test, p-value = 1.0), a synonymous variant located in region 1 within exon 10 of OCA2 (Fig. 3Table 1, and table S3). The ancestral rs1800404 (C) allele, associated with dark pigmentation, is common in most Africans as well as southern/eastern Asians and Australo-Melanesians, whereas the derived (T) allele, associated with light pigmentation, is most common (frequency >70%) in Europeans and San (Fig. 1 and fig. S4). Haplotype (Fig. 5) and coalescent analyses (Fig. 4 and fig. S6) show two divergent clades, one enriched for the rs1800404 (C) allele and the other for the rs1800404 (T) allele. Coalescent analysis indicates that the TMRCA of all lineages is 1.7 mya (95% CI: 1.5–2.0 mya) and the TMRCA of lineages containing the derived (T) allele is 629 kya (95% CI 426–848 kya) (Fig. 4). The deep coalescence of lineages, and the positive Tajima’s Dvalues in this region in both African and non-African populations (fig. S5), is consistent with balancing selection acting at this locus.

The SNP with highest probability of being causal in region 3 is rs4932620 (F-test, p-value = 3.2 × 10−9) located within intron 11 of HERC2 (Fig. 3Table 1, and table S3). This SNP is 917 bp from rs916977, a SNP associated with blue-eye color in Europeans (5556), and is in strong LD (r2 = 1.0 in most East African populations) with SNPs extending into region 2 of HERC2 (Table 1). The derived rs4932620 (T) allele associated with dark skin pigmentation is most common in Ethiopian populations with high levels of Nilo-Saharan ancestry and is at moderate frequency in other Ethiopian, Hadza, and Tanzania Nilo-Saharan populations (Fig. 1 and fig. S4). Haplotype analysis indicates that the rs4932620 (T) allele in South Asians and Australo-Melanesians is on the same or similar haplotype background as in Africans (Fig. 5 and fig. S6), suggesting it is identical by descent. The TMRCA of haplotypes containing the rs4932620 (T) allele is 247 kya (95% CI: 158–345 kya) (Fig. 4).

We also observe an LD block of SNPs within HERC2 that are associated with skin pigmentation independently of the SNPs described above, though they do not reach genome-wide significance (table S3). These are in a region with enhancer activity in Europeans (50). For example, SNP rs6497271 (F-test, p-value = 1.8 × 10−6), which is located 437 bp from SNP rs12913832, has been associated with skin color in Europeans (50) and is in a consensus SOX2 motif (a transcription factor which modulates levels of MITF in melanocytes) (57) (Fig. 3). The ancestral rs6497271 (A) allele associated with dark pigmentation is on haplotypes in South Asians and Australo-Melanesians similar or identical to those in Africans (Fig. 5 and fig. S6), suggesting they are identical by descent. The derived (G) allele associated with light skin pigmentation is most common in Europeans and San and dates to 921 kya (CI: 700 kya-1.3 mya) (Figs. 1 and 4 and figs. S4 and S6). SNPs associated with pigmentation at all three regions show high allelic differentiation when comparing East African Nilo-Saharans and CEU (FST = 0.72 - 0.85, top 0.5% on chromosome 15) (table S4).

Analyses of RNA-Seq data from 106 primary melanocyte cultures indicates that African ancestry is significantly correlated with increased OCA2 gene expression (PCC, Padj = 6.1 × 10−7) (fig. S9). A permutation approach identified significant associations between OCA2 expression and SNPs within an LD block tagged by rs4932620 extending across regions 2 and 3 (Padj = 2.2 × 10−2). Alleles in this LD block associated with dark pigmentation correlate with increased OCA2expression. We did not observe associations between the candidate causal variants in region 1 and OCA2 expression despite a high minor allele frequency (34%). However, we observe a significant association between a haplotype tagged by rs1800404 and alternative splicing resulting in inclusion/exclusion of exon 10 (linear regression t test, p-value = 9.1 × 10−40). Exon 10 encodes the amino acids encompassing the third transmembrane domain of OCA2 and is the location of several albinism-associated OCA2 mutations (5859), raising the possibility that the shorter transcript encodes a non-functional channel. Comparing splice junction usage across individuals, we estimate that each additional copy of the light rs1800404 (T) allele reduces inclusion of exon 10 by ~20% (95% CI 17.9-21.5%; fig. S9). Homozygotes for the light rs1800404 (T) allele are, therefore, expected to produce ~60% functional OCA2 protein (compared to individuals with albinism who produce no functional OCA2 protein).

Skin pigmentation is a complex trait

To estimate the proportion of pigmentation variance explained by the top eight candidate SNPs at SLC24A5MFSD12DDB1/TMEM138 and OCA2/HERC2, we used a linear mixed model with two genetic random effect terms, one based on the genome-wide kinship matrix, and the other based on the kinship matrix derived from the set of significant variants. Approximately 28.9% (S.E. 10.6%) of the pigmentation variance is attributable to these SNPs. Considering each locus in turn and all significantly associated variants (p-value < 5 × 10−8), the trait variation attributable to each locus is: SLC24A5 (12.8%, S.E. 3.5%), MFSD12 (4.5%, S.E. 2.1%), DDB1/TMEM138 (2.2%, S.E. 1.5%), and OCA2/HERC2 (3.9%, S.E. 2.9%). Thus ~29% of the additive heritability of skin pigmentation in Africans is due to variation at these four regions. This observation indicates that the genetic architecture of skin pigmentation is simpler (i.e., fewer genes of stronger effect) than other complex traits, such as height (60). Additionally, most candidate causal variants are in non-coding regions, indicating the importance of regulatory variants influencing skin pigmentation phenotypes.

Evolution of skin pigmentation in modern humans

Skin pigmentation is highly variable within Africa. Populations such as the San from southern Africa are almost as lightly pigmented as Asians (1), while the East African Nilo-Saharan populations are the most darkly pigmented in the world (Fig. 1). Most alleles associated with light and dark pigmentation in our dataset are estimated to have originated prior to the origin of modern humans ~300 ky ago (26). In contrast to the lack of variation at MC1R, which is under purifying selection in Africa (61), our results indicate that both light and dark alleles at MFSD12DDB1OCA2, and HERC2 have been segregating in the hominin lineage for hundreds of thousands of years (Fig. 4). Further, the ancestral allele is associated with light pigmentation in approximately half of the predicted causal SNPs; Neanderthal and Denisovan genome sequences, which diverged from modern human sequences 804 kya (62), contain the ancestral allele at all loci. These observations are consistent with the hypothesis that darker pigmentation is a derived trait that originated in the genus Homo within the past ~2 million years after human ancestors lost most of their protective body hair, though these ancestral hominins may have been moderately, rather than darkly, pigmented (6364). Moreover, it appears that both light and dark pigmentation has continued to evolve over hominid history.

Individuals from South Asia and Australo-Melanesia share variants associated with dark pigmentation at MFSD12DDB1/TMEM138OCA2 and HERC2 that are identical by descent from Africans. This raises the possibility that other phenotypes shared between Africans and some South Asian and Australo-Melanesian populations may also be due to genetic variants identical by descent from African populations rather than convergent evolution (65). This observation is consistent with a proposed southern migration route out of Africa ~80 kya (66). Alternatively, it is possible that light and dark pigmentation alleles segregated in a single African source population (1348) and that alleles associated with dark pigmentation were maintained outside of Africa only in the South Asian and Australo-Melanesian populations due to selection.

By studying ethnically, genetically, and phenotypically diverse Africans, we identify novel pigmentation loci that are not highly polymorphic in other populations. Interestingly, the loci identified in this study appear to affect multiple phenotypes. For example, DDB1influences pigmentation (42), cellular response to the mutagenic effect of UVR (39) and female fertility (41). Thus, some of the pigmentation-associated variants identified here may be maintained due to pleiotropic effects on other aspects of human physiology.

It is important to note that genetic variants that do not reach genome-wide significance in our study might also impact the pigmentation phenotype. Indeed, the 1000 most strongly associated SNPs exhibit enrichment for genes involved in pigmentation and melanocyte physiology in the mouse phenotype database and in ion transport and pyrimidine metabolism in humans (table S8). Future research in larger numbers of ethnically diverse Africans may reveal additional loci associated with skin pigmentation and will further shed light on the evolutionary history, and adaptive significance, of skin pigmentation in humans.

Materials and Methods

Individuals in the study were sampled from Ethiopia, Tanzania and Botswana. Written informed consent was obtained from all participants, and research/ethics approval and permits were obtained from all relevant institutions (67). To measure skin pigmentation we used a DSM II ColorMeter to quantify reflectance from the inner under arm. Red reflectance values were converted to a standard melanin index score (68). DNA was extracted from whole blood using a salting out procedure (PureGen).

A total of 1,570 samples were genotyped on the Illumina Omni5M SNP array (5M dataset) that includes ~4.5 million SNPs. Genotypes were clustered and called in Genome Studio software. Variant positions are reported in hg19/37 coordinates. The overall completion rate was 98.8%. Each individual's sex was verified based on X chromosome inbreeding coefficients. We used Beagle 4.0 (69) to phase the Illumina 5M SNP array data merged with SNPs from the TGP dataset that were filtered to exclude related individuals.

High coverage (>30 X) Illumina Sequencing was performed on a subset of the genotyped individuals (N=135). Variants were called following the approach described in (13). Adapter sequences were trimmed with trimadap. Reads were aligned using bwa mem to the human reference sequence build 37 (hg19). After alignment we marked duplicate reads prior to calling variants with GATK HaplotypeCaller (70). To select high quality variants we employed a two-set filtering strategy. First, we used the GATK variant quality score recalibration to score variant sites. We used TGP, OMIM, and our curated genotypes from the Illumina Omni 5M SNP array as training data. After recalibration we discarded sites with the lowest scores. In addition, we discarded sites in low-complexity regions listed in (71) and duplicate regions identified with Delly(72).

We performed local imputation around each of the regions showing significant associations with skin pigmentation from GWAS using the Illumina Omni 5M SNP dataset. We extracted array genotypes within 1Mb (500 kb upstream and 500 kb downstream) of top GWAS variants from each region and phased them using SHAPEIT2 (7374). The reference panel came from two datasets: filtered variants from the 135 African genomes and TGP (10). After phasing, imputation was performed using Minimac3 (75). Imputation performed very well at most loci (R2 > 0.91 with MAF ≥ 0.05) (table S3).

To identify SNPs associated with pigmentation, GWAS was performed first on the Illumina Omni 5M SNP dataset, and independently with imputed variants at candidate regions, using linear mixed models implemented in EMMAX software (9). Age and sex were included as covariates, and we corrected for genetic relatedness with an IBS kinship matrix. We used CAVIAR to identify variants in the imputed dataset most likely to be causal (11). Ontology enrichment for genes near the top 1000 most strongly associated variants from the 5M dataset was obtained using the annotation tool GREAT (76).

We estimated the contribution to the variation in melanin index from the top candidate causal variants with a restricted maximum likelihood (REML) analysis implemented in the Genome-wide Complex Trait Analysis (GCTA) software (77). The variance parameters for two genetic relationship matrices (GRMs) are estimated: one GRM is constructed (78) from genome-wide background variants with MAF>0.01, and one GRM is constructed from the set of 8 top pigmentation-associated variants. The contribution of each locus to the melanin index variation is estimated similarly, using all genome-wide significant (p-value <5 × 10−8) variants within each locus to construct the pigmentation-associated GRM (table S3). REML iterations are based on maximizing the Average Information matrix.

To test for neutrality in the regions flanking our top GWAS variants we calculated Tajima’s D, FST, and extended haplotype homozygosity using iHS (7981). Tajima’s D was measured along chromosomes 11 and 15 using 50 kb sliding windows. Due to a high recombination rate observed near the MFSD12 locus, we used 10 kb windows in that region (chromosome 19). Vcftools was used to calculate both Tajima’s D and FST (82). To calculate extended haplotype homozygosity (iHS) we used Selscan (83). Unstandardized iHS scores were normalized within 100kb bins according to the frequency of the derived allele. We then identified the signals of positive selection by calculating the proportion of SNPs with |iHS| >2 in non-overlapping windows (80). To identify outlier windows we calculated 5th and 95th percentiles. Population differentiation was assessed with Weir and Cockerham's fixation index FST (79) between each pair of populations. Outliers were identified using empirical p-values.

Median joining haplotype networks (84) for the Illumina Omni 5M SNP dataset were constructed and visualized at genomic regions of interest using NETWORK (85). In addition, we constructed genealogies of regions flanking candidate causal SNPs using a hierarchical clustering approach with sequence data from the Simons Genome Diversity Project (13). Briefly, we considered a single copy of each chromosome from each of the 279 individuals from (13). We inferred recombination breakpoints within a symmetric window surrounding each locus using the program kwarg (86) and identified the longest shared haplotype between each pair of sequences in which no recombination events occurred. We then computed the expected coalescence time between each pair of sequences, conditional on the observed number of mutations in the non-recombining region. Genealogies were constructed by applying the WPGMA hierarchical clustering algorithm to the estimated pairwise coalescence times. Our estimator accounts for recombination events and the population size history. However, simulation studies indicate that accounting for time-varying population size has relatively little effect on our estimates when the size changes according to previously inferred histories for human populations (6787). Because the true population sizes and relationships among the populations we considered are complex and imprecisely known, we assumed a constant population size of N = 104 in our analyses. The robustness analysis presented in (67) describes how our time estimates would change under different demographic histories and selective pressures.

To identify candidate causal GWAS variants altering gene expression we visualized and intersected variants with chromHMM tracks (88), DNase-1 hypersensitivity peaks, H3K27ac signal tracks for keratinocytes and melanocytes (29), CTCF signal tracks from keratinocytes (89), and ChIP-seq signal tracks from MITF (45). Variants were intersected with chromatin annotations using bedtools (90). Functional consequences of variants were also assessed using deepSEA (91) and deltaSVM (92). The effect of genetic variants on transcription factor binding was predicted using the MEME suite (93) for all transcription factors in the JASPAR 2016 CORE Vertebrate motif set (94).

To test for associations between gene expression, genetic variation, and ancestry we used eQTL and allele-specific expression (ASE) analyses on transcriptomes and genotype data from primary cultures of human melanocytes, isolated from foreskin of 106 individuals of assorted ancestries. All 106 individuals were genotyped on Illumina OmniExpress arrays and genotypes were subsequently imputed using the Michigan Imputation Server (75) based on the TGP reference panel and using SHAPEIT for phasing (73). RNA sequencing was performed to a mean depth of 87 million reads per sample. STAR (95) was used for aligning reads, and RSEM (96) was used to quantify the gene expression. Quantile normalization was applied in all samples to get the final RSEM value. To account for hidden factors driving expression variabilities, a set of covariates were further identified using the PEER method (97) and applied to calculate the normalized expression matrices. Principal components analysis was performed using genotypic data to capture population structure and ancestry using the struct.pca module of GLU. Using the normalized expression values and principal components, Pearson correlation between gene expression levels and ancestry was calculated, and associations between GWAS variant genotypes and gene expression levels were evaluated using ordinary least squares regression.

To identify associations between our GWAS candidate causal variants and expression of nearby genes (using the 106 melanocyte transcriptomes), we first found all protein-coding genes with transcription start site (TSS) within 1 Mb of the top GWAS variant for each locus and RSEM values greater than 0.5 in the primary melanocyte cultures. Pearson correlation was used to measure the association between ancestry and gene expression.

For each locus, we tested whether any genes with a transcription start site within 1Mb of the top SNP had an eQTL amongst the set of pigmentation QTLs using an additive linear model with the first two principal components of ancestry as covariates. To identify significant variant-gene associations we used a permutation approach for each locus independently (67). This was repeated for each gene and focal variants across all genes were adjusted for multiple testing using the Bonferroni correction (98).

We also carried out allele-specific expression (ASE) analyses for each significant eQTL SNP. Sites with at least 30 mapped reads, <5% mapping bias, and ENCODE 125 bp mappability score >=1 were retained for further analysis. For genes with a heterozygous coding variant amongst the melanocyte transcriptomes, allelic expression (AE) was computed as AE=|0.5 - NA/(NANR)|, where NR is the number of reads carrying the reference allele and NA is the number of reads containing the alternative allele. For each GWAS variant, differences in gene AE between GWAS heterozygotes and homozygotes was evaluated by Wilcoxon rank-sum test. This was repeated for all possible genes and GWAS variants and Bonferroni-corrected p-values less than 0.05 were considered significant. For several genes, including HMG20B and DDB1, ASE could not be measured for some or all variants of interest because no individuals were heterozygous for both the test-variant and a coding variant.

For OCA2 we tested for an association between inclusion rates of exon 10, which contains our top candidate causal variant in the region, rs1800404, and individual genotypes at rs1800404. For each melanocyte transcriptome, reads spanning the exon9-exon10 and exon9-exon11 junctions were extracted from the splice-junction files output by STAR. For each individual, a percent spliced in (PSI) value was calculated. To estimate the effect of variation at rs1800404 on exon 10 inclusion, ordinary least squares regression was carried out between PSI and dosage of the alternative allele for rs1800404 across individuals. A two-sided t test was used to calculate a p-value.

To test the functional impact of a subset of GWAS variants on gene expression, predicted regulatory sequences containing variants were cloned into a pGL4.23 firefly luciferase reporter vector. Vectors were transfected into a WM88 melanocytic melanoma cell line and co-transfected with renilla luciferase control vector (pRL-CMV) in a dual luciferase assay. Relative luciferase activity (firefly/renilla luminescence ratio) is presented as fold change compared to cells transfected with the empty pGL4.23 vector. Data were analyzed with a modified Kruskal-Wallis Rank Sum test and pairwise comparisons between groups were performed using the Conover method. P values were corrected for multiple comparisons with the Benjamini-Hochberg method using the R package PMCMR, and p-values less than 0.05 were considered significant.

We characterized the function of MFSD12 in vitro in immortalized melanocytes and in vivo in both zebrafish and mice. Immortalized melan-Ink4a melanocytes from C57BL/6 Ink4a-Arf1−/−mice were cultured as described (31). To deplete MFSD12, cells were infected with recombinant lentiviruses – generated by transient transfection in HEK293T cells – to express Mfsd12-targeted shRNAs or non-target controls. Cells resistant to puromycin (also encoded by the lentiviruses) were analyzed 5-7 days after infection. Knockdown efficiency of Mfsd12 mRNA in cells expressing Mfsd12-specific shRNAs relative to non-target shRNA was quantified by reverse transcription/ quantitative PCR (detected with SYBR Green; Applied Biosystems) relative to Tubb4b (encoding β-tubulin) as a reference gene. Melanin content in cell lysates was determined by a spectrophotometric assay as described (99), and melanin coverage in intact cells was determined by bright field microscopy and analysis using the “Analyze Particles” plug-in in ImageJ (National Institutes of Health). To analyze MFSD12-HA localization, melan-Ink4a melanocytes were transiently transfected with MFSD12-HA expression plasmids and analyzed 48 hours later by bright field and immunofluorescence microscopy as described (100) using the TA99 monoclonal antibody to TYRP1 (American Type Culture Collection) to detect melanosomes, rabbit anti-LAMP2A (Abcam) to detect lysosomes, and rat anti-HA (Roche) to detect the transgene. Percent signal overlap in the cell periphery was determined from background-subtracted, thresholded, binary images using the “Analyze Particles” plug-in in ImageJ. Statistical significance was determined using unpaired, two-tailed student’s t tests: p < 0.05: *, p < 0.01: **, p < 0.001: ***, p < 0.0001: ****. Details are provided in (67).

Zebrafish mutagenesis using CRISPR/Cas9 was performed to target mfsd12a. Compound heterozygous mutant fish for analysis were generated from F1 incrosses of mutant founder fish. For methylene blue staining, embryos were collected following fertilization and placed in zebrafish system water containing 0.0002% methylene blue and analyzed at 6 dpf. For GFP analysis, embryos were injected with 25 pg Tg(aox5:PALM-GFP) and 80 pg tol2 mRNA and GFP expression was evaluated in mosaic injected fish at 5 dpf. Larvae were anesthetized in sub-lethal 1x tricane solution and placed in 100ul of a low melt agarose solution (0.8%).

In mice two targets for CRISPR/Cas9 cleavage were selected within Exon 2 of Mfsd12 to generate a 134bp deletion resulting in a null allele of Mfsd12. A mixture of Cas9 mRNA (TriLink BioTechnologies) and each of the two synthesized gRNAs was used for pronuclear injection into C57BL/6J x FVB/N F1 hybrid zygotes. Mutation carrying mice were viable and presented with grey coat color distinct from littermates. Hairs were plucked from postnatal day 18 mice and individual awl hairs were mounted in permount and imaged with a stereomicroscope (Zeiss SteREO Discovery.V12) at the base of the sub apical yellow band where the switch from eumelanin to pheomelanin is visible.

To characterize Mfsd12 in grizzled mice, Illumina generated whole genome sequences of grizzled, JIGR/DN (gr/gr) reads were mapped using bwa mem to GRCm38/mm10 (available at SRA Accession SRR5571237). Sequence variants between JIGR/DN gr/gr and C57BL/6J reference genome within the gr/gr candidate region were identified using SnpEff (101). Validation of a 12 bp deletion within Mfsd12 was performed using samples from an independently maintained gr/gr colony provided by the laboratory of Dr. Margit Burmeister.

Supplementary Materials


Materials and Methods

Figs. S1 to S14

Tables S1 to S8

NISC Comparative Sequencing Program Collaborator List

References (103130)

References and Notes

  1. N. G. Jablonski, G. Chaplin, The evolution of human skin coloration. J. Hum. Evol. 39, 57–106 (2000). doi:10.1006/jhev.2000.0403pmid:10896812

  2. M. S. Marks, M. C. Seabra, The melanosome: Membrane dynamics in black and white. Nat. Rev. Mol. Cell Biol. 2, 738–748 (2001). doi:10.1038/35096009pmid:11584301

  3. F. H. Mover, Genetic variations in the fine structure and ontogeny of mouse melanin granules. Am. Zool. 6, 43–66 (1966). doi:10.1093/icb/6.1.43pmid:5902512

  4. J. P. Ebanks, A. Koshoffer, R. R. Wickett, S. Schwemberger, G. Babcock, T. Hakozaki, R. E. Boissy, Epidermal keratinocytes from light vs. dark skin exhibit differential degradation of melanosomes. J. Invest. Dermatol. 131, 1226–1233 (2011). doi:10.1038/jid.2011.22pmid:21326292

  5. F. Liu, M. Visser, D. L. Duffy, P. G. Hysi, L. C. Jacobs, O. Lao, K. Zhong, S. Walsh, L. Chaitanya, A.Wollstein, G. Zhu, G. W. Montgomery, A. K. Henders, M. Mangino, D. Glass, V. Bataille, R. A. Sturm, F. Rivadeneira, A. Hofman, W. F. J. van IJcken, A. G. Uitterlinden, R.-J. T. S. Palstra, T. D. Spector, N. G.Martin, T. E. C. Nijsten, M. Kayser, Genetics of skin color variation in Europeans: Genome-wide association studies with functional follow-up. Hum. Genet. 134, 823–835 (2015). doi:10.1007/s00439-015-1559-0pmid:25963972

  6. S. Beleza, N. A. Johnson, S. I. Candille, D. M. Absher, M. A. Coram, J. Lopes, J. Campos, I. I. Araújo, T. M. Anderson, B. J. Vilhjálmsson, M. Nordborg, A. Correia E Silva, M. D. Shriver, J. Rocha, G. S. Barsh, H.Tang, Genetic architecture of skin and eye color in an African-European admixed population. PLOS Genet. 9, e1003372 (2013). doi:10.1371/journal.pgen.1003372pmid:23555287.

  7. R. Lloyd-Jones, M. R. Robinson, G. Moser, J. Zeng, S. Beleza, G. S. Barsh, H. Tang, P. M. Visscher, Inference on the genetic basis of eye and skin color in an admixed population via Bayesian linear mixed models. Genetics 206, 1113–1126 (2017). doi:10.1534/genetics.116.193383pmid:28381588.

  8. A. Tishkoff, F. A. Reed, F. R. Friedlaender, C. Ehret, A. Ranciaro, A. Froment, J. B. Hirbo, A. A.Awomoyi, J.-M. Bodo, O. Doumbo, M. Ibrahim, A. T. Juma, M. J. Kotze, G. Lema, J. H. Moore, H.Mortensen, T. B. Nyambo, S. A. Omar, K. Powell, G. S. Pretorius, M. W. Smith, M. A. Thera, C. Wambebe, J. L. Weber, S. M. Williams, The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009). doi:10.1126/science.1172257pmid:19407144

  9. H. M. Kang, J. H. Sul, S. K. Service, N. A. Zaitlen, S. Y. Kong, N. B. Freimer, C. Sabatti, E. Eskin, Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010). doi:10.1038/ng.548pmid:20208533

  10. 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature 526, 68–74 (2015). doi:10.1038/nature15393pmid:26432245

  11. F. Hormozdiari, E. Kostem, E. Y. Kang, B. Pasaniuc, E. Eskin, Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).doi:10.1534/genetics.114.167908pmid:25104515

  12. B. Vernot, S. Tucci, J. Kelso, J. G. Schraiber, A. B. Wolf, R. M. Gittelman, M. Dannemann, S. Grote, R. C.McCoy, H. Norton, L. B. Scheinfeldt, D. A. Merriwether, G. Koki, J. S. Friedlaender, J. Wakefield, S. Pääbo, J. M. Akey, Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016). doi:10.1126/science.aad9416pmid:26989198

  13. S. Mallick, H. Li, M. Lipson, I. Mathieson, M. Gymrek, F. Racimo, M. Zhao, N. Chennagiri, S. Nordenfelt, A. Tandon, P. Skoglund, I. Lazaridis, S. Sankararaman, Q. Fu, N. Rohland, G. Renaud, Y. Erlich, T. Willems, C. Gallo, J. P. Spence, Y. S. Song, G. Poletti, F. Balloux, G. van Driem, P. de Knijff, I. G. Romero, A. R. Jha, D. M. Behar, C. M. Bravi, C. Capelli, T. Hervig, A. Moreno-Estrada, O. L. Posukh, E. Balanovska, O. Balanovsky, S. Karachanak-Yankova, H. Sahakyan, D. Toncheva, L. Yepiskoposyan, C. Tyler-Smith, Y. Xue, M. S.Abdullah, A. Ruiz-Linares, C. M. Beall, A. Di Rienzo, C. Jeong, E. B. Starikovskaya, E. Metspalu, J. Parik, R.Villems, B. M. Henn, U. Hodoglugil, R. Mahley, A. Sajantila, G. Stamatoyannopoulos, J. T. S. Wee, R.Khusainova, E. Khusnutdinova, S. Litvinov, G. Ayodo, D. Comas, M. F. Hammer, T. Kivisild, W. Klitz, C. A.Winkler, D. Labuda, M. Bamshad, L. B. Jorde, S. A. Tishkoff, W. S. Watkins, M. Metspalu, S. Dryomov, R. Sukernik, L. Singh, K. Thangaraj, S. Pääbo, J. Kelso, N. Patterson, D. Reich, The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). doi:10.1038/nature18964pmid:27654912

  14. R. L. Lamason, M. A. Mohideen, J. R. Mest, A. C. Wong, H. L. Norton, M. C. Aros, M. J. Jurynec, X. Mao, V. R. Humphreville, J. E. Humbert, S. Sinha, J. L. Moore, P. Jagadeeswaran, W. Zhao, G. Ning, I.Makalowska, P. M. McKeigue, D. O’donnell, R. Kittles, E. J. Parra, N. J. Mangini, D. J. Grunwald, M. D.Shriver, V. A. Canfield, K. C. Cheng, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310, 1782–1786 (2005). doi:10.1126/science.1116238pmid:16357253

  15. S. Beleza, A. M. Santos, I. Alves, E. Cameron, M. D. Shriver, E. J. Parra, J. Rocha, The timing of pigmentation lightening in Europeans. Mol. Biol. Evol. 30 24–35 (2013).doi:10.1093/molbev/mss207pmid:22923467

  16. M. Jonnalagadda, N. Bharti, Y. Patil, S. Ozarkar, S. M. K, R. Joshi, H. Norton, Identifying signatures of positive selection in pigmentation genes in two South Asian populations. Am. J. Hum. Biol. 29, e23012(2017). doi:10.1002/ajhb.23012pmid:28439965

  17. C. Basu Mallick, F. M. Iliescu, M. Möls, S. Hill, R. Tamang, G. Chaubey, R. Goto, S. Y. W. Ho, I. Gallego Romero, F. Crivellaro, G. Hudjashov, N. Rai, M. Metspalu, C. G. N. Mascie-Taylor, R. Pitchappan, L. Singh,M. Mirazon-Lahr, K. Thangaraj, R. Villems, T. Kivisild, The light skin allele of SLC24A5 in South Asians and Europeans shares identity by descent. PLOS Genet. 9, e1003912 (2013). doi:10.1371/journal.pgen.1003912pmid:24244186

  18. I. Mathieson, I. Lazaridis, N. Rohland, S. Mallick, N. Patterson, S. A. Roodenberg, E. Harney, K.Stewardson, D. Fernandes, M. Novak, K. Sirak, C. Gamba, E. R. Jones, B. Llamas, S. Dryomov, J. Pickrell, J. L. Arsuaga, J. M. B. de Castro, E. Carbonell, F. Gerritsen, A. Khokhlov, P. Kuznetsov, M. Lozano, H.Meller, O. Mochalov, V. Moiseyev, M. A. R. Guerra, J. Roodenberg, J. M. Vergès, J. Krause, A. Cooper, K. W. Alt, D. Brown, D. Anthony, C. Lalueza-Fox, W. Haak, R. Pinhasi, D. Reich, Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015). doi:10.1038/nature16152pmid:26595274

  19. L. Pagani, T. Kivisild, A. Tarekegn, R. Ekong, C. Plaster, I. Gallego Romero, Q. Ayub, S. Q. Mehdi, M. G.Thomas, D. Luiselli, E. Bekele, N. Bradman, D. J. Balding, C. Tyler-Smith, Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool. Am. J. Hum. Genet. 91, 83–96 (2012). doi:10.1016/j.ajhg.2012.05.015pmid:22726845

  20. F. Tekola-Ayele, A. Adeyemo, G. Chen, E. Hailu, A. Aseffa, G. Davey, M. J. Newport, C. N. Rotimi, Novel genomic signals of recent selection in an Ethiopian population. Eur. J. Hum. Genet. 23, 1085–1092(2015). doi:10.1038/ejhg.2014.233pmid:25370040

  21. C. M. Schlebusch, P. Skoglund, P. Sjödin, L. M. Gattepaille, D. Hernandez, F. Jay, S. Li, M. De Jongh, A.Singleton, M. G. B. Blum, H. Soodyall, M. Jakobsson, Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338, 374–379 (2012). doi:10.1126/science.1227721pmid:22997136

  22. J. K. Pickrell, N. Patterson, C. Barbieri, F. Berthold, L. Gerlach, T. Güldemann, B. Kure, S. W. Mpoloka, H.Nakagawa, C. Naumann, M. Lipson, P.-R. Loh, J. Lachance, J. Mountain, C. D. Bustamante, B. Berger, S. A.Tishkoff, B. M. Henn, M. Stoneking, D. Reich, B. Pakendorf, The genetic prehistory of southern Africa. Nat. Commun. 3, 1143–1146 (2012). doi:10.1038/ncomms2140pmid:23072811

  23. L. Pagani, S. Schiffels, D. Gurdasani, P. Danecek, A. Scally, Y. Chen, Y. Xue, M. Haber, R. Ekong, T. Oljira, E. Mekonnen, D. Luiselli, N. Bradman, E. Bekele, P. Zalloua, R. Durbin, T. Kivisild, C. Tyler-Smith, Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am. J. Hum. Genet. 96, 986–991 (2015). doi:10.1016/j.ajhg.2015.04.019pmid:26027499

  24. C. Ehret, An African Classical Age: Eastern and Southern Africa in World History, 1000 BC to AD 400(Oxford, 1998).

  25. M. G. Madej, S. Dang, N. Yan, H. R. Kaback, Evolutionary mix-and-match with MFS transporters. Proc. Natl. Acad. Sci. U.S.A. 110, 5870–5874 (2013). doi:10.1073/pnas.1303538110pmid:23530251

  26. R. Yu, R. Broady, Y. Huang, Y. Wang, J. Yu, M. Gao, M. Levings, S. Wei, S. Zhang, A. Xu, M. Su, J. Dutz, X.Zhang, Y. Zhou, Transcriptome analysis reveals markers of aberrantly activated innate immunity in vitiligo lesional and non-lesional skin. PLOS ONE 7, e51040 (2012). doi:10.1371/journal.pone.0051040pmid:23251420

  27. D. Richter, R. Grün, R. Joannes-Boyau, T. E. Steele, F. Amani, M. Rué, P. Fernandes, J.-P. Raynal, D.Geraads, A. Ben-Ncer, J.-J. Hublin, S. P. McPherron, The age of the hominin fossils from Jebel Irhoud, Morocco, and the origins of the Middle Stone Age. Nature 546, 293–296 (2017). doi:10.1038/nature22335pmid:28593967

  28. M. Przeworski, The signature of positive selection at randomly chosen loci. Genetics 160, 1179–1189(2002). pmid:11901132

  29. V. Le Corre, A. Kremer, The genetic differentiation at quantitative trait loci under local adaptation. Mol. Ecol. 21, 1548–1566 (2012). doi:10.1111/j.1365-294X.2012.05479.xpmid:22332667

  30. Roadmap Epigenomics Consortium, Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). doi:10.1038/nature14248pmid:25693563

  31. D. Hnisz, B. J. Abraham, T. I. Lee, A. Lau, V. Saint-André, A. A. Sigova, H. A. Hoke, R. A. Young, Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013). doi:10.1016/j.cell.2013.09.053pmid:24119843

  32. E. V. Sviderskaya, S. P. Hill, T. J. Evans-Whipp, L. Chin, S. J. Orlow, D. J. Easty, S. C. Cheong, D. Beach, R. A. DePinho, D. C. Bennett, p16Ink4a in melanocyte senescence and differentiation. J. Natl. Cancer Inst. 94, 446–454 (2002). doi:10.1093/jnci/94.6.446pmid:11904317

  33. K. Howe, M. D. Clark, C. F. Torroja, J. Torrance, C. Berthelot, M. Muffato, J. E. Collins, S. Humphray, K.McLaren, L. Matthews, S. McLaren, I. Sealy, M. Caccamo, C. Churcher, C. Scott, J. C. Barrett, R. Koch, G.-J. Rauch, S. White, W. Chow, B. Kilian, L. T. Quintais, J. A. Guerra-Assunção, Y. Zhou, Y. Gu, J. Yen, J.-H. Vogel, T. Eyre, S. Redmond, R. Banerjee, J. Chi, B. Fu, E. Langley, S. F. Maguire, G. K. Laird, D. Lloyd, E. Kenyon, S. Donaldson, H. Sehra, J. Almeida-King, J. Loveland, S. Trevanion, M. Jones, M. Quail, D. Willey, A. Hunt, J. Burton, S. Sims, K. McLay, B. Plumb, J. Davis, C. Clee, K. Oliver, R. Clark, C. Riddle, D. Elliot, G.Threadgold, G. Harden, D. Ware, S. Begum, B. Mortimore, G. Kerry, P. Heath, B. Phillimore, A. Tracey, N. Corby, M. Dunn, C. Johnson, J. Wood, S. Clark, S. Pelan, G. Griffiths, M. Smith, R. Glithero, P. Howden, N.Barker, C. Lloyd, C. Stevens, J. Harley, K. Holt, G. Panagiotidis, J. Lovell, H. Beasley, C. Henderson, D.Gordon, K. Auger, D. Wright, J. Collins, C. Raisen, L. Dyer, K. Leung, L. Robertson, K. Ambridge, D.Leongamornlert, S. McGuire, R. Gilderthorp, C. Griffiths, D. Manthravadi, S. Nichol, G. Barker, S.Whitehead, M. Kay, J. Brown, C. Murnane, E. Gray, M. Humphries, N. Sycamore, D. Barker, D. Saunders, J.Wallis, A. Babbage, S. Hammond, M. Mashreghi-Mohammadi, L. Barr, S. Martin, P. Wray, A. Ellington, N.Matthews, M. Ellwood, R. Woodmansey, G. Clark, J. Cooper, A. Tromans, D. Grafham, C. Skuce, R.Pandian, R. Andrews, E. Harrison, A. Kimberley, J. Garnett, N. Fosker, R. Hall, P. Garner, D. Kelly, C. Bird, S. Palmer, I. Gehring, A. Berger, C. M. Dooley, Z. Ersan-Ürün, C. Eser, H. Geiger, M. Geisler, L. Karotki, A. Kirn, J. Konantz, M. Konantz, M. Oberländer, S. Rudolph-Geiger, M. Teucke, C. Lanz, G. Raddatz, K. Osoegawa, B. Zhu, A. Rapp, S. Widaa, C. Langford, F. Yang, S. C. Schuster, N. P. Carter, J. Harrow, Z. Ning, J. Herrero, S. M. Searle, A. Enright, R. Geisler, R. H. Plasterk, C. Lee, M. Westerfield, P. J. de Jong, L. I. Zon, J. H. Postlethwait, C. Nüsslein-Volhard, T. J. Hubbard, H. Roest Crollius, J. Rogers, D. L. Stemple, The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498–503 (2013). doi:10.1038/nature12111pmid:23594743

  34. R. N. Kelsh, M. Brand, Y. J. Jiang, C. P. Heisenberg, S. Lin, P. Haffter, J. Odenthal, M. C. Mullins, F. J. van Eeden, M. Furutani-Seiki, M. Granato, M. Hammerschmidt, D. A. Kane, R. M. Warga, D. Beuchle, L.Vogelsang, C. Nüsslein-Volhard, Zebrafish pigmentation mutations and the processes of neural crest development. Development 123, 369–389 (1996). pmid:9007256

  35. S. Le Guyader, S. Jesuthasan, Analysis of xanthophore and pterinosome biogenesis in zebrafish using methylene blue and pteridine autofluorescence. Pigment Cell Res. 15, 27–31 (2002). doi:10.1034/j.1600-0749.2002.00045.xpmid:11837453

  36. J. L. Bloom, D. S. Falconer, “Grizzled,” a mutant in linkage group X of the mouse. Genet. Res. 7, 159–167 (1966). doi:10.1017/S0016672300009587

  37. T. Kobayashi, W. D. Vieira, B. Potterf, C. Sakai, G. Imokawa, V. J. Hearing, Modulation of melanogenic protein expression during the switch from eu- to pheomelanogenesis. J. Cell Sci. 108, 2301–2309(1995). pmid:7673350

  38. G. Raposo, D. Tenza, D. M. Murphy, J. F. Berson, M. S. Marks, Distinct protein sorting and localization to premelanosomes, melanosomes, and lysosomes in pigmented melanocytic cells. J. Cell Biol. 152, 809–824 (2001). doi:10.1083/jcb.152.4.809pmid:11266471

  39. G. Chu, E. Chang, Xeroderma pigmentosum group E cells lack a nuclear factor that binds to damaged DNA. Science 242, 564–567 (1988). doi:10.1126/science.3175673pmid:3175673

  40. A. L. Kadekaro, S. Leachman, R. J. Kavanagh, V. Swope, P. Cassidy, D. Supp, M. Sartor, S.Schwemberger, G. Babcock, K. Wakamatsu, S. Ito, A. Koshoffer, R. E. Boissy, P. Manga, R. A. Sturm, Z. A. Abdel-Malek, Melanocortin 1 receptor genotype: An important determinant of the damage response of melanocytes to ultraviolet radiation. FASEB J. 24, 3850–3860 (2010). doi:10.1096/fj.10-158485pmid:20519635

  41. Y. Zhang, S. Feng, F. Chen, H. Chen, J. Wang, C. McCall, Y. Xiong, X. W. Deng, Arabidopsis DDB1-CUL4 ASSOCIATED FACTOR1 forms a nuclear E3 ubiquitin ligase with DDB1 and CUL4 that is involved in multiple plant developmental processes. Plant Cell 20, 1437–1455 (2008). doi:10.1105/tpc.108.058891pmid:18552200

  42. C. Yu, Y.-L. Zhang, W.-W. Pan, X.-M. Li, Z.-W. Wang, Z.-J. Ge, J.-J. Zhou, Y. Cang, C. Tong, Q.-Y. Sun, H.-Y.Fan, CRL4 complex regulates mammalian oocyte survival and reprogramming by activation of TET proteins. Science 342, 1518–1521 (2013). doi:10.1126/science.1244587pmid:24357321

  43. M. Lieberman, O. Segev, N. Gilboa, A. Lalazar, I. Levin, The tomato homolog of the gene encoding UV-damaged DNA binding protein 1 (DDB1) underlined as the gene that causes the high pigment-1 mutant phenotype. Theor. Appl. Genet. 108, 1574–1581 (2004). doi:10.1007/s00122-004-1584-1pmid:14968305

  44. K. Takata, H. Yoshida, M. Yamaguchi, K. Sakaguchi, Drosophila damaged DNA-binding protein 1 is an essential factor for development. Genetics 168, 855–865 (2004). doi:10.1534/genetics.103.025965pmid:15514059

  45. J. H. Lee, J. L. Silhavy, J. E. Lee, L. Al-Gazali, S. Thomas, E. E. Davis, S. L. Bielas, K. J. Hill, M. Iannicelli, F. Brancati, S. B. Gabriel, C. Russ, C. V. Logan, S. M. Sharif, C. P. Bennett, M. Abe, F. Hildebrandt, B. H.Diplas, T. Attié-Bitach, N. Katsanis, A. Rajab, R. Koul, L. Sztriha, E. R. Waters, S. Ferro-Novick, C. G. Woods, C. A. Johnson, E. M. Valente, M. S. Zaki, J. G. Gleeson, Evolutionarily assembled cis-regulatory module at a human ciliopathy locus. Science 335, 966–969 (2012). doi:10.1126/science.1213506pmid:22282472

  46. P. Laurette, T. Strub, D. Koludrovic, C. Keime, S. Le Gras, H. Seberg, E. Van Otterloo, H. Imrichova, R.Siddaway, S. Aerts, R. A. Cornell, G. Mengus, I. Davidson, Transcription factor MITF and remodeller BRG1 define chromatin organisation at regulatory elements in melanoma cells. eLife 4, e06857 (2015). doi:10.7554/eLife.06857pmid:25803486

  47. G. Li, X. Ruan, R. K. Auerbach, K. S. Sandhu, M. Zheng, P. Wang, H. M. Poh, Y. Goh, J. Lim, J. Zhang, H. S. Sim, S. Q. Peh, F. H. Mulawadi, C. T. Ong, Y. L. Orlov, S. Hong, Z. Zhang, S. Landt, D. Raha, G. Euskirchen, C.-L. Wei, W. Ge, H. Wang, C. Davis, K. I. Fisher-Aylor, A. Mortazavi, M. Gerstein, T. Gingeras, B. Wold, Y.Sun, M. J. Fullwood, E. Cheung, E. Liu, W.-K. Sung, M. Snyder, Y. Ruan, Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012). doi:10.1016/j.cell.2011.12.014pmid:22265404

  48. L. Teng, B. He, J. Wang, K. Tan, 4DGenome: A comprehensive database of chromatin interactions. Bioinformatics 31, 2560–2564 (2015). doi:10.1093/bioinformatics/btv158pmid:25788621

  49. A.-S. Malaspinas, M. C. Westaway, C. Muller, V. C. Sousa, O. Lao, I. Alves, A. Bergström, G.Athanasiadis, J. Y. Cheng, J. E. Crawford, T. H. Heupink, E. Macholdt, S. Peischl, S. Rasmussen, S.Schiffels, S. Subramanian, J. L. Wright, A. Albrechtsen, C. Barbieri, I. Dupanloup, A. Eriksson, A.Margaryan, I. Moltke, I. Pugach, T. S. Korneliussen, I. P. Levkivskyi, J. V. Moreno-Mayar, S. Ni, F. Racimo, M. Sikora, Y. Xue, F. A. Aghakhanian, N. Brucato, S. Brunak, P. F. Campos, W. Clark, S. Ellingvåg, G. Fourmile, P. Gerbault, D. Injie, G. Koki, M. Leavesley, B. Logan, A. Lynch, E. A. Matisoo-Smith, P. J.McAllister, A. J. Mentzer, M. Metspalu, A. B. Migliano, L. Murgha, M. E. Phipps, W. Pomat, D. Reynolds, F.-X. Ricaut, P. Siba, M. G. Thomas, T. Wales, C. M. Wall, S. J. Oppenheimer, C. Tyler-Smith, R. Durbin, J. Dortch, A. Manica, M. H. Schierup, R. A. Foley, M. M. Lahr, C. Bowern, J. D. Wall, T. Mailund, M. Stoneking, R. Nielsen, M. S. Sandhu, L. Excoffier, D. M. Lambert, E. Willerslev, A genomic history of Aboriginal Australia. Nature 538, 207–214 (2016). doi:10.1038/nature18299pmid:27654914

  50. M. H. Beltrame, M. A. Rubel, S. A. Tishkoff, Inferences of African evolutionary history from genomic data. Curr. Opin. Genet. Dev. 41, 159–166 (2016). doi:10.1016/j.gde.2016.10.002pmid:27810637

  51. M. Visser, M. Kayser, R.-J. Palstra, HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. Genome Res. 22, 446–455 (2012). doi:10.1101/gr.128652.111pmid:22234890

  52. M. Kayser, F. Liu, A. C. J. W. Janssens, F. Rivadeneira, O. Lao, K. van Duijn, M. Vermeulen, P. Arp, M. M.Jhamai, W. F. J. van Ijcken, J. T. den Dunnen, S. Heath, D. Zelenika, D. D. G. Despriet, C. C. W. Klaver, J. R.Vingerling, P. T. V. M. de Jong, A. Hofman, Y. S. Aulchenko, A. G. Uitterlinden, B. A. Oostra, C. M. van Duijn,Three genome-wide association studies and a linkage analysis identify HERC2 as a human iris color gene. Am. J. Hum. Genet. 82, 411–423 (2008). doi:10.1016/j.ajhg.2007.10.003pmid:18252221

  53. J. Han, P. Kraft, H. Nan, Q. Guo, C. Chen, A. Qureshi, S. E. Hankinson, F. B. Hu, D. L. Duffy, Z. Z. Zhao, N. G. Martin, G. W. Montgomery, N. K. Hayward, G. Thomas, R. N. Hoover, S. Chanock, D. J. Hunter, A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLOS Genet. 4, e1000074 (2008). doi:10.1371/journal.pgen.1000074pmid:18483556

  54. N. W. Bellono, I. E. Escobar, A. J. Lefkovith, M. S. Marks, E. Oancea, An intracellular anion channel critical for pigmentation. eLife 3, e04543 (2014). doi:10.7554/eLife.04543pmid:25513726

  55. M. H. Brilliant, The mouse p (pink-eyed dilution) and human P genes, oculocutaneous albinism type 2 (OCA2), and melanosomal pH. Pigment Cell Res. 14, 86–93 (2001). doi:10.1034/j.1600-0749.2001.140203.xpmid:11310796

  56. A. Rafati et al., Association of rs12913832 in the HERC2 gene affecting human iris color variation. ASJ12, 9–16 (2015).

  57. H. Eiberg, J. Troelsen, M. Nielsen, A. Mikkelsen, J. Mengel-From, K. W. Kjaer, L. Hansen, Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA expression. Hum. Genet. 123, 177–187 (2008). doi:10.1007/s00439-007-0460-xpmid:18172690

  58. H. E. Seberg, E. Van Otterloo, S. K. Loftus, H. Liu, G. Bonde, R. Sompallae, D. E. Gildea, J. F. Santana, J. R. Manak, W. J. Pavan, T. Williams, R. A. Cornell, TFAP2 paralogs regulate melanocyte differentiation in parallel with MITF. PLOS Genet. 13, e1006636 (2017). doi:10.1371/journal.pgen.1006636pmid:28249010

  59. W. S. Oetting, S. S. Garrett, M. Brott, R. A. King, P gene mutations associated with oculocutaneous albinism type II (OCA2). Hum. Mutat. 25, 323 (2005). doi:10.1002/humu.9318pmid:15712365

  60. R. Kerr, G. Stevens, P. Manga, S. Salm, P. John, T. Haw, M. Ramsay, Identification of P gene mutations in individuals with oculocutaneous albinism in sub-Saharan Africa. Hum. Mutat. 15, 166–172 (2000).doi:10.1002/(SICI)1098-1004(200002)15:2<166::AID-HUMU5>3.0.CO;2-Zpmid:10649493

  61. A. R. Wood, T. Esko, J. Yang, S. Vedantam, T. H. Pers, S. Gustafsson, A. Y. Chu, K. Estrada, J. Luan, Z. Kutalik, N. Amin, M. L. Buchkovich, D. C. Croteau-Chonka, F. R. Day, Y. Duan, T. Fall, R. Fehrmann, T.Ferreira, A. U. Jackson, J. Karjalainen, K. S. Lo, A. E. Locke, R. Mägi, E. Mihailov, E. Porcu, J. C. Randall, A.Scherag, A. A. E. Vinkhuyzen, H.-J. Westra, T. W. Winkler, T. Workalemahu, J. H. Zhao, D. Absher, E. Albrecht, D. Anderson, J. Baron, M. Beekman, A. Demirkan, G. B. Ehret, B. Feenstra, M. F. Feitosa, K.Fischer, R. M. Fraser, A. Goel, J. Gong, A. E. Justice, S. Kanoni, M. E. Kleber, K. Kristiansson, U. Lim, V.Lotay, J. C. Lui, M. Mangino, I. Mateo Leach, C. Medina-Gomez, M. A. Nalls, D. R. Nyholt, C. D. Palmer, D.Pasko, S. Pechlivanis, I. Prokopenko, J. S. Ried, S. Ripke, D. Shungin, A. Stancáková, R. J. Strawbridge, Y. J. Sung, T. Tanaka, A. Teumer, S. Trompet, S. W. van der Laan, J. van Setten, J. V. Van Vliet-Ostaptchouk, Z. Wang, L. Yengo, W. Zhang, U. Afzal, J. Arnlöv, G. M. Arscott, S. Bandinelli, A. Barrett, C. Bellis, A. J.Bennett, C. Berne, M. Blüher, J. L. Bolton, Y. Böttcher, H. A. Boyd, M. Bruinenberg, B. M. Buckley, S.Buyske, I. H. Caspersen, P. S. Chines, R. Clarke, S. Claudi-Boehm, M. Cooper, E. W. Daw, P. A. De Jong, J. Deelen, G. Delgado, J. C. Denny, R. Dhonukshe-Rutten, M. Dimitriou, A. S. F. Doney, M. Dörr, N. Eklund, E.Eury, L. Folkersen, M. E. Garcia, F. Geller, V. Giedraitis, A. S. Go, H. Grallert, T. B. Grammer, J. Gräßler, H.Grönberg, L. C. P. G. M. de Groot, C. J. Groves, J. Haessler, P. Hall, T. Haller, G. Hallmans, A. Hannemann, C. A. Hartman, M. Hassinen, C. Hayward, N. L. Heard-Costa, Q. Helmer, G. Hemani, A. K. Henders, H. L.Hillege, M. A. Hlatky, W. Hoffmann, P. Hoffmann, O. Holmen, J. J. Houwing-Duistermaat, T. Illig, A. Isaacs, A. L. James, J. Jeff, B. Johansen, Å. Johansson, J. Jolley, T. Juliusdottir, J. Junttila, A. N. Kho, L.Kinnunen, N. Klopp, T. Kocher, W. Kratzer, P. Lichtner, L. Lind, J. Lindström, S. Lobbens, M. Lorentzon, Y.Lu, V. Lyssenko, P. K. E. Magnusson, A. Mahajan, M. Maillard, W. L. McArdle, C. A. McKenzie, S. McLachlan, P. J. McLaren, C. Menni, S. Merger, L. Milani, A. Moayyeri, K. L. Monda, M. A. Morken, G.Müller, M. Müller-Nurasyid, A. W. Musk, N. Narisu, M. Nauck, I. M. Nolte, M. M. Nöthen, L. Oozageer, S.Pilz, N. W. Rayner, F. Renstrom, N. R. Robertson, L. M. Rose, R. Roussel, S. Sanna, H. Scharnagl, S.Scholtens, F. R. Schumacher, H. Schunkert, R. A. Scott, J. Sehmi, T. Seufferlein, J. Shi, K. Silventoinen, J. H. Smit, A. V. Smith, J. Smolonska, A. V. Stanton, K. Stirrups, D. J. Stott, H. M. Stringham, J. Sundström, M. A. Swertz, A.-C. Syvänen, B. O. Tayo, G. Thorleifsson, J. P. Tyrer, S. van Dijk, N. M. van Schoor, N. van der Velde, D. van Heemst, F. V. A. van Oort, S. H. Vermeulen, N. Verweij, J. M. Vonk, L. L. Waite, M.Waldenberger, R. Wennauer, L. R. Wilkens, C. Willenborg, T. Wilsgaard, M. K. Wojczynski, A. Wong, A. F. Wright, Q. Zhang, D. Arveiler, S. J. L. Bakker, J. Beilby, R. N. Bergman, S. Bergmann, R. Biffar, J. Blangero, D. I. Boomsma, S. R. Bornstein, P. Bovet, P. Brambilla, M. J. Brown, H. Campbell, M. J. Caulfield, A.Chakravarti, R. Collins, F. S. Collins, D. C. Crawford, L. A. Cupples, J. Danesh, U. de Faire, H. M. den Ruijter, R. Erbel, J. Erdmann, J. G. Eriksson, M. Farrall, E. Ferrannini, J. Ferrières, I. Ford, N. G. Forouhi, T.Forrester, R. T. Gansevoort, P. V. Gejman, C. Gieger, A. Golay, O. Gottesman, V. Gudnason, U. Gyllensten, D. W. Haas, A. S. Hall, T. B. Harris, A. T. Hattersley, A. C. Heath, C. Hengstenberg, A. A. Hicks, L. A. Hindorff, A. D. Hingorani, A. Hofman, G. K. Hovingh, S. E. Humphries, S. C. Hunt, E. Hypponen, K. B. Jacobs, M.-R.Jarvelin, P. Jousilahti, A. M. Jula, J. Kaprio, J. J. P. Kastelein, M. Kayser, F. Kee, S. M. Keinanen-Kiukaanniemi, L. A. Kiemeney, J. S. Kooner, C. Kooperberg, S. Koskinen, P. Kovacs, A. T. Kraja, M. Kumari, J. Kuusisto, T. A. Lakka, C. Langenberg, L. Le Marchand, T. Lehtimäki, S. Lupoli, P. A. F. Madden, S.Männistö, P. Manunta, A. Marette, T. C. Matise, B. McKnight, T. Meitinger, F. L. Moll, G. W. Montgomery, A. D. Morris, A. P. Morris, J. C. Murray, M. Nelis, C. Ohlsson, A. J. Oldehinkel, K. K. Ong, W. H. Ouwehand, G. Pasterkamp, A. Peters, P. P. Pramstaller, J. F. Price, L. Qi, O. T. Raitakari, T. Rankinen, D. C. Rao, T. K. Rice, M. Ritchie, I. Rudan, V. Salomaa, N. J. Samani, J. Saramies, M. A. Sarzynski, P. E. H. Schwarz, S. Sebert, P. Sever, A. R. Shuldiner, J. Sinisalo, V. Steinthorsdottir, R. P. Stolk, J.-C. Tardif, A. Tönjes, A. Tremblay, E.Tremoli, J. Virtamo, M.-C. Vohl, P. Amouyel, F. W. Asselbergs, T. L. Assimes, M. Bochud, B. O. Boehm, E.Boerwinkle, E. P. Bottinger, C. Bouchard, S. Cauchi, J. C. Chambers, S. J. Chanock, R. S. Cooper, P. I. W. de Bakker, G. Dedoussis, L. Ferrucci, P. W. Franks, P. Froguel, L. C. Groop, C. A. Haiman, A. Hamsten, M. G.Hayes, J. Hui, D. J. Hunter, K. Hveem, J. W. Jukema, R. C. Kaplan, M. Kivimaki, D. Kuh, M. Laakso, Y. Liu, N. G. Martin, W. März, M. Melbye, S. Moebus, P. B. Munroe, I. Njølstad, B. A. Oostra, C. N. A. Palmer, N. L.Pedersen, M. Perola, L. Pérusse, U. Peters, J. E. Powell, C. Power, T. Quertermous, R. Rauramaa, E. Reinmaa, P. M. Ridker, F. Rivadeneira, J. I. Rotter, T. E. Saaristo, D. Saleheen, D. Schlessinger, P. E.Slagboom, H. Snieder, T. D. Spector, K. Strauch, M. Stumvoll, J. Tuomilehto, M. Uusitupa, P. van der Harst, H. Völzke, M. Walker, N. J. Wareham, H. Watkins, H.-E. Wichmann, J. F. Wilson, P. Zanen, P. Deloukas, I. M.Heid, C. M. Lindgren, K. L. Mohlke, E. K. Speliotes, U. Thorsteinsdottir, I. Barroso, C. S. Fox, K. E. North, D. P. Strachan, J. S. Beckmann, S. I. Berndt, M. Boehnke, I. B. Borecki, M. I. McCarthy, A. Metspalu, K. Stefansson, A. G. Uitterlinden, C. M. van Duijn, L. Franke, C. J. Willer, A. L. Price, G. Lettre, R. J. F. Loos, M. N. Weedon, E. Ingelsson, J. R. O’Connell, G. R. Abecasis, D. I. Chasman, M. E. Goddard, P. M. Visscher, J. N. Hirschhorn, T. M. Frayling, Electronic Medical Records and Genomics (eMEMERGEGE) Consortium,MIGen Consortium, PAGEGE Consortium, LifeLines Cohort Study, Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014). doi:10.1038/ng.3097pmid:25282103

  62. R. M. Harding, E. Healy, A. J. Ray, N. S. Ellis, N. Flanagan, C. Todd, C. Dixon, A. Sajantila, I. J. Jackson, M. A. Birch-Machin, J. L. Rees, Evidence for variable selective pressures at MC1R. Am. J. Hum. Genet. 66, 1351–1361 (2000). doi:10.1086/302863pmid:10733465

  63. D. Reich, R. E. Green, M. Kircher, J. Krause, N. Patterson, E. Y. Durand, B. Viola, A. W. Briggs, U. Stenzel, P. L. F. Johnson, T. Maricic, J. M. Good, T. Marques-Bonet, C. Alkan, Q. Fu, S. Mallick, H. Li, M. Meyer, E. E.Eichler, M. Stoneking, M. Richards, S. Talamo, M. V. Shunkov, A. P. Derevianko, J.-J. Hublin, J. Kelso, M. Slatkin, S. Pääbo, Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010). doi:10.1038/nature09710pmid:21179161

  64. N. G. Jablonski, G. Chaplin, The colours of humanity: The evolution of pigmentation in the human lineage. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160349 (2017). doi:10.1098/rstb.2016.0349pmid:28533464

  65. A. R. Rogers, D. Iltis, S. Wooding, Genetic variation at the MCIR locus and the time since loss of human body hair. Curr. Anthropol. 45, 105–108 (2004). doi:10.1086/381006

  66. K. C. Ang, M. S. Ngu, K. P. Reid, M. S. Teh, Z. S. Aida, D. X. R. Koh, A. Berg, S. Oppenheimer, H. Salleh, M. M. Clyde, B. M. Md-Zain, V. A. Canfield, K. C. Cheng, Skin color variation in Orang Asli tribes of Peninsular Malaysia. PLOS ONE 7, e42752 (2012). doi:10.1371/journal.pone.0042752pmid:22912732

  67. L. Pagani, D. J. Lawson, E. Jagoda, A. Mörseburg, A. Eriksson, M. Mitt, F. Clemente, G. Hudjashov, M.DeGiorgio, L. Saag, J. D. Wall, A. Cardona, R. Mägi, M. A. Wilson Sayres, S. Kaewert, C. Inchley, C. L.Scheib, M. Järve, M. Karmin, G. S. Jacobs, T. Antao, F. M. Iliescu, A. Kushniarevich, Q. Ayub, C. Tyler-Smith, Y. Xue, B. Yunusbayev, K. Tambets, C. B. Mallick, L. Saag, E. Pocheshkhova, G. Andriadze, C.Muller, M. C. Westaway, D. M. Lambert, G. Zoraqi, S. Turdikulova, D. Dalimova, Z. Sabitov, G. N. N. Sultana,J. Lachance, S. Tishkoff, K. Momynaliev, J. Isakova, L. D. Damba, M. Gubina, P. Nymadawa, I. Evseeva, L.Atramentova, O. Utevska, F.-X. Ricaut, N. Brucato, H. Sudoyo, T. Letellier, M. P. Cox, N. A. Barashkov, V.Škaro, L. Mulahasanovic′, D. Primorac, H. Sahakyan, M. Mormina, C. A. Eichstaedt, D. V. Lichman, S.Abdullah, G. Chaubey, J. T. S. Wee, E. Mihailov, A. Karunas, S. Litvinov, R. Khusainova, N. Ekomasova, V.Akhmetova, I. Khidiyatova, D. Marjanović, L. Yepiskoposyan, D. M. Behar, E. Balanovska, A. Metspalu, M.Derenko, B. Malyarchuk, M. Voevoda, S. A. Fedorova, L. P. Osipova, M. M. Lahr, P. Gerbault, M. Leavesley,A. B. Migliano, M. Petraglia, O. Balanovsky, E. K. Khusnutdinova, E. Metspalu, M. G. Thomas, A. Manica, R.Nielsen, R. Villems, E. Willerslev, T. Kivisild, M. Metspalu, Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538, 238–242 (2016). doi:10.1038/nature19792pmid:27654910

  68. Materials and methods are available as supplementary materials.

  69. J.K. Wagner, C. Jovel, H. L. Norton, E. J. Parra, M. D. Shriver , Comparing quantitative measures of erythema, pigmentation and skin response using reflectometry. Pigment Cell Res. 15, 379–384 (2002).doi:10.1034/j.1600-0749.2002.02042.xpmid:12213095

  70. S. R. Browning, B. L. Browning, Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81,1084–1097 (2007). doi:10.1086/521987pmid:17924348

  71. A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).doi:10.1101/gr.107524.110pmid:20644199

  72. H. Li, Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014). doi:10.1093/bioinformatics/btu356pmid:24974202

  73. T. Rausch, T. Zichner, A. Schlattl, A. M. Stütz, V. Benes, J. O. Korbel, DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).doi:10.1093/bioinformatics/bts378pmid:22962449

  74. J. O’Connell, D. Gurdasani, O. Delaneau, N. Pirastu, S. Ulivi, M. Cocca, M. Traglia, J. Huang, J. E.Huffman, I. Rudan, R. McQuillan, R. M. Fraser, H. Campbell, O. Polasek, G. Asiki, K. Ekoru, C. Hayward, A. F. Wright, V. Vitart, P. Navarro, J.-F. Zagury, J. F. Wilson, D. Toniolo, P. Gasparini, N. Soranzo, M. S. Sandhu, J. Marchini , A general approach for haplotype phasing across the full spectrum of relatedness. PLOS Genet. 10, e1004234 (2014). doi:10.1371/journal.pgen.1004234pmid:24743097

  75. O. Delaneau, J. Marchini, J.-F. Zagury, A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011). doi:10.1038/nmeth.1785pmid:22138821

  76. S. Das, L. Forer, S. Schönherr, C. Sidore, A. E. Locke, A. Kwong, S. I. Vrieze, E. Y. Chew, S. Levy, M.McGue, D. Schlessinger, D. Stambolian, P. R. Loh, W. G. Iacono, A. Swaroop, L. J. Scott, F. Cucca, F.Kronenberg, M. Boehnke, G. R. Abecasis, C. Fuchsberger, Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016). pmid:27571263

  77. C. Y. McLean, D. Bristor, M. Hiller, S. L. Clarke, B. T. Schaar, C. B. Lowe, A. M. Wenger, G. Bejerano,GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).doi:10.1038/nbt.1630pmid:20436461

  78. J. Yang, S. H. Lee, M. E. Goddard, P. M. Visscher, GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011). doi:10.1016/j.ajhg.2010.11.011pmid:21167468

  79. J. Yang, B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D. R. Nyholt, P. A. Madden, A. C. Heath,N. G. Martin, G. W. Montgomery, M. E. Goddard, P. M. Visscher, Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010). pmid:20562875

  80. B. S. Weir, C. C. Cockerham, Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984). pmid:28563791

  81. B. F. Voight, S. Kudaravalli, X. Wen, J. K. Pritchard, A map of recent positive selection in the human genome. PLOS Biol. 4, e72 (2006). doi:10.1371/journal.pbio.0040072pmid:16494531

  82. F. Tajima, Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595 (1989). pmid:2513255

  83. P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, 1000 Genomes Project Analysis Group , The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).doi:10.1093/bioinformatics/btr330pmid:21653522

  84. Z. A. Szpiech, R. D. Hernandez , selscan: An efficient multithreaded program to perform EHH-based scans for positive selection. Mol. Biol. Evol. 31, 2824–2827 (2014).doi:10.1093/molbev/msu211pmid:25015648

  85. H. J. Bandelt, P. Forster, A. Röhl, Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16, 37–48 (1999). doi:10.1093/oxfordjournals.molbev.a026036pmid:10331250

  86. Fluxus Engineering (1999); www.fluxus-engineering.com.

  87. R. B. Lyngsø, Y. S. Song, J. Hein, in Algorithms in Bioinformatics, R. Casadio, G. Myers, Eds. (Lecture Notes in Computer Science series, Springer, 2005), vol. 3692, pp. 239–250.

  88. J. Terhorst, J. A. Kamm, Y. S. Song, Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017). pmid:28024154

  89. J. Ernst, M. Kellis, ChromHMM: Automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012). doi:10.1038/nmeth.1906pmid:22373907

  90. L. H. Chadwick, The NIH Roadmap Epigenomics Program data resource. Epigenomics 4, 317–324(2012). doi:10.2217/epi.12.18pmid:22690667

  91. A. R. Quinlan, I. M. Hall, BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). doi:10.1093/bioinformatics/btq033pmid:20110278

  92. J. Zhou, O. G. Troyanskaya, Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015). doi:10.1038/nmeth.3547pmid:26301843

  93. D. Lee, D. U. Gorkin, M. Baker, B. J. Strober, A. L. Asoni, A. S. McCallion, M. A. Beer, A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).pmid:26075791

  94. T. L. Bailey, M. Boden, F. A. Buske, M. Frith, C. E. Grant, L. Clementi, J. Ren, W. W. Li, W. S. Noble, MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).doi:10.1093/nar/gkp335pmid:19458158

  95. A. Mathelier, O. Fornes, D. J. Arenillas, C. Y. Chen, G. Denay, J. Lee, W. Shi, C. Shyr, G. Tan, R. Worsley-Hunt, A. W. Zhang, F. Parcy, B. Lenhard, A. Sandelin, W. W. Wasserman, JASPAR 2016: A major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016). doi:10.1093/nar/gkv1176pmid:26531826

  96. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R.Gingeras, STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).doi:10.1093/bioinformatics/bts635pmid:23104886

  97. B. Li, C. N. Dewey, RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). doi:10.1186/1471-2105-12-323pmid:21816040

  98. O. Stegle, L. Parts, R. Durbin, J. Winn, A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010). doi:10.1371/journal.pcbi.1000770pmid:20463871

  99. C. E. Bonferroni, Teoria statistica delle classi e calcolo delle probabilita (Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 1936), vol. 8.

  100. C. Delevoye, I. Hurbain, D. Tenza, J.-B. Sibarita, S. Uzan-Gafsou, H. Ohno, W. J. C. Geerts, A. J. Verkleij,J. Salamero, M. S. Marks, G. Raposo , AP-1 and KIF13A coordinate endosomal sorting and positioning during melanosome biogenesis. J. Cell Biol. 187, 247–264 (2009).doi:10.1083/jcb.200907122pmid:19841138

  101. P. A. Calvo, D. W. Frank, B. M. Bieler, J. F. Berson, M. S. Marks, A cytoplasmic sequence in human tyrosinase defines a second class of di-leucine-based sorting signals for late endosomal and lysosomal delivery. J. Biol. Chem. 274, 12780–12789 (1999). doi:10.1074/jbc.274.18.12780pmid:10212263

  102. P. Cingolani, A. Platts, L. Wang, M. Coon, T. Nguyen, L. Wang, S. J. Land, X. Lu, D. M. Ruden, A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 6, 80–92 (2012). doi:10.4161/fly.19695pmid:22728672

  103. M. Jonnalagadda, S. Ozarkar, R. Ashma, S. Kulkarni , Skin pigmentation variation among populations of West Maharashtra, India. Am. J. Hum. Biol. 28, 36–43 (2016).doi:10.1002/ajhb.22738pmid:26126512

  104. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). doi:10.1093/bioinformatics/btp352pmid:19505943

  105. C. C. Chang, C. C. Chow, L. C. A. M. Tellier, S. Vattikuti, S. M. Purcell, J. J. Lee, Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). doi:10.1186/s13742-015-0047-8pmid:25722852

  106. D. L. Ayres, A. Darling, D. J. Zwickl, P. Beerli, M. T. Holder, P. O. Lewis, J. P. Huelsenbeck, F. Ronquist, D. L.Swofford, M. P. Cummings, A. Rambaut, M. A. Suchard, BEAGLE: An application programming interface and high-performance computing library for statistical phylogenetics. Syst. Biol. 61, 170–173 (2012).doi:10.1093/sysbio/syr100pmid:21963610

  107. D. H. Alexander, J. Novembre, K. Lange, Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009). doi:10.1101/gr.094052.109pmid:19648217

  108. 1000 Genomes Project Consortium, G. R. Abecasis, D. Altshuler, A. Auton, L. D. Brooks, R. M. Durbin, R. A.Gibbs, M. E. Hurles, G. A. McVean, A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010). doi:10.1038/nature09534pmid:20981092

  109. S. Gazal, M. Sahbatou, M.-C. Babron, E. Génin, A.-L. Leutenegger, High level of inbreeding in final phase of 1000 Genomes Project. Sci. Rep. 5, 17453 (2015). doi:10.1038/srep17453pmid:26625947

  110. A. L. Price, N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick, D. Reich, Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).doi:10.1038/ng1847pmid:16862161

  111. Cortex Technology, DSM II ColorMeter, (2017); www.cortex.dk.

  112. X. Zhou, M. Stephens, Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012). pmid:22706312

  113. D. Karolchik, A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W. Sugnet, D. Haussler, W. J. Kent, The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).doi:10.1093/nar/gkh103pmid:14681465

  114. F. Hormozdiari, M. van de Bunt, A. V. Segrè, X. Li, J. W. J. Joo, M. Bilow, J. H. Sul, S. Sankararaman, B.Pasaniuc, E. Eskin, Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016). doi:10.1016/j.ajhg.2016.10.003pmid:27866706

  115. C. A. Sloan, E. T. Chan, J. M. Davidson, V. S. Malladi, J. S. Strattan, B. C. Hitz, I. Gabdank, A. K. Narayanan, M. Ho, B. T. Lee, L. D. Rowe, T. R. Dreszer, G. Roe, N. R. Podduturi, F. Tanaka, E. L. Hong, J. M. Cherry, ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726–D732 (2016).doi:10.1093/nar/gkv1160pmid:26527727

  116. H. Thorvaldsdóttir, J. T. Robinson, J. P. Mesirov, Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).doi:10.1093/bib/bbs017pmid:22517427

  117. J. T. Robinson, H. Thorvaldsdóttir, W. Winckler, M. Guttman, E. S. Lander, G. Getz, J. P. Mesirov, Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011). doi:10.1038/nbt.1754pmid:21221095

  118. C. E. Grant, T. L. Bailey, W. S. Noble, FIMO: Scanning for occurrences of a given motif. Bioinformatics 27,1017–1018 (2011). doi:10.1093/bioinformatics/btr064pmid:21330290

  119. S. E. Castel, A. Levy-Moonshine, P. Mohammadi, E. Banks, T. Lappalainen, Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015). doi:10.1186/s13059-015-0762-6pmid:26381377

  120. Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

  121. G. K. Varshney, B. Carrington, W. Pei, K. Bishop, Z. Chen, C. Fan, L. Xu, M. Jones, M. C. LaFave, J. Ledin, R.Sood, S. M. Burgess, A high-throughput functional genomics workflow based on CRISPR/Cas9-mediated targeted mutagenesis in zebrafish. Nat. Protoc. 11, 2357–2375 (2016).doi:10.1038/nprot.2016.141pmid:27809318

  122. G. K. Varshney, W. Pei, M. C. LaFave, J. Idol, L. Xu, V. Gallardo, B. Carrington, K. Bishop, M. Jones, M. Li, U.Harper, S. C. Huang, A. Prakash, W. Chen, R. Sood, J. Ledin, S. M. Burgess, High-throughput gene targeting and phenotyping in zebrafish using CRISPR/Cas9. Genome Res. 25, 1030–1042 (2015).doi:10.1101/gr.186379.114pmid:26048245

  123. B. Carrington, G. K. Varshney, S. M. Burgess, R. Sood, CRISPR-STAT: An easy and reliable PCR-based method to evaluate target-specific sgRNA activity. Nucleic Acids Res. 43, e157 (2015).doi:10.1093/nar/gkv802pmid:26253739

  124. G. K. Chen, P. Marjoram, J. D. Wall, Fast and flexible simulation of DNA sequence data. Genome Res. 19,136–142 (2009). doi:10.1101/gr.083634.108pmid:19029539

  125. A. Kong, G. Thorleifsson, D. F. Gudbjartsson, G. Masson, A. Sigurdsson, A. Jonasdottir, G. B. Walters, A.Jonasdottir, A. Gylfason, K. T. Kristinsson, S. A. Gudjonsson, M. L. Frigge, A. Helgason, U.Thorsteinsdottir, K. Stefansson, Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010). doi:10.1038/nature09525pmid:20981099

  126. H. Chen, J. Hey, M. Slatkin, A hidden Markov model for investigating recent positive selection through haplotype structure. Theor. Popul. Biol. 99, 18–30 (2015). doi:10.1016/j.tpb.2014.11.001

  127. J. Smith, G. Coop, M. Stephens, J. Novembre, Estimating time to the common ancestor for a beneficial allele. 071241 [Preprint]. 24 August 2016. .doi:10.1101/071241

  128. GTEx Consortium, The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348, 648–660 (2015). doi:10.1126/science.1262110pmid:25954001

  129. K. D. Haltaufderhyde, E. Oancea, Genome-wide transcriptome analysis of human epidermal melanocytes. Genomics 104, 482–489 (2014). doi:10.1016/j.ygeno.2014.09.010pmid:25451175

  130. K. Wang, M. Li, H. Hakonarson, ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).doi:10.1093/nar/gkq603pmid:20601685

  131. L. B. Barreiro, G. Laval, H. Quach, E. Patin, L. Quintana-Murci, Natural selection has driven population differentiation in modern humans. Nat. Genet. 40, 340–345 (2008). doi:10.1038/ng.78pmid:18246066

  132. Acknowledgments: We thank J. Akey, R. McCoy for Melanesian genotype data, A. Clark, C. Brown, YS Park for critical review of the manuscript, members of the Tishkoff lab for helpful discussion, Dr. A. Weeraratna at Wistar Institute, Philadelphia for providing the WM88 melanocytic cell line, Dr. D. Parichy for the aox5:palmGFP plasmid, Dr. G. Xu and Dr. R Yang at University of Pennsylvania for technical assistance, Dr. M. Burmeister at University of Michigan for grizzled mouse samples, L. Garrett at the Embryonic Stem Cell and Transgenic Mouse Core (NHGRI), R. Sood in the Zebrafish Core (NHGRI), and the African participants. We acknowledge the contribution of the staff members of the Cancer Genomics Research Laboratory (NCI), the NIH Intramural Sequencing Center, NCI Center for Cancer Research Sequencing Facility, the Yale University Skin SPORE Specimen Resource Core and the Botswana-University of Pennsylvania Partnership. This work utilized computational resources of the NIH HPC Biowulf cluster. This research was funded by the following grants: NIH grants 1R01DK104339-0, 1R01GM113657-01 and NSF grant BCS-1317217 to SAT. NIH grant R01 AR048155 from NIAMS to MM. NIH grant R01 AR066318 from NIAMS to EO, NIH grants 5R24OD017870-04 and 1U54DK110805-01to LZ and YZ, NIH grant R01-GM094402 to YSS, NIH K12 GM081259 from NIGMS to SB. MHB was partly supported by a “Science Without Borders” fellowship from CNPq – Brazil. YSS is a Chan Zuckerberg Biohub investigator. This work was supported in part by the Center of Excellence in Environmental Toxicology (NIH P30-ES013508, T32-ES019851 to MH) (NIEHS), the Intramural Program of the National Human Genome Research Institute and the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, federal funds from NCI under Contract HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. L.I.Z. is a founder and stock holder of Fate Therapeutics, Marauder Therapeutics, and Scholar Rock. Data are available at dbGAP accession number phs001396.v1.p1 and SRA BioProject PRJNA392485.