The 1000 Genomes Project
Encyclopedia
The 1000 Genomes Project, launched in January 2008, is an international research effort to establish by far the most detailed catalogue of human genetic variation
Human genetic variation
Human genetic variation refers to genetic differences both within and among populations. There may be multiple variants of any given gene in the human population , leading to polymorphism. Many genes are not polymorphic, meaning that only a single allele is present in the population: that allele is...

. Scientists plan to sequence
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 the genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

s of at least one thousand anonymous participants from a number of different ethnic groups within the next three years, using newly developed technologies which are faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in Nature . As of late 2010, the project is in its production phase with a target of sequencing upwards of 2000 individuals.

The project unites multidisciplinary research teams from institutes around the world, including the United Kingdom
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...

, China
China
Chinese civilization may refer to:* China for more general discussion of the country.* Chinese culture* Greater China, the transnational community of ethnic Chinese.* History of China* Sinosphere, the area historically affected by Chinese culture...

 and the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...

. Each will contribute to the enormous sequence dataset and to a refined human genome map
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

, which will be freely accessible through public databases to the scientific community and the general public alike.

By providing an overview of all genetic variation, not only what is biomedically relevant, the consortium will generate a valuable tool for all fields of natural science, especially in the disciplines of Genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, Medicine
Medicine
Medicine is the science and art of healing. It encompasses a variety of health care practices evolved to maintain and restore health by the prevention and treatment of illness....

, Pharmacology
Pharmacology
Pharmacology is the branch of medicine and biology concerned with the study of drug action. More specifically, it is the study of the interactions that occur between a living organism and chemicals that affect normal or abnormal biochemical function...

, Biochemistry
Biochemistry
Biochemistry, sometimes called biological chemistry, is the study of chemical processes in living organisms, including, but not limited to, living matter. Biochemistry governs all living organisms and living processes...

 and Bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

.


Background

Within the past few decades, advances in human population genetics
Population genetics
Population genetics is the study of allele frequency distribution and change under the influence of the four main evolutionary processes: natural selection, genetic drift, mutation and gene flow. It also takes into account the factors of recombination, population subdivision and population...

 and comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

 have made it possible to gain increasing insight into the nature of genetic diversity. Although, we are just beginning to understand how processes like the random sampling of gamete
Gamete
A gamete is a cell that fuses with another cell during fertilization in organisms that reproduce sexually...

s, structural variations
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

 (insertions/ deletions (indel
Indel
Indel is a molecular biology term that has different definitions in different fields:*In evolutionary studies, indel is used to mean an insertion or a deletion and indels simply refers to the mutation class that includes both insertions, deletions, and the combination thereof, including insertion...

s), copy number variations
Gene copy number
Copy-number variations —a form of structural variation—are alterations of the DNA of a genome that results in the cell having an abnormal number of copies of one or more sections of the DNA. CNVs correspond to relatively large regions of the genome that have been deleted or duplicated on certain...

 (CNV), retroelements), single-nucleotide polymorphisms (SNPs) and natural selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

 have shaped the level and pattern of variation within species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

 and also between species.

Human genetic variation

The random sampling of gametes during sexual reproduction leads to genetic drift
Genetic drift
Genetic drift or allelic drift is the change in the frequency of a gene variant in a population due to random sampling.The alleles in the offspring are a sample of those in the parents, and chance has a role in determining whether a given individual survives and reproduces...

 — a random fluctuation in the population frequency of a trait — in subsequent generations and would result in the loss of all variation in the absence of external influence. It is postulated that the rate of genetic drift is inversely proportional to population size, and that it may be accelerated in specific situations such as bottlenecks
Population bottleneck
A population bottleneck is an evolutionary event in which a significant percentage of a population or species is killed or otherwise prevented from reproducing....

, where the population size is reduced for a certain period of time, and by the founder effect
Founder effect
In population genetics, the founder effect is the loss of genetic variation that occurs when a new population is established by a very small number of individuals from a larger population. It was first fully outlined by Ernst Mayr in 1942, using existing theoretical work by those such as Sewall...

 (individuals in a population tracing back to a small number of founding individuals).

Anzai et al. demonstrated that indels account for 90.4 % of all observed variations in the sequence of the major histocompatibility locus
Major histocompatibility complex
Major histocompatibility complex is a cell surface molecule encoded by a large gene family in all vertebrates. MHC molecules mediate interactions of leukocytes, also called white blood cells , which are immune cells, with other leukocytes or body cells...

 (MHC) between humans and chimpanzees. After taking multiple indels into consideration, the high degree of genomic similarity between the two species (98.6 % nucleotide sequence identity) drops to only 86.7 %. For example, a large deletion of 95 kilobases (kb) between the loci
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...

 of the human MICA and MICB genes
Gênes
Gênes is the name of a département of the First French Empire in present Italy, named after the city of Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa. Its capital was Genoa, and it was divided in the arrondissements of Genoa, Bobbio, Novi Ligure, Tortona and...

, results in a single hybrid chimpanzee MIC gene, linking this region to a species-specific handling of several retroviral
Retrovirus
A retrovirus is an RNA virus that is duplicated in a host cell using the reverse transcriptase enzyme to produce DNA from its RNA genome. The DNA is then incorporated into the host's genome by an integrase enzyme. The virus thereafter replicates as part of the host cell's DNA...

 infections and the resultant susceptibility to various autoimmune diseases. The authors conclude that instead of more subtle SNPs, indels were the driving mechanism in primate speciation.

Besides mutations, SNPs and other structural variants
Structural variation
Structural variation is the variation in structure of an organism's chromosome. It consists of many kinds of variation in the genome of one species, and usually includes microscopic and submicroscopic types, such as deletions, duplications, copy-number variants, insertions, inversions and...

 such as copy-number variants (CNVs) are contributing to the genetic diversity in human populations. Using microarrays, almost 1,500 copy number variable regions, covering around 12% of the genome and containing hundreds of genes, disease loci, functional elements and segmental duplications, have been identified in the HapMap sample collection. Although the specific function of CNVs remains elusive, the fact that CNVs span more nucleotide content per genome than SNPs emphasizes the importance of CNVs in genetic diversity and evolution.

Investigating human genomic variations holds great potential for identifying genes that might underlie differences in disease resistance (e.g. MHC region
Major histocompatibility complex
Major histocompatibility complex is a cell surface molecule encoded by a large gene family in all vertebrates. MHC molecules mediate interactions of leukocytes, also called white blood cells , which are immune cells, with other leukocytes or body cells...

) or drug metabolism
Drug metabolism
Drug metabolism is the biochemical modification of pharmaceutical substances by living organisms, usually through specialized enzymatic systems. This is a form of xenobiotic metabolism. Drug metabolism often converts lipophilic chemical compounds into more readily excreted polar products...

.

Natural selection

Natural selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

 in the evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

 of a trait can be divided into three classes. Directional or positive selection refers to a situation where a certain allele has a greater fitness than other alleles, consequently increasing its population frequency (e.g. antibiotic resistance
Antibiotic resistance
Antibiotic resistance is a type of drug resistance where a microorganism is able to survive exposure to an antibiotic. While a spontaneous or induced genetic mutation in bacteria may confer resistance to antimicrobial drugs, genes that confer resistance can be transferred between bacteria in a...

 of bacteria). In contrast, stabilizing or negative selection
Negative selection
Negative selection may refer to:*Negative selection , in natural selection it refers to the selective removal of rare alleles that are deleterious...

 (also known as purifying selection) lowers the frequency or even removes alleles from a population due to disadvantages associated with it with respect to other alleles. Finally, a number of forms of balancing selection
Balancing selection
Balancing selection refers to a number of selective processes by which multiple alleles are actively maintained in the gene pool of a population at frequencies above that of gene mutation. This usually happens when the heterozygotes for the alleles under consideration have a higher adaptive value...

 exist; those increase genetic variation within a species by being overdominant (heterozygous individuals are fitter than homozygous individuals, e.g. G6PD, the gene involved in sickle cell anaemia and malaria
Malaria
Malaria is a mosquito-borne infectious disease of humans and other animals caused by eukaryotic protists of the genus Plasmodium. The disease results from the multiplication of Plasmodium parasites within red blood cells, causing symptoms that typically include fever and headache, in severe cases...

 resistance) or can vary spatially within a species that inhabits different niches, thus favouring different alleles. Some genomic differences may not affect fitness. Neutral variation, previously thought to be “junk” DNA, is unaffected by natural selection resulting in higher genetic variation at such sites when compared to sites where variation does influence fitness.

It is not fully clear how natural selection has shaped population differences; however, genetic candidate regions under selection have been identified recently. Patterns of DNA polymorphisms
Polymorphism (biology)
Polymorphism in biology occurs when two or more clearly different phenotypes exist in the same population of a species — in other words, the occurrence of more than one form or morph...

 can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism. Barreiro et al. found evidence that negative selection has reduced population differentiation at the amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

–altering level (particularly in disease-related genes), whereas, positive selection has ensured regional adaptation of human populations by increasing population differentiation in gene regions (mainly nonsynonymous
Missense mutation
In genetics, a missense mutation is a point mutation in which a single nucleotide is changed, resulting in a codon that codes for a different amino acid . This can render the resulting protein nonfunctional...

 and 5'-untranslated region
Five prime untranslated region
A messenger ribonucleic acid molecule codes for a protein through translation. The mRNA also contains regions that are not translated: in eukaryotes these include the 5' untranslated region, 3' untranslated region, 5' cap and poly-A tail....

 variants).

It is thought that most complex
Genetic disorder
A genetic disorder is an illness caused by abnormalities in genes or chromosomes, especially a condition that is present from before birth. Most genetic disorders are quite rare and affect one person in every several thousands or millions....

 and Mendelian diseases
Mendelian inheritance
Mendelian inheritance is a scientific description of how hereditary characteristics are passed from parent organisms to their offspring; it underlies much of genetics...

 (except diseases with late onset, assuming that older individuals no longer contribute to the fitness of their offspring) will have an effect on survival and/or reproduction, thus, genetic factors underlying those diseases should be influenced by natural selection. Although, diseases that have late onset today could have been childhood diseases in the past as genes delaying disease progression could have undergone selection. Gaucher disease (mutations in the GBA gene), Crohn’s disease (mutation of NOD2) and familial hypertrophic cardiomyopathy
Cardiomyopathy
Cardiomyopathy, which literally means "heart muscle disease," is the deterioration of the function of the myocardium for any reason. People with cardiomyopathy are often at risk of arrhythmia or sudden cardiac death or both. Cardiomyopathy can often go undetected, making it especially dangerous to...

 (mutations in CMH1, CMH2, CMH3 and CMH4) are all examples of negative selection. These disease mutations are primarily recessive and segregate as expected at a low frequency, supporting the hypothesized negative selection. There is evidence that the gentic-basis of Type 1 Diabetes may have undergone positive selection . Few cases have been reported, where disease-causing mutations appear at the high frequencies supported by balanced selection. The most prominent example is mutations of the G6PD locus where, if homozygous G6PD enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

 deficiency and consequently sickle-cell anaemia results, but in the heterozygous state are partially protective against malaria
Malaria
Malaria is a mosquito-borne infectious disease of humans and other animals caused by eukaryotic protists of the genus Plasmodium. The disease results from the multiplication of Plasmodium parasites within red blood cells, causing symptoms that typically include fever and headache, in severe cases...

. Other possible explanations for segregation of disease alleles at moderate or high frequencies include genetic drift and recent alterations towards positive selection due to environmental changes such as diet or genetic hitch-hiking
Genetic hitchhiking
Genetic hitchhiking is the process by which an allele may increase in frequency by virtue of being linked to a gene that is positively selected. Proximity on a chromosome may allow genes to be dragged along with a selective sweep experienced by an advantageous gene nearby...

.

Genome-wide comparative analyses
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

 of different human populations, as well as between species (e.g. human versus chimpanzee) are helping us to understand the relationship between diseases and selection and provide evidence of mutations in constrained genes being disproportionally associated with heritable disease phenotypes. Genes implicated in complex disorders tend to be under less negative selection than Mendelian disease genes or non-disease genes.

Goals

There are two kinds of genetic variants related to disease. The first are rare genetic variants that have a severe effect predominantly on simple traits (e.g. Cystic Fibrosis
Cystic fibrosis
Cystic fibrosis is a recessive genetic disease affecting most critically the lungs, and also the pancreas, liver, and intestine...

, Huntington disease). The second, more common, genetic variants have a mild effect and are thought to be implicated in complex traits (e.g. Diabetes, Heart Disease
Heart disease
Heart disease, cardiac disease or cardiopathy is an umbrella term for a variety of diseases affecting the heart. , it is the leading cause of death in the United States, England, Canada and Wales, accounting for 25.4% of the total deaths in the United States.-Types:-Coronary heart disease:Coronary...

). Between these two types of genetic variants lies a significant gap of knowledge, which the 1000 Genomes Project is designed to address.

The primary goal of this project is to create a complete and detailed catalogue of human genetic variation
Human genetic variation
Human genetic variation refers to genetic differences both within and among populations. There may be multiple variants of any given gene in the human population , leading to polymorphism. Many genes are not polymorphic, meaning that only a single allele is present in the population: that allele is...

s, which in turn can be used for association studies relating genetic variation to disease. By doing so the consortium aims to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with minor allele frequencies
Minor allele frequency
Minor allele frequency refers to the frequency at which the less common allele occurs in a given population.SNPs with a minor allele frequency of 5% or greater were targeted by the HapMap project....

 as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies, haplotype
Haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...

 backgrounds and linkage disequilibrium
Linkage disequilibrium
In population genetics, linkage disequilibrium is the non-random association of alleles at two or more loci, not necessarily on the same chromosome. It is also referred to as to as gametic phase disequilibrium , or simply gametic disequilibrium...

 patterns of variant alleles.

Secondary goals will include the support of better SNP and probe selection for genotyping
Genotyping
Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

 platforms in future studies and the improvement of the human reference sequence
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

. Furthermore, the completed database will be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and recombination
Genetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...

.

Outline

The human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

 consists of approximately 3 billion DNA base pairs and is estimated to carry 20,000–25,000 protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 coding genes
Gênes
Gênes is the name of a département of the First French Empire in present Italy, named after the city of Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa. Its capital was Genoa, and it was divided in the arrondissements of Genoa, Bobbio, Novi Ligure, Tortona and...

. In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges, data quality standards and sequence coverage.

Over the course of the next three years, scientists at the Sanger Institute
Sanger Institute
The Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust....

, BGI Shenzhen
Beijing Genomics Institute
BGI , known as the Beijing Genomics Institute prior to 2008, is one of the world’s premier genome sequencing centers. Its sequencing output is expected to soon surpass the equivalent of more than 15,000 human genomes per year....

 and the National Human Genome Research Institute
National Human Genome Research Institute
The National Human Genome Research Institute is a division of the National Institutes of Health, located in Bethesda, Maryland.NHGRI began as the National Center for Human Genome Research , which was established in 1989 to carry out the role of the NIH in the International Human Genome Project...

’s Large-Scale Sequencing Network are planning to sequence a minimum of 1,000 human genomes. Due to the large amount of sequence data that need to be generated and analyzed it is possible that other participants may be recruited over time.

Almost 10 billion bases will be sequenced per day over a period of the two year production phase. This equates to more than two human genomes every 24 hours; a groundbreaking capacity. Challenging the leading experts of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 and statistical genetics, the sequence dataset will comprise 6 trillion DNA bases, 60-fold more sequence data than what has been published in DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 databases over the past 25 years.

To determine the final design of the full project three pilot studies were designed and will be carried out within the first year of the project. The first pilot intends to genotype 180 people of 3 major geographic groups
World population
The world population is the total number of living humans on the planet Earth. As of today, it is estimated to be  billion by the United States Census Bureau...

 at low coverage (2x). For the second pilot study,
the genomes of two nuclear families (both parents and an adult child) are going to be sequenced with deep coverage (20x per genome). The third pilot study involves sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).

It has been estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Therefore, several new technologies (e.g. Solexa, 454
454 Life Sciences
454 Life Sciences, is a biotechnology company based in Branford, Connecticut. It is a subsidiary of Roche, and specializes in high-throughput DNA sequencing.-History and Major Achievements:...

, SOLiD
ABI Solid Sequencing
SOLiD is a next-generation sequencing technology developed by Life Technologies and has been commercially available since 2008. These next generation technologies generate hundreds of millions to billions of small sequence reads at one time...

) will be applied, lowering the expected costs to between $30 million and $50 million. The major support will be provided by the Wellcome Trust Sanger Institute in Hinxton, England; the Beijing Genomics Institute
Beijing Genomics Institute
BGI , known as the Beijing Genomics Institute prior to 2008, is one of the world’s premier genome sequencing centers. Its sequencing output is expected to soon surpass the equivalent of more than 15,000 human genomes per year....

, Shenzhen (BGI Shenzhen), China; and the NHGRI, part of the National Institutes of Health (NIH).

The compiled genome sequence data will be made freely available.

Human genome samples

Based on the overall goals for the project, the samples will be chosen to provide power in populations where association studies for common diseases are being carried out. Furthermore, the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation.

For the pilot studies human genome samples from the HapMap collection will be sequenced. It will be useful to focus on samples that have additional data available (such as ENCODE
ENCODE
ENCODE is a public research consortium launched by the US National Human Genome Research Institute in September 2003. The goal is to find all functional elements in the human genome, one of the most critical projects by NHGRI after it completed the successful Human Genome Project...

 sequence, genome-wide genotypes, fosmid
Fosmid
Fosmids are similar to cosmids but are based on the bacterial F-plasmid. The cloning vector is limited, as a host can only contain one fosmid molecule. Fosmids are 40 kb of random genomic DNA...

-end sequence, structural variation assays, and gene expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

) to be able to compare the results with those from other projects.

Complying with extensive ethical procedures, the 1000 Genomes Project will then use samples from volunteer donors. The following populations will be included in the study: Yoruba
Yoruba people
The Yoruba people are one of the largest ethnic groups in West Africa. The majority of the Yoruba speak the Yoruba language...

 in Ibadan
Ibadan
Ibadan is the capital city of Oyo State and the third largest metropolitan area in Nigeria, after Lagos and Kano, with a population of 1,338,659 according to the 2006 census. Ibadan is also the largest metropolitan geographical area...

, Nigeria
Nigeria
Nigeria , officially the Federal Republic of Nigeria, is a federal constitutional republic comprising 36 states and its Federal Capital Territory, Abuja. The country is located in West Africa and shares land borders with the Republic of Benin in the west, Chad and Cameroon in the east, and Niger in...

; Japanese
Japanese people
The are an ethnic group originating in the Japanese archipelago and are the predominant ethnic group of Japan. Worldwide, approximately 130 million people are of Japanese descent; of these, approximately 127 million are residents of Japan. People of Japanese ancestry who live in other countries...

 in Tokyo
Tokyo
, ; officially , is one of the 47 prefectures of Japan. Tokyo is the capital of Japan, the center of the Greater Tokyo Area, and the largest metropolitan area of Japan. It is the seat of the Japanese government and the Imperial Palace, and the home of the Japanese Imperial Family...

; Chinese
Chinese people
The term Chinese people may refer to any of the following:*People with Han Chinese ethnicity ....

 in Beijing
Beijing
Beijing , also known as Peking , is the capital of the People's Republic of China and one of the most populous cities in the world, with a population of 19,612,368 as of 2010. The city is the country's political, cultural, and educational center, and home to the headquarters for most of China's...

; Utah
Utah
Utah is a state in the Western United States. It was the 45th state to join the Union, on January 4, 1896. Approximately 80% of Utah's 2,763,885 people live along the Wasatch Front, centering on Salt Lake City. This leaves vast expanses of the state nearly uninhabited, making the population the...

 residents with ancestry from northern and western Europe
Europe
Europe is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally 'divided' from Asia to its east by the watershed divides of the Ural and Caucasus Mountains, the Ural River, the Caspian and Black Seas, and the waterways connecting...

; Luhya in Webuye
Webuye
Webuye is an industrial town in Bungoma District in the Western Province of Kenya. Located on the main road to Uganda, the town is home to the Pan African Paper Mills, the largest paper factory in the region, as well as a number of heavy-chemical and sugar manufacturers. It has a tropical climate,...

, Kenya
Kenya
Kenya , officially known as the Republic of Kenya, is a country in East Africa that lies on the equator, with the Indian Ocean to its south-east...

; Maasai in Kinyawa, Kenya; Toscani in Italy
Italy
Italy , officially the Italian Republic languages]] under the European Charter for Regional or Minority Languages. In each of these, Italy's official name is as follows:;;;;;;;;), is a unitary parliamentary republic in South-Central Europe. To the north it borders France, Switzerland, Austria and...

; Peruvians in Perú
Peru
Peru , officially the Republic of Peru , is a country in western South America. It is bordered on the north by Ecuador and Colombia, on the east by Brazil, on the southeast by Bolivia, on the south by Chile, and on the west by the Pacific Ocean....

; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican
Mexican people
Mexican people refers to all persons from Mexico, a multiethnic country in North America, and/or who identify with the Mexican cultural and/or national identity....

 ancestry in Los Angeles
Los Ángeles
Los Ángeles is the capital of the province of Biobío, in the commune of the same name, in Region VIII , in the center-south of Chile. It is located between the Laja and Biobío rivers. The population is 123,445 inhabitants...

; and people of African ancestry in the southwestern United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...

.

See also

  • Human Genome Project
    Human Genome Project
    The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

  • HapMap Project
  • Personal genomics
    Personal genomics
    Personal genomics is the branch of genomics concerned with the sequencing and analysis of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism analysis chips , or partial or full genome sequencing...

  • Population groups in biomedicine
  • 1000 Plant Genomes Project
    1000 Plant Genomes Project
    Announced in 2008, shortly after the human 1000 Genomes Project, the 1000 Plant Genomes Project is another, similar highly large-scale genomics endeavour to take advantage of the speed and efficiency of next-generation DNA sequencing. Headed by Dr. Gane Ka-Shu Wong and Dr...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK