Reference genome
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

. As they are often assembled from the sequencing of DNA from a number of donors, reference genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

s do not accurately represent the genetic code of any single individual. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. For example GRCh37, the Genome Reference Consortium human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

 (build 37) is derived from thirteen anonymous volunteers from Buffalo, New York
Buffalo, New York
Buffalo is the second most populous city in the state of New York, after New York City. Located in Western New York on the eastern shores of Lake Erie and at the head of the Niagara River across from Fort Erie, Ontario, Buffalo is the seat of Erie County and the principal city of the...

. The ABO blood group system
ABO blood group system
The ABO blood group system is the most important blood type system in human blood transfusion. The associated anti-A antibodies and anti-B antibodies are usually IgM antibodies, which are usually produced in the first years of life by sensitization to environmental substances such as food,...

 differs among humans, but the human reference genome contains only an O allele
ABO (gene)
Histo-blood group ABO system transferase is an enzyme that in humans is encoded by the ABO gene.It determines the ABO blood group of an individual.- Function :...

 (although the other alleles are annotated).

As the cost of DNA sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 falls, and new full genome sequencing
Full genome sequencing
Full genome sequencing , also known as whole genome sequencing , complete genome sequencing, or entire genome sequencing, is a laboratory process that determines the complete DNA sequence of an organism's genome at a single time...

 technologies emerge, more genome sequences continue to be generated. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

. Most individuals with their entire genome sequenced, such as James D. Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

, had their genome assembled in this manner. For much of a genome, the reference provides a good approximation of the DNA of any single individual. But in regions with high allelic diversity
Gene pool
In population genetics, a gene pool is the complete set of unique alleles in a species or population.- Description :A large gene pool indicates extensive genetic diversity, which is associated with robust populations that can survive bouts of intense selection...

, such as the major histocompatibility complex
Major histocompatibility complex
Major histocompatibility complex is a cell surface molecule encoded by a large gene family in all vertebrates. MHC molecules mediate interactions of leukocytes, also called white blood cells , which are immune cells, with other leukocytes or body cells...

 in humans and the major urinary proteins
Major urinary proteins
Major urinary proteins , also known as α2u-globulins, are a subfamily of proteins found in abundance in the urine and other secretions of many animals. Mups provide a small range of identifying information about the donor animal, when detected by the vomeronasal organ of the receiving animal. They...

 of mice, the reference genome may differ significantly from other individuals. Comparison between the reference (build 36) and Watson's genome revealed 3.3 million single nucleotide polymorphism
Single nucleotide polymorphism
A single-nucleotide polymorphism is a DNA sequence variation occurring when a single nucleotide — A, T, C or G — in the genome differs between members of a biological species or paired chromosomes in an individual...

 differences, while about 1.4 percent of his DNA could not be matched to the reference genome at all. For regions where there is known to be large scale variation, sets of alternate loci
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...

 are assembled alongside the reference locus.

The human and mouse reference genomes are maintained and improved by the Genome Reference Consortium
Genome Reference Consortium
The Genome Reference Consortium is a group of educational institutes which was formed to improve the representation of reference genomes. At the time the human reference was initially described, it was clear that some regions were recalcitrant to closure with existing technology...

 (GRC), a group of less than 20 scientists from a number of genome research institutes, including the European Bioinformatics Institute
European Bioinformatics Institute
The European Bioinformatics Institute is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory...

, the National Center for Biotechnology Information
National Center for Biotechnology Information
The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...

, The Sanger Institute
Sanger Institute
The Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust....

 and Washington University in St. Louis
Washington University in St. Louis
Washington University in St. Louis is a private research university located in suburban St. Louis, Missouri. Founded in 1853, and named for George Washington, the university has students and faculty from all fifty U.S. states and more than 110 nations...

. GRC continues to improve reference genomes by building new alignments that contain fewer gaps, and fixing misrepresentations in the sequence. As of 2010, the human reference genome is in its 19th version. The GRCh37 build contains around 250 gaps, whereas the first version had ~150,000 gaps.

Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl
Ensembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...

 or UCSC Genome Browser
UCSC Genome Browser
The University of California, Santa Cruz is an up-to-date source for genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations...


