Genome project
Encyclopedia
Genome projects are scientific endeavours that ultimately aim to determine the complete genome
sequence of an organism
(be it an animal
, a plant
, a fungus
, a bacterium, an archaean, a protist
or a virus
) and to annotate protein-coding gene
s and other important genome-encoded features. The genome sequence of an organism includes the collective DNA
sequences of each chromosome
in the organism. For a bacterium
containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.
The Human Genome Project
was a landmark genome project that is already having a major impact on research across the life sciences, with potential for spurring numerous medical and commercial developments.
In 2011 ICAR scientists were the first in the world to sequence the pigeon pea genome. it was a purely indigenous effort by 31 scientists led by Nagendra Kumar Singh
of NRCPB. The first draft of the sequence was published in J. Plant Biochem. Biotechnol
s and putting them back together to create a representation of the original chromosome
s from which the DNA originated. In a shotgun sequencing
project, all the DNA from a source (usually a single organism
, anything from a bacterium to a mammal
) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 1000 nucleotide
s or bases at a time. (The four bases are adenine
, guanine
, cytosine
, and thymine
, represented as AGCT.) A genome assembly algorithm
works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged together, and the process continues.
Genome assembly is a very difficult computational
problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and some occur in thousands of different locations, especially in the large genomes of plant
s and animal
s.
The resulting (draft) genome sequence is produced by combining the information sequenced contig
s and then employing linking information to create scaffolds. Scaffolds are positioned along the physical map
of the chromosomes creating a "golden path".
for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.
.
It consists of two main steps:
Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.
The basic level of annotation is using BLAST
for finding similarities, and then annotating genomes based on that. However, nowadays more and more additional information is added to the annotation platform. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g. Ensembl
) rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline.
Structural annotation consists of the identification of genomic elements.
Functional annotation consists of attaching biological information to genomic elements.
These steps may involve both biological experiments and in silico
analysis.
A variety of software tools have been developed to permit scientists to view and share genome annotations.
Genome annotation is the next major challenge for the Human Genome Project
, now that the genome sequences of human and several model organisms are largely complete. Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism. Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together".
Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:
At Wikipedia, genome annotation has started to become automated under the auspices of the Gene Wiki portal which operates a bot
that harvests gene data from research databases and creates gene stubs on that basis.
of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes.
It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes
in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans, where coding DNA
may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the coding region
s separately. Also, as scientists understand more about the role of this noncoding DNA
(often referred to as junk DNA), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism.
In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include gene prediction
to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs
or mRNAs to help find out where the genes actually are.
) it was common to first map
the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be 'shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach).
Improvements in DNA sequencing
technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair
) and newer technology has also meant that genomes can be sequenced far more quickly.
When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as model organism
or have a relevance to human health (e.g. pathogenic bacteria
or vectors of disease such as mosquito
s) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (e.g. the common chimpanzee
).
In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity
.
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
sequence of an organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...
(be it an animal
Animal
Animals are a major group of multicellular, eukaryotic organisms of the kingdom Animalia or Metazoa. Their body plan eventually becomes fixed as they develop, although some undergo a process of metamorphosis later on in their life. Most animals are motile, meaning they can move spontaneously and...
, a plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...
, a fungus
Fungus
A fungus is a member of a large group of eukaryotic organisms that includes microorganisms such as yeasts and molds , as well as the more familiar mushrooms. These organisms are classified as a kingdom, Fungi, which is separate from plants, animals, and bacteria...
, a bacterium, an archaean, a protist
Protist
Protists are a diverse group of eukaryotic microorganisms. Historically, protists were treated as the kingdom Protista, which includes mostly unicellular organisms that do not fit into the other kingdoms, but this group is contested in modern taxonomy...
or a virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...
) and to annotate protein-coding gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
s and other important genome-encoded features. The genome sequence of an organism includes the collective DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
sequences of each chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...
in the organism. For a bacterium
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.
The Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...
was a landmark genome project that is already having a major impact on research across the life sciences, with potential for spurring numerous medical and commercial developments.
In 2011 ICAR scientists were the first in the world to sequence the pigeon pea genome. it was a purely indigenous effort by 31 scientists led by Nagendra Kumar Singh
Nagendra Kumar Singh
Dr Nagendra Kumar Singh is an eminent Indian agricultural scientist. He is presently a National Professor under ICAR at National Research Centre for Plant Biotechnology, Indian Agricultural Research Institute, New Delhi. He was born in a small village called Rajapur in the Mau District of Uttar...
of NRCPB. The first draft of the sequence was published in J. Plant Biochem. Biotechnol
Genome assembly
Genome assembly refers to the process of taking a large number of short DNA sequenceDNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...
s and putting them back together to create a representation of the original chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...
s from which the DNA originated. In a shotgun sequencing
Shotgun sequencing
In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun....
project, all the DNA from a source (usually a single organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...
, anything from a bacterium to a mammal
Mammal
Mammals are members of a class of air-breathing vertebrate animals characterised by the possession of endothermy, hair, three middle ear bones, and mammary glands functional in mothers with young...
) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 1000 nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...
s or bases at a time. (The four bases are adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...
, guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...
, cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...
, and thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...
, represented as AGCT.) A genome assembly algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged together, and the process continues.
Genome assembly is a very difficult computational
Computational
Computational may refer to:* Computer* Computational algebra* Computational Aeroacoustics* Computational and Information Systems Laboratory* Computational and Systems Neuroscience* Computational archaeology* Computational auditory scene analysis...
problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and some occur in thousands of different locations, especially in the large genomes of plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...
s and animal
Animal
Animals are a major group of multicellular, eukaryotic organisms of the kingdom Animalia or Metazoa. Their body plan eventually becomes fixed as they develop, although some undergo a process of metamorphosis later on in their life. Most animals are motile, meaning they can move spontaneously and...
s.
The resulting (draft) genome sequence is produced by combining the information sequenced contig
Contig
A contig is a set of overlapping DNA segments that together represent a consensus region of DNA. In bottom-up sequencing projects, a contig refers to overlapping sequence data ; in top-down sequencing projects, contig refers to the overlapping clones that form a physical map of the genome that is...
s and then employing linking information to create scaffolds. Scaffolds are positioned along the physical map
Physical map
Physical map may refer to:* Physical map , maps that shows countries of the world.* Physical map , showing how much DNA separates two genes and is measured in base pairs, as opposed to a genetic map...
of the chromosomes creating a "golden path".
Assembly Software
Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such assembler Short Oligonucleotide Analysis Package developed by BGIBeijing Genomics Institute
BGI , known as the Beijing Genomics Institute prior to 2008, is one of the world’s premier genome sequencing centers. Its sequencing output is expected to soon surpass the equivalent of more than 15,000 human genomes per year....
for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.
Genome annotation
Genome annotation is the process of attaching biological information to sequencesDNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...
.
It consists of two main steps:
- identifying elements on the genomeGenomeIn modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
, a process called gene predictionGene predictionIn computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions...
, and - attaching biological information to these elements.
Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.
The basic level of annotation is using BLAST
BLAST
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...
for finding similarities, and then annotating genomes based on that. However, nowadays more and more additional information is added to the annotation platform. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g. Ensembl
Ensembl
Ensembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...
) rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline.
Structural annotation consists of the identification of genomic elements.
- ORFsOpen reading frameIn molecular genetics, an open reading frame is a DNA sequence that does not contain a stop codon in a given reading frame.Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop...
and their localisation - gene structure
- coding regions
- location of regulatory motifs
Functional annotation consists of attaching biological information to genomic elements.
- biochemical function
- biological function
- involved regulation and interactions
- expression
These steps may involve both biological experiments and in silico
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...
analysis.
A variety of software tools have been developed to permit scientists to view and share genome annotations.
Genome annotation is the next major challenge for the Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...
, now that the genome sequences of human and several model organisms are largely complete. Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism. Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together".
Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:
- ENCyclopedia Of DNA Elements (ENCODE)ENCODEENCODE is a public research consortium launched by the US National Human Genome Research Institute in September 2003. The goal is to find all functional elements in the human genome, one of the most critical projects by NHGRI after it completed the successful Human Genome Project...
- Entrez Gene
- EnsemblEnsemblEnsembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...
- GENCODEGENCODEGENCODE is a scientific project in genome research and part of the ENCODE scale-up project.The aim of the GENCODE project is to annotate all evidence-based gene features in the entire human genome at a high accuracy...
- Gene Ontology ConsortiumGene OntologyThe Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species...
- GeneRIFGeneRIFA GeneRIF or Gene Reference Into Function is a short statement about the function of a gene. GeneRIFs provide a simple mechanism for allowing scientists to add to the functional annotation of genes described in the database. In practice, function is construed quite broadly...
- RefSeqRefSeqThe Reference Sequence database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. This database is built by National Center for Biotechnology Information , and, unlike GenBank, provides only single record for each natural...
- UniprotUniProtUniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many of which are derived from genome sequencing projects...
- Vertebrate and Genome Annotation Project (Vega)Vertebrate and Genome Annotation ProjectThe Vertebrate and Genome Annotation project provides manual curation of vertebrate genomes for the scientific community. The Vega data repository is publicly available, regularly updated and includes annotations of several finished vertebrate genome sequences: human, mouse, zebrafish, pig and...
At Wikipedia, genome annotation has started to become automated under the auspices of the Gene Wiki portal which operates a bot
Internet bot
Internet bots, also known as web robots, WWW robots or simply bots, are software applications that run automated tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone...
that harvests gene data from research databases and creates gene stubs on that basis.
When is a genome project finished?
When sequencing a genome, there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every base pairBase pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...
of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes.
It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes
Gênes
Gênes is the name of a département of the First French Empire in present Italy, named after the city of Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa. Its capital was Genoa, and it was divided in the arrondissements of Genoa, Bobbio, Novi Ligure, Tortona and...
in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans, where coding DNA
Coding region
The coding region of a gene, also known as the coding sequence or CDS, is that portion of a gene's DNA or RNA, composed of exons, that codes for protein. The region is bounded nearer the 5' end by a start codon and nearer the 3' end with a stop codon...
may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the coding region
Coding region
The coding region of a gene, also known as the coding sequence or CDS, is that portion of a gene's DNA or RNA, composed of exons, that codes for protein. The region is bounded nearer the 5' end by a start codon and nearer the 3' end with a stop codon...
s separately. Also, as scientists understand more about the role of this noncoding DNA
Noncoding DNA
In genetics, noncoding DNA describes components of an organism's DNA sequences that do not encode for protein sequences. In many eukaryotes, a large percentage of an organism's total genome size is noncoding DNA, although the amount of noncoding DNA, and the proportion of coding versus noncoding...
(often referred to as junk DNA), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism.
In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include gene prediction
Gene prediction
In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions...
to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs
Expressed sequence tag
An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence. They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. The identification of ESTs has proceeded rapidly, with approximately 65.9 million ESTs now available in...
or mRNAs to help find out where the genes actually are.
Historical and technological perspectives
Historically, when sequencing eukaryotic genomes (such as the worm Caenorhabditis elegansCaenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...
) it was common to first map
Gene mapping
Gene mapping, also called genome mapping, is the creation of a genetic map assigning DNA fragments to chromosomes.When a genome is first investigated, this map is nonexistent. The map improves with the scientific progress and is perfect when the genomic DNA sequencing of the species has been...
the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be 'shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach).
Improvements in DNA sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...
) and newer technology has also meant that genomes can be sequenced far more quickly.
When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as model organism
Model organism
A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms. Model organisms are in vivo models and are widely used to...
or have a relevance to human health (e.g. pathogenic bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
or vectors of disease such as mosquito
Mosquito
Mosquitoes are members of a family of nematocerid flies: the Culicidae . The word Mosquito is from the Spanish and Portuguese for little fly...
s) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (e.g. the common chimpanzee
Chimpanzee
Chimpanzee, sometimes colloquially chimp, is the common name for the two extant species of ape in the genus Pan. The Congo River forms the boundary between the native habitat of the two species:...
).
In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity
Human Genome Diversity Project
The Human Genome Diversity Project was started by Stanford University's Morrison Institute and a collaboration of scientists around the world. It is the result of many years of work by Luigi Cavalli-Sforza, one of the most cited scientists in the world, which has published extensively in the use...
.
Example genome projects
Many organisms have genome projects that have either been completed or will be completed shortly, including:- HumanHumanHumans are the only living species in the Homo genus...
s, Homo sapiens; see Human genome projectHuman Genome ProjectThe Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional... - Palaeo-Eskimo, an ancient-human
- NeanderthalNeanderthalThe Neanderthal is an extinct member of the Homo genus known from Pleistocene specimens found in Europe and parts of western and central Asia...
, "Homo neanderthalensis" (partial); see Neanderthal Genome ProjectNeanderthal Genome ProjectThe Neanderthal genome project is a collaboration of scientists coordinated by the Max Planck Institute for Evolutionary Anthropology in Germany and 454 Life Sciences in the United States to sequence the Neanderthal genome.... - Common ChimpanzeeCommon ChimpanzeeThe common chimpanzee , also known as the robust chimpanzee, is a great ape. Colloquially, the common chimpanzee is often called the chimpanzee , though technically this term refers to both species in the genus Pan: the common chimpanzee and the closely related bonobo, formerly called the pygmy...
Pan troglodytes; see Chimpanzee Genome ProjectChimpanzee Genome ProjectThe Chimpanzee Genome Project is an effort to determine the DNA sequence of the Chimpanzee genome. It is expected that by comparing the genomes of humans and other apes, it will be possible to better understand what makes humans distinct from other species.... - Domestic Cow
- Bovine GenomeBovine genome75px|left|The genome of a female Hereford cow has been sequenced by the Bovine Genome Sequencing and Analysis Consortium, a team of researchers led by the National Institutes of Health and the U.S...
- Honey Bee Genome Sequencing ConsortiumHoney Bee Genome Sequencing ConsortiumThe Honey Bee Genome Sequencing Consortium is an international collaborative group of genomics scientists, scientific organisations and universities who are trying to decipher the genome sequences of the honey bee . It was formed in 2001 by American scientists...
- Human microbiome projectHuman microbiome projectThe Human Microbiome Project is a United States National Institutes of Health initiative with the goal of identifying and characterizing the microorganisms which are found in association with both healthy and diseased humans . Launched in 2008, it is a five-year project, best characterized as a...
- International Grape Genome ProgramInternational Grape Genome ProgramThe International Grape Genomics Program is a collaborative genome project dedicated to determining the genome sequence of the grapevine Vitis vinifera...
- International HapMap ProjectInternational HapMap ProjectThe International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...
See also
- Joint Genome InstituteJoint Genome InstituteThe U.S. Department of Energy Joint Genome Institute was created in 1997 to unite the expertise and resources in genome mapping, DNA sequencing, technology development, and information sciences pioneered at the DOE genome centers at Lawrence Berkeley National Laboratory , Lawrence Livermore...
- Model organismModel organismA model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms. Model organisms are in vivo models and are widely used to...
- National Center for Biotechnology InformationNational Center for Biotechnology InformationThe National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
- IlluminaIllumina (company)Illumina, Inc. is a company incorporated in April 1998 that develops, manufactures and markets integrated systems for the analysis of genetic variation and biological function. Using its technologies, the company provides a line of products and services that serve the sequencing, genotyping and...
, private company involved in genome sequencing - KnomeKnomeKnome is an American personal genomics company that sells human whole genome and exome analysis and sequencing services to researchers and consumers...
, private company offering genome analysis & sequencing