Human genome
Encyclopedia

The human genome is the genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA
Mitochondrial DNA
Mitochondrial DNA is the DNA located in organelles called mitochondria, structures within eukaryotic cells that convert the chemical energy from food into a form that cells can use, adenosine triphosphate...

. 22 of the 23 chromosomes are autosomal chromosome pairs
Autosome
An autosome is a chromosome that is not a sex chromosome, or allosome; that is to say, there is an equal number of copies of the chromosome in males and females. For example, in humans, there are 22 pairs of autosomes. In addition to autosomes, there are sex chromosomes, to be specific: X and Y...

, while the remaining pair is sex-determining
XY sex-determination system
The XY sex-determination system is the sex-determination system found in humans, most other mammals, some insects and some plants . In this system, females have two of the same kind of sex chromosome , and are called the homogametic sex. Males have two distinct sex chromosomes , and are called...

. The haploid human genome occupies a total of just over 3 billion DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

s.
The Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

 (HGP) produced a reference sequence
Reference genome
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' genetic code. As they are often assembled from the sequencing of DNA from a number of donors, reference genomes do not accurately represent the genetic code of any...

 of the euchromatic
Euchromatin
Euchromatin is a lightly packed form of chromatin that is rich in gene concentration, and is often under active transcription. Unlike heterochromatin, it is found in both cells with nuclei and cells without nuclei...

 human genome, which is used worldwide in biomedical sciences.

The haploid human genome contains ca. 23,000 protein-coding gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s, far fewer than had been expected before its sequencing. In fact, only about 1.5% of the genome codes for protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

s, while the rest consists of non-coding RNA
Non-coding RNA
A non-coding RNA is a functional RNA molecule that is not translated into a protein. Less-frequently used synonyms are non-protein-coding RNA , non-messenger RNA and functional RNA . The term small RNA is often used for short bacterial ncRNAs...

 genes, regulatory sequence
Regulatory sequence
A regulatory sequence is a segment of DNA where regulatory proteins such as transcription factors bind preferentially. These regulatory proteins bind to short stretches of DNA called regulatory regions, which are appropriately positioned in the genome, usually a short distance 'upstream' of the...

s, intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

s, and noncoding DNA
Noncoding DNA
In genetics, noncoding DNA describes components of an organism's DNA sequences that do not encode for protein sequences. In many eukaryotes, a large percentage of an organism's total genome size is noncoding DNA, although the amount of noncoding DNA, and the proportion of coding versus noncoding...

 (once known as "junk DNA").

Genes

There are estimated to be between 20,000 and 25,000 human protein-coding gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s. The estimate of the number of human genes has been repeatedly revised down as genome sequence quality and gene finding methods have improved. In the late 1960s, predictions estimated that human cells had as many as 2,000,000 genes.

Surprisingly, the number of human genes seems to be less than a factor of two greater than that of many much simpler organisms, such as the roundworm
Caenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...

 and the fruit fly
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...

. However, a larger proportion of human genes are related to central nervous system and especially brain development.

Human genes are distributed unevenly across the chromosomes. Each chromosome contains various gene-rich and gene-poor regions, which seem to be correlated with chromosome bands
Cytogenetics
Cytogenetics is a branch of genetics that is concerned with the study of the structure and function of the cell, especially the chromosomes. It includes routine analysis of G-Banded chromosomes, other cytogenetic banding techniques, as well as molecular cytogenetics such as fluorescent in situ...

 and GC-content
GC-content
In molecular biology and genetics, GC-content is the percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine . This may refer to a specific fragment of DNA or RNA, or that of the whole genome...

. The significance of these nonrandom patterns of gene density is not well understood. In addition to protein coding genes, the human genome contains thousands of RNA genes, including tRNA, ribosomal
Ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....

 RNA, microRNA, and other non-coding RNA genes.

Regulatory sequences

The human genome has many different regulatory sequences which are crucial to controlling gene expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

. These are typically short sequences that appear near or within genes. A systematic understanding of these regulatory sequences and how they together act as a gene regulatory network
Gene regulatory network
A gene regulatory network or genetic regulatory network is a collection of DNA segments in a cell whichinteract with each other indirectly and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA.In general, each mRNA molecule goes...

 is only beginning to emerge from computational, high-throughput expression and comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

 studies. Some types of non-coding DNA are genetic "switches" that do not encode proteins, but do regulate when and where genes are expressed.

Identification of regulatory sequences relies in part on evolutionary conservation. The evolutionary branch between the primates and mouse
Mouse
A mouse is a small mammal belonging to the order of rodents. The best known mouse species is the common house mouse . It is also a popular pet. In some places, certain kinds of field mice are also common. This rodent is eaten by large birds such as hawks and eagles...

, for example, occurred 70–90 million years ago. So computer comparisons of gene sequences that identify conserved non-coding sequence
Conserved non-coding sequence
A conserved non-coding sequence is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production....

s will be an indication of their importance in duties such as gene regulation.

Another comparative genomic approach to locating regulatory sequences in humans is the gene sequencing of the puffer fish. These vertebrates have essentially the same genes and regulatory gene sequences as humans, but with only one-eighth the noncoding DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes.

Other DNA

Protein-coding sequences (specifically, coding exon
Exon
An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA...

s) comprise less than 1.5% of the human genome. Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the human genome size
Genome size
Genome size is the total amount of DNA contained within one copy of a single genome. It is typically measured in terms of mass in picograms or less frequently in Daltons or as the total number of nucleotide base pairs typically in megabases . One picogram equals 978 megabases...

. Much of this is composed of:

Repeat elements

  • Tandem repeat
    Tandem repeat
    Tandem repeats occur in DNA when a pattern of two or more nucleotides is repeated and the repetitions are directly adjacent to each other. -Example:An example would be:in which the sequence A-T-T-C-G is repeated three times.-Terminology:...

    s
    • Satellite DNA
      Satellite DNA
      Satellite DNA consists of very large arrays of tandemly repeating, non-coding DNA. Satellite DNA is the main component of functional centromeres, and form the main structural constituent of heterochromatin....

    • Minisatellite
      Minisatellite
      A minisatellite is a section of DNA that consists of a short series of bases 10-60 bp. These occur at more than 1,000 locations in the human genome...

    • Microsatellite
  • Interspersed repeat
    Interspersed repeat
    Interspersed repetitive DNA is found in all eukaryotic genomes. Certain classes of these sequences propagate themselves by RNA mediated transposition, and they have been called retrotransposons. Interspersed repetitive DNA elements allow new genes to evolve. They do this by uncoupling similar DNA...

    s
    • SINEs
    • LINEs

Transposons

  • Retrotransposon
    Retrotransposon
    Retrotransposons are genetic elements that can amplify themselves in a genome and are ubiquitous components of the DNA of many eukaryotic organisms. They are a subclass of transposon. They are particularly abundant in plants, where they are often a principal component of nuclear DNA...

    s
    • LTR
      • Ty1-copia
      • Ty3-gypsy
    • Non-LTR
      • SINEs
      • LINEs
  • DNA Transposons

Noncoding DNA

Many DNA sequences that do not code for gene expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

 have important biological functions as indicated by comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

 studies that report some sequences of noncoding DNA that are highly conserved
Conserved sequence
In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences , protein sequences, protein structures or polymeric carbohydrates across species or within different molecules produced by the same organism...

, sometimes on time-scales representing hundreds of millions of years, implying that these noncoding regions are under strong evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

ary pressure and positive selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

. These noncoding sequences were once referred to as "junk" DNA and there are many sequences that are likely to function, but in ways that are not fully understood. Recent experiments using microarrays
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

 have revealed that a substantial fraction of non-genic DNA is in fact transcribed into RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

, which leads to the possibility that the resulting transcripts may have some unknown function. Also, the evolutionary conservation across the mammal
Mammal
Mammals are members of a class of air-breathing vertebrate animals characterised by the possession of endothermy, hair, three middle ear bones, and mammary glands functional in mothers with young...

ian genomes of much more sequence than can be explained by protein-coding regions indicates that many, and perhaps most, functional elements in the genome remain unknown. The investigation of the vast quantity of sequence information in the human genome whose function remains unknown is currently a major avenue of scientific inquiry. Meanwhile, considering the global genome DNA information as a whole could provide new ways to understand a possible global level function of non coding DNA.

Information content

The 2.9 billion base pairs of the haploid human genome correspond to a maximum of about 725 megabyte
Megabyte
The megabyte is a multiple of the unit byte for digital information storage or transmission with two different values depending on context: bytes generally for computer memory; and one million bytes generally for computer storage. The IEEE Standards Board has decided that "Mega will mean 1 000...

s of data, since every base pair can be coded by 2 bits. Since individual genomes vary by less than 1% from each other, they can be losslessly compressed
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...

 to roughly 4 megabytes.

The entropy rate
Entropy rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process...

 of the genome differs significantly between coding and non-coding sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about 45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per base pair for the individual chromosome, except for the Y-chromosome, which has an entropy rate below 0.9 bits per base pair.

Information content of the haploid human genome by chromosome:

Haploid means we only count one of each chromosome pair. For this reason, the total information content for a woman (XX) is less than for a man (XY), where both the X and the Y are counted.
total (XY) total (XX) 10  11  12  13  14  15  16  17  18  19  20  21  22  X
X chromosome
The X chromosome is one of the two sex-determining chromosomes in many animal species, including mammals and is common in both males and females. It is a part of the XY sex-determination system and X0 sex-determination system...

 
Y
Y chromosome
The Y chromosome is one of the two sex-determining chromosomes in most mammals, including humans. In mammals, it contains the gene SRY, which triggers testis development if present. The human Y chromosome is composed of about 60 million base pairs...

million base pairs (Mbp) 3,080 3,022 247 243 199 191 181 171 159 146 140 135 134 132 114 106 100 89 79 76 63 62 47 50 155 58
megabytes (raw data) 770 756 61.8 60.7 49.9 47.8 45.2 42.7 39.7 36.6 35.1 33.9 33.6 33.1 28.5 26.6 25.1 22.2 19.7 19.0 16.0 15.6 11.7 12.4 38.7 14.4
megabytes (zipped
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...

 ASCII text)
Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

 file
827 819 65.1 68.2 57.4 52.3 51.3 48.8 45.3 38.6 33.9 39.1 39.8 38.8 28.8 26.5 22.9 22.5 22.7 22.2 16.4 18.9 10.4 10.4 38.6 8.0
entropy rate in bits per base pair 1.70 1.71 1.82 1.80 1.82 1.82 1.83 1.82 1.81 1.83 1.59 1.83 1.84 1.59 1.56 1.53 1.66 1.82 1.87 1.58 1.86 1.82 1.62 1.83 1.80 0.84

Sequencing

DNA sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 determines the order of the nucleotide bases
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

 in a genome.

Composite

The Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

 and a parallel project by Celera Genomics
Celera Genomics
Celera Corporation was a business unit of the Applera Corporation, but was spun off in July 2008 to become an independent publicly traded company. In May 2011 Quest Diagnostics Incorporated completed the acquisition of Celera, which thus became a wholly owned subsidiary...

 each produced and published a haploid human genome sequence, both of which were a composite of the DNA sequence of several individuals.

Personal

A personal genome sequence is a complete sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 of the chemical base pairs that make up the DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 of a single person. Because medical treatments have different effects on different people because of genetic variations such as single-nucleotide polymorphisms (SNPs), the analysis of personal genomes may lead to personalized medical treatment based on individual genotypes.

The completion of the fifth such map was announced in December 2008. The genome mapped was that of a Korean researcher Seong-Jin Kim. Genome maps had previously been completed for Craig Venter
Craig Venter
John Craig Venter is an American biologist and entrepreneur, most famous for his role in being one of the first to sequence the human genome and for his role in creating the first cell with a synthetic genome in 2010. Venter founded Celera Genomics, The Institute for Genomic Research and the J...

 of the U.S. in 2007, James Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

 of the U.S. in April 2008, and Yang Huanming
Yang Huanming
Dr. Yang Huanming, also known as Dr. Henry Yang, is one of China's leading genetics researchers. Yang directs the Beijing Genomics Institute, at the Chinese Academy of Sciences in Beijing, China...

 of China in November 2008 and Dan Stoicescu in January 2008.

Personal genomes had not been sequenced in the Human Genome Project to protect the identity of volunteers who provided DNA samples. That sequence was derived from the DNA of several volunteers from a diverse population. Another distinction is that the HGP sequence is haploid, however, the sequence maps for Venter and Watson for example are diploid, representing both sets of chromosomes.

Kim’s genome had 1.58 million SNPs that had never been reported before and indicates that six out of 10,000 DNA bases are unique to Koreans. Kim's sequence map can be used to assist in building a standard Korean genome, which can then be used to compare the genomes of other Korean individuals for personalized medical treatments.

Mapping

Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome.

Variation

An example of a variation map is the HapMap being developed by the International HapMap Project
International HapMap Project
The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...

. The HapMap is a haplotype
Haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...

 map of the human genome, "which will describe the common patterns of human DNA sequence variation." It catalogs the patterns of small-scale variations in the genome that involve single DNA letters, or bases.

Researchers published the first sequence-based map of large-scale structural variation across the human genome in the journal Nature
Nature (journal)
Nature, first published on 4 November 1869, is ranked the world's most cited interdisciplinary scientific journal by the Science Edition of the 2010 Journal Citation Reports...

in May 2008. Large-scale structural variations are differences in the genome among people that range from a few thousand to a few million DNA bases; some are gains or losses of stretches of genome sequence and others appear as re-arrangements of stretches of sequence. These variations include differences in the number of copies individuals have of a particular gene, deletions, translocations and inversions.

Variation

Most studies of human genetic variation have focused on single-nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur 1 in 1000 base pairs, on average, in the euchromatic
Euchromatin
Euchromatin is a lightly packed form of chromatin that is rich in gene concentration, and is often under active transcription. Unlike heterochromatin, it is found in both cells with nuclei and cells without nuclei...

 human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless of race, genetically 99.9% the same", although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in copy number variation. A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by the International HapMap Project
International HapMap Project
The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...

.

The genomic loci and length of certain types of small repetitive sequences
Repeated sequence (DNA)
In the study of DNA sequences, one can distinguish two main types of repeated sequence:*Tandem repeats:**Satellite DNA**Minisatellite**Microsatellite*Interspersed repeats:**SINEs...

 are highly variable from person to person, which is the basis of DNA fingerprinting and DNA paternity testing technologies. The heterochromatic
Heterochromatin
Heterochromatin is a tightly packed form of DNA, which comes in different varieties. These varieties lie on a continuum between the two extremes of constitutive and facultative heterochromatin...

 portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant phenotypic
Phenotype
A phenotype is an organism's observable characteristics or traits: such as its morphology, development, biochemical or physiological properties, behavior, and products of behavior...

 effect results from typical variation in repeats or heterochromatin.

Most gross genomic mutations in gamete
Gamete
A gamete is a cell that fuses with another cell during fertilization in organisms that reproduce sexually...

 germ cells probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities. Down syndrome
Down syndrome
Down syndrome, or Down's syndrome, trisomy 21, is a chromosomal condition caused by the presence of all or part of an extra 21st chromosome. It is named after John Langdon Down, the British physician who described the syndrome in 1866. The condition was clinically described earlier in the 19th...

, Turner Syndrome
Turner syndrome
Turner syndrome or Ullrich-Turner syndrome encompasses several conditions in human females, of which monosomy X is most common. It is a chromosomal abnormality in which all or part of one of the sex chromosomes is absent...

, and a number of other diseases result from nondisjunction
Nondisjunction
Nondisjunction is the failure of chromosome pairs to separate properly during meiosis stage 1 or stage 2. This could arise from a failure of homologous chromosomes to separate in meiosis I, or the failure of sister chromatids to separate during meiosis II or mitosis. The result of this error is a...

 of entire chromosomes. Cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...

 cells frequently have aneuploidy
Aneuploidy
Aneuploidy is an abnormal number of chromosomes, and is a type of chromosome abnormality. An extra or missing chromosome is a common cause of genetic disorders . Some cancer cells also have abnormal numbers of chromosomes. Aneuploidy occurs during cell division when the chromosomes do not separate...

 of chromosomes and chromosome arms, although a cause and effect
Causality
Causality is the relationship between an event and a second event , where the second event is understood as a consequence of the first....

 relationship between aneuploidy and cancer has not been established.

Genetic disorders

Most aspects of human biology involve both genetic (inherited) and non-genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in nature (height, eye color, ability to taste or smell certain compounds, etc.). Moreover, some genetic disorders only cause disease in combination with the appropriate environmental factors (such as diet). With these caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For example, cystic fibrosis
Cystic fibrosis
Cystic fibrosis is a recessive genetic disease affecting most critically the lungs, and also the pancreas, liver, and intestine...

 is caused by mutations in the CFTR gene, and is the most common recessive disorder in caucasian populations with over 1,300 different mutations known. Disease-causing mutations in specific genes are usually severe in terms of gene function, and are fortunately rare, thus genetic disorders are similarly individually rare. However, since there are many genes that can vary to cause genetic disorders, in aggregate they comprise a significant component of known medical conditions, especially in pediatric medicine. Molecularly characterized genetic disorders are those for which the underlying causal gene has been identified, currently there are approximately 2,200 such disorders annotated in the OMIM database.

Studies of genetic disorders are often performed by means of family-based studies. In some instances population based approaches are employed, particularly in the case of so-called founder populations such as those in Finland, French-Canada, Utah, Sardinia, etc. Diagnosis and treatment of genetic disorders are usually performed by a geneticist
Geneticist
A geneticist is a biologist who studies genetics, the science of genes, heredity, and variation of organisms. A geneticist can be employed as a researcher or lecturer. Some geneticists perform experiments and analyze data to interpret the inheritance of skills. A geneticist is also a Consultant or...

-physician trained in clinical/medical genetics. The results of the Human Genome Project are likely to provide increased availability of genetic testing
Genetic testing
Genetic testing is among the newest and most sophisticated of techniques used to test for genetic disorders which involves direct examination of the DNA molecule itself. Other genetic tests include biochemical tests for such gene products as enzymes and other proteins and for microscopic...

 for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and counselled
Genetic counseling
Genetic counseling or traveling is the process by which patients or relatives, at risk of an inherited disorder, are advised of the consequences and nature of the disorder, the probability of developing or transmitting it, and the options open to them in management and family planning...

 on the consequences, the probability it will be inherited, and how to avoid or ameliorate it in their offspring.

As noted above, there are many different kinds of DNA sequence variation, ranging from complete extra or missing chromosomes down to single nucleotide changes. It is generally presumed that much naturally occurring genetic variation in human populations is phenotypically neutral, i.e. has little or no detectable effect on the physiology of the individual (although there may be fractional differences in fitness defined over evolutionary time frames). Genetic disorders can be caused by any or all known types of sequence variation. To molecularly characterize a new genetic disorder, it is necessary to establish a causal link between a particular genomic sequence variant and the clinical disease under investigation. Such studies constitute the realm of human molecular genetics.

With the advent of the Human Genome and International HapMap Project
International HapMap Project
The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...

, it has become feasible to explore subtle genetic influences on many common disease conditions such as diabetes, asthma, migraine, schizophrenia, etc. Although some causal links have been made between genomic sequence variants in particular genes and some of these diseases, often with much publicity in the general media, these are usually not considered to be genetic disorders per se as their causes are complex, involving many different genetic and environmental factors. Thus there may be disagreement in particular cases whether a specific medical condition should be termed a genetic disorder.

Evolution

Comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

 studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of extant lineages approximately 200 million years ago, containing the vast majority of genes. Intriguingly, since genes and known regulatory sequences probably comprise less than 2% of the genome, this suggests that there may be more unknown functional sequence than known functional sequence. A smaller, yet substantial, fraction of human genes seem to be shared among most known vertebrate
Vertebrate
Vertebrates are animals that are members of the subphylum Vertebrata . Vertebrates are the largest group of chordates, with currently about 58,000 species described. Vertebrates include the jawless fishes, bony fishes, sharks and rays, amphibians, reptiles, mammals, and birds...

s. The published chimpanzee
Chimpanzee
Chimpanzee, sometimes colloquially chimp, is the common name for the two extant species of ape in the genus Pan. The Congo River forms the boundary between the native habitat of the two species:...

 genome differs from that of the human genome by 1.23% in direct sequence comparisons. Around 20% of this figure is accounted for by variation within each species, leaving only ~1.06% consistent sequence divergence between humans and chimps at shared genes. This nucleotide by nucleotide difference is dwarfed, however, by the portion of each genome that is not shared, including around 6% of functional genes that are unique to either humans or chimps. In other words, the considerable observable differences between humans and chimps may be due as much or more to genome level variation in the number, function and expression of genes rather than DNA sequence changes in shared genes. Indeed, even within humans, there has been found to be a previously unappreciated amount of copy number variation (CNV) which can make up as much as 5 - 15% of the human genome. In other words, between humans, there could be +/- 500,000,000 base pairs of DNA, some being active genes, others inactivated, or active at different levels. The full significance of this finding remains to be seen. On average, a typical human protein-coding gene differs from its chimpanzee ortholog by only two amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human chromosome 2
Chromosome 2 (human)
Chromosome 2 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome. Chromosome 2 is the second largest human chromosome, spanning more than 237 million base pairs and representing almost 8% of the total DNA in cells.Identifying genes on each...

, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13 (later renamed to chromosomes 2A and 2B, respectively).

Humans have undergone an extraordinary loss of olfactory receptor
Olfactory receptor
Olfactory receptors expressed in the cell membranes of olfactory receptor neurons are responsible for the detection of odor molecules. Activated olfactory receptors are the initial player in a signal transduction cascade which ultimately produces a nerve impulse which is transmitted to the brain...

 genes during our recent evolution, which explains our relatively crude sense of smell
Olfaction
Olfaction is the sense of smell. This sense is mediated by specialized sensory cells of the nasal cavity of vertebrates, and, by analogy, sensory cells of the antennae of invertebrates...

 compared to most other mammals. Evolutionary evidence suggests that the emergence of color vision
Color vision
Color vision is the capacity of an organism or machine to distinguish objects based on the wavelengths of the light they reflect, emit, or transmit...

 in humans and several other primate
Primate
A primate is a mammal of the order Primates , which contains prosimians and simians. Primates arose from ancestors that lived in the trees of tropical forests; many primate characteristics represent adaptations to life in this challenging three-dimensional environment...

 species has diminished the need for the sense of smell.

Mitochondrial genome

The human mitochondrial genome, while usually not included when referring to the "human genome", is of tremendous interest to geneticists, since it undoubtedly plays a role in mitochondrial disease
Mitochondrial disease
Mitochondrial diseases are a group of disorders caused by dysfunctional mitochondria, the organelles that are the "powerhouses" of the cell. Mitochondria are found in every cell of the human body except red blood cells...

. It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent. (see Mitochondrial Eve
Mitochondrial Eve
In the field of human genetics, Mitochondrial Eve refers to the matrilineal "MRCA" . In other words, she was the woman from whom all living humans today descend, on their mother's side, and through the mothers of those mothers and so on, back until all lines converge on one person...

)

Due to the lack of a system for checking for copying errors, Mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold increase in the mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of Native Americans
Indigenous peoples of the Americas
The indigenous peoples of the Americas are the pre-Columbian inhabitants of North and South America, their descendants and other ethnic groups who are identified with those peoples. Indigenous peoples are known in Canada as Aboriginal peoples, and in the United States as Native Americans...

 from Siberia
Siberia
Siberia is an extensive region constituting almost all of Northern Asia. Comprising the central and eastern portion of the Russian Federation, it was part of the Soviet Union from its beginning, as its predecessor states, the Tsardom of Russia and the Russian Empire, conquered it during the 16th...

 or Polynesia
Polynesia
Polynesia is a subregion of Oceania, made up of over 1,000 islands scattered over the central and southern Pacific Ocean. The indigenous people who inhabit the islands of Polynesia are termed Polynesians and they share many similar traits including language, culture and beliefs...

ns from southeastern Asia
Asia
Asia is the world's largest and most populous continent, located primarily in the eastern and northern hemispheres. It covers 8.7% of the Earth's total surface area and with approximately 3.879 billion people, it hosts 60% of the world's current human population...

. It has also been used to show that there is no trace of Neanderthal
Neanderthal
The Neanderthal is an extinct member of the Homo genus known from Pleistocene specimens found in Europe and parts of western and central Asia...

 DNA in the European gene mixture inherited through purely maternal lineage.

Epigenome

Epigenetics are a variety of features of the human genome that transcend its primary DNA sequence, such as chromatin
Chromatin
Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a cell. The primary functions of chromatin are; to package DNA into a smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and prevent DNA damage, and to control gene...

 packaging, histone
Histone
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation...

 modifications and DNA methylation
DNA methylation
DNA methylation is a biochemical process that is important for normal development in higher organisms. It involves the addition of a methyl group to the 5 position of the cytosine pyrimidine ring or the number 6 nitrogen of the adenine purine ring...

, and which are important in regulating gene expression, genome replication and other cellular processes. Epigenetic markers strengthen and weaken transcription of certain genes but do not affect the actual sequence of DNA nucleotides.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK