Consensus sequence
Encyclopedia
In molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 and bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, consensus sequence refers to the most common nucleotide or amino acid at a particular position after multiple sequences are aligned. A consensus sequence is a way of representing the results of a multiple sequence alignment
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

, where related sequences are compared to each other, and similar functional sequence motif
Sequence motif
In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance...

s are found. The consensus sequence shows which residues are most abundant in the alignment at each position.

Developing software for pattern recognition
Pattern recognition
In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...

 is a major topic in genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

, and bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

. Specific sequence motif
Sequence motif
In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance...

s can function as regulatory sequence
Regulatory sequence
A regulatory sequence is a segment of DNA where regulatory proteins such as transcription factors bind preferentially. These regulatory proteins bind to short stretches of DNA called regulatory regions, which are appropriately positioned in the genome, usually a short distance 'upstream' of the...

s controlling biosynthesis, or as signal sequence
Signal sequence
Signal sequence can refer to:*Protein targeting*Signal peptide*DNA uptake signal sequence...

s that direct a molecule to a specific site within the cell or regulate its maturation. Since the regulatory function of these sequences is important, they are thought to be conserved across long periods of evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

. In some cases, evolutionary relatedness can be estimated by the amount of conservation of these sites.

The conserved sequence motifs are called consensus sequences and they show which residues are conserved and which residues are variable. Consider the following example DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 sequence:
A[CT]N{A}YR


In this notation, A means that an A is always found in that position; [CT] stands for either C or T; N stands for any base; and {A} means any base except A. Y represents any pyrimidine
Pyrimidine
Pyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring...

, and R indicates any purine
Purine
A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....

.

In this example, the notation [CT] does not give any indication of the relative frequency of C or T occurring at that position. An alternative method of representing a consensus sequence uses a sequence logo
Sequence logo
In bioinformatics, a sequence logo is a graphical representation of the sequence conservation of nucleotides or amino acids .-Logo creation:...

. This is a graphical representation of the consensus sequence, in which the size of a symbol is related to the frequency that a given nucleotide (or amino acid) occurs at a certain position. In sequence logos the more conserved the residue, the larger the symbol for that residue is drawn, the less frequent, the smaller the symbol. Sequence logos can be generated using WebLogo, or using the Gestalt Workbench, a publicly availablable visualization tool written by Gustavo Glusman at the Institute for Systems Biology. Further discussion on the limitations of consensus sequences is given in a paper 'Consensus Sequence Zen'.

A protein binding site, represented by a consensus sequence, may be a short sequence of nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s which is found several times in the genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 and is thought to play the same role in its different locations. For example, many transcription factors recognize particular patterns in the promoters of the gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s they regulate. In the same way restriction enzymes usually have palindromic consensus sequences, usually corresponding to the site where they cut the DNA. Transposons act in much the same manner in their identification of target sequences for transposition. Finally splice sites (sequences immediately surrounding the exon
Exon
An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA...

-intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

 boundaries) can also be considered as consensus sequences.

Thus a consensus sequence is a model for a putative DNA recognition site: it is obtained by aligning all known examples of a certain recognition site and defined as the idealized sequence that represents the predominant base at each position. All the actual examples shouldn't differ from the consensus by more than a few substitutions, but counting mismatches in this way can lead to inconsistencies.

Any mutation allowing a mutated nucleotide in the core promoter sequence to look more like the consensus sequence is known as an up mutation. This kind of mutation will generally make the promoter stronger and thus the RNA polymerase forms a tighter bind to the DNA it wishes to transcribe and transcription is up regulated. On the contrary, mutations that destroy conserved nucleotides in the consensus sequence are known as down mutations. These types of mutations down regulate transcription since RNA polymerase can no longer bind as tightly to the core promoter sequence.

See also

  • Degenerate nucleotide
  • Position-specific scoring matrix
    Position-specific scoring matrix
    A position weight matrix , also called position-specific weight matrix or position-specific scoring matrix , is a commonly used representation of motifs in biological sequences....

  • Sequence motif
    Sequence motif
    In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK