Sequence logo
Encyclopedia
In bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, a sequence logo is a graphical representation of the sequence conservation of nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s (in a strand of DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

/RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

) or amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s (in protein sequences).

Logo creation

To create sequence logos, related DNA, RNA or protein sequences, or DNA sequences that have common conserved binding sites, are aligned so that the most conserved parts create good alignments. A sequence logo can then be created from the conserved multiple sequence alignment. The sequence logo will show how well residues are conserved at each position: the fewer the number of residues, the higher the letters will be, because the better the conservation is at that position. Different residues at the same position are scaled according to their frequency. The height of the entire stack of residues is the information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

 measured in bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

s. Sequence logos can be used to represent conserved DNA binding site
DNA binding site
DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that they are part of a DNA sequence and they are bound by DNA-binding proteins...

s, where transcription factor
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

s bind.

The information content (y-axis) of position is given by:
for amino acids,
for nucleic acids,


where is the uncertainty
(sometimes called the Shannon entropy) of position

Here, is the relative frequency
Frequency (statistics)
In statistics the frequency of an event i is the number ni of times the event occurred in the experiment or the study. These frequencies are often graphically represented in histograms....

of base or amino acid at position , and is the small-sample correction for an alignment of letters. The height of letter in column is given by

The approximation for the small-sample correction, , is given by:

where is 4 for nucleotides, 20 for amino acids, and is the number of sequences in the alignment.

External links


Tools for creating sequence logos

  • WebLogo Python Code Python Code (BSD license, somewhat difficult to use)
  • WebLogo 3.0 (Online)
  • MoRAine (Online application with integrated binding site re-annotation)
  • GENIO (Online)
  • PWM-based logo (Online application for motif PWM-based models)
  • LogoBar (Java application)
  • CorreLogo An online server for 3D sequence logos of RNA and DNA alignments
  • seqlogo C function to generate DNA sequence logos
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK