Clustal
Encyclopedia
Clustal is a widely used multiple sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

 computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

. The latest version is 2.1. There are two main variations:
  • ClustalW: command line interface
  • ClustalX: This version has a graphical user interface. It is available for Windows, Mac OS, and Unix/Linux.


This program is available from the Clustal Homepage or [ftp://ftp.ebi.ac.uk/pub/software/ European Bioinformatics Institute ftp server].

Input/Output

This program accepts a wide range on input format. Included NBRF/PIR, FASTA
FASTA format
In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences...

, EMBL/Swissprot, Clustal, GCC/MSF, GCG9 RSF, and GDE.

The output format can be one or many of the following: Clustal, NBRF/PIR, GCG/MSF, PHYLIP
PHYLIP
PHYLIP is a free computational phylogenetics package of programs for inferring evolutionary trees . The name is an acronym for PHYLogeny Inference Package. It consists of 35 portable programs, i.e...

, GDE, or NEXUS.

Multiple sequence alignment

There are three main steps:
  1. Do a pairwise alignment
    Sequence alignment
    In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

  2. Create a phylogenetic tree
    Phylogenetic tree
    A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...

     (or use a user-defined tree)
  3. Use the phylogenetic tree to carry out a multiple alignment


These are done automatically when you select "Do Complete Alignment".
Other options are "Do Alignment from guide tree" and "Produce guide tree only".

Setting

Users can align the sequences using the default setting, but occasionally it may be useful to customize one's own parameters.

The main parameters are the gap opening penalty, and the gap extension penalty.

See also

  • Sequence alignment software
    Sequence alignment software
    This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment...

  • T-Coffee
    T-Coffee
    T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment...

  • Align-m
    Align-m
    Align-m is a multiple sequence alignment program written by Ivo Van Walle.Align-m has the ability to accomplish the following tasks:* Multiple sequence alignment* Include extra information to guide the sequence alignment* Multiple structural alignment...

  • DIALIGN-T
    DIALIGN-T
    DIALIGN-T is an implementation of an improved algorithm for segment-based multiple sequence alignment, written by Amarendran R. Subramanian and freely available under the GNU Lesser General Public License. DIALIGN-T has been updated to DIALIGN-TX recently in 2008.- References :* Subramanian AR,...

  • DIALIGN-TX
    DIALIGN-TX
    DIALIGN-TX is a multiple sequence alignment program written by Amarendran R. Subramanian and is substantial improvement of DIALIGN-T by combining greedy and progressive alignment strategies in a new algorithm....

  • JAligner
    JAligner
    JAligner is an open source Java implementation of the Smith-Waterman algorithm with Gotoh's improvement for biological local pairwise sequence alignment using the affine gap penalty model. It was written by Ahmed Moustafa....

  • MAFFT
    MAFFT
    MAFFT is a multiple sequence alignment program for amino acid or nucleotide sequences. MAFFT is freely available for academic use, without any warranty.- External links :* * * at EBI* at GenomeNet* in MyHits, SIB...

  • MAVID
    MAVID
    MAVID is a multiple sequence alignment program suitable for the alignment of large numbers of DNA sequences. The sequences can be small mitochondrial genomes or large genomic regions up to megabases long...

  • MUSCLE
  • ProbCons
    ProbCons
    ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools.- See also :*...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK