UGENE
Encyclopedia
UGENE is free
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 open-source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 cross-platform
Cross-platform
In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...

 bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 software.

It integrates dozens of well-known biological
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...

 tools and algorithms
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

, providing both graphical user
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

 and command line
Command-line interface
A command-line interface is a mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks...

 interfaces
Interface
-Academic journals:* Interface: a journal for and about social movements* Interfaces * Journal of the Royal Society Interface* The Technology Interface Journal-Science:* Biointerface* Interface , boundary surface...

. Using UGENE Workflow Designer one can arrange the required tools and algorithms
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 into a workflow
Scientific workflow system
A Scientific Workflow Systems is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a scientific application...

 schema.

In order to provide maximum possible performance
Computer performance
Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.Depending on the context, good computer performance may involve one or more of the following:...

 UGENE utilizes multicore CPUs and GPUs to optimize some of its computational routines
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

. Another way to speed up computations
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

 is to use Amazon EC2 cloud
Cloud computing
Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility over a network ....

 resources.

Key features

The software supports the following features:
  • Creating, editing and annotating nucleic acid and protein
    Protein
    Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

     sequences
    Sequence (biology)
    A sequence in biology is the one-dimensional ordering of monomers, covalently linked within in a biopolymer; it is also referred to as the primary structure of the biological macromolecule.-See also:* Protein sequence* DNA sequence...

  • Search through online databases
    Online database
    An online database is a database accessible from a network, including from the Internet.It differs from a local database, held in an individual computer or its attached storage, such as a CD....

    : NCBI
    National Center for Biotechnology Information
    The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...

    , PDB
    Protein Data Bank
    The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....

    , UniProtKB/Swiss-Prot, UniProtKB/TrEMBL
  • Multiple sequence alignment
    Multiple sequence alignment
    A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

    : Clustal
    Clustal
    Clustal is a widely used multiple sequence alignment computer program. The latest version is 2.1. There are two main variations:*ClustalW: command line interface*ClustalX: This version has a graphical user interface...

    , MUSCLE, Kalign, MAFFT
    MAFFT
    MAFFT is a multiple sequence alignment program for amino acid or nucleotide sequences. MAFFT is freely available for academic use, without any warranty.- External links :* * * at EBI* at GenomeNet* in MyHits, SIB...

    , T-Coffee
    T-Coffee
    T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment...

  • Online and local BLAST
    BLAST
    In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...

     search
  • Restriction analysis
    Restriction enzyme
    A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...

     with integrated REBASE restriction enzyme database
  • Integrated Primer3 package for PCR primers design
    Primer (molecular biology)
    A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...

  • Search for direct
    Direct repeat
    Direct repeats are a type of genetic sequence that consists of two or more repeats of a specific sequence.Direct repeats are nucleotide sequences present in multiple copies in the genome. There are several types of repeated sequences. Interspersed DNA repeats are copies of transposable elements...

    , inverted
    Inverted repeat
    An inverted repeat is a sequence of nucleotides that is the reversed complement of another sequence further downstream.For example, 5'---GACTGC....GCAGTC---3'. When no nucleotides intervene between the sequence and its downstream complement, it is called a palindrome. Inverted repeats define the...

     and tandem
    Tandem repeat
    Tandem repeats occur in DNA when a pattern of two or more nucleotides is repeated and the repetitions are directly adjacent to each other. -Example:An example would be:in which the sequence A-T-T-C-G is repeated three times.-Terminology:...

     repeats
    Repeated sequence (DNA)
    In the study of DNA sequences, one can distinguish two main types of repeated sequence:*Tandem repeats:**Satellite DNA**Minisatellite**Microsatellite*Interspersed repeats:**SINEs...

     in DNA
    DNA
    Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

     sequences
  • Constructing dotplots
    Dot plot (bioinformatics)
    A dot plot is a graphical method that allows the comparison of two biological sequences and identify regions of close similarity between them. It is a kind of recurrence plot.-Introduction:...

     for nucleic acid sequences
  • Search for transcription factor
    Transcription factor
    In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

     binding site
    Binding site
    In biochemistry, a binding site is a region on a protein, DNA, or RNA to which specific other molecules and ions—in this context collectively called ligands—form a chemical bond...

    s (TFBS) with weight matrix
    Position-specific scoring matrix
    A position weight matrix , also called position-specific weight matrix or position-specific scoring matrix , is a commonly used representation of motifs in biological sequences....

     and SITECON algorithms
  • Aligning short reads with Bowtie and UGENE genome aligner
  • Search for ORFs
  • Cloning
    Cloning
    Cloning in biology is the process of producing similar populations of genetically identical individuals that occurs in nature when organisms such as bacteria, insects or plants reproduce asexually. Cloning in biotechnology refers to processes used to create copies of DNA fragments , cells , or...

     in silico
    In silico
    In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...

  • 3D structure
    Protein structure
    Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...

     viewer for files in PDB
    Protein Data Bank
    The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....

     and MMDB formats, anaglyph view support
  • Protein secondary structure prediction with GOR IV
    GOR method
    The GOR method is an information theory-based method for the prediction of secondary structures in proteins. It was developed in the late 1970s shortly after the simpler Chou-Fasman method...

     and PSIPRED algorithms
  • HMMER2
    HMMER
    HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences...

     and HMMER3
    HMMER
    HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences...

     packages integration
  • Building (using integrated PHYLIP
    PHYLIP
    PHYLIP is a free computational phylogenetics package of programs for inferring evolutionary trees . The name is an acronym for PHYLogeny Inference Package. It consists of 35 portable programs, i.e...

     package) and viewing phylogenetic tree
    Phylogenetic tree
    A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...

    s
  • Local sequence alignment
    Sequence alignment
    In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

     with optimized Smith-Waterman algorithm
  • Combining various algorithms into custom workflow
    Workflow
    A workflow consists of a sequence of connected steps. It is a depiction of a sequence of operations, declared as work of a person, a group of persons, an organization of staff, or one or more simple or complex mechanisms. Workflow may be seen as any abstraction of real work...

    s with UGENE Workflow Designer
  • Search for a pattern of various algorithms' results in a nucleic acid sequence with UGENE Query Designer
  • Visualization of next generation sequencing data (BAM files) using UGENE Assembly Browser

User interface

The software has three main views to display biological data
Biological data
Biological data are data or measurements collected from biological sources, which are often stored or exchanged in a digital form. Biological data are commonly stored in files or databases...

 on the user’s screen
Display device
A display device is an output device for presentation of information in visual or tactile form...

.

1. The Sequence view is used to visualize, analyze and modify nucleic acid or protein sequences. Depending on the sequence type and the options selected the followings views can be presented inside the Sequence view window:
  • 3D structure
    Protein structure
    Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...

     view
  • Circular view
  • Chromatogram view
  • Dotplot
    Dot plot (bioinformatics)
    A dot plot is a graphical method that allows the comparison of two biological sequences and identify regions of close similarity between them. It is a kind of recurrence plot.-Introduction:...

     view


2. The Alignment editor is used to visualize, analyze and modify a nucleic acid or protein multiple sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

.

3. The Assembly Browser allows to visualize and browse next-generation sequencing data.

4. The Phylogenetic tree viewer.

UGENE Workflow Designer

UGENE Workflow Designer allows creating and running complex computational workflow
Workflow
A workflow consists of a sequence of connected steps. It is a depiction of a sequence of operations, declared as work of a person, a group of persons, an organization of staff, or one or more simple or complex mechanisms. Workflow may be seen as any abstraction of real work...

 schemas.

The elements that a schema consists of correspond to the bulk of algorithms integrated into UGENE. Using the Workflow Designer one can also create custom workflow elements.

The workflow schemas can be run both locally and remotely, either using the graphical interface
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

 or launched from the command line
Command-line interface
A command-line interface is a mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks...

.

UGENE Query Designer

UGENE Query Designer allows a user to analyze a nucleotide sequence using different algorithms (Repeats
Repeated sequence (DNA)
In the study of DNA sequences, one can distinguish two main types of repeated sequence:*Tandem repeats:**Satellite DNA**Minisatellite**Microsatellite*Interspersed repeats:**SINEs...

 finder, ORF finder, Weight matrix
Position-specific scoring matrix
A position weight matrix , also called position-specific weight matrix or position-specific scoring matrix , is a commonly used representation of motifs in biological sequences....

 matching, etc.) at the same time imposing constraints on the positional relationship of the results obtained from the algorithms.

A schema of the algorithms and constraints is either created from the GUI
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

 or edited as a plain text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....

.

The results are saved as a set of annotations to a specified file in the GenBank
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...

 format.

UGENE Assembly Browser

UGENE Assembly Browser project was started in 2010 as an entry for Illumina iDEA Challenge 2011. The Assembly Browser allows a user to visualize and browse large (up to hundreds of millions of short reads) next generation sequence assemblies. The only format currently supported is BAM (which is the binary version of SAM). To browse assembly data in UGENE an input file should be converted to a UGENE database file. This approach has both advantages and disadvantages. The disadvantages are that the conversion may take time for a large BAM file and there should be enough disk space to store the database. On the other hand this allows to overview the whole assembly, navigate in it and go to well-covered regions rather rapidly. In addition before the conversion the user can choose contigs
Contig
A contig is a set of overlapping DNA segments that together represent a consensus region of DNA. In bottom-up sequencing projects, a contig refers to overlapping sequence data ; in top-down sequencing projects, contig refers to the overlapping clones that form a physical map of the genome that is...

 to be extracted from the BAM file. By this mean it is possible to open big files such as 1000 Genomes Project data.

Supported biological data formats

  • Sequence
    Sequence (biology)
    A sequence in biology is the one-dimensional ordering of monomers, covalently linked within in a biopolymer; it is also referred to as the primary structure of the biological macromolecule.-See also:* Protein sequence* DNA sequence...

    s and annotations: FASTA
    FASTA format
    In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences...

     (.fa), GenBank
    GenBank
    The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...

     (.gb), EMBL (.emb), GFF (.gff)
  • Multiple sequence alignment
    Multiple sequence alignment
    A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

    s: Clustal
    Clustal
    Clustal is a widely used multiple sequence alignment computer program. The latest version is 2.1. There are two main variations:*ClustalW: command line interface*ClustalX: This version has a graphical user interface...

     (.aln), MSF (.msf), Stockholm
    Stockholm format
    Stockholm format is a Multiple sequence alignment format used by Pfam and Rfam to disseminate protein and RNA sequence alignments. The alignment editors...

     (.sto), Nexus
    Nexus file
    Nexus file format is widely used in Bioinformatics. Several popular phylogenetic programs such as Paup*, MrBayes, Mesquite, and MacClade use this format.- Syntax :Command inside square brackets [ and ] are ignored...

     (.nex)
  • 3D structures
    Protein structure
    Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...

    : PDB
    Protein Data Bank
    The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....

     (.pdb), MMDB (.prt)
  • Chromatograms: ABIF (.abi), SCF (.scf)
  • Short reads: Sequence Alignment/Map(SAM) (.sam), binary version of SAM (.bam), ACE
    ACE file format
    The ACE file format is a specification for storing data about genomic contigs.The Center for Bioinformatics and Computation provides a.The original ACE format was developed for use with Consed, a...

     (.ace), FASTQ (.fastq)
  • Phylogenetic tree
    Phylogenetic tree
    A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...

    s: Newick
    Newick format
    In mathematics, Newick tree format is a way to represent graph-theoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F...

     (.nwk)
  • Other formats: Bairoch (enzyme
    Enzyme
    Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

    s info), HMM (HMMER
    HMMER
    HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences...

     profiles), PWM and PFM (position matrices
    Position-specific scoring matrix
    A position weight matrix , also called position-specific weight matrix or position-specific scoring matrix , is a commonly used representation of motifs in biological sequences....

    ), etc.

Release cycle

UGENE is primarily developed by Unipro LLC. Each iteration lasts about 6 weeks. By the end of iteration a release comes out. One can also download a development snapshot of the software.

The features to be included into the next release are mostly initiated by users.

See also

  • Sequence alignment software
    Sequence alignment software
    This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment...

  • Bioinformatics
    Bioinformatics
    Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

  • Computational biology
    Computational biology
    Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...

  • List of open source bioinformatics software

Related software

  • Discovery Studio
  • Gene Designer
    Gene Designer
    Gene Designer is a free bioinformatics software package. It is used by Molecular Biologists from academia, government and the pharmaceutical, chemical, agricultural and biotechnology industries to design, clone and validate genetic sequences.- Features :...

  • Vector NTI
    Vector NTI
    Vector NTI is a bioinformatics software package. The current versions are v11.5.1 for Windows/PCs and v7.1 for Macs, but only supporting Mac OS X v10.3 .- Features :* create, annotate, analyse, and share DNA/protein sequences...

  • Geneious
    Geneious
    Geneious is suite of cross-platform bioinformatics software applications developed by Biomatters Ltd.- Features :Geneious comes in a Basic version that is free for academic use, and a commercial Pro version with added features. Geneious bundles various bioinformatics tools under one hood with an...

  • CLC Main Workbench
  • MacVector
    MacVector
    MacVector is a commercial sequence analysis application for Apple Macintosh computers running Mac OS X. It is intended to be used by Molecular Biologists to help analyze, design, research and document their experiments in the laboratory.- Features :...

  • QuickGene
  • Ape
  • SerialCloner

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK