SOSUI
Encyclopedia
SOSUI is a free online tool that predicts a part of the secondary structure
of protein
s from a given amino acid sequence (AAS). The main objective is to determine whether the protein in question is a soluble or a transmembrane protein
.
was developed in 1996 at Tokyo University. The name means as much as "hydrophobic", an allusion to its molecular "clients".
that are relatively easy to predict, taking into account the known helical potentials of the given amino acid sequence(AAS). The much more difficult task is to differentiate between the α helices in soluble proteins and the ones in transmembrane proteins, the α helix being a very common secondary structure pattern in proteins.
SOSUI uses 4 characteristics of the AAS in its prediction:
An important improvement compared to Kyte und Doolittle's "hydropathy index", which relies entirely on one characteristic, is the introduction of the so-called "amphiphilicity index". It is calculated by giving every AA with an amphiphilic residue a certain value which is derived from the AA's molecular structure. To meet SOSUI's criteria for amphiphilicity, the polar, hydrophilic residue may not be linked directly to the beta-carbon; there must be at least one apolar carbon interposed (therefore only lysine, arginine, histidine, glutamic acid, glutamine, tryptophan and tyrosine are relevant).
SOSUI then looks for accumulations of amphiphilic AAs at the ends of α helices, which seems to be typical for transmembrane α helices (it makes the transmembrane position the energetically best one for these α helices by placing amphiphilic AAs at the lipid-water boundary and is thus co-responsible for the protein's correct localization).
The AA's charge is also taken into consideration; the length is important because biological lipid membranes have a certain thickness determining the length of membrane-spanning proteins.
According to a study published by SOSUI's developers it successfully differentiated 99% of a chosen group of proteins with known structure . However, another study that had several prediction tools perform on the AAS's of 122 known proteins claimed that SOSUI was correct about the number of α helices in only about 60% of the cases . But even if the number of transmembrane domains is not always exact, the differentiation between soluble and transmembrane proteins often works, as it is only necessary to find out if a protein has such a domain at all. Of course, membrane proteins which don't have transmembrane α helices (e.g. porins
) or which are fixed with a covalent bond
cannot be found by SOSUI.
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
of protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
s from a given amino acid sequence (AAS). The main objective is to determine whether the protein in question is a soluble or a transmembrane protein
Transmembrane protein
A transmembrane protein is a protein that goes from one side of a membrane through to the other side of the membrane. Many TPs function as gateways or "loading docks" to deny or permit the transport of specific substances across the biological membrane, to get into the cell, or out of the cell as...
.
History
SOSUI's algorithmAlgorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
was developed in 1996 at Tokyo University. The name means as much as "hydrophobic", an allusion to its molecular "clients".
How SOSUI works
First of all, SOSUI looks for α helicesAlpha helix
A common motif in the secondary structure of proteins, the alpha helix is a right-handed coiled or spiral conformation, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier...
that are relatively easy to predict, taking into account the known helical potentials of the given amino acid sequence(AAS). The much more difficult task is to differentiate between the α helices in soluble proteins and the ones in transmembrane proteins, the α helix being a very common secondary structure pattern in proteins.
SOSUI uses 4 characteristics of the AAS in its prediction:
- "hydropathy index" (Kyte und Doolittle 1982)
- weighted presence of amphiphilic amino acids (AA) and their localization: "amphiphilicity index"
- the AA's charge
- the length of the AAS
An important improvement compared to Kyte und Doolittle's "hydropathy index", which relies entirely on one characteristic, is the introduction of the so-called "amphiphilicity index". It is calculated by giving every AA with an amphiphilic residue a certain value which is derived from the AA's molecular structure. To meet SOSUI's criteria for amphiphilicity, the polar, hydrophilic residue may not be linked directly to the beta-carbon; there must be at least one apolar carbon interposed (therefore only lysine, arginine, histidine, glutamic acid, glutamine, tryptophan and tyrosine are relevant).
SOSUI then looks for accumulations of amphiphilic AAs at the ends of α helices, which seems to be typical for transmembrane α helices (it makes the transmembrane position the energetically best one for these α helices by placing amphiphilic AAs at the lipid-water boundary and is thus co-responsible for the protein's correct localization).
The AA's charge is also taken into consideration; the length is important because biological lipid membranes have a certain thickness determining the length of membrane-spanning proteins.
According to a study published by SOSUI's developers it successfully differentiated 99% of a chosen group of proteins with known structure . However, another study that had several prediction tools perform on the AAS's of 122 known proteins claimed that SOSUI was correct about the number of α helices in only about 60% of the cases . But even if the number of transmembrane domains is not always exact, the differentiation between soluble and transmembrane proteins often works, as it is only necessary to find out if a protein has such a domain at all. Of course, membrane proteins which don't have transmembrane α helices (e.g. porins
Porin (protein)
Porins are beta barrel proteins that cross a cellular membrane and act as a pore through which molecules can diffuse. Unlike other membrane transport proteins, porins are large enough to allow passive diffusion, i.e., they act as channels that are specific to different types of molecules...
) or which are fixed with a covalent bond
Covalent bond
A covalent bond is a form of chemical bonding that is characterized by the sharing of pairs of electrons between atoms. The stable balance of attractive and repulsive forces between atoms when they share electrons is known as covalent bonding....
cannot be found by SOSUI.
Results
The result page first shows general information (length, average hydrophobicity). If the protein in question is a transmembrane protein, the number of transmembrane domains and their localization is noted. A "hydropathy-profile" with colored accentation of hydrophobic parts; the helical wheel diagrams of potential transmembrane domains are shown as well. The last image shows a schematic overview of the transmembrane protein's location.Sources
- Hirokawa, Boon-Chieng, Mitaku, SOSUI: Classification and secondary structure prediction for membrane proteins, Bioinformatics Vol.14 S.378-379 (1998) http://bioinformatics.oxfordjournals.org/cgi/reprint/14/4/378
- Masami Ikeda, Masafumi Arai, Toshio Shimizu, Evaluation of transmembrane topology prediction methods by using an experimentally characterized topology dataset, Genome Informatics 11: 426–427 (2000) http://www.jsbi.org/journal/GIW00/GIW00P094.pdf