GOR method
Encyclopedia
The GOR method is an information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

-based method for the prediction
Secondary structure prediction
Secondary structure prediction is a set of techniques in bioinformatics that aim to predict the secondary structures of proteins and nucleic acid sequences based only on knowledge of their primary structure...

 of secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...

s in protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

s. It was developed in the late 1970s shortly after the simpler Chou-Fasman method
Chou-Fasman method
The Chou–Fasman method are an empirical technique for the prediction of secondary structures in proteins, originally developed in the 1970s. The method is based on analyses of the relative frequencies of each amino acid in alpha helices, beta sheets, and turns based on known protein structures...

. Like Chou-Fasman, the GOR method is based on probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

 parameters derived from empirical studies of known protein tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

s solved by X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...

. However, unlike Chou-Fasman, the GOR method takes into account not only the propensities of individual amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s to form particular secondary structures, but also the conditional probability
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...

 of the amino acid to form a secondary structure given that its immediate neighbors have already formed that structure. The method is therefore essentially Bayesian in its analysis.

The GOR method analyzes sequences to predict alpha helix
Alpha helix
A common motif in the secondary structure of proteins, the alpha helix is a right-handed coiled or spiral conformation, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier...

, beta sheet
Beta sheet
The β sheet is the second form of regular secondary structure in proteins, only somewhat less common than the alpha helix. Beta sheets consist of beta strands connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet...

, turn
Turn (biochemistry)
A turn is an element of secondary structure in proteins where the polypeptide chain reverses its overall direction.- Definition :According to the most common definition, a turn is a structural motif where the Cα atoms of two residues separated by few peptide bonds are in close approach A turn is...

, or random coil
Random coil
A random coil is a polymer conformation where the monomer subunits are oriented randomly while still being bonded to adjacent units. It is not one specific shape, but a statistical distribution of shapes for all the chains in a population of macromolecules...

 secondary structure at each position based on 17-amino-acid sequence windows. The original description of the method included four scoring matrices of size 17×20, where the columns correspond to the log-odds score, which reflects the probability of finding a given amino acid at each position in the 17-residue sequence. The four matrices reflect the probabilities of the central, ninth amino acid being in a helical, sheet, turn, or coil conformation. In subsequent revisions to the method, the turn matrix was eliminated due to the high variability of sequences in turn regions (particularly over such a large window). The method was considered as best requiring at least four contiguous residues to score as alpha helices to classify the region as helical, and at least two contiguous residues for a beta sheet.

The mathematics and algorithm of the GOR method were based on an earlier series of studies by Robson and colleagues reported mainly in the Journal of Molecular Biology (e.g. ) and The Biochemical Journal (e.g. ). The latter describes the information theoretic expansions in terms of conditional information measures. The use of the word "simple" in the title of the GOR paper reflected the fact that the above earlier methods provided proofs and techniques somewhat daunting by being rather unfamiliar in protein science in the early 1970s; even Bayes methods were then unfamiliar and controversial. An important feature of these early studies, which survived in the GOR method, was the treatment of the sparse protein sequence data of the early 1970s by expected information measures. That is, expectations on a Bayesian basis considering the distribution of plausible information measure values given the actual frequencies (numbers of observations). The expectation measures resulting from integration over this and similar distributions may now be seen as composed of “incomplete” or extended zeta functions, e.g. z(s,observed frequency) − z(s,expected frequency) with incomplete zeta function z(s, n) = 1 + (1/2)s + (1/3)s+ (1/4)s + …. +(1/n)s. The GOR method used s=1. Also, in the GOR method and the earlier methods, the measure for the contrary state to e.g. helix H, i.e. ~H, was subtracted from that for H, and similarly for beta sheet, turns, and coil or loop. Thus the method can be seen as employing a zeta function estimate of log predictive odds. An adjustable decision constant could also be applied, which thus also implies a decision theory approach; the GOR method allowed the option to use decision constants to optimize predictions for different classes of protein. The expected information measure used as a basis for the information expansion was less important by the time of publication of the GOR method because protein sequence data became more plentiful, at least for the terms considered at that time. Then, for s=1, the expression z(s,observed frequency) −
z(s,expected frequency) approaches the natural logarithm of (observed frequency / expected frequency) as frequencies increase. However, this measure (including use of other values of s) remains important in later more general applications with high dimensional data, where data for more complex terms in the information expansion are inevitably sparse (e.g. ).
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK