DNA-binding domain
Encyclopedia
A DNA-binding domain is an independently folded protein domain
Protein domain
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...

 that contains at least one motif
Structural motif
In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a supersecondary structure, which appears also in a variety of other molecules...

 that recognizes double- or single-stranded DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

. A DBD can recognize a specific DNA sequence (a recognition sequence
Recognition sequence
The recognition sequence, sometimes also referred to as recognition site, of any DNA-binding protein motif that exhibits binding specificity, refers to the DNA sequence , to which the domain is specific...

) or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.

Function

One or more DNA-binding domains are often part of a larger protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 consisting of additional domains
Protein domain
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...

 with differing function. The additional domains often regulate the activity of the DNA-binding domain. The function of DNA binding is either structural or involving transcription regulation, with the two roles sometimes overlapping.

DNA-binding domains with functions involving DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 structure have biological roles in the replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

, repair
DNA repair
DNA repair refers to a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and radiation can cause DNA damage, resulting in as many as 1...

, storage
Chromatin
Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a cell. The primary functions of chromatin are; to package DNA into a smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and prevent DNA damage, and to control gene...

, and modification of DNA, such as methylation
DNA methylation
DNA methylation is a biochemical process that is important for normal development in higher organisms. It involves the addition of a methyl group to the 5 position of the cytosine pyrimidine ring or the number 6 nitrogen of the adenine purine ring...

.

Many proteins involved in the regulation of gene expression
Regulation of gene expression
Gene modulation redirects here. For information on therapeutic regulation of gene expression, see therapeutic gene modulation.Regulation of gene expression includes the processes that cells and viruses use to regulate the way that the information in genes is turned into gene products...

 contain DNA-binding domains. For example, proteins that regulate transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 by binding DNA are called transcription factors. The final output of most cellular signaling cascades is gene regulation.

The DBD interacts with the nucleotides of DNA
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

 in a DNA sequence-specific
Recognition sequence
The recognition sequence, sometimes also referred to as recognition site, of any DNA-binding protein motif that exhibits binding specificity, refers to the DNA sequence , to which the domain is specific...

 or non-sequence-specific manner, but even non-sequence-specific recognition involves some sort of molecular complementarity
Molecular recognition
The term molecular recognition refers to the specific interaction between two or more molecules through noncovalent bonding such as hydrogen bonding, metal coordination, hydrophobic forces, van der Waals forces, π-π interactions, electrostatic and/or electromagnetic effects...

 between protein and DNA. DNA recognition by the DBD can occur at the major or minor groove of DNA, or at the sugar-phosphate DNA backbone (see the structure of DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

). Each specific type of DNA recognition is tailored to the protein's function. For example, the DNA-cutting enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

 DNAse I
Deoxyribonuclease I
Deoxyribonuclease I , is an endonuclease coded by the human gene DNASE1.DNase I is a nuclease that cleaves DNA preferentially at phosphodiester linkages adjacent to a pyrimidine nucleotide, yielding 5'-phosphate-terminated polynucleotides with a free hydroxyl group on position 3', on average...

 cuts DNA almost randomly and so must bind to DNA in a non-sequence-specific manner. But, even so, DNAse I recognizes a certain 3-D DNA structure
Molecular geometry
Molecular geometry or molecular structure is the three-dimensional arrangement of the atoms that constitute a molecule. It determines several properties of a substance including its reactivity, polarity, phase of matter, color, magnetism, and biological activity.- Molecular geometry determination...

, yielding a somewhat specific DNA cleavage pattern that can be useful for studying DNA recognition by a technique called DNA footprinting
DNA footprinting
DNA footprinting is a method of investigating the sequence specificity of DNA-binding proteins in vitro. This technique can be used to study protein-DNA interactions both outside and within cells....

.

Many DNA-binding domains must recognize specific DNA sequences, such as DBDs of transcription factors that activate specific genes, or those of enzymes that modify DNA at specific sites, like restriction enzyme
Restriction enzyme
A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...

s and telomerase
Telomerase
Telomerase is an enzyme that adds DNA sequence repeats to the 3' end of DNA strands in the telomere regions, which are found at the ends of eukaryotic chromosomes. This region of repeated nucleotide called telomeres contains non-coding DNA material and prevents constant loss of important DNA from...

. The hydrogen bonding
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

 pattern in the DNA major groove is less degenerate than that of the DNA minor groove, providing a more attractive site for sequence
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

-specific DNA recognition.

The specificity of DNA-binding proteins can be studied using many biochemical and biophysical techniques, such as gel electrophoresis
Gel electrophoresis
Gel electrophoresis is a method used in clinical chemistry to separate proteins by charge and or size and in biochemistry and molecular biology to separate a mixed population of DNA and RNA fragments by length, to estimate the size of DNA and RNA fragments or to separate proteins by charge...

, analytical ultracentrifugation
Ultracentrifuge
The ultracentrifuge is a centrifuge optimized for spinning a rotor at very high speeds, capable of generating acceleration as high as 2,000,000 g . There are two kinds of ultracentrifuges, the preparative and the analytical ultracentrifuge...

, calorimetry
Calorimetry
Calorimetry is the science of measuring the heat of chemical reactions or physical changes. Calorimetry is performed with a calorimeter. The word calorimetry is derived from the Latin word calor, meaning heat...

, DNA mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

, protein structure
Protein structure
Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...

 mutation or modification, nuclear magnetic resonance
Nuclear magnetic resonance
Nuclear magnetic resonance is a physical phenomenon in which magnetic nuclei in a magnetic field absorb and re-emit electromagnetic radiation...

, x-ray crystallography, surface plasmon resonance
Surface plasmon resonance
The excitation of surface plasmons by light is denoted as a surface plasmon resonance for planar surfaces or localized surface plasmon resonance for nanometer-sized metallic structures....

, electron paramagnetic resonance
Electron paramagnetic resonance
Electron paramagnetic resonance or electron spin resonance spectroscopyis a technique for studying chemical species that have one or more unpaired electrons, such as organic and inorganic free radicals or inorganic complexes possessing a transition metal ion...

, cross-link
Cross-link
Cross-links are bonds that link one polymer chain to another. They can be covalent bonds or ionic bonds. "Polymer chains" can refer to synthetic polymers or natural polymers . When the term "cross-linking" is used in the synthetic polymer science field, it usually refers to the use of...

ing and Microscale Thermophoresis
Microscale Thermophoresis
Microscale Thermophoresis is a technology for the analysis of biomolecules. Microscale Thermophoresis is the directed movement of particles in a microscopic temperature gradient...

 (MST).

Helix-turn-helix

Originally discovered in bacteria, the helix-turn-helix
Helix-turn-helix
In proteins, the helix-turn-helix is a major structural motif capable of binding DNA. It is composed of two α helices joined by a short strand of amino acids and is found in many proteins that regulate gene expression...

 motif is commonly found in repressor proteins and is about 20 amino acids long. In eukaryotes, the homeodomain comprises 2 helices, one of which recognizes the DNA (aka recognition helix). They are common in proteins that regulate developmental processes (PROSITE
PROSITE
PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns, signatures, and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot...

 HTH).

Zinc finger

The zinc finger
Zinc finger
Zinc fingers are small protein structural motifs that can coordinate one or more zinc ions to help stabilize their folds. They can be classified into several different structural families and typically function as interaction modules that bind DNA, RNA, proteins, or small molecules...

 This domain is generally between 23 and 28 amino acids long and is stabilized by coordinating Zinc ions with regularly spaced zinc-coordinating residues (either histidines or cysteines). The most common class of zinc finger (Cys2His2) coordinates a single zinc ion and consists of a recognition helix and a 2-strand beta-sheet. In transcription factors these domains are often found in arrays (usually separated by short linker sequences) and adjacent fingers are spaced at 3 basepair intervals when bound to DNA.

Leucine zipper

The basic leucine zipper (bZIP
BZIP domain
The Basic Leucine Zipper Domain is found in many DNA binding eukaryotic proteins. One part of the domain contains a region that mediates sequence specific DNA binding properties and the leucine zipper that is required for the dimerization of two DNA binding regions. The DNA binding region...

) domain contains an alpha helix with a leucine at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins. When binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves. It regulates gene expression.

Winged helix

Consisting of about 110 amino acids, the winged helix (WH) domain has four helices and a two-strand beta-sheet.

Winged helix turn helix

The winged helix turn helix domain (wHTH
Winged helix turn helix
The winged helix turn helix is a DNA-binding domain that binds to specific DNA sequences. It is formed by a 3-helical bundle and a 3- or 4-strand beta-sheet . Topology of helices and strands in the wHTH families may vary...

) is typically 85-90 amino acids long. It is formed by a 3-helical bundle and a 4-strand beta-sheet (wing).

Helix-loop-helix

The Helix-loop-helix domain is found in some transcription factors and is characterized by two α helices
Alpha helix
A common motif in the secondary structure of proteins, the alpha helix is a right-handed coiled or spiral conformation, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier...

 connected by a loop. One helix is typically smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions.

HMG-box

HMG-box
HMG-box
The HMG-box is a protein domain which is involved in DNA binding.-Structure:The structure of the HMG-box domain contains three alpha helices separated by loops .-Function:...

 domains are found in high mobility group proteins which are involved in a variety of DNA-dependent processes like replication and transcription. The domain consists of three alpha helices separated by loops.

Immunoglobulin fold

The immunoglobulin domain
Immunoglobulin domain
The immunoglobulin domain is a type of protein domain that consists of a 2-layer sandwich of between 7 and 9 antiparallel β-strands arranged in two β-sheets with a Greek key topology....

  consists of a beta-sheet structure with large connecting loops, which serve to recognize either DNA major grooves or antigens. Usually found in immunoglobulin proteins, they are also present in Stat proteins of the cytokine pathway. This is likely because the cytokine pathway evolved relatively recently and has made use of systems that were already functional, rather than creating its own.

B3 domain

The B3
B3 domain
The B3 DNA binding domain is a highly conserved domain found exclusively in transcription factors, from higher plants combined with other domains...

 DBD is found exclusively in transcription factor
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

s from higher plants and restriction endonucleases EcoRII and BfiI and typically consists of 100-120 residues. It includes seven beta sheet
Beta sheet
The β sheet is the second form of regular secondary structure in proteins, only somewhat less common than the alpha helix. Beta sheets consist of beta strands connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet...

s and two alpha helices
Alpha helix
A common motif in the secondary structure of proteins, the alpha helix is a right-handed coiled or spiral conformation, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier...

, which form a DNA-binding pseudobarrel protein fold
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

.

TAL effector DNA-binding domain

TAL effectors
TAL effector
TAL effectors are proteins secreted by Xanthomonas bacteria via their type III secretion system when they infect various plant species...

 are found in bacterial plant pathogens and are involved in regulating the genes of the host plant in order to facilitate bacterial virulence, proliferation, and dissemination. They contain a central region of tandem 33-35 residue repeats and each repeat region encodes a single DNA base in the TALE's binding site.

See also

  • DNA-binding protein
    DNA-binding protein
    DNA-binding proteins are proteins that are composed of DNA-binding domains and thus have a specific or general affinity for either single or double stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that...

  • Nucleic acid simulations

External links

  • DBD database of predicted transcription factors Uses a curated set of DNA-binding domains to predict transcription factors in all completely sequenced genomes
  • Table of DNA-binding motifs
  • DNA-binding domains in PROSITE
    PROSITE
    PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns, signatures, and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK