Chemical table file
Encyclopedia

File formats

Chemical table files come in various formats. In addition to the formats discussed below, other formats include RGfiles, Rxnfiles, RDfiles, XDfiles and Clipboard.

Molfile

An MDL Molfile is a file format created by MDL
MDL Information Systems
MDL Information Systems, a provider of R&D informatics offerings for the life sciences and chemicals industries and acquired by Symyx Technologies, Inc. in 2007, was launched as a computer-aided drug design firm in January 1978 in Hayward, California.-History:The company was founded by Stuart...

 (now Symyx who have merged with Accelrys
Accelrys
Accelrys is a software company headquartered in the US, with representation in Europe and Japan. It provides software for chemical, materials and bioscience research for the pharmaceutical, biotechnology, consumer packaged goods, aerospace, energy and chemical industries.Accelrys started in 2001...

), for holding information about the atoms, bonds, connectivity and coordinates of a molecule. The molfile consists of some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information.

The molfile is sufficiently common that most, if not all, cheminformatics
Cheminformatics
Cheminformatics is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are used in pharmaceutical companies in the process of drug discovery...

 software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica
Mathematica
Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing...

.

The current de-facto standard version is molfile V2000; although, more recently, the V3000 format has been circulating widely enough to present a potential compatibility issue for those not yet V3000-capable.

MDL publishes a specification of their Connection-Table formats, which include Molfile and SD formats.

Following are the contents of a Molfile of benzene
Benzene
Benzene is an organic chemical compound. It is composed of 6 carbon atoms in a ring, with 1 hydrogen atom attached to each carbon atom, with the molecular formula C6H6....

 created in ChemSketch, as seen in a text editor:


benzene
ACD/Labs0812062058

6 6 0 0 0 0 0 0 0 0 1 V2000
1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 2 0 0 0 0
4 2 2 0 0 0 0
5 3 1 0 0 0 0
6 4 1 0 0 0 0
6 5 2 0 0 0 0
M END
$$$$

Lines Section Description
1-3 Header
1 Molecule name ("benzene")
2 User/Program/Date/etc information
3 Comment (blank)
4-17 Connection table (Ctab)
4 Counts line: 6 atoms, 6 bonds, ..., V2000 standard
5-10 Atom block (1 line for each atom): x, y, z, element, etc
11-16 Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc
17 Propeties block (empty)
18 $$$$ See note

Note: According to the official molfile specification, the '$$$$' notation applied only to the SDF file – not to the molfile, so ChemSketch molfiles will not always function properly.

SDF

SDF is one of a family of chemical-data file formats developed by MDL; it is intended especially for structural information. "SDF" stands for structure-data file, and SDF files actually wrap the molfile (MDL Molfile) format. Multiple compounds are delimited
Delimiter
A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...

 by lines consisting of four dollar signs ($$$$). A feature of the SDF format is its ability to include associated data.

Associated data items are denoted as follows:

>
XCA3464366

>
5.825

>
Sigma

>
499.611

Some programs that can import SDF files (e.g. ISIS/Base) require that the first data field after the molecule data (in the example above, Unique_ID) be a unique identifier for each record.

Multiple data items are permitted on multiple lines. The MDL SDF-format specification requires that a hard-carriage-return character be inserted into any text field whose content exceeds 200 characters. This requirement is frequently violated in practice, as many SMILES and InChI strings exceed that length.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK