GEDCOM
Encyclopedia
GEDCOM, an acronym for GEnealogical Data COMmunication, is a proprietary and open
Open format
An open file format is a published specification for storing digital data, usually maintained by a standards organization, which can therefore be used and implemented by anyone. For example, an open format can be implementable by both proprietary and free and open source software, using the typical...

 de facto specification for exchanging genealogical
Genealogy
Genealogy is the study of families and the tracing of their lineages and history. Genealogists use oral traditions, historical records, genetic analysis, and other records to obtain information about a family and to demonstrate kinship and pedigrees of its members...

 data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 between different genealogy software
Genealogy software
Genealogy software is computer software used to record, organize, and publish genealogical data. At a minimum, genealogy software collects the date and place of an individual's birth, marriage, and death, and stores the relationships of individuals to their parents, spouses, and children...

. GEDCOM was developed by The Church of Jesus Christ of Latter-day Saints as an aid to genealogical research
Research
Research can be defined as the scientific search for knowledge, or as any systematic investigation, to establish novel facts, solve new or existing problems, prove new ideas, or develop new theories, usually using a scientific method...

.

A GEDCOM file is plain text (usually either ANSEL
ANSEL
ANSEL, American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use, is a character set used in text encodings like MARC-8...

 or ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

) containing genealogical information about individuals, and meta data linking these records together. Most genealogy software supports importing from and/or exporting to GEDCOM format. However, some genealogy software programs incorporate the use of proprietary extensions to the GEDCOM format, which are not always recognized by other genealogy programs, for example the GEDCOM 5.5 EL (Extended Locations) specification.

GEDCOM model

GEDCOM uses a lineage-linked data model. This data model is based on the nuclear family
Nuclear family
Nuclear family is a term used to define a family group consisting of a father and mother and their children. This is in contrast to the smaller single-parent family, and to the larger extended family. Nuclear families typically center on a married couple, but not always; the nuclear family may have...

 and the individual
Individual
An individual is a person or any specific object or thing in a collection. Individuality is the state or quality of being an individual; a person separate from other persons and possessing his or her own needs, goals, and desires. Being self expressive...

. This contrasts with evidence-based models, where data are structured to reflect the supporting evidence. In the GEDCOM lineage-linked data model, all data are structured to reflect the believed reality, that is, actual (or hypothesized) nuclear families and individuals.

GEDCOM file structure

A GEDCOM file consists of a header
Header (information technology)
In information technology, header refers to supplemental data placed at the beginning of a block of data being stored or transmitted. In data transmission, the data following the header are sometimes called the payload or body....

 section, records, and a trailer
Trailer (information technology)
In information technology, trailer refers to supplemental data placed at the end of a block of data being stored or transmitted, which may contain information for the handling of the data block, or just mark its end....

 section. Within these sections, records
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

 represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0, while other level numbers are positive integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

s.

Although it is theoretically possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView
PhpGedView
PhpGedView is a free PHP-based web application for working with genealogy data on the Internet. The project was founded and is headed by John Finlay. It is licensed under GPL....

 project, though it is not meant to be a standalone validator. For standalone validation you can use "The Windows GEDCOM Validator" or the older unmaintained Gedcheck from the LDS.

During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program. Findings showed that a number of problems existed and that The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear. In 2005, the Genealogical Software Report Card was evaluated, (by Bill Mumford who participated in the original GEDCOM Testbook Project) and included testing the GEDCOM 5.5 standard using the Gedcheck program.

Example

sample.ged

0 HEAD
1 SOUR Reunion
2 VERS V8.0
2 CORP Leister Productions
1 DEST Reunion
1 DATE 11 FEB 2006
1 FILE test
1 GEDC
2 VERS 5.5
1 CHAR MACINTOSH
0 @I1@ INDI
1 NAME Bob /Cox/
1 SEX M
1 FAMS @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @I2@ INDI
1 NAME Joann /Para/
1 SEX F
1 FAMS @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @I3@ INDI
1 NAME Bobby Jo /Cox/
1 SEX M
1 FAMC @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
1 CHIL @I3@
0 TRLR

----

The following is a sample GEDCOM file. The first column indicates an indentation level.

The header (HEAD) includes the source program and version (Reunion, V8.0), the GEDCOM version (5.5), and the character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

 (MACINTOSH), (Which is invalid, as according to the GEDCOM 5.5 specification, valid choices are (ANSEL), (UNICODE) or (ASCII).)

The individual records (INDI) define Bob Cox (ID 1—@I1@), Joann Para (ID 2), and Bobby Jo Cox (ID 3).

The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.

Versions

The current version of the specification is GEDCOM 5.5, which was released on 12 January 1996. A subsequent draft GEDCOM 5.5.1 specification was issued in 1999, introducing nine new tags, including WWW, EMAIL and FACT, and adding UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 as an approved character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

. This draft has not been formally approved, but its provisions have been adopted in some part by a number of genealogy programs and is used by FamilySearch.org While PAF 5.2
Personal Ancestral File
Personal Ancestral File is free-of-cost genealogy software provided by FamilySearch, a website operated by The Church of Jesus Christ of Latter-day Saints...

 does support GEDCOM 5.5, PAF 5.2 uses UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 as its internal character set, a feature which was introduced in the GEDCOM 5.5.1 draft, and can output a UTF-8 GEDCOM.

On 23 January 2002, a draft (beta)
Development stage
A software release life cycle refers to the phases of development and maturity for a piece of computer software—ranging from its initial development, to its eventual release, and updated versions of the released version to help improve software or fix bugs still present in the software.- Pre-alpha...

 version of GEDCOM 6.0 was released for developer study only, as it was not a complete specification, and developers were recommended to not begin implementation in their software. For example, descriptions of the meaning and expected contents of tags were not included. GEDCOM 6.0 was to be the first version to store data in XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 format, and was to change the preferred character set from ANSEL to Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

.

Lineage-linked GEDCOM is the deliberate de facto common denominator. Despite version 5.5 of the GEDCOM standard first being published in 1996, many genealogical software suppliers have yet to support the feature of multilingual Unicode text (instead of the ANSEL character set) introduced with that version of the specification. Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original Chinese, Japanese and Korean (CJK)
CJK
CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

 characters, without which they could be ambiguous and of little use for genealogical or historical research.

Release history

Meaning
Red Old Standard/Draft; not supported
Yellow Old Standard; still supported
Green Current Standard
Blue Future Draft

GEDCOM Version Release Date Notes
1.0 1984 -
2.0 Dec 1985 PAF
Personal Ancestral File
Personal Ancestral File is free-of-cost genealogy software provided by FamilySearch, a website operated by The Church of Jesus Christ of Latter-day Saints...

 2.0
2.1 Feb 1987 GEDCOM for PAF 2.1
2.3 Draft 7 August 1985 with PAF2.0 GEDCOM implementation conventions
2.4 Draft 13 December 1985 with PAF2.0 GEDCOM implementation conventions
3.0 Standard 9 October 1987 PAF 2.0 and 2.1 implementation of 3.0
4.0 Standard August 1989 PAF 2.1 - 2.31
4.1 Draft - -
4.2 Draft 25 January 1990 -
5.0 Draft 31 December 1991 lineage-linked structures were introduced.
5.1 Draft 18 September 1992 -
5.2 Draft 22 January 1992 -
5.3 Draft 4 November 1993 Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 standard (ISO/IEC 10646) was introduced as an additional character set.
5.4 Draft 21 August 1995 -
5.5 Standard 11 December 1995 PAF 3, 4 and 5
5.5 Standard + Errata Sheet 2 January 1996 PAF 3, 4 and 5
GEDCOM (Future Direction) Draft 1 May 1998 "it used an entirely new data model"
5.5.1 Draft 2 October 1999 Used by FamilySearch.org UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 added as an approved character encoding.
5.6 Private Draft - "Jed Allen sent those two files to a few people only for sort of "private comments"
6.0 XML Draft 28 December 2001 Was not a complete specification, and not recommended to begin to software implementations.

Limitations

Support for multi-person events and sources

A GEDCOM file can contain information on events such as births, deaths, census records, ship's records, marriages, etc.; a general rule of thumb is that an event is something that took place at a specific time, at a specific place (even if the time & place are not known). GEDCOM files can also contain attributes such as physical description, occupation, and total number of children; unlike events, attributes generally cannot be associated with a specific time or place.

The GEDCOM specification requires that each event or attribute is associated with exactly one individual or family. This causes redundancy for events such as census records where the actual census entry often contains information on multiple individuals. In the GEDCOM file, for census records a separate census "CENS" event must be added for each individual referenced. Some genealogy programs, such as The Master Genealogist
The Master Genealogist
- Languages available :English , English , German, Dutch, Norwegian, French, Italian and Afrikaans.- File format :TMG's underlying database engine is Visual FoxPro v9...

, have elaborate database structures for sources that are used, among other things, to represent multi-person events. When databases are exported from one of these programs to GEDCOM, these database structures cannot be represented in GEDCOM due to this limitation, with the result that the event or source information including all of the relevant citation reference information must be duplicated each place that it is used. This duplication makes it difficult for the user to maintain the information related to sources.

In the GEDCOM specification, events that are associated with a family such as marriage information is only stored in a GEDCOM once, as part of the family (FAM) record, and then both spouses are linked to that single family record.

Ambiguity in the specification

The GEDCOM specification was made purposefully flexible to support many ways of encoding data, particularly in the area of sources. This flexibility has led to a great deal of ambiguity, and has produced the side effect that some genealogy programs which import GEDCOM do not import all of the data from a file.

Explicit support for non-marriage relationships

GEDCOM does not explicitly support data representation of many types of close interpersonal relationships, such as same-sex marriages, domestic partnerships, cohabitation
Cohabitation
Cohabitation usually refers to an arrangement whereby two people decide to live together on a long-term or permanent basis in an emotionally and/or sexually intimate relationship. The term is most frequently applied to couples who are not married...

, polyamory
Polyamory
Polyamory is the practice, desire, or acceptance of having more than one intimate relationship at a time with the knowledge and consent of everyone involved....

, polygamy
Polygamy
Polygamy is a marriage which includes more than two partners...

 or incest
Incest
Incest is sexual intercourse between close relatives that is usually illegal in the jurisdiction where it takes place and/or is conventionally considered a taboo. The term may apply to sexual activities between: individuals of close "blood relationship"; members of the same household; step...

, but such relationships and any other can be represented using the ASSO tag.

Ordering of events that do not have dates

The GEDCOM specification does not offer explicit support for keeping a known order of events. In particular, the order of relationships (FAMS) for a person and the order of the children within a relationship (FAM) can be lost. In many cases the sequence of events can be derived from the associated dates. But dates are not always known, in particular when dealing with data from centuries ago. For example, in the case that a person has had two relationships, both with unknown dates, but from descriptions it is known that the second one is indeed the second one. The order in which these FAMS are recorded in GEDCOM's INDI record will depend on the exporting program. In Aldfaer for instance, the sequence depends on the ordering of the data by the user (alphabetical, chronological, reference, etc.). The proposed XML GEDCOM standard does not address this issue either.

Lesser-known features

GEDCOM has many features that are not commonly used, and hence are unknown to some people. Some software packages do not support all the features that the GEDCOM standard allows.

Multimedia

The GEDCOM standard does support the inclusion of multimedia objects (for example, photos of individuals). Such multimedia objects can be either included in the GEDCOM file itself (called the "embedded form") or in an external file where the name of the external file is specified in the GEDCOM file (called the "linked form"). Embedding multimedia directly in the GEDCOM file makes transmission of data easier, in that all of the information (including the multimedia data) is in one file, but the resulting file can be enormous. Linking multimedia keeps the size of the GEDCOM file under control, but then when transmitting the file, the multimedia objects must either be transmitted separately or archived together with the GEDCOM into one larger file. Support for embedding media directly was dropped in the draft 5.5.1 standard.

Conflicting information

The GEDCOM standard does allow for the specification of multiple opinions or conflicting data, simply by specifying multiple records of the same type. For example, if an individual's birth date was recorded as 10 January 1800 on the birth certificate, but 11 January 1800 on the death certificate, two BIRT records for that individual would be included, the first with the 10 January 1800 date and giving the birth certificate as the source, and the second with the 11 January 1800 date and giving the death certificate as the source. The preferred record is usually listed first.

This example encoded in GEDCOM might look like this:
0 @I1@ INDI
1 NAME John /Doe/
1 BIRT
2 DATE 10 JAN 1800
2 SOUR @S1@
3 DATA
4 TEXT Transcription from birth certificate would go here
3 NOTE This birth record is preferred because it comes from the birth certificate
3 QUAY 2
1 BIRT
2 DATE 11 JAN 1800
2 SOUR @S2@
3 DATA
4 TEXT Transcription from death certificate would go here
3 QUAY 2

Conflicting data may also be the result of user errors. The standard does not specify in any way that the contents must be consistent. A birth date like "10 APR 1819" might mistakenly have been recorded as "10 APR 1918" long after the person's death. The only way to reveal such inconsistencies is by rigorous validation of the content data.

Internationalization

The GEDCOM standard supports internationalization in several ways. First, newer versions of the standard allow data to be stored in Unicode (or, more recently, UTF-8), so text in any language can be stored. Secondly, in the same way that you can have multiple events on a person, GEDCOM allows you to have multiple names for a person, so names can be stored in multiple languages (although there is no standardized way to indicate which instance is in which language). Finally, in the latest draft version (5.5.1, not yet in widespread use), the NAME field also supports a phonetic variation (FONE) and a romanized variation (ROMN) of the name.

Alternatives to GEDCOM

Commsoft, the authors of the Roots series of genealogy software and Ultimate Family Tree, defined a version called Event-Oriented GEDCOM (also known as "Event GEDCOM" and originally called InterGED), which included events as first class (zero-level) items. Although it is event based, it is still a model built on assumed reality rather than evidence. Event GEDCOM was more flexible, as it allowed some separation between believed events and the participants. However, Event GEDCOM was not widely adopted by other developers due to its semantic differences. With Roots and Ultimate Family Tree no longer available, very few people today are using Event GEDCOM.
  • Event-Oriented GEDCOM specification - Draft Release 1.0 (12 September 1994) (Microsoft Word in a ZIP
    ZIP (file format)
    Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...

     file)
  • GEDC - An XML-Based Standard for Genealogy - originated as an outgrowth of the GEDCOM 6.0 Beta specification and Gentech's Genealogical Data Model (GDM).
  • FamilyML
  • GDMUML - Genealogical Data Models in the Unified Modeling Language
  • GedML: Genealogical Data in XML - combines the GEDCOM data model with the XML standard.
  • Gendatam - genealogical data model
  • The GENTECH Genealogical Data Model
    • gdmxml, a RELAX NG Schema to validate XML documents with genealogical information according to the GENTECH Genealogical Data Model.
    • GeneaPro, an attempt to create genealogy software based on the GenTech data model.
  • GeniML - Genealogical Information Markup Language is a data model and XML vocabulary for recording and exchanging genealogical data.
  • GenXML is a file format for exchange of data between genealogy programs.
  • GRAMPS XML
  • GREnDL - Genealogical Record Exchange and Description Language - XML specification, Draft Release 1.1 (10 January 2004)
  • GEDCOM 5.5 XML - "attempts to be a 100 percent one-to-one translation of GEDCOM 5.5 into XML; it even includes the superfluous (and empty) element." - By Chad Albers - neomantic.com

See also

  • FamilySearch
    FamilySearch
    FamilySearch is a genealogy organization established and run by The Church of Jesus Christ of Latter-day Saints. It is the largest genealogy organization in the world. FamilySearch consists of a collection of records, resources, and services designed to help people learn more about their family...

    • Ancestral File Number
    • International Genealogical Index
      International Genealogical Index
      The International Genealogical Index is a database of genealogical records, compiled from several sources, and maintained by The Church of Jesus Christ of Latter-day Saints...

  • GENDEX
    GENDEX
    GENDEX File is a specification to export the index of a genealogical home page to a global name index service. Developed by Eugene W. Stark as a feature of his GEDCOM to HTML translator software, GED2HTML...

     - Genealogical index
  • Genealogical numbering systems
    Genealogical numbering systems
    Several genealogical numbering systems have been widely adopted for presenting family trees and pedigree charts in text format. Among the most popular numbering systems are: Ahnentafel , and the Register, NGSQ, Henry, d'Aboville, Meurgey de Tupigny, and de Villiers/Pama Systems...

  • GNTP
    GNTP
    The Genealogy Network Transfer Protocol is an unfinished protocol for a peer-to-peer genealogy network that was not completed because of resource constraints. The idea was to allow genealogists to share GEDCOM files in much the same way that music and other files are distributed on other...

     - Genealogy Network Transfer Protocol
  • Tiny Tafel Format
    Tiny Tafel
    The Tiny Tafel format [tye-nee tahf-uhl] provides a compact way of describing the main surnames found in a family genealogy, which can be read by humans and matched by computers using a Tafel Matching System...

     - encoded "ancestor table"

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK