YAML
Encyclopedia
YAML is a human-readable
Human-readable
A human-readable medium or human-readable format is a representation of data or information that can be naturally read by humans.In computing, human-readable data is often encoded as ASCII or Unicode text, rather than presented in a binary representation...

 data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 serialization
Serialization
In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment...

 format that takes concepts from programming languages such as C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

, and Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, and ideas from XML
Extensible Markup Language
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 and the data format of electronic mail (RFC 2822). YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. It is available for several programming languages.

YAML is a recursive acronym
Recursive acronym
A recursive acronym is an acronym or initialism that refers to itself in the expression for which it stands...

 for "YAML Ain't Markup Language
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

". Early in its development, YAML was said to mean "Yet Another
Yet another
Among programmers, yet another is an idiomatic qualifier in the name of a computer program, organisation, or event that is confessedly unoriginal.Stephen C...

 Markup Language
Lightweight markup language
A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form....

", but was retronym
Retronym
A retronym is a type of neologism that provides a new name for an object or concept to differentiate the original form or version of it from a more recent form or version. The original name is most often augmented with an adjective to account for later developments of the object or concept itself...

ed to distinguish its purpose as data-oriented, rather than document markup.

Features

YAML syntax was designed to be easily mapped to data types common to most high-level languages:
list, associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

, and scalar
Scalar (computing)
In computing, a scalar variable or field is one that can hold only one value at a time; as opposed to composite variables like array, list, hash, record, etc. In some contexts, a scalar value may be understood to be numeric. A scalar data type is the type of a scalar variable...

. Its familiar indented outline and lean appearance make it especially suited for tasks where humans are likely to view or edit data structures, such as configuration files, dumping during debugging, and document headers (e.g. the headers found on most e-mails are very close to YAML). Although well-suited for hierarchical data representation, it also has a compact syntax for relational data as well. Its line and whitespace delimiters make it friendly to ad hoc grep
Grep
grep is a command-line text-search utility originally written for Unix. The name comes from the ed command g/re/p...

/Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

/Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

/Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

 operations. A major part of its accessibility comes from eschewing the use of enclosures like quotation marks, brackets, braces, and open/close-tags which can be hard for the human eye to balance in nested hierarchies.

Sample document

Data structure hierarchy is maintained by outline indentation.

---
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
family: Gale

items:
- part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4

- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
price: 100.27
quantity: 1

bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS

ship-to: *id001

specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
Pay no attention to the
man behind the curtain.
...


Notice that strings do not require enclosure in quotations. The specific number of spaces in the indentation is unimportant as long as parallel elements have the same left justification and the hierarchically nested elements are indented further. This sample document defines an associative array with 7 top level keys: one of the keys, "items", contains a 2 element array (or "list"), each element of which is itself an associative array with differing keys. Relational data and redundancy removal are displayed: the "ship-to" associative array content is copied from the "bill-to" associative array's content as indicated by the anchor ( & ) and reference ( * ) labels. Optional blank lines can be added for readability. Multiple documents can exist in a single file/stream and are separated by "---". An optional "..." can be used at the end of a file (useful for signaling an end in streamed communications without closing the pipe).

Basic components of YAML

YAML offers both an indented and an "in-line" style for denoting associative arrays and lists. Here is a sampler of the components.

Lists 

Conventional block format uses a hyphen+space to begin a new item in list.
--- # Favorite movies
- Casablanca
- North by Northwest
- The Man Who Wasn't There

Optional inline format is delimited by comma+space and enclosed in brackets (similar to JSON).
--- # Shopping list
[milk, pumpkin pie, eggs, juice]

Associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

s

Keys are separated from values by a colon+space.
--- # Indented Blocks, common in YAML data files, use indentation and new lines to separate the key: value pairs
name: John Smith
age: 33
--- # Inline Blocks, common in YAML data streams, use commas to separate the key: value pairs between braces
{name: John Smith, age: 33}
Newlines preserved

--- |
There once was a man from Darjeeling
Who got on a bus bound for Ealing
It said on the door
"Please don't spit on the floor"
So he carefully spat on the ceiling

By default, the leading indent (of the first line) and trailing white space is stripped, though other behavior can be explicitly specified.
Newlines folded

--- >
Wrapped text
will be folded
into a single
paragraph

Blank lines denote
paragraph breaks
Folded text converts newlines to spaces and removes leading whitespace.
Lists of associative arrays

- {name: John Smith, age: 33}
- name: Mary Smith
age: 27
Associative arrays of lists

men: [John Smith, Bill Jones]
women:
- Mary Smith
- Susan Williams

Advanced components of YAML

Two features that distinguish YAML from the capabilities of other data serialization languages are Relational trees and Data Typing.

Data types

Explicit data typing is seldom seen in the majority of YAML documents since YAML autodetects simple types. Data types can be divided into three categories: core, defined, and user-defined. Core are ones expected to exist in any parser (e.g. floats, ints, strings, lists, maps, ...). Many more advanced data types, such as binary data, are defined in the YAML specification but not supported in all implementations. Finally YAML defines a way to extend the data type definitions locally to accommodate user defined classes, structures or primitives (e.g. quad precision floats).
Casting data types

YAML autodetects the datatype of the entity. Sometimes one wants to cast the datatype explicitly. The most common situation is a single word string that looks like a number, boolean or tag may need disambiguation by surrounding it with quotes or use of an explicit datatype tag.

---
a: 123 # an integer
b: "123" # a string, disambiguated by quotes
c: 123.0 # a float
d: !!float 123 # also a float via explicit data type prefixed by ( !! )
e: !!str 123 # a string, disambiguated by explicit type
f: !!str Yes # a string via explicit type
g: Yes # a boolean True
h: Yes we have No bananas # a string, "Yes" and "No" disambiguated by context.

Other specified data types

Not every implementation of YAML has every specification-defined data type. These built-in types use a double exclamation sigil prefix ( !! ). Particularly interesting ones not shown here are sets, ordered maps, timestamps, and hexadecimal. Here's an example of base64 encoded binary data.


---
picture: !!binary |
R0lGODlhDAAMAIQAAP//9/X
17unp5WZmZgAAAOfn515eXv
Pz7Y6OjuDg4J+fn5OTk6enp
56enmleECcgggoBADs=mZmE

Extension for user-defined data types

Many implementations of YAML can support user defined data types. This is a good way to serialize an object. Local data types are not universal data types but are defined in the application using the YAML parser library. Local data types use a single exclamation mark ( ! ).

---
myObject: !myClass { name: Joe, age: 15 }

Syntax

A compact cheat-sheet (actually written in YAML) as well as a full specification are available at the YAML web site. The following is a synopsis of the basic elements.
  • YAML streams are encoded using the set of printable Unicode
    Unicode
    Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

     characters, either in UTF-8
    UTF-8
    UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

     or UTF-16.
  • Whitespace
    Whitespace (computer science)
    In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

     indentation
    Indentation
    An indentation may refer to:* A notch, or deep recesses; for instance in a coastline, or a carving in rock* The placement of text farther to the right to separate it from surrounding text....

     is used to denote structure; however tab
    Tab
    Tab or tabs may refer to:* Tab, a British Army term for a loaded march* Tab , by Monster Magnet* Tab , a small protective covering for the fingers* Tab , the mechanism for opening a beverage can...

     characters are never allowed as indentation.
  • Comments begin with the number sign
    Number sign
    Number sign is a name for the symbol #, which is used for a variety of purposes including, in some countries, the designation of a number...

     ( # ), can start anywhere on a line, and continue until the end of the line.
  • List members are denoted by a leading hyphen
    Hyphen
    The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...

     ( - ) with one member per line, or enclosed in square brackets  and separated by comma
    Comma (punctuation)
    The comma is a punctuation mark. It has the same shape as an apostrophe or single closing quotation mark in many typefaces, but it differs from them in being placed on the baseline of the text. Some typefaces render it as a small line, slightly curved or straight but inclined from the vertical, or...

     space
    Space (punctuation)
    In writing, a space is a blank area devoid of content, serving to separate words, letters, numbers, and punctuation. Conventions for interword and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....

     .
  • Associative arrays are represented using the colon
    Colon (punctuation)
    The colon is a punctuation mark consisting of two equally sized dots centered on the same vertical line.-Usage:A colon informs the reader that what follows the mark proves, explains, or lists elements of what preceded the mark....

     space
    Space (punctuation)
    In writing, a space is a blank area devoid of content, serving to separate words, letters, numbers, and punctuation. Conventions for interword and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....

      in the form key: value, either one per line or enclosed in curly braces  and separated by comma
    Comma (punctuation)
    The comma is a punctuation mark. It has the same shape as an apostrophe or single closing quotation mark in many typefaces, but it differs from them in being placed on the baseline of the text. Some typefaces render it as a small line, slightly curved or straight but inclined from the vertical, or...

     space
    Space (punctuation)
    In writing, a space is a blank area devoid of content, serving to separate words, letters, numbers, and punctuation. Conventions for interword and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex....

     .
    • An associative array key may be prefixed with a question mark
      Question mark
      The question mark , is a punctuation mark that replaces the full stop at the end of an interrogative sentence in English and many other languages. The question mark is not used for indirect questions...

       ( ? ) to allow for liberal multi-word keys to be represented unambiguously.
  • Strings (scalars
    Scalar (computing)
    In computing, a scalar variable or field is one that can hold only one value at a time; as opposed to composite variables like array, list, hash, record, etc. In some contexts, a scalar value may be understood to be numeric. A scalar data type is the type of a scalar variable...

    ) are ordinarily unquoted, but may be enclosed in double-quotes ( " ), or single-quotes ( ' ).
    • Within double-quotes, special characters may be represented with C-style
      C (programming language)
      C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

       escape sequences starting with a backslash
      Backslash
      The backslash is a typographical mark used mainly in computing. It was first introduced to computers in 1960 by Bob Bemer. Sometimes called a reverse solidus or a slosh, it is the mirror image of the common slash....

       ( \ ). According to the documentation the only octal escape supported is \0.
  • Block scalars are delimited with indentation
    Indentation
    An indentation may refer to:* A notch, or deep recesses; for instance in a coastline, or a carving in rock* The placement of text farther to the right to separate it from surrounding text....

     with optional modifiers to preserve ( | ) or fold ( > ) newlines.
  • Multiple documents within a single stream are separated by three hyphens ( --- ).
    • three periods
      Full stop
      A full stop is the punctuation mark commonly placed at the end of sentences. In American English, the term used for this punctuation is period. In the 21st century, it is often also called a dot by young people...

       ( ... ) optionally end a file within a stream.
  • Repeated nodes are initially denoted by an ampersand
    Ampersand
    An ampersand is a logogram representing the conjunction word "and". The symbol is a ligature of the letters in et, Latin for "and".-Etymology:...

     ( & ) and thereafter referenced with an asterisk
    Asterisk
    An asterisk is a typographical symbol or glyph. It is so called because it resembles a conventional image of a star. Computer scientists and mathematicians often pronounce it as star...

     ( * ).
  • Nodes may be labeled with a type or tag using the exclamation point ( !! ) followed by a string which can be expanded into a URI.
  • YAML documents in a stream may be preceded by directives composed of a percent sign
    Percent sign
    The percent sign is the symbol used to indicate a percentage .Related signs include the permille sign ‰ and the permyriad sign , which indicate that a number is divided by one thousand or ten thousand respectively...

     ( % ) followed by a name and space delimited parameters. Two directives are defined in YAML 1.1:
    • The %YAML directive is used to identify the version of YAML in a given document.
    • The %TAG directive is used as a shortcut for URI prefixes. These shortcuts may then be used in node type tags.


YAML requires that colons and commas used as list separators be followed by a space so that scalar values containing embedded punctuation (such as 5,280 or http://www.wikipedia.org) can generally be represented without needing to be enclosed in quotes.

Two additional sigil
Sigil (computer programming)
In computer programming, a sigil is a symbol attached to a variable name, showing the variable's datatype or scope. In 1999 Philip Gwyn adopted the term "to mean the funny character at the front of a Perl variable".- Historical context:...

 characters are reserved in YAML for possible future standardisation: the at sign
At sign
The at sign , also called the ampersat, apetail, arroba, atmark, at symbol, commercial at or monkey tail, is formally an abbreviation of the accounting and commercial invoice term "at the rate of"...

 ( @ ) and accent grave ( ` ).

Comparison to other data structure format languages

While YAML shares similarities with JSON, XML
Extensible Markup Language
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 and SDL
Simple Declarative Language
The Simple Declarative Language is a cross-platform declarative programming language used for defining basic data structures such as lists, maps, and trees of typed data in a compact, easy to read representation....

, it also has characteristics that are unique in comparison to many other similar format languages.

JSON

JSON syntax is a subset of YAML version 1.2, which was promulgated with the express purpose of bringing YAML "into compliance with JSON as an official subset." Though prior versions of YAML were not strictly compatible, the discrepancies were rarely noticeable and most JSON documents can be parsed by YAML parsers. This is because JSON's semantic structure is equivalent to the optional "inline-style" of writing YAML. While extended hierarchies can be written in inline-style like JSON, this is not a recommended YAML style except when it aids clarity.

YAML has many additional features lacking in JSON, including extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order.

XML and SDL

YAML lacks the notion of tag attributes that are found in XML and SDL (Simple Declarative Language
Simple Declarative Language
The Simple Declarative Language is a cross-platform declarative programming language used for defining basic data structures such as lists, maps, and trees of typed data in a compact, easy to read representation....

). For data structure serialization, tag attributes are, arguably, a feature of questionable utility since the separation of data and meta-data adds complexity when represented by the natural data structures (associative arrays, lists) in common languages. Instead YAML has extensible type declarations (including class types for objects).

YAML itself does not have XML's language-defined document schema descriptors that allow, for example, a document to self validate. However, there are several externally defined schema descriptor languages for YAML (e.g. Doctrine, Kwalify and Rx) that fulfill that role. Moreover, the semantics provided by YAML's language-defined type-declarations in the YAML document itself frequently relaxes the need for a validator in simple, common situations. Additionally, YAXML, which represents YAML data structures in XML, allows XML schema importers and output mechanisms like XSLT to be applied to YAML.

Indented delimiting

Because YAML primarily relies on outline indentation for structure, it is especially resistant to delimiter collision. YAML's insensitivity to quotes and braces in scalar values means one may embed XML, SDL, JSON or even YAML documents inside a YAML document by simply indenting it in a block literal:

---
example: >
HTML goes into YAML without modification
message: |

"Three is always greater than
two, even for large values of two"

--Author Unknown


date: 2007-06-01


YAML may be placed in JSON and SDL by quoting and escaping all interior quotes. YAML may be placed in XML by escaping reserved characters , and converting whitespace; or by placing it in a CDATA-section
CDATA
The term CDATA, meaning character data, is used for distinct, but related, purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited...

.

Non-hierarchical data models

Unlike SDL, and JSON, which can only represent data in a hierarchical model with each child node having a single parent, YAML also offers a simple relational scheme that allows repeats of identical data to be referenced from two or more points in the tree rather than entered redundantly at those points. This is similar to the facility IDREF built into XML. The YAML parser then expands these references into the fully populated data structures they imply when read in, so whatever program is using the parser does not have to be aware of a relational encoding model, unlike XML processors which do not expand references. This expansion can enhance readability while reducing data entry errors in configuration files or processing protocols where many parameters remain the same in a sequential series of records while only a few vary. An example being that "ship-to" and "bill-to" records in an invoice are nearly always the same data.

Practical considerations

YAML is line-oriented and thus it is often simple to convert the unstructured output of existing programs into YAML format while having them retain much of the look of the original document. Because there are no close-tags, braces, or quotation marks to balance, it is generally easy to generate well-formed YAML directly from distributed print statements within unsophisticated programs. Likewise, the white space delimiters facilitate quick-and-dirty filtering of YAML files using the line oriented commands in grep, awk, perl, ruby, and python.

In particular, unlike mark-up languages, chunks of consecutive YAML lines tend to be well-formed YAML documents themselves. This makes it very easy to write parsers that do not have to process a document in its entirety (e.g. balancing open- and close-tags and navigating quoted and escaped characters) before they begin extracting specific records within. This property is particularly expedient when iterating in a single, stateless pass, over records in a file whose entire data structure is too large to hold in memory, or for which reconstituting the entire structure to extract one item would be prohibitively expensive.

Counterintuitively, although its indented delimiting might seem to complicate deeply nested hierarchies, YAML handles indents as small as a single space, and this may achieve better compression than markup languages. Additionally, extremely deep indentation can be avoided entirely by either: 1) reverting to "inline-style" (i.e. JSON-like format) without the indentation; or 2) using relational anchors to unwind the hierarchy to a flat form that the YAML parser will transparently reconstitute into the full data structure.

Security

YAML is purely a data representation language and thus has no executable commands. This means that parsers will be (or at least should be) safe to apply to tainted data without fear of a latent command-injection security hole. For example, because JSON is native JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

 it is tempting to use the JavaScript interpreter itself to evaluate the data structure into existence, leading to command injection holes when inadequately verified. While validation and safe parsing is inherently possible in any data language, implementation is such a notorious pitfall that YAML's lack of an associated command language may be a relative security benefit.

However, YAML allows language specific tags so that arbitrary local objects can be created by a parser that supports those tags. Any YAML parser that allows sophisticated object instantiation to be executed opens the potential for an injection attack. Perl parsers that allow loading of objects of arbitrary class create so-called "blessed" values. Using these values may trigger unexpected behavior, e.g. if the class uses overloaded operators. This may lead to execution of arbitrary Perl code.

The situation is similar for Python parsers. According to the PyYAML documentation:


Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as the Internet. The function yaml.safe_load limits this ability to simple Python objects like integers or lists.

Data processing and representation

The XML and YAML specifications provide very different logical models for data node representation, processing, and storage.

XML: The primary logical structures in an XML instance document are: 1) Element; and 2) Attribute. For these primary logical structures, the base XML specification does not define constraints regarding such factors as duplication of elements or the order in which they are allowed to appear. In defining conformance for XML processors, the XML specification generalizes them into two types: 1) validating ; and 2) non-validating. The XML specification asserts no detailed definitions for: an API; processing model; or data representation model; although several are defined in separate specifications that a user or specification implementor may choose independently. These include the Document Object Model
Document Object Model
The Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...

 and XQuery
XQuery
- Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....

.

A richer model for defining valid XML content is the W3C XML Schema standard. This allows for full specification of valid XML content and is supported by a wide range of open source, free and commercial processors and libraries.

YAML: The primary logical structures in a YAML instance document  are: 1) Scalar; 2) Sequence; and 3) Mapping. The YAML specification also indicates some basic constraints that apply to these primary logical structures. For example, according to the specification, mapping keys do not have an order. In every case where node order is significant, a sequence must be used.

Moreover, in defining conformance for YAML processors, the YAML specification defines two primary operations: 1) Dump; and 2) Load. All YAML-compliant processors must provide at least one of these operations, and may optionally provide both. Finally, the YAML specification defines an information model or "representation graph" which must be created during processing for both Dump and Load operations, although this representation need not be made available to the user through an API.

Pitfalls and implementation defects

  • Editors:
    • An editor mode that autoexpands tabs to spaces and displays text in a fixed-width font is recommended. Tab expansion mismatch is a pitfall when pasting text copied from Web pages.
    • The editor needs to handle UTF-8 and UTF-16 correctly (otherwise, it will be necessary to use only ASCII
      ASCII
      The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

       as a subset of UTF-8).
  • Strings:
    • YAML allows one to avoid quoted strings which can enhance readability and avoid the need for nested escape sequences. However, this leads to a pitfall when inline strings are ambiguous single words (e.g. digits or boolean words) or when the unquoted phrase accidentally contains a YAML construct (e.g., a leading exclamation point or a colon-space after a word: "!Caca de vaca!" or "Caution: lions ahead"). This is not an issue that anyone using a proper YAML emitter will confront, but can come up in ad hoc scripts or human editing of files. In such a case a better approach is to use block literals rather than inline string expressions as these have no such ambiguities to resolve.
  • Anticipating implementation idiosyncrasies:
    • Some implementations of YAML, such as Perl's YAML.pm will load an entire file (stream) and parse it en-masse. Conversely, YAML::Tiny only reads the first document in the stream and stops. Other implementations like PyYaml are lazy and iterate over the next document only upon request. For very large files in which one plans to handle the documents independently, instantiating the entire file before processing may be prohibitive. Thus in YAML.pm, occasionally one must chunk a file into documents and parse those individually. Fortunately, YAML makes this easy since this simply requires splitting on the document separator, m/^---/.

Portability

Simple YAML files (e.g. key value pairs) are readily parsed with regular expressions without resort to a formal YAML parser. YAML emitters and parsers for many popular languages written in the pure native language itself exist, making it portable in a self-contained manner. Bindings to C-libraries also exist when speed is needed.

C libraries

  • LibYAML As of 2007-06, this implementation of YAML 1.1 is stable and recommended by the YAML specification authors for production use (despite the 0.1.2 version number and a mild caution that the API is not barred from evolution.). A tutorial on its use can be found here
  • SYCK This implementation supports most of YAML 1.0 specification and is in widespread use. It is optimized for use with higher level interpreted languages, obtaining speed by writing directly to the symbol table of the higher level language when it can. As of 2005 it is no longer maintained but remains available. This implementation is available under the user's choice of two licenses; the 2-clause BSD license, and an unusual "Death and Repudiation" license which restricts its use to dead people only.

C++ library

  • yaml-cpp C++ Yaml library, compatible with YAML 1.2 spec

Bindings

Native implementations and C libraries bindings for YAML exist for the following languages:
  • Actionscript
    ActionScript
    ActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...

    • as3yaml A direct port of jvyaml for Actionscript 3

  • C++
    C++
    C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...


  • Go
    Go (programming language)
    Go is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being...

    • goyaml based on LibYAML but written entirely in Go

  • Haskell
    Haskell (programming language)
    Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...


  • Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

    • jvyaml based on Syck API, and patterned off RbYAML
    • JYaml pure small Java implementation
    • SnakeYAML YAML 1.1 parser and emitter for Java 5

  • JavaScript
    JavaScript
    JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....





  • Objective-C
    Objective-C
    Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...


  • Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

    • YAML is a common interface to several YAML parsers.
    • YAML::Tiny implements a useful subset of YAML; small, pure Perl, and faster than the full implementation.
    • YAML::Syck Binding to SYCK C-library. Offers fast, highly featured YAML
    • YAML::XS Binding to LibYaml. Better yaml 1.1 compatibility.

  • PHP
    PHP
    PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

    • Spyc is a pure PHP implementation
    • PHP-Syck (binding to SYCK library)
    • Symfony YAML was initially released as part of the Symfony
      Symfony
      Symfony is a web application framework written in PHP which follows the model-view-controller paradigm. Released under the MIT license, Symfony is free software...

       framework
    • PECL Yaml (binding to LibYAML library)

  • Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

    • PyYaml Highly featured. Pure Python or optionally uses LibYAML.
    • PySyck Binding to SYCK C-Library

  • Ruby (YAML included in standard library since 1.8. based on SYCK)
    • Ya2YAML with full UTF-8
      UTF-8
      UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

       support
    • ZAML far faster than default library.
    • RbYAML A YAML parser in pure Ruby

  • R
    R (programming language)
    R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....


  • Tcl
    Tcl
    Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...


  • XML
    Extensible Markup Language
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

     YAXML (currently draft only)

See also

  • Comparison of data serialization formats
    Comparison of data serialization formats
    This is a comparison of data serialization formats, different ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.-Overview:*a. The current default format is binary....

  • List of lightweight markup languages


Other human-readable serialization formats include:
  • AsciiDoc
    AsciiDoc
    AsciiDoc is a lightweight markup language. It requires the installation of a special AsciiDoc "converter program" that can convert AsciiDoc documents to XHTML, DocBook or HTML. DocBook in turn can be converted to other formats such as PDF, TeX, Unix manpages and many more using the tool A2X which...

  • JSON
    JSON
    JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

    , the JavaScript Object Notation discussed above
  • OGDL
    OGDL
    OGDL , is a "structured textual format that represents information in the form of graphs, where the nodes are strings and the arcs or edges are spaces or indentation."...

  • Plist, the object serialization format from NEXTSTEP
    NEXTSTEP
    NeXTSTEP was the object-oriented, multitasking operating system developed by NeXT Computer to run on its range of proprietary workstation computers, such as the NeXTcube...

  • S-expression
    S-expression
    S-expressions or sexps are list-based data structures that represent semi-structured data. An S-expression may be a nested list of smaller S-expressions. S-expressions are probably best known for their use in the Lisp family of programming languages...

    s
  • SDL
    Simple Declarative Language
    The Simple Declarative Language is a cross-platform declarative programming language used for defining basic data structures such as lists, maps, and trees of typed data in a compact, easy to read representation....

  • Simple Outline XML
    Simple Outline XML
    Simple Outline XML is a compressed way of writing XML.SOX uses indenting to represent the structure of an XML document, eliminating the need for closing tags.- Example :The following XHTML markup fragment: Sample page A very brief page...

  • Struxt
    Struxt
    Struxt is a human-readable data format designed to be structurally equivalent to XML yet representationally similar to C-style programming languages.'Struxt stands for "Structured Text".- Features :...

     C-style equivalent to XML
    Extensible Markup Language
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

  • Candle Markup, unified markup and object notation

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK