XML Schema
Encyclopedia
XML Schema, published as a W3C recommendation
in May 2001, is one of several XML schema languages
. It was the first separate schema language for XML
to achieve Recommendation status by the W3C. Because of confusion between XML Schema as a specific W3C specification, and the use of the same term to describe schema languages in general, some parts of the user community referred to this language as WXS, an initialism for W3C XML Schema, while others referred to it as XSD, an initialism for XML Schema Document—a document written in the XML Schema language, typically containing the "xsd" XML namespace
prefix and stored with the ".xsd" filename extension
. In Version 1.1 (currently in July 2011 a Candidate Recommendation), the W3C has chosen to adopt XSD as the preferred name, and that is the name used in this article.
Like all XML schema languages, XSD can be used to express a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XSD was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data type
s. Such a post-validation infoset
can be useful in the development of XML document processing software, but the schema language's dependence on specific data types has provoked criticism.
s and other early XML schema efforts such as DDML, SOX
, XML-Data, and XDR
. It has adopted features from each of these proposals but is also a compromise among them. Of those languages, XDR and SOX continued to be used and supported for a while after XML Schema was published. A number of Microsoft
products supported XDR until the release of MSXML
6.0 (which dropped XDR in favor of XML Schema) in December 2006. Commerce One
, Inc. supported its SOX schema language until declaring bankruptcy in late 2004.
The most obvious features offered in XSD that are not available in XML's native Document Type Definition
s (DTDs) are namespace
awareness, and datatypes, that is, the ability to define element and attribute content as containing values such as integers and dates rather than arbitrary text.
The XSD 1.0 specification was originally published in 2001, with a second edition following in 2004 to correct large numbers of errors. A sequence of drafts of its successor, XSD 1.1, have been published, the latest being a Candidate Recommendation dated 21 July 2011.
Schema documents are organized by namespace: all the named schema components belong to a target namespace, and the target namespace is a property of the schema document as a whole. A schema document may include other schema documents for the same namespace, and may import schema documents for a different namespace.
When an instance document is validated against a schema (a process known as assessment), the schema to be used for validation can either be supplied as a parameter to the validation engine, or it can be referenced directly from the instance document using two special attributes,
XML Schema Documents usually have the filename extension ".xsd". A unique Internet Media Type
is not yet registered for XSDs, so "application/xml" or "text/xml" should be used, as per RFC 3023.
XSD provides a set of 19 primitive data types (
Twenty-five derived types are defined within the specification itself, and further derived types can be defined by users in their own schemas.
that was implicit during validation. The XML Schema data model includes:
This collection of information is called the Post-Schema-Validation Infoset (PSVI)
. The PSVI gives a valid XML document its "type" and facilitates treating the document as an object, using object-oriented programming
(OOP) paradigms.
A number of development tools can be used to create a graphical representation of a schema. Many of them create diagrams similar to the one shown below:
An example of an XML document that conforms to this schema
. This code allows contents of XML documents to be treated as objects within the programming environment.
stylesheet, that will produce high quality readable HTML and printed material.
A good summary of the criticisms is provided by James Clark
(who promotes his own alternative, RELAX NG
):
and Saxon
have both released implementations that are around 90% complete.)
Significant new features in XSD 1.1 are:
Until the latest draft, XSD 1.1 also proposed the addition of a new numeric data type, precisionDecimal. This proved controversial, and was therefore dropped from the specification at a late stage of development.
W3C XML Schema 1.1 Specification
Other
W3C recommendation
A W3C Recommendation is the final stage of a ratification process of the World Wide Web Consortium working group concerning a technical standard. This designation signifies that a document has been subjected to a public and W3C-member organization's review. It aims to standardise the Web technology...
in May 2001, is one of several XML schema languages
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...
. It was the first separate schema language for XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
to achieve Recommendation status by the W3C. Because of confusion between XML Schema as a specific W3C specification, and the use of the same term to describe schema languages in general, some parts of the user community referred to this language as WXS, an initialism for W3C XML Schema, while others referred to it as XSD, an initialism for XML Schema Document—a document written in the XML Schema language, typically containing the "xsd" XML namespace
XML Namespace
xmlns tagged XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary...
prefix and stored with the ".xsd" filename extension
Filename extension
A filename extension is a suffix to the name of a computer file applied to indicate the encoding of its contents or usage....
. In Version 1.1 (currently in July 2011 a Candidate Recommendation), the W3C has chosen to adopt XSD as the preferred name, and that is the name used in this article.
Like all XML schema languages, XSD can be used to express a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XSD was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
s. Such a post-validation infoset
XML Information Set
XML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items...
can be useful in the development of XML document processing software, but the schema language's dependence on specific data types has provoked criticism.
History
In its appendix of references, the XSD specification acknowledges the influence of DTDDocument Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
s and other early XML schema efforts such as DDML, SOX
Schema for Object-Oriented XML
Schema for Object-Oriented XML, or SOX, is an XML schema language developed by Commerce One. In 1998 a SOX specification was submitted to the World Wide Web Consortium and published as a W3C Note. A revised version, SOX 2.0, was published as a W3C Note in 1999.SOX was one of several predecessors of...
, XML-Data, and XDR
XDR Schema
XML-Data Reduced was a schema language for specifying and validating XML documents.In January 1998, Microsoft, the University of Edinburgh and others submitted a proposal for an XML schema language called XML-Data to the World Wide Web Consortium...
. It has adopted features from each of these proposals but is also a compromise among them. Of those languages, XDR and SOX continued to be used and supported for a while after XML Schema was published. A number of Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
products supported XDR until the release of MSXML
MSXML
Microsoft XML Core Services is a set of services that allow applications written in JScript, VBScript, and Microsoft development tools to build Windows-native XML-based applications...
6.0 (which dropped XDR in favor of XML Schema) in December 2006. Commerce One
Commerce One
Commerce One was a pioneering e-commerce company founded in 1994 as DistriVision in Pleasanton, California. The company was renamed Commerce One in 1997, and went public in 1999. They were one of the darlings in the hot B2B sector, and saw their stock soar from 20 to over 600 in early 2000,...
, Inc. supported its SOX schema language until declaring bankruptcy in late 2004.
The most obvious features offered in XSD that are not available in XML's native Document Type Definition
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
s (DTDs) are namespace
XML Namespace
xmlns tagged XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary...
awareness, and datatypes, that is, the ability to define element and attribute content as containing values such as integers and dates rather than arbitrary text.
The XSD 1.0 specification was originally published in 2001, with a second edition following in 2004 to correct large numbers of errors. A sequence of drafts of its successor, XSD 1.1, have been published, the latest being a Candidate Recommendation dated 21 July 2011.
Schemas and Schema Documents
Technically, a schema is an abstract collection of metadata, consisting of a set of schema components: chiefly element and attribute declarations and complex and simple type definitions. These components are usually created by processing a collection of schema documents, which contain the source language definitions of these components. In popular usage, however, a schema document is often referred to as a schema.Schema documents are organized by namespace: all the named schema components belong to a target namespace, and the target namespace is a property of the schema document as a whole. A schema document may include other schema documents for the same namespace, and may import schema documents for a different namespace.
When an instance document is validated against a schema (a process known as assessment), the schema to be used for validation can either be supplied as a parameter to the validation engine, or it can be referenced directly from the instance document using two special attributes,
xsi:schemaLocation
and xsi:noNamespaceSchemaLocation
. (The latter mechanism requires the client invoking validation to trust the document sufficiently to know that it is being validated against the correct schema. "xsi" is the conventional prefix for the namespace "http://www.w3.org/2001/XMLSchema-instance".)XML Schema Documents usually have the filename extension ".xsd". A unique Internet Media Type
Internet media type
An Internet media type, originally called a MIME type after MIME and sometimes a Content-type after the name of a header in several protocols whose value is such a type, is a two-part identifier for file formats on the Internet.The identifiers were originally defined in RFC 2046 for use in email...
is not yet registered for XSDs, so "application/xml" or "text/xml" should be used, as per RFC 3023.
Data types
Unlike DTDs, an XML Schema allows the content of an element or attribute to be validated against a data type. For example, an attribute might be constrained to hold only a valid date or a decimal number.XSD provides a set of 19 primitive data types (
anyURI
, base64Binary
, boolean
, date
, dateTime
, decimal
, double
, duration
, float
, hexBinary
, gDay
, gMonth
, gMonthDay
, gYear
, gYearMonth
, NOTATION
, QName
, string
, and time
). It allows new data types to be constructed from these primitives by three mechanisms:
- restriction (reducing the set of permitted values),
- list (allowing a sequence of values), and
- union (allowing a choice of values from several types).
Twenty-five derived types are defined within the specification itself, and further derived types can be defined by users in their own schemas.
Post-Schema-Validation Infoset
After XML Schema-based validation, it is possible to express an XML document's structure and content in terms of the data modelData model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
that was implicit during validation. The XML Schema data model includes:
- the vocabulary (element and attribute names)
- the content model (relationships and structure)
- the data types.
This collection of information is called the Post-Schema-Validation Infoset (PSVI)
PSVI
PSVI is an acronym for Post-Schema-Validation Infoset, a term used in XML parsing. It is the extended infoset after the XML instance has been validated against the attached schema document and extends the XML infoset after validation. Briefly, an XML schema assigns an identifiable type to each...
. The PSVI gives a valid XML document its "type" and facilitates treating the document as an object, using object-oriented programming
Object-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
(OOP) paradigms.
Example
This is an example of a rather simple schema document to describe an address.A number of development tools can be used to create a graphical representation of a schema. Many of them create diagrams similar to the one shown below:
An example of an XML document that conforms to this schema
Secondary uses for XML Schemas
The primary reason for defining an XML schema is to formally describe an XML document; however the resulting schema has a number of other uses that go beyond simple validation.Code generation
The schema can be used to generate code, referred to as XML Data BindingXML data binding
XML data binding refers to a means of representing information in an XML document as an object in computer memory. This allows applications to access the data in the XML from the object rather than using the DOM or SAX to retrieve the data from a direct representation of the XML itself.An XML data...
. This code allows contents of XML documents to be treated as objects within the programming environment.
Document generation
The schema can be used to generate human-readable documentation; this is especially useful where the authors have made use of the annotation elements. No formal standard exists for documentation generation, but a number of tools are available, such as the Xs3pXs3p
xs3p is an XSLT stylesheet that generates XHTML documentation from XML Schema Definition language schema.xs3p requires an XSLT processor like Xalan from Apache Software Foundation...
stylesheet, that will produce high quality readable HTML and printed material.
Criticism
Although XML Schema is successful in that it has been widely adopted and largely achieves what it set out to achieve, it has been the subject of a great deal of severe criticism, perhaps more so than any other W3C Recommendation.A good summary of the criticisms is provided by James Clark
James Clark (XML expert)
James Clark, is the author of groff and expat and has done much work with open-source software and XML. Born in London, and educated at Charterhouse and Merton College, Oxford, Clark has lived in Bangkok, Thailand since 1995, and is now a permanent resident...
(who promotes his own alternative, RELAX NG
RELAX NG
In computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...
):
- A schema written using XSD is difficult to read and understand.
- There are many surprises in the language, for example that restriction of elements works differently from restriction of attributes.
- The W3C Recommendation itself is extremely difficult to read.
- XSD lacks any formal mathematical specification.
- XSD provides no facilities to state that the value or presence of one attribute is dependent on the values or presence of other attributes (so-called co-occurrence constraints).
- XSD offers very weak support for unordered content.
- The set of XSD datatypes on offer is highly arbitrary.
- There is no way for an XSD schema to indicate which elements are permitted at the top level of a document.
- The use of
xsi:schemaLocation
, an attribute that appears within an instance to identify the schema to be used for validation, causes security and interoperability problems. - The two tasks of validation and augmentation (adding type information and default values) should be kept separate.
Version 1.1
As of July 2011, XSD 1.1 is a Candidate Recommendation, which means it is in the final review phase before becoming an approved W3C specification. This requires completion of conformance testing reports from two implementations. (XercesXerces
Xerces is a collection of software libraries for parsing, validating, serializing and manipulating XML. The library implements a number of standard APIs for XML parsing, including DOM, SAX and SAX2. The implementation is available in Java, C++ and Perl programming languages.-External...
and Saxon
Saxon XSLT
Saxon is an XSLT and XQuery processor created by Michael Kay. There are open-source and also closed-source commercial versions. Versions exist for Java and .NET.The current version, as of December 2010, is 9.3.- Versions :...
have both released implementations that are around 90% complete.)
Significant new features in XSD 1.1 are:
- The ability to define assertions against the document content by means of XPath 2.0XPath 2.0XPath 2.0 is the current version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007....
expressions (an idea borrowed from SchematronSchematronIn markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees...
) - The ability to select the type against which an element will be validated based on the values of the element's attributes ("conditional type assignment")
- Relaxing the rules whereby explicit elements in a content model must not match wildcards also allowed by the model
- The ability to specify wildcards (for both elements and attributes) that apply to all types in the schema, so that they all implement the same extensibility policy
Until the latest draft, XSD 1.1 also proposed the addition of a new numeric data type, precisionDecimal. This proved controversial, and was therefore dropped from the specification at a late stage of development.
See also
- List of XML schemas - list of XML schemas in use on the Internet sorted by purpose
- RELAX NGRELAX NGIn computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...
- another XML schema language (an ISO international standard) that is often used with XML Schema datatypes - XML Schema EditorXML Schema EditorThe W3C's XML Schema Recommendation defines a formal mechanism for describing XML documents. The standard has become popular and is used by the majority of standards bodies when describing their data....
- Information about XML Schema Editing Tools - XML Schema Language ComparisonXML Schema Language ComparisonAn XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. There are several different languages available for specifying an XML...
- Comparison to other XML Schema languages. - Unique Particle AttributionUnique Particle AttributionThe Unique Particle Attribution rule is a mechanism to prevent ambiguity in W3C XML Schema version 1.0.Due to the UPA rule the XML schema fragment given below is prohibited: Given the XML instance fragment: 42...
Further reading
- Definitive XML Schema, Priscilla Walmsley, Prentice-Hall, 2001, ISBN 0130655678
- XML Schema, Eric van der Vlist, O'Reilly, 2001, ISBN 0596002521
- The XML Schema Companion, Neil Bradley, Addison-Wesley, 2003, ISBN 0321136179
- Professional XML Schemas, Jon Ducket et al., Wrox Press, 2001, ISBN 1861005474
- XML Schemas, Lucinda Dykes et al., Sybex, ISBN 0782140459
External links
W3C XML Schema 1.0 SpecificationW3C XML Schema 1.1 Specification
Other