XML database
Encyclopedia
An XML database is a data persistence software system that allows data to be stored in XML
format. This data can then be queried
, exported and serialized into the desired format.
Two major classes of XML database exist:
, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format.
) form.
The formal definition from the XML:DB initiative (which appears to be inactive since 2003) states that a native XML database:
Additionally, many XML databases provide a logical model of grouping documents, called "collections
". Databases can set up and manage many collections at one time. In some implementations, a hierarchy of collections can exist, much in the same way that an operating system
's directory-structure works.
All XML databases support at least one form of querying syntax. Minimally, just about all of them support XPath
for performing queries against documents or collections of documents. XPath provides a simple pathing system that allows users to identify nodes that match a particular set of criteria.
In addition to XPath, many XML databases support XSLT
as a method of transforming documents or query-results retrieved from the database. XSLT provides a declarative language written using an XML grammar. It aims to define a set of XPath filter
s that can transform documents (in part or in whole) into other formats including Plain text
, XML
, or HTML
.
Many XML databases also support XQuery
to perform querying. XQuery includes XPath as a node-selection method, but extends XPath to provide transformational capabilities. Users sometimes refer to its syntax as "FLWOR
" because the query may include the following clauses: 'for', 'let', 'where', 'order by' and 'return'. Traditional RDBMS vendors (who traditionally had SQL only engines), are now shipping with hybrid SQL and XQuery engines. Hybrid SQL/XQuery engines help to query XML data alongside the relational data, in the same query expression. This approach helps in combining relational and XML data.
Some XML databases support an API
called the XML:DB API (or XAPI) as a form of implementation-independent access to the XML datastore. In XML databases, XAPI resembles ODBC
and JDBC
as used with relational databases. On the 24th of June 2009, The Java Community Process
released the final version of the XQuery API for Java specification (XQJ) - "a common API
that allows an application to submit queries conforming to the W3C XQuery 1.0
specification and to process the results of such queries".
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
format. This data can then be queried
XQuery
- Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....
, exported and serialized into the desired format.
Two major classes of XML database exist:
- XML-enabled: these map all XML to a traditional database (such as a relationalRelational modelThe relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
database), accepting XML as input and rendering XML as output. This term implies that the database does the conversion itself (as opposed to relying on middlewareMiddlewareMiddleware is computer software that connects software components or people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact...
). - Native XML (NXD): the internal model of such databases depends on XML and uses XML documents as the fundamental unit of storage, which are, however, not necessarily stored in the form of text files.
Rationale for XML in databases
O'Connell (2005, 9.2) gives one reason for the use of XML in databases: the increasingly common use of XML for data transportTransport layer
In computer networking, the transport layer or layer 4 provides end-to-end communication services for applications within a layered architecture of network components and protocols...
, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format.
Native XML databases
The term "native XML database" (NXD) can lead to confusion. Many NXDs do not function as standalone databases at all, and do not really store the native (textCharacter (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
) form.
The formal definition from the XML:DB initiative (which appears to be inactive since 2003) states that a native XML database:
- Defines a (logical) model for an XML document — as opposed to the data in that document — and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATAPCDATAParsed Character Data is a data definition that originated in Standard Generalized Markup Language , and is used also in Extensible Markup Language Document Type Definition to designate mixed content XML elements....
, and document order. Examples of such models include the XPathXPathXPath is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document...
data model, the XML InfosetXML Information SetXML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items...
, and the models implied by the DOMDocument Object ModelThe Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...
and the events in SAXSimple API for XMLSAX is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model...
1.0.
- Has an XMLXMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
document as its fundamental unit of (logical) storage, just as a relational databaseRelational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
has a row in a table as its fundamental unit of (logical) storage.
- Need not have any particular underlying physical storage model. For example, NXDs can use relational, hierarchical, or object-oriented databaseObject databaseAn object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
structures, or use a proprietary storage format (such as indexed, compressed files).
Additionally, many XML databases provide a logical model of grouping documents, called "collections
Collection (computing)
In computer science, a collection is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. Generally, the data items will be of the same type or, in languages supporting...
". Databases can set up and manage many collections at one time. In some implementations, a hierarchy of collections can exist, much in the same way that an operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
's directory-structure works.
All XML databases support at least one form of querying syntax. Minimally, just about all of them support XPath
XPath
XPath is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document...
for performing queries against documents or collections of documents. XPath provides a simple pathing system that allows users to identify nodes that match a particular set of criteria.
In addition to XPath, many XML databases support XSLT
XSLT
XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...
as a method of transforming documents or query-results retrieved from the database. XSLT provides a declarative language written using an XML grammar. It aims to define a set of XPath filter
Filter (software)
A filter is a computer program to process a data stream. Some operating systems such as Unix are rich with filter programs. Even Windows has some simple filters built into its command shell, most of which have significant enhancements relative to the similar filter commands that were available in...
s that can transform documents (in part or in whole) into other formats including Plain text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....
, XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
, or HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
.
Many XML databases also support XQuery
XQuery
- Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....
to perform querying. XQuery includes XPath as a node-selection method, but extends XPath to provide transformational capabilities. Users sometimes refer to its syntax as "FLWOR
FLWOR
The programming language XQuery defines FLWOR as an expression that supports iteration and binding of variables to intermediate results. FLWOR is an acronym: FOR, LET, WHERE, ORDER BY, RETURN...
" because the query may include the following clauses: 'for', 'let', 'where', 'order by' and 'return'. Traditional RDBMS vendors (who traditionally had SQL only engines), are now shipping with hybrid SQL and XQuery engines. Hybrid SQL/XQuery engines help to query XML data alongside the relational data, in the same query expression. This approach helps in combining relational and XML data.
Some XML databases support an API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
called the XML:DB API (or XAPI) as a form of implementation-independent access to the XML datastore. In XML databases, XAPI resembles ODBC
Open Database Connectivity
In computing, ODBC is a standard C interface for accessing database management systems . The designers of ODBC aimed to make it independent of database systems and operating systems...
and JDBC
Java Database Connectivity
Java DataBase Connectivity, commonly referred to as JDBC, is an API for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases...
as used with relational databases. On the 24th of June 2009, The Java Community Process
Java Community Process
The Java Community Process or JCP, established in 1998, is a formalized process that allows interested parties to get involved in the definition of future versions and features of the Java platform....
released the final version of the XQuery API for Java specification (XQJ) - "a common API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
that allows an application to submit queries conforming to the W3C XQuery 1.0
XQuery
- Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....
specification and to process the results of such queries".
XML Databases with database APIs (XQJ, XML:DB, RESTful)
XML Database | License | Language | XQJ API | XML:DB API | RESTful Representational State Transfer Representational state transfer is a style of software architecture for distributed hypermedia systems such as the World Wide Web. The term representational state transfer was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation... API |
---|---|---|---|---|---|
BaseX BaseX BaseX is a native and light-weight XML database management system, developed as a community project on GitHub. It is specialized in storing, querying, and visualizing large XML documents and collections... |
BSD License | Java | |||
eXist EXist eXist is an open source database management system entirely built on XML technology, also called a native XML database. Unlike most relational database management systems, eXist uses XQuery, which is a , to manipulate its data.- eXist Benefits :... |
LGPL License | Java | |||
MarkLogic Server | Commercial Commercial software Commercial software, or less commonly, payware, is computer software that is produced for sale or that serves commercial purposes.Commercial software is most often proprietary software, but free software packages may also be commercial software.... |
C++ | |||
MonetDB/XQuery MonetDB MonetDB is an open source column-oriented database management system developed at the Centrum Wiskunde & Informatica in the Netherlands.It was designed to provide high performance on complex queries against large databases,e.g... |
Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
C++ | |||
Oracle Oracle Corporation Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems... |
Commercial Commercial software Commercial software, or less commonly, payware, is computer software that is produced for sale or that serves commercial purposes.Commercial software is most often proprietary software, but free software packages may also be commercial software.... |
C++ | |||
Sedna Sedna (database) Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in... |
Apache License Apache License The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer.... |
C++ | |||
External references
- XML Databases - The Business Case, Charles Foster, June 2008 - Talks about the current state of Databases and data persistence, how the current Relational Database model is starting to crack at the seams and gives an insight into a strong alternative for today's requirements.
- An XML-based Database of Molecular Pathways (2005-06-02) Speed / Performance comparisons of eXist, X-Hive, Sedna and Qizx/open
- XML Native Database Systems: Review of Sedna, Ozone, NeoCoreXMS 2006
- XML Data Stores: Emerging Practices
- Bhargava, P.; Rajamani, H.; Thaker, S.; Agarwal, A. (2005) XML Enabled Relational Databases, Texas, The University of Texas at Austin.
- O'Connell, S. Advanced Databases Course Notes, Southampton, University of Southampton, 2005
- Initiative for XML Databases
- XML and Databases, Ronald Bourret, September 2005
- XML Database Products, Ronald Bourret, 2000–2009
- The State of Native XML Databases, Elliotte Rusty Harold, August 13, 2007