PureXML
Encyclopedia
pureXML is the native XML
storage feature in the IBM DB2
data server. pureXML provides query language
s, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that DB2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format.
to work with the data.
XML data is stored in columns of DB2 tables that have the XML data type. XML data is stored in a parsed format that reflects the hierarchical nature of the original XML data. As such, pureXML uses trees and nodes as its model for storing and processing XML data. If you instruct DB2 to validate XML data against an XML schema prior to storage, DB2 annotates all nodes in the XML hierarchy with information about the schema types; otherwise, it will annotate the nodes with default type information. Upon storage, DB2 preserves the internal structure of XML data, converting its tag names and other information into integer values. Doing so helps conserve disk space and also improves the performance of queries that use navigational expressions. However, users aren't aware of this internal representation. Finally, DB2 automatically splits XML nodes across multiple database pages, as needed.
XML schemas specify which XML elements are valid, in what order these elements should appear in XML data, which XML data types are associated with each element, and so on. pureXML allows you to validate the cells in a column of XML data against no schema, one schema, or multiple schemas. pureXML also provides tools to support evolving XML schemas.
IBM has enhanced its programming language
interfaces to support access to its XML data. These enhancements span Java
(JDBC), C
(embedded SQL and call-level interface), COBOL
(embedded SQL), PHP
, and Microsoft
's .NET Framework
(through the DB2.NET provider).
, Unix
, and Microsoft Windows
release, which was codenamed Viper, in June 2006. It was available on DB2 9 for z/OS
in March 2007. In October 2007, IBM released DB2 9.5 with improved XML data transaction performance and improved storage savings. In June 2009, IBM released DB2 9.7 with XML supported for database-partitioned, range-partitioned, and multi-dimensionally clustered tables as well as compression of XML data and indices.
with its 11g
product and Microsoft with its SQL Server
product.
pureXML also competes with XML-only databases like BaseX, eXist
, MarkLogic or Sedna
.
The following books are also available for purchase:
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
storage feature in the IBM DB2
IBM DB2
The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...
data server. pureXML provides query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
s, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that DB2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format.
Technical information
DB2 includes two distinct storage mechanisms: one for efficiently managing traditional SQL data types, and another for managing XML data. The underlying storage mechanism is transparent to users and applications; they simply use SQL (including SQL with XML extensions or SQL/XML) or XQueryXQuery
- Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....
to work with the data.
XML data is stored in columns of DB2 tables that have the XML data type. XML data is stored in a parsed format that reflects the hierarchical nature of the original XML data. As such, pureXML uses trees and nodes as its model for storing and processing XML data. If you instruct DB2 to validate XML data against an XML schema prior to storage, DB2 annotates all nodes in the XML hierarchy with information about the schema types; otherwise, it will annotate the nodes with default type information. Upon storage, DB2 preserves the internal structure of XML data, converting its tag names and other information into integer values. Doing so helps conserve disk space and also improves the performance of queries that use navigational expressions. However, users aren't aware of this internal representation. Finally, DB2 automatically splits XML nodes across multiple database pages, as needed.
XML schemas specify which XML elements are valid, in what order these elements should appear in XML data, which XML data types are associated with each element, and so on. pureXML allows you to validate the cells in a column of XML data against no schema, one schema, or multiple schemas. pureXML also provides tools to support evolving XML schemas.
IBM has enhanced its programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
interfaces to support access to its XML data. These enhancements span Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
(JDBC), C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
(embedded SQL and call-level interface), COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
(embedded SQL), PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
, and Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
's .NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
(through the DB2.NET provider).
History
pureXML was first included in the DB2 9 for LinuxLinux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
, and Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
release, which was codenamed Viper, in June 2006. It was available on DB2 9 for z/OS
Z/OS
z/OS is a 64-bit operating system for mainframe computers, produced by IBM. It derives from and is the successor to OS/390, which in turn followed a string of MVS versions.Starting with earliest:*OS/VS2 Release 2 through Release 3.8...
in March 2007. In October 2007, IBM released DB2 9.5 with improved XML data transaction performance and improved storage savings. In June 2009, IBM released DB2 9.7 with XML supported for database-partitioned, range-partitioned, and multi-dimensionally clustered tables as well as compression of XML data and indices.
Competition
DB2 is a hybrid data server—it offers data management for traditional relational data, as well as providing native XML data management. Other vendors that offer data management for both relational data and native XML storage include OracleOracle Corporation
Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...
with its 11g
Oracle Database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....
product and Microsoft with its SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...
product.
pureXML also competes with XML-only databases like BaseX, eXist
EXist
eXist is an open source database management system entirely built on XML technology, also called a native XML database. Unlike most relational database management systems, eXist uses XQuery, which is a , to manipulate its data.- eXist Benefits :...
, MarkLogic or Sedna
Sedna (database)
Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...
.
User groups
The International DB2 Users Group (IDUG) is an independent, not-for-profit association of IT professionals who use IBM DB2. IDUG provides education, technical resources, peer networking opportunities, online resources and other programs for DB2 users.Books
IBM International Technical Support Organization (ITSO) has published the following books, which are available in print or as free e-books:The following books are also available for purchase:
Education and training
The following pureXML classroom and online courses are available from IBM Education:- Query and Manage XML Data with DB2 9. IBM course CG130. Classroom. Duration: 4 days.
- Query XML Data with DB2 9. IBM course CG100. Classroom. Duration: 2 days (first 2 days of CG130).
- Managing XML Data in DB2 9. IBM course CG160. Classroom. Duration: 2 days (last 2 days of CG130).
- DB2 pureXML. IBM Course CT140. Self-paced study plus Live Virtual Classroom.