XPath
Encyclopedia

XPath is a language for selecting nodes from an XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 document. In addition, XPath
XPath
XPath is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document...

 may be used to compute values (strings, numbers, or boolean values) from the content of an XML document. The current version of the language is XPath 2.0
XPath 2.0
XPath 2.0 is the current version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007....

, but version 1.0 is still more widely used.

The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as an XPath.

Originally motivated by a desire to provide a common syntax and behavior model between XPointer
XPointer
XPointer is a system for addressing components of XML based internet media.XPointer is divided among four specifications: a "framework" which forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing...

 and XSLT
XSLT
XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

, subsets of the XPath query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

 are used in other W3C specifications such as XML Schema and XForms
XForms
XForms is an XML format for the specification of a data processing model for XML data and user interface for the XML data, such as web forms...

.

Syntax and semantics

The most important kind of expression in XPath is a location path. A location path consists of a sequence of location steps. Each location step has three components:
  • an axis
  • a node test
  • zero or more predicates.


An XPath expression is evaluated with respect to a context node. An Axis Specifier such as 'child' or 'descendant' specifies the direction to navigate from the context node. The node test and the predicate are used to filter the nodes specified by the axis specifier: For example the node test 'A' requires that all nodes navigated to must have label 'A'. A predicate can be used to specify that the selected nodes have certain properties, which are specified by XPath expressions themselves.

The XPath syntax comes in two flavours: the abbreviated syntax, is more compact and allows XPaths to be written and read easily using intuitive and, in many cases, familiar characters and constructs. The full syntax is more verbose, but allows for more options to be specified, and is more descriptive if read carefully.

Abbreviated syntax

The compact notation allows many defaults and abbreviations for common cases. Given source XML containing at least


<A>
<B>
<C/>
</B>
</A>


the simplest XPath takes a form such as
  • /A/B/C

which selects C elements that are children of B elements that are children of the A element that forms the outermost element of the XML document. The XPath syntax is designed to mimic URI (Uniform Resource Identifier
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

) and Unix-style file path syntax.

More complex expressions can be constructed by specifying an axis other than the default 'child' axis, a node test other than a simple name, or predicates, which can be written in square brackets after any step. For example, the expression
  • A//B/*[1]

selects the first element ('[1]'), whatever its name ('*'), that is a child ('/') of a B element that itself is a child or other, deeper descendant ('//') of an A element that is a child of the current context node (the expression does not begin with a '/'). If there are several suitable B elements in the document, this actually returns a set of all their first children. ("(A//B/*)[1]" returns just the first such node.)

Expanded syntax

In the full, unabbreviated syntax, the two examples above would be written
  • /child::A/child::B/child::C
  • child::A/descendant-or-self::node/child::B/child::*[position=1]


Here, in each step of the XPath, the axis (e.g. child or descendant-or-self) is explicitly specified, followed by :: and then the node test, such as A or node in the examples above

Axis specifiers

The Axis Specifier indicates navigation direction within the tree representation of the XML document. The axes available are:
Full Syntax Abbreviated Syntax Notes
ancestor
ancestor-or-self
attribute @ @abc is short for attribute::abc
child xyz is short for child::xyz
descendant
descendant-or-self // // is short for /descendant-or-self::node/
following
following-sibling
namespace
parent .. .. is short for parent::node
preceding
preceding-sibling
self . . is short for self::node


As an example of using the attribute axis in abbreviated syntax, //a/@href selects the attribute called href in a elements anywhere in the document tree.
The expression . (an abbreviation for self::node) is most commonly used within a predicate to refer to the currently selected node.
For example, h3[.='See also'] selects an element called h3 in the current context, whose text content is See also.

Node tests

Node tests may consist of specific node names or more general expressions. In the case of an XML document in which the namespace prefix gs has been defined, //gs:enquiry will find all the enquiry elements in that namespace, and //gs:* will find all elements, regardless of local name, in that namespace.

Other node test formats are:
comment :finds an XML comment node, e.g.
text :finds a node of type text, e.g. the hello in hello all
processing-instruction :finds XML processing instruction
Processing Instruction
A Processing Instruction is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application....

s such as . In this case, processing-instruction('php') would match.
node :finds any node at all.

Predicates

Predicates, written as expressions in square brackets, can be used to restrict a node-set to select only those nodes for which some condition is true. For example a[@href='help.php'] will select those a elements (among the children of the context node) having an href attribute whose value is help.php.

There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step (i.e. that of the immediately preceding node test) and do not alter that context. All predicates must be satisfied for a match to occur.

When the value of the predicate is numeric, it is interpreted as a test on the position of the node. So p[1] selects the first p element child, while p[last] selects the last.

In other cases, the value of the predicate is automatically converted to a boolean. When the predicate evaluates to a node-set, the result is true when the node-set is non-empty. Thus p[@x] selects those p elements that have an attribute named x.

A more complex example: the expression a[/html/@lang='en'][@href='help.php'][1]/@target selects the value of the target attribute of the first a element among the children of the context node that has its href attribute set to help.php, provided the document's html top-level element also has a lang attribute set to en. The reference to an attribute of the top-level element in the first predicate affects neither the context of other predicates nor that of the location step itself.

Predicate order is significant if predicates test the position of a node. Each predicate 'filters' a location step's selected node-set in turn. So a[1][@href='help.php'] will find a match only if the first a child of the context node satisfies the condition @href='help.php', while a[@href='help.php'][1] will find the first a child that satisfies this condition.

Functions and operators

XPath 1.0 defines four data types: node-sets (sets of nodes with no intrinsic order), strings, numbers and booleans.

The available operators are:
  • The "/", "//" and "[...]" operators, used in path expressions, as described above.
  • A union operator, "|", which forms the union of two node-sets.
  • Boolean operators "and" and "or", and a function "not"
  • Arithmetic operators "+", "-", "*", "div" (divide), and "mod"
  • Comparison operators "=", "!=", "<", ">", "<=", ">="


The function library includes:
  • Functions to manipulate strings: concat, substring, contains, substring-before, substring-after, translate, normalize-space, string-length
  • Functions to manipulate numbers: sum, round, floor, ceiling
  • Functions to get properties of nodes: name, local-name, namespace-uri
  • Functions to get information about the processing context: position, last
  • Type conversion functions: string, number, boolean


Some of the more commonly useful functions are detailed below. For a complete description, see the W3C Recommendation document

Node set functions

position :returns a number representing the position of this node in the sequence of nodes currently being processed (for example, the nodes selected by an xsl:for-each instruction in XSLT).
count(node-set) :returns the number of nodes in the node-set supplied as its argument.

String functions

string(object?) :converts any of the four XPath data types into a string according to built-in rules. If the value of the argument is a node-set, the function returns the string-value of the first node in document order, ignoring any further nodes.
concat(string, string, string*) :concatenates
Concatenation
In computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...

 two or more strings
starts-with(s1, s2) : returns true if s1 starts with s2
contains(s1, s2) :returns true if s1 contains s2
substring(string, start, length?) :example: substring("ABCDEF",2,3) returns "BCD".
substring-before(s1, s2) :example: substring-before("1999/04/01","/") returns 1999
substring-after(s1, s2) :example: substring-after("1999/04/01","/") returns 04/01
string-length(string?) :returns number of characters in string
normalize-space(string?) :all leading and trailing whitespace
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

 is removed and any sequences of whitespace characters are replaced by a single space. This is very useful when the original XML may have been prettyprint
Prettyprint
Prettyprint is the application of any of various stylistic formatting conventions to text, source code, markup, and other similar kinds of content. These formatting conventions usually consist of changes in positioning, spacing, color, contrast, size and similar modifications intended to make the...

 formatted, which could make further string processing unreliable.
substring(string,start,length) :returns a length characters long substring of the given string, starting at start (which begins with 1).

Boolean functions

not(boolean) :negates any boolean expression.

true :evaluates to true.

false :evaluates to false.

Number functions

sum(node-set) :converts the string values of all the nodes found by the XPath argument into numbers, according to the built-in casting rules, then returns the sum of these numbers.

Usage examples

Expressions can be created inside predicates using the operators: =, !=, <=, <, >= and >. Boolean expressions may be combined with brackets and the boolean operators and and or as well as the not function described above. Numeric calculations can use *, +, -, div and mod. Strings can consist of any Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 characters.

//item[@price > 2*@discount] selects items whose price attribute is greater than twice the numeric value of their discount attribute.

Entire node-sets can be combined ('unioned'
Union (set theory)
In set theory, the union of a collection of sets is the set of all distinct elements in the collection. The union of a collection of sets S_1, S_2, S_3, \dots , S_n\,\! gives a set S_1 \cup S_2 \cup S_3 \cup \dots \cup S_n.- Definition :...

) using the vertical bar character |. Node sets that meet one or more of several conditions can be found by combining the conditions inside a predicate with 'or'.

v[x or y] | w[z] will return a single node-set consisting of all the v elements that have x or y child-elements, as well as all the w elements that have z child-elements, that were found in the current context.

Examples

Given a sample XML document






en.wikipedia.org
de.wikipedia.org
fr.wikipedia.org
pl.wikipedia.org
es.wikipedia.org




en.wiktionary.org
fr.wiktionary.org
vi.wiktionary.org
tr.wiktionary.org
es.wiktionary.org






The XPath expression
/wikimedia/projects/project/@name
Selects name attributes for all projects, and
/wikimedia//editions
Selects all editions of all projects, and
/wikimedia/projects/project/editions/edition[@language="English"]/text
Selects addresses of all English Wikimedia projects (text of all edition elements where language attribute is equal
to English). And the following
/wikimedia/projects/project[@name="Wikipedia"]/editions/edition/text
Selects addresses of all Wikipedias (text of all edition elements that exist under project element with a name
attribute of Wikipedia)

Implementations

Command Line Tools
  • XMLStarlet
    XMLStarlet
    XMLStarlet is a command line XML utility which allows the modification and validation of XML documents.It is released under a MIT License.- Example Usage :An XML document can be validated against an XSD schema as follows: xml val -e -s my.xsd my.xml...



ActionScript
ActionScript
ActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...



C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

/C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

  • libxml2
  • Pathan
    Pathan
    Pathan may refer to a member of the:*Pashtun people; an ethnic group native to Pakistan and Afghanistan*Pathans of Punjab*Pathans of Rajasthan*Pathans of Uttar Pradesh*Pathans of Bihar*Pathans of Gujarat*Rohilla...

  • Sedna XML Database
    Sedna (database)
    Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...

  • VTD-XML
    VTD-XML
    Virtual Token Descriptor for eXtensible Markup Language refers to a collection of cross-platform XML processing technologies centered around a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor...



Delphi
Delphi
Delphi is both an archaeological site and a modern town in Greece on the south-western spur of Mount Parnassus in the valley of Phocis.In Greek mythology, Delphi was the site of the Delphic oracle, the most important oracle in the classical Greek world, and a major site for the worship of the god...



Implementations for Database Engines
  • OpenLink Virtuoso
    Virtuoso Universal Server
    Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional RDBMS, ORDBMS, virtual database, RDF, XML, free-text, web application server and file server functionality in a single system...


Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

  • Saxon XSLT
    Saxon XSLT
    Saxon is an XSLT and XQuery processor created by Michael Kay. There are open-source and also closed-source commercial versions. Versions exist for Java and .NET.The current version, as of December 2010, is 9.3.- Versions :...

     supports XPath 1.0 and XPath 2.0 (as well as XSLT 1.0, XSLT 2.0, and XQuery 1.0)
  • BaseX
    BaseX
    BaseX is a native and light-weight XML database management system, developed as a community project on GitHub. It is specialized in storing, querying, and visualizing large XML documents and collections...

     (also supports XPath 2.0 and XQuery)
  • VTD-XML
    VTD-XML
    Virtual Token Descriptor for eXtensible Markup Language refers to a collection of cross-platform XML processing technologies centered around a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor...

  • Sedna XML Database
    Sedna (database)
    Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...

     Both XML:DB and proprietary.


The Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...


package has been part of Java standard edition since Java 5. Technically this is an XPath API rather than an XPath implementation, and it allows the programmer the ability to select a specific implementation that conforms to the interface.
JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

  • JQuery
    JQuery
    jQuery is a cross-browser JavaScript library designed to simplify the client-side scripting of HTML. It was released in January 2006 at BarCamp NYC by John Resig...

     (Basic support)

.NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

  • In the System.Xml and System.Xml.XPath namespaces
  • Sedna XML Database
    Sedna (database)
    Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...


Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...


PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

  • Sedna XML Database
    Sedna (database)
    Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...


Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

  • libxml2
  • Amara
    Amara
    Amara, the sun beetles, are a large genus of carabid beetles, mostly holarctic, but a few species are neotropical or occurring in eastern Asia.These ground beetles are mostly black or bronze-coloured...

  • Sedna XML Database
    Sedna (database)
    Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...


Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

  • libxml2

ActionScript
ActionScript
ActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...


Scheme
  • Sedna XML Database
    Sedna (database)
    Sedna is an open source database management system that provides native storage for XML data.The distinctive design decisions employed in Sedna are schema-based clustering storage strategy for XML data and memory management based on layered address space.- Data Organization :Data organization in...


SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

  • MySQL
    MySQL
    MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

     supports a subset of XPath from version 5.1.5 onwards
  • PostgreSQL
    PostgreSQL
    PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

     supports XPath and XSLT from version 8.4 on

Use in schema languages

XPath is increasingly used to express constraints in schema languages for XML.
  • The (now ISO standard
    International Organization for Standardization
    The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

    ) schema language Schematron
    Schematron
    In markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees...

     pioneered the approach.
  • A streaming subset of XPath is used in W3C XML Schema 1.0 for expressing uniqueness and key constraints. In XSD 1.1, the use of XPath is extended to support conditional type assignment based on attribute values, and to allow arbitrary boolean assertions to be evaluated against the content of elements.
  • XForms
    XForms
    XForms is an XML format for the specification of a data processing model for XML data and user interface for the XML data, such as web forms...

     uses XPath to bind types to values.
  • The approach has even found use in non-XML applications, such as the constraint language for Java called PMD: the Java is converted to a DOM-like parse tree, then XPaths rules are defined over the tree.

See also

  • XPath 2.0
    XPath 2.0
    XPath 2.0 is the current version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007....

  • XML
    XML
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

  • XSL
    Extensible Stylesheet Language
    In computing, the term Extensible Stylesheet Language is used to refer to a family oflanguages used to transform and render XML documents....

    , XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

    , XSL-FO
  • XQuery
    XQuery
    - Features :XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents....

  • XLink
    XLink
    XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.-The XLink specification:...

    , XPointer
    XPointer
    XPointer is a system for addressing components of XML based internet media.XPointer is divided among four specifications: a "framework" which forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing...

  • XML Schema
  • Schematron
    Schematron
    In markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees...

  • Navigational database
    Navigational database
    A navigational database is a type of database characterized by the fact that objects in it are found primarily by following references from other objects...

  • XML database
    XML database
    An XML database is a data persistence software system that allows data to be stored in XML format. This data can then be queried, exported and serialized into the desired format.Two major classes of XML database exist:...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK