Appendices

1 Introduction

Introduction

XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations XSLT and XPointer XPTR. The primary purpose of XPath is to address parts of an XML XML document. In support of this primary purpose, it also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.

In addition to its use for addressing, XPath is also designed so that it has a natural subset that can be used for matching (testing whether or not a node matches a pattern); this use of XPath is described in [XSLT].

XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes and text nodes. XPath defines a way to compute a [string-value] for each type of node. Some types of nodes also have names. XPath fully supports XML Namespaces XMLNAMES. Thus, the name of a node is modeled as a pair consisting of a local part and a possibly null namespace URI; this is called an [expanded-name] . The data model is described in detail in Data Model.

The primary syntactic construct in XPath is the expression. An expression matches the production [Expr] . An expression is evaluated to yield an object, which has one of the following four basic types:

node-set (an unordered collection of nodes without duplicates)

boolean (true or false)

number (a floating-point number)

string (a sequence of UCS characters)

Expression evaluation occurs with respect to a context. XSLT and XPointer specify how the context is determined for XPath expressions used in XSLT and XPointer respectively. The context consists of:

a node (the context node)

a pair of non-zero positive integers (the context position and the context size)

a set of variable bindings

a function library

the set of namespace declarations in scope for the expression

The context position is always less than or equal to the context size.

The variable bindings consist of a mapping from variable names to variable values. The value of a variable is an object, which can be of any of the types that are possible for the value of an expression, and may also be of additional types not specified here.

The function library consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result. This document defines a core function library that all XPath implementations must support (see Core Function Library). For a function in the core function library, arguments and result are of the four basic types. Both XSLT and XPointer extend XPath by defining additional functions; some of these functions operate on the four basic types; others operate on additional data types defined by XSLT and XPointer.

The namespace declarations consist of a mapping from prefixes to namespace URIs.

The variable bindings, function library and namespace declarations used to evaluate a subexpression are always the same as those used to evaluate the containing expression. The context node, context position, and context size used to evaluate a subexpression are sometimes different from those used to evaluate the containing expression. Several kinds of expressions change the context node; only predicates change the context position and context size (see Predicates). When the evaluation of a kind of expression is described, it will always be explicitly stated if the context node, context position, and context size change for the evaluation of subexpressions; if nothing is said about the context node, context position, and context size, they remain unchanged for the evaluation of subexpressions of that kind of expression.

XPath expressions often occur in XML attributes. The grammar specified in this section applies to the attribute value after XML 1.0 normalization. So, for example, if the grammar uses the character <, this must not appear in the XML source as < but must be quoted according to XML 1.0 rules by, for example, entering it as <. Within expressions, literal strings are delimited by single or double quotation marks, which are also used to delimit XML attributes. To avoid a quotation mark in an expression being interpreted by the XML processor as terminating the attribute value the quotation mark can be entered as a character reference (" or '). Alternatively, the expression can use single quotation marks if the XML attribute is delimited with double quotation marks or vice-versa.

One important kind of expression is a location path. A location path selects a set of nodes relative to the context node. The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path. Location paths can recursively contain expressions that are used to filter sets of nodes. A location path matches the production [LocationPath] .

In the following grammar, the non-terminals QName and NCName are defined in XMLNAMES, and S is defined in XML. The grammar uses the same EBNF notation as XML (except that grammar symbols always have initial capital letters).

Expressions are parsed by first dividing the character string to be parsed into tokens and then parsing the resulting sequence of tokens. Whitespace can be freely used between tokens. The tokenization process is described in Lexical Structure.

[Next Chapter] [Home]

Table of contents

Appendices

1 Introduction