Stylus Studio XML Editor

Table of contents

Appendices

3.6 Strings

Strings

Strings consist of a sequence of zero or more characters, where a character is defined as in the XML Recommendation XML. A single character in XPath thus corresponds to a single Unicode abstract character with a single corresponding Unicode scalar value (see UNICODE); this is not the same thing as a 16-bit Unicode code value: the Unicode coded character representation for an abstract character with Unicode scalar value greater that U+FFFF is a pair of 16-bit Unicode code values (a surrogate pair). In many programming languages, a string is represented by a sequence of 16-bit Unicode code values; implementations of XPath in such languages must take care to ensure that a surrogate pair is correctly treated as a single XPath character.

NOTE: 

It is possible in Unicode for there to be two strings that should be treated as identical even though they consist of the distinct sequences of Unicode abstract characters. For example, some accented characters may be represented in either a precomposed or decomposed form. Therefore, XPath expressions may return unexpected results unless both the characters in the XPath expression and in the XML document have been normalized into a canonical form. See CHARMOD.