[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Why does validation fail with a named ENTITY for carriage returnand line
Hi Folks, I want to specify the format of a "From:" field for email messages. The requirement is: 1. It starts with the literal "From: 2. Then there are one or more characters, a - z 3. Then the @ symbol 4. Then there are one or more characters, a - z 5. Then there is a carriage return (decimal 13) followed by a line feed (decimal 10) A regular expression in the XML Schema pattern facet is well-suited for expressing that requirement: <xs:element name="from"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="From:[a-z]+@[a-z\.]+ "/> </xs:restriction> </xs:simpleType> </xs:element> Great. Here is a sample instance document: <from>From:jdoe@machine.example </from> That validates beautifully against the XML Schema. Now, many email fields must end with CRLF so I declared an XML ENTITY that I can reuse: <!ENTITY CRLF " "> I then changed the pattern facet to reference the named ENTITY: <xs:element name="from"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="From:[a-z]+@[a-z\.]+&CRLF;"/> </xs:restriction> </xs:simpleType> </xs:element> When I validate the above instance document I get this error: The content "From:jdoe@machine.example\r\n" of element <from> does not match the required simple type. Value "From:jdoe@machine.example\r\n" contravenes the pattern facet "From:[a-z]+@[a-z\.]+ " of the type of element <from>. Huh? What's going on? Why does the instance document validate when the character entities are explicitly provided in the pattern facet, but the instance document fails validation when a named ENTITY is used in the pattern facet? The problem is not with the XML Schema validator. The problem is at a lower level. The problem is with the XML Parser. Look again at the pattern facet: <xs:pattern value="From:[a-z]+@[a-z\.]+&CRLF;"/> Ignore the fact that it is XML Schema stuff. It is XML. We have an element <xs:pattern> and it has one attribute, value, which has this value: From:[a-z]+@[a-z\.]+&CRLF; What does an XML parser do to attribute values? Answer: it normalizes attribute values. (http://www.w3.org/TR/REC-xml/#AVNormalize) The XML normalization algorithm says this: For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity. Okay, let's replace &CRLF; with its replacement text: <xs:pattern value="From:[a-z]+@[a-z\.]+ "/> The normalization algorithm then says: For a white space character (#32, #13, #10, #9), append a space character (#32) to the normalized value. Okay, that yields: <xs:pattern value="From:[a-z]+@[a-z\.]+ "/> Note the two spaces at the end of the regular expression. So normalization of this: <xs:pattern value="From:[a-z]+@[a-z\.]+&CRLF;"/> produces this: <xs:pattern value="From:[a-z]+@[a-z\.]+ "/> Hold on! Why doesn't this: <xs:pattern value="From:[a-z]+@[a-z\.]+ "/> also normalize to this: <xs:pattern value="From:[a-z]+@[a-z\.]+ "/> I'm confused. Why does validation fail with named ENTITIES and succeed with character entities? /Roger
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|