[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Couldn't illegal XML characters be used simply by escaping them?
Hi Folks, This week I was in a discussion and the topic of illegal XML characters came up and someone asked: "Couldn't illegal XML characters simply be escaped?" Here is my response. Is it correct? Complete? Easy to understand? We need to distinguish between a reserved XML character versus an illegal character. The '<' symbol is a reserved XML character. If data contains that symbol it will confuse an XML Parser because the Parser will think, "Oh, a new element is being started." For example, consider this: <Equation>if A < B then ...</Equation> That '<' symbol needs to be escaped. We can escape it using the built in < entity or the decimal or the hexadecimal value of the symbol. Let's do the latter: <Equation>if A < B then ...</Equation> Now the XML Parser is not confused into thinking that the XML is trying to start a new element. Note that the XML Parser does resolve the character entity reference and the output of the Parser is this: <Equation>if A < B then ...</Equation> We've made it past the Parser, so that '<' symbol no longer a problem. An important thing to note is that the '<' symbol is (obviously) a legal character. The XML 1.0 specification lists those characters that may be used in an XML document (see below for a partial list). So some characters cannot be used in XML documents. For example, hex 0 (null) is not a legal XML character. [Person I was talking to] your suggestion is to escape illegal characters like so: <Test> Here is a null character: �</Test> What will an XML Parser do with that character entity reference? It will resolve it (let (null) represent the null character): <Test> Here is a null character: (null)</Test> But now the output of the XML Parser is an XML document that contains an illegal character. Thus an error is thrown. Recap: reserved characters may be used where they ordinarily would cause confusion by escaping them. But illegal characters may never be used and escaping them does not help. /Roger Decimal value of US-ASCII character | Is an XML character? ------------------------------------------ 1 | No 2 | No 3 | No 4 | No 5 | No 6 | No 7 | No 8 | No 9 | Yes 10 | Yes 11 | No 12 | No 13 | Yes 14 | No 15 | No 16 | No 17 | No 18 | No 19 | No 20 | No 21 | No 22 | No 23 | No 24 | No 25 | No 26 | No 27 | No 28 | No 29 | No 30 | No 31 | No 32-127 | Yes
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|