[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [ Revised ] 15 elementary truths about XML
On 01/11/2011 13:40, Costello, Roger L. wrote: > Hi Folks, > > Thank you John, Bjoern, Peter, Michael, Andrew, Michael, and Toby for your excellent feedback. > > Based on your feedback, I revised the statements. Do you agree with the current formulation of each statement? /Roger > > 1. An XML document is a sequence of zeros and ones called bits. This need not be true at all. > > 2. A byte consists of 8 bits. this is (as usually interpreted) true, but irrelevant to XML > > 3. Thus, the content an XML document is a sequence of bytes. No, the XML spec says: A parsed entity contains text, a sequence of characters, ... character is an atomic unit of text as specified by ISO/IEC 10646:200 nowhere does it say that an xml document is a sequence of bytes. > > 4. Here is an example of a byte: 00110001 That is a binary number, but isn't an elementary truth about XML. > > 5. That byte may be interpreted in various ways by software applications. For example, it may be interpreted as: > > - corresponding to an integer in base two. > In base 10 it represents the integer 49. > > - corresponding to a character. > In the ASCII character encoding scheme it > represents the character 1. This is true but isn't an elementary truth about XML, just a tangentially related fact. > > 6. XML processors always interpret the bytes in XML documents as characters. Documents consist of characters not bytes, the xml processor may or may not, depending of the encoding, treat the bytes in the encoding of an entity as characters. > > 7. Thus, XML processors interpret the content of XML documents as a sequence of characters. that is how the xml spec defines documents, as a sequence of entities each of which is a sequence of characters, which is an atomic unit as defined by Unicode and ISO/IEC 10646. > > 8. There are various character encoding schemes, such as ASCII and UTF-8. Some character encoding schemes require more than one byte to encode a character. > yes > 9. An XML processor may identify the character encoding scheme used by an XML document either by its encoding attribute in the XML declaration or by some out-of-band means. Yes or No, depending how you parse that. The document identifies to the XML processor its encoding by the means you specify. > 10. An XML processor is software that reads the bytes in an XML document and makes them available to XML applications. It may read characters that are not composed of bytes. An XML document consists of characters not bytes. Characters may be encoded by whatever means. An XML processor must be able to decode at least utf8 and utf16 (which do encode each character as a sequence of bytes) > 11. An XML application is software that processes the output of an XML processor. Metaphorically, an XML application is a layer of software on top of an XML processor. > > 12. An XML Schema validator is an XML application. > > 13. XML applications may interpret the bytes in XML documents differently than how an XML processor interprets the bytes. XML applications do not interpret the bytes at all as they are not reported by the XML processor. > 14. For example, consider the XML Schema that declares an element A with a Boolean data type: > > <element name="A" type="boolean" /> > > Suppose the value of<A> is the byte 00110001. > The element declaration informs the XML Schema validator > and the XML Schema validator interprets the byte as the > Boolean value "true." that's an example of something, not an elementary truth (just commenting on your numbering, not the actual example) However to comment on teh example, the value (by which I assume you mean content) of an element is never a byte, but a sequence of characters. > 15. Thus, an XML processor interprets the byte 00110001 as representing the character 1 whereas an XML Schema validator interprets the same byte as representing the Boolean value "true." > No, the processor sees the unicode character U+0031 and may make of that what it wishes. David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|