[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [ Revised ] 15 elementary truths about XML

  • From: David Carlisle <davidc@nag.co.uk>
  • To: "Costello, Roger L." <costello@mitre.org>
  • Date: Tue, 01 Nov 2011 14:01:57 +0000

Re:  [ Revised ] 15 elementary truths about XML
On 01/11/2011 13:40, Costello, Roger L. wrote:
> Hi Folks,
>
> Thank you John, Bjoern, Peter, Michael, Andrew, Michael, and Toby for your excellent feedback.
>
> Based on your feedback, I revised the statements.  Do you agree with the current formulation of each statement?  /Roger
>
> 1. An XML document is a sequence of zeros and ones called bits.

This need not be true at all.
>
> 2. A byte consists of 8 bits.
this is (as usually interpreted) true, but irrelevant to XML
>
> 3. Thus, the content an XML document is a sequence of bytes.
No, the XML spec says:
A parsed entity contains text, a sequence of characters, ...
character is an atomic unit of text as specified by ISO/IEC 10646:200
nowhere does it say that an xml document is a sequence of bytes.

>
> 4. Here is an example of a byte: 00110001

That is a binary number, but isn't an elementary truth about XML.

>
> 5. That byte may be interpreted in various ways by software applications. For example, it may be interpreted as:
>
>      - corresponding to an integer in base two.
>        In base 10 it represents the integer 49.
>
>      - corresponding to a character.
>        In the ASCII character encoding scheme it
>       represents the character 1.

This is true but isn't an elementary truth about XML, just a 
tangentially related fact.
>
> 6. XML processors always interpret the bytes in XML documents as characters.

Documents consist of characters not bytes, the xml processor may or may 
not, depending of the encoding, treat the bytes in the encoding of an 
entity as characters.
>
> 7. Thus, XML processors interpret the content of XML documents as a sequence of characters.
that is how the xml spec defines documents, as a sequence of entities 
each of which is a sequence of characters, which is an atomic unit as 
defined by Unicode and ISO/IEC 10646.
>
> 8. There are various character encoding schemes, such as ASCII and UTF-8. Some character encoding schemes require more than one byte to encode a character.
>
yes
> 9. An XML processor may identify the character encoding scheme used by an XML document either by its encoding attribute in the XML declaration or by some out-of-band means.

Yes or No, depending how you parse that. The document identifies to the 
XML processor its encoding by the means you specify.

> 10. An XML processor is software that reads the bytes in an XML document and makes them available to XML applications.
It may read characters that are not composed of bytes.

An XML document consists of characters not bytes. Characters may be 
encoded by whatever means. An XML processor must be able to decode at 
least utf8 and utf16 (which do encode each character as a sequence of bytes)


> 11. An XML application is software that processes the output of an XML processor. Metaphorically, an XML application is a layer of software on top of an XML processor.
>
> 12. An XML Schema validator is an XML application.
>
> 13. XML applications may interpret the bytes in XML documents differently than how an XML processor interprets the bytes.

XML applications do not interpret the bytes at all as they are not 
reported by the XML processor.

> 14. For example, consider the XML Schema that declares an element A with a Boolean data type:
>
>      <element name="A" type="boolean" />
>
>      Suppose the value of<A>  is the byte 00110001.
>      The element declaration informs the XML Schema validator
>      and the XML Schema validator interprets the byte as the
>      Boolean value "true."
that's an example of something, not an elementary truth (just commenting 
on your numbering, not the actual example) However to comment on teh 
example, the value (by which I assume you mean content) of an element is 
never a byte, but a sequence of characters.

> 15. Thus, an XML processor interprets the byte 00110001 as representing the character 1 whereas an XML Schema validator interprets the same byte  as representing the Boolean value "true."
>
No, the processor sees the unicode character U+0031 and may make of that 
what it wishes.


David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.