|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [Shannon: information ~ uncertainty] Ramifications to XML
----- Original Message ----- From: "Roger L. Costello" <costello@m...> To: <xml-dev@l...> Sent: Monday, October 11, 2004 5:45 AM Subject: [Shannon: information ~ uncertainty] Ramifications to XML data exchange? .. > QUESTIONS > > 1. How does this aspect (information ~ uncertainty) of Shannon's work relate > to data exchange using XML? (I realize that this is a very broad question. > Its intent is to stimulate discussion on the application of Shannon's > information/uncertainty ideas to XML data exchange) > > 2. A schema is used to restrict the allowable forms that an instance > document may take. So doesn't a schema reduce information? My response is not meant to insult the intelligence and training of the members of this list, but to present the concepts in elementary fashion in what seemed to be the spirit of Roger's questions. If you ask me to tell you a number from 1 to 8, the 8 possible answers represent 3 bits of information. This is the information content of the response I send you. But how many bits do I need to send the information? This depends upon the agreed-upon encoding. If we exchange other kinds of messages, then some other bits might be needed to distinguish my answer to the 'pick a number from 1 to 8' question from the other things I could be telling you. If there were a total of 16 distinct questions I might be answering, another 4 bits would be the minimum needed to encode my choice. If you asked me the same question repeatedly and I didn't answer in order, your question might have some extra bits attached telling you which one in the sequence it was, and I might have to pass those back in the answer. Clearly, XML represents a large family of possible encodings of the information, but all the bits beyond the 3 containing the answer and whatever is necessary to specify the question corresponding to the response that I send you represent wasteful excess. The point is to use our communication bandwidth to send the minimun information necessary to accomplish the purpose of the communication, or in other words, to maximize the useful information content in the message that is sent. In general it is presumed we are sending many messages, so an out-of-band encoding description can be assumed to be amortized over those many messages (not transmitted each time). If I tell you that all my responses correspond to an XML Schema I've sent you in the past, or which you can look up somewhere, the encoding's excess bits may be reduced from some other XML form of exchange. Thus, I could send you the XML <q3>7</q3> indicating that my answer to the question of type 3 is 7. This is still a lot more than the 7 bits that is the minumum, but a lot fewer than <answer><question>Pick a number from 1 to 8.</question>The number from 1 to 8 that I picked is 7.</answer>. Given the appropriate level of need and regularity of communication, the schema-described XML could be transmitted over the wire in binary form using ASN.1 encoding or something else, getting a little closer to the minimum information content, though you and I would see the same text at our respective ends of the channel. Jeff
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








