[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: [Shannon: information ~ uncertainty] Ramifications to XML
It's worth noting that information, in Shannon and Weaver's definition, does not equal 'meaning'. That's why it seems 'counterintuitive'. "The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is, they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic messages are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages." The Mathematics of Communication - Weaver and Shannon, 1949 You could contrast this to the Boltzmann entropy where entropy is a measure of disorder and is related to processes that are 'irreversible' (not time-symmetric). This is related to 'addressability'. As the article cited below notes, this is an essential aspect of 'distinctness' when all possible states are equally probable, a condition Shannon asserts. While a process appears to be reversible in theory, in practice, it can be quite difficult or require so much precision as to be impossible. Explosions are given as examples. The process is reversible as long as particles don't interact. Entropy in this view is related to the number of possible states of an isolated system. As systems interact, the number of states becomes logarithmic. Are all states equally probable given the initial states? If not, is that what 'meaning' means, and is it a fundamental requirement for time? So is the point that schemas increase the value of information (make it meaningful) by reducing the number of potential states regardless of the length (Cantorian madness in two steps)? Keep in mind that apriori information is 'meaningful' and that in a temporal system, the schema is apriori. http://www.mathpages.com/home/kmath552/kmath552.htm And you may also be a few steps from the 'why SGML is a better design than XML' thread where SGML provides more entropy-reducing constraints and XML is designed to cope with human laziness by enabling more errors/interpretations/implementations of the same information thus increasing the information by reducing the meaningfulness. The puzzle you have to solve is that this is done to enable local meanings instead of relying on a global definition. len From: Roger L. Costello [mailto:costello@m...] Hi Folks, I am trying to get an understanding of Claude Shannon's work on information theory. Below I describe one small part of Shannon's work. I would like to hear your thoughts on its ramifications to information exchange using XML. INFORMATION Shannon defines information as follows: Information is proportional to uncertainty. High uncertainty equates to a high amount of information. Low uncertainty equates to a low amount of information. More specifically, Shannon talks about a set of possible data. A set comprised of 10 possible choices of data has less information than a set comprised of a hundred possible choices. This may seem rather counterintuitive, but bear with me as I give an example. In a book I am reading[1] the author gives an example which provides a nice intuition of Shannon's statement that information is proportional to uncertainty. EXAMPLE Imagine that a man is in prison and wants to send a message to his wife. Suppose that the prison only allows one message to be sent, "I am fine". Even if the person is deathly ill all he can send is, "I am fine". Clearly there is no information in this message. Here the set of possible messages is one. There is no uncertainty and there is no information. Suppose that the prison allows one of two messages to be sent, "I am fine" or "I am ill". If the prisoner sends one of these messages then some information will be passed to his wife. Here the set of possible messages is two. There is uncertainty (of which message will be sent). When one of the two messages is selected by the prisoner and sent to his wife some information is passed. Suppose that the prison allows one of four messages to be sent: 1. I am healthy and happy 2. I am healthy but not happy 3. I am happy but not healthy 4. I am not happy and not healthy If the person sends one of these messages then even more information will be passed. Thus, the bigger the set of potential messages the more uncertainty. The more uncertainty there is the more information there is. Interestingly, it doesn't matter what the messages are. All that matters is the "number" of messages in the set. Thus, there is the same amount of information in this set: {"I am fine", "I am ill"} as there is in this set: {A, B} SIDE NOTES a. Part of Shannon's goal was to measure the "amount" of information. In the example above where there are two possible messages the amount of information is 1 bit. In the example where there are four possible messages the amount of information is 2 bits. b. Shannon refers to uncertainty as "entropy". Thus, the higher the entropy (uncertainty) the higher the information. The lower the entropy the lower the information. QUESTIONS 1. How does this aspect (information ~ uncertainty) of Shannon's work relate to data exchange using XML? (I realize that this is a very broad question. Its intent is to stimulate discussion on the application of Shannon's information/uncertainty ideas to XML data exchange) 2. A schema is used to restrict the allowable forms that an instance document may take. So doesn't a schema reduce information?
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|