[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [Shannon: information ~ uncertainty] Ramifications to XML
You mean <xs:element name="strikeNumber"> <xs:complexType> <xs:sequence> <xs:element name="digit" type="xs:integer" minOccurs="16' maxOccurs="16"/> </xs:sequence> </xs:complexType> </xs:element> has more info value than <xs:element name="strikeNumber"> <xs:complexType> <xs:sequence> <xs:element name="digit" type="xs:integer" minOccurs="8' maxOccurs="8"/> </xs:sequence> </xs:complexType> </xs:element> ? ----- Original Message ----- From: "Ed Lai" <data_mechanic@y...> To: "TAN Kuan Hui" <kuanhui@x...>; "Roger L. Costello" <costello@m...>; <xml-dev@l...> Sent: Wednesday, October 13, 2004 12:09 AM Subject: Re: [Shannon: information ~ uncertainty] Ramifications to XML data exchange? > The discussion on unequal probability is unnecessary > and only side-tracks the discusssion. > > Lots of possible choice is not information, knowing > which particular choice is taken is information. > > It is like the lotto ticket numbers, knowing that the > number you picked is just one of the many millions of > possibility is not much of information. Kowning the > actual winning number is information. > > The more the choices, the more information you get > when you know which choices. > > Without a schema, when you got the actual XML data, > you know which choices it is out of infinity, that is > a lot of information. > > With a schema that only allows 4 valid XML instance, > getting the XML data give you the knowledge of which > of the four, that is not a lot of information. > > So a schema does reduces the information a XML message > can carry. > > However that comes from knowing the schema, if you > know the schema, then there is less information in the > message, but this comes about because you already > knows a lot, you know the schema. The schema carries a > lot of information, so once you know the schema there > not much more you can know. > > Back to the lotto example, if you know the first 5 > balls, you already have a lot of information, the > lotto drawing would carry very little information. > Knowing the schema is like knowing the first 5 balls. > > So if someone knows the first 5 numbers of the next > lotto winning ticket, please tell me. > > Ed Lai > > --- TAN Kuan Hui <kuanhui@x...> wrote: > > > > EXAMPLE > > > > > > Imagine that a man is in prison and wants to send > > a message to his wife. > > > Suppose that the prison only allows one message to > > be sent, "I am fine". > > > Even if the person is deathly ill all he can > > > send is, "I am fine". Clearly there is no > > information in this message. > > > > > > Here the set of possible messages is one. There > > is no uncertainty and > > there > > > is no information. > > > > > > Suppose that the prison allows one of two messages > > to be sent, "I am fine" > > > or "I am ill". If the prisoner sends one of these > > messages then some > > > information will be passed to his wife. > > > > > > Here the set of possible messages is two. There > > is uncertainty (of which > > > message will be sent). When one of the two > > messages is selected by the > > > prisoner and sent to his wife some information is > > > passed. > > > > > > Suppose that the prison allows one of four > > messages to be sent: > > > > > > 1. I am healthy and happy > > > 2. I am healthy but not happy > > > 3. I am happy but not healthy > > > 4. I am not happy and not healthy > > > > > > If the person sends one of these messages then > > even more information will > > be > > > passed. > > > > > > Thus, the bigger the set of potential messages the > > more uncertainty. The > > > more uncertainty there is the more information > > there is. > > > > > > Interestingly, it doesn't matter what the messages > > are. All that matters > > is > > > the "number" of messages in the set. Thus, there > > is the same amount of > > > information in this set: > > > > > > > You are making the assumption that all possibilities > > occur with equal > > probability. This is equivalent to white noise. If > > the probability > > distribution of your data set is close to white > > noise, then we can > > conclude that there is no redundancy in your encoded > > data. This > > would be ideal. > > > > > > > {"I am fine", "I am ill"} > > > > > > as there is in this set: > > > > > > {A, B} > > > > > > SIDE NOTES > > > > > > a. Part of Shannon's goal was to measure the > > "amount" of information. > > > In the example above where there are two > > possible messages the amount > > > of information is 1 bit. In the example where > > there are four > > > possible messages the amount of information is > > 2 bits. > > > > > > b. Shannon refers to uncertainty as "entropy". > > Thus, the higher the > > > entropy (uncertainty) the higher the > > information. The lower the > > > entropy the lower the information. > > > > > > QUESTIONS > > > > > > 1. How does this aspect (information ~ > > uncertainty) of Shannon's work > > relate > > > to data exchange using XML? (I realize that this > > is a very broad > > question. > > > Its intent is to stimulate discussion on the > > application of Shannon's > > > information/uncertainty ideas to XML data > > exchange) > > > > > > 2. A schema is used to restrict the allowable > > forms that an instance > > > document may take. So doesn't a schema reduce > > information? > > > > > A schema will constrain the data into a conforming > > set of data but > > that does not mean that every possible combination > > is useful; > > some permutation probably never occur and cannot > > factor in > > your evaluation of the value of the information > > carried. > > In real data, a small set of data would occur with > > significantly higher > > probability than others; therefore you have to > > factor in the > > "probability distribution profile" (pdf) to make > > your discussion meaningful. > > > > Lets say, if you have an XML schema that constrain > > your data > > into 1 of 10 possible combinations with each one > > occurring with equal > > probability verus another with 100 possibilities but > > with 1 combination > > occurring 99% of the time, which data feed then has > > greater information > > value ? > > > > The key to your discussion, in my humble opinion, is > > in > > "information predictability" rather than > > "information uncertainty". > > > > rgds, > > Kuan Hui > > > > > > > > > > > > > > > ----------------------------------------------------------------- > > The xml-dev list is sponsored by XML.org > > <http://www.xml.org>, an > > initiative of OASIS <http://www.oasis-open.org> > > > > The list archives are at > > http://lists.xml.org/archives/xml-dev/ > > > > To subscribe or unsubscribe from this list use the > > subscription > > manager: > > <http://www.oasis-open.org/mlmanage/index.php> > > > > > > > ===== > Ed Lai > data_mechanic@y... > > > > _______________________________ > Do you Yahoo!? > Express yourself with Y! Messenger! Free. Download now. > http://messenger.yahoo.com > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> > > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|