[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [Shannon: information ~ uncertainty] Ramifications to XML


shannon lai
You mean

<xs:element name="strikeNumber">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="digit" type="xs:integer" minOccurs="16'
maxOccurs="16"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

has more info value than

<xs:element name="strikeNumber">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="digit" type="xs:integer" minOccurs="8' maxOccurs="8"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

?

----- Original Message ----- 
From: "Ed Lai" <data_mechanic@y...>
To: "TAN Kuan Hui" <kuanhui@x...>; "Roger L. Costello"
<costello@m...>; <xml-dev@l...>
Sent: Wednesday, October 13, 2004 12:09 AM
Subject: Re:  [Shannon: information ~ uncertainty] Ramifications to
XML data exchange?


> The discussion on unequal probability is unnecessary
> and only side-tracks the discusssion.
>
> Lots of possible choice is not information, knowing
> which particular choice is taken is information.
>
> It is like the lotto ticket numbers, knowing that the
> number you picked is just one of the many millions of
> possibility is not much of information. Kowning the
> actual winning number is information.
>
> The more the choices, the more information you get
> when you know which choices.
>
> Without a schema, when you got the actual XML data,
> you know which choices it is out of infinity, that is
> a lot of information.
>
> With a schema that only allows 4 valid XML instance,
> getting the XML data give you the knowledge of which
> of the four, that is not a lot of information.
>
> So a schema does reduces the information a XML message
> can carry.
>
> However that comes from knowing the schema, if you
> know the schema, then there is less information in the
> message, but this comes about because you already
> knows a lot, you know the schema. The schema carries a
> lot of information, so once you know the schema there
> not much more you can know.
>
> Back to the lotto example, if you know the first 5
> balls, you already have a lot of information, the
> lotto drawing would carry very little information.
> Knowing the schema is like knowing the first 5 balls.
>
> So if someone knows the first 5 numbers of the next
> lotto winning ticket, please tell me.
>
> Ed Lai
>
> --- TAN Kuan Hui <kuanhui@x...> wrote:
>
> > > EXAMPLE
> > >
> > > Imagine that a man is in prison and wants to send
> > a message to his wife.
> > > Suppose that the prison only allows one message to
> > be sent, "I am fine".
> > > Even if the person is deathly ill all he can
> > > send is, "I am fine".  Clearly there is no
> > information in this message.
> > >
> > > Here the set of possible messages is one.  There
> > is no uncertainty and
> > there
> > > is no information.
> > >
> > > Suppose that the prison allows one of two messages
> > to be sent, "I am fine"
> > > or "I am ill".  If the prisoner sends one of these
> > messages then some
> > > information will be passed to his wife.
> > >
> > > Here the set of possible messages is two.  There
> > is uncertainty (of which
> > > message will be sent).  When one of the two
> > messages is selected by the
> > > prisoner and sent to his wife some information is
> > > passed.
> > >
> > > Suppose that the prison allows one of four
> > messages to be sent:
> > >
> > > 1. I am healthy and happy
> > > 2. I am healthy but not happy
> > > 3. I am happy but not healthy
> > > 4. I am not happy and not healthy
> > >
> > > If the person sends one of these messages then
> > even more information will
> > be
> > > passed.
> > >
> > > Thus, the bigger the set of potential messages the
> > more uncertainty. The
> > > more uncertainty there is the more information
> > there is.
> > >
> > > Interestingly, it doesn't matter what the messages
> > are.  All that matters
> > is
> > > the "number" of messages in the set.  Thus, there
> > is the same amount of
> > > information in this set:
> > >
> >
> > You are making the assumption that all possibilities
> > occur with equal
> > probability. This is equivalent to white noise. If
> > the probability
> > distribution of your data set is close to white
> > noise, then we can
> > conclude that there is no redundancy in your encoded
> > data. This
> > would be ideal.
> >
> >
> > >    {"I am fine", "I am ill"}
> > >
> > > as there is in this set:
> > >
> > >    {A, B}
> > >
> > > SIDE NOTES
> > >
> > > a. Part of Shannon's goal was to measure the
> > "amount" of information.
> > >    In the example above where there are two
> > possible messages the amount
> > >    of information is 1 bit.  In the example where
> > there are four
> > >    possible messages the amount of information is
> > 2 bits.
> > >
> > > b. Shannon refers to uncertainty as "entropy".
> > Thus, the higher the
> > >    entropy (uncertainty) the higher the
> > information.  The lower the
> > >    entropy the lower the information.
> > >
> > > QUESTIONS
> > >
> > > 1. How does this aspect (information ~
> > uncertainty) of Shannon's work
> > relate
> > > to data exchange using XML?  (I realize that this
> > is a very broad
> > question.
> > > Its intent is to stimulate discussion on the
> > application of Shannon's
> > > information/uncertainty ideas to XML data
> > exchange)
> > >
> > > 2. A schema is used to restrict the allowable
> > forms that an instance
> > > document may take.  So doesn't a schema reduce
> > information?
> > >
> > A schema will constrain the data into a conforming
> > set of data but
> > that does not mean that every possible combination
> > is useful;
> > some permutation probably never occur and cannot
> > factor in
> > your evaluation of the value of the information
> > carried.
> > In real data, a small set of data would occur with
> > significantly higher
> > probability than others; therefore you have to
> > factor in the
> > "probability distribution profile" (pdf) to make
> > your discussion meaningful.
> >
> > Lets say, if you have an XML schema that constrain
> > your data
> > into 1 of 10 possible combinations with each one
> > occurring with equal
> > probability verus another with 100 possibilities but
> > with 1 combination
> > occurring 99% of the time, which data feed then has
> > greater information
> > value ?
> >
> > The key to your discussion, in my humble opinion, is
> > in
> > "information predictability" rather than
> > "information uncertainty".
> >
> > rgds,
> > Kuan Hui
> >
> >
> >
> >
> >
> >
> >
> -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org
> > <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at
> > http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this list use the
> > subscription
> > manager:
> > <http://www.oasis-open.org/mlmanage/index.php>
> >
> >
>
>
> =====
> Ed Lai
> data_mechanic@y...
>
>
>
> _______________________________
> Do you Yahoo!?
> Express yourself with Y! Messenger! Free. Download now.
> http://messenger.yahoo.com
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>
>


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.