[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: [Shannon: information ~ uncertainty] Ramifications to XML
Correct. Discussions of Markov models are appropriate. Note that the systems are seldom isolates and that entropy requires an arrow of time. http://mathworld.wolfram.com/MarkovChain.html Analysis of frequency of letters is applied to text categorization, and other pattern-based analysis used for prediction. Imagine a tool that scans texts and based on this analysis, creates a schematic description of the frequencies of occurrence of some set of categorical types. Would that output be close or equivalent to a DTD or Schema? Is a DTD/Schema a pattern generated by a learning/negotiation process? In the following, I will summarize from http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html The three principle applications of Markov modeling are: o Evaluation model: Discovering the probability of an observable sequence (apply the forward algorithm) o Decoding model: Discovering the sequence of hidden states that create the observable state (apply the Viterbi algorithm) o Learning model: Given an observable state, discovering the hidden states (apply the forward/backward algorithm) In fact, the majority of texts we exchange are not random and all choices in the Shannon sense are not equally probable. They are 'meaningful'. Understanding how texts acquire the property of meaning infers one understands how multiple systems, even ones where within each system some choices are equally probable (non-deterministic) and some are not (relative determinism) when interacting reduce or increase entropy. Determinism varies system by system. The arrow of time does not in and of itself produce steady increases in entropy. Only thermodynamically isolated systems fit that model. A system interoperating with other systems and exchanging energy changes that outcome. A Markov model assumes we can predict a future state based on past states. In the system we think of as the WWW, the URI = Energy. XML can be used to control the states of messages exchanged as identified by URIs. You could assign probabilities to the URIs identifying transitions among states (first order Markov systems and so on). If you have M states and the probabilities do not vary in time, you have M squared transitions. This can be modeled as a transition matrix. This is a Markov process model triple: o States: all possible states of system o Initial Vector: initial probability of each state at time zero o Transition Matrix: probability of state given previous state Note well: this is a discrete system (not continuous so no infinities to worry about: discrete time steps, discrete state values, probabilities do not vary in time). That is obviously not a model of the world as we observe it. System behavior may be determined by hidden processes that create the observable states and the probabilities are modeled using a hidden Markov process that relates the observable states. The 'confusion matrix' is the first-order hidden Markov processes that set probabilities that then determine the observable states. The Hidden Markov model is a modeled as a triple (RDFers Unite!): o Vector of initial state probabilities o State transition matrix o Confusion matrix Again, the probabilities do no vary in time and that is an unrealistic assumption, and a weakness of Markov models. len From: TAN Kuan Hui [mailto:kuanhui@x...] You are making the assumption that all possibilities occur with equal probability. This is equivalent to white noise. If the probability distribution of your data set is close to white noise, then we can conclude that there is no redundancy in your encoded data. This would be ideal. A schema will constrain the data into a conforming set of data but that does not mean that every possible combination is useful; some permutation probably never occur and cannot factor in your evaluation of the value of the information carried. In real data, a small set of data would occur with significantly higher probability than others; therefore you have to factor in the "probability distribution profile" (pdf) to make your discussion meaningful. Lets say, if you have an XML schema that constrain your data into 1 of 10 possible combinations with each one occurring with equal probability verus another with 100 possibilities but with 1 combination occurring 99% of the time, which data feed then has greater information value ? The key to your discussion, in my humble opinion, is in "information predictability" rather than "information uncertainty".
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|