[OFFLIST] A new approach to XML Validation
I hope all is will with you. We are busy as usual over here... I'm sending this to you offlist as I want to get my thoughts in shape before I go public with this on xml-dev. As you know, in a previous life I worked in the development of financial trading systems in London. What you probably don't know is that I worked on called "technical analysis" of historical price and volume movements in financial markets. Many traders swear by technical analysis as it helps them develop an intiution - a "feel" for how a market is likely to move. Simply put technical analysis is a mathematical analysis of historical data used to generate visual/numerical guides to future market movements. Some traders claim to use technical analysis exclusively - others combine it with analysis of market fundamentals - e.g. economic data. Most of our clients in those days used both. Anyway, I thought it would be interesting to apply the same technical analysis ideas from the Financial Markets to XML validation. In the XML world, developers learn the hard way to develop a "feel" for documents and learn to spot those that are likely to troublesome to process. This intuition is orthoganal to the analysis achievable with grammer or rule based validation. Documents can parse beautifully yet be very difficult to program and visa versa. I've written a couple of XSLT programs (using some Java extensions) that generate some numbers based on a technical analysis inspired treatment of XML instance data. The embedded formulae are based on smoothing the element/pcdata structure of an XML instance into a Fourier series. Elements are used to generate Sine waves with attributes used to calculate modulation. PCDATA is used to generate Cosine waves with moving averages over character count used to calculate the modulation. The two are then combined into an infinite series, summed to generate some numbers. I've created a basic syntax for IBVL and I've run some IBVL schemas against some XML collections - Jon Bosak's Shakespeare, an ebXML test suite and some RSS feeds. The current algorithms are designed to generate a single number [0..<10]. I've generated a bunch of these from the above XML documents. The number 1.618 has occured a number of times on documents that I would consider "programmer friendly" and the number 4.669 on documents that are dogs to write software for. I need to chase down the significance (if any) of these numbers. It could be a problem in the XSLT extension functions. I think of these algorithms as a new form of XML validation. We have grammer based validation (DTD, XSD, RNG) Rule based validation (Schematron), Example based validation (Examplatron) and now Intuition based validation (IBVL) with my stuff. Anyway, I just wanted to let you know what I'm up to and prepare you for the day I ask your help on some of this! Your knowledge of the territory could help me avoid making Foolish mistakes in the IBVL algorithms. Will be in touch. Sean
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format