[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Parsing efficiency? - why not 'compile'????
Tahir Hashmi wrote: > Robin Berjon wrote: >>It would be horrible. Quite simply horrible. But then, it would never have taken >>off so we wouldn't be discussing it. > > Let me modify Karl's assumption a little: > > Let's assume we /now have/ a binary XML specification [snip], > everything basically the same, just binary streaming format, but > same Infoset, same APIs /as/ for reporting XML content. > > And again ask these questions: > > What would be the difference? For the programmer? For the platforms? (note that your question is a bit flawed as we already have standard specifications for binary infosets.) You basically have two groups of people: - those that don't need it. For them, it'll make no difference. They wouldn't use it. This is not the WXS type of technology that dribbles its way through many others. - those that do need it. These folks will be able to use XML where they couldn't before. And when I say XML, I mean AngleBracketedUnicode. Conversion to binary will only happen in the steps where it is needed so that most of what those people will see will be actual XML. > Extreme optimization based on the knowledge of Schema might be > unattractive because: > > # Interpreting involved binary constructs could be more difficult: > > Consider the variable length symbols that I have used in Xqueeze[1] > (as also Dennis Sosnoski in XMLS, IIRC). The symbols are easy to > understand - unsigned integers serialized as octets in Big-endian > order, with the least significant bit of each octet acting as a > continuation flag. However, parsing them requires a loop that runs > as many times as there are octets in the symbol to read one. Each > iteration involves one comparison (check if LSb is 1), > multiplication (promotion of the previous octet by 8 bits) and > addition (value of the current octet). It's not difficult to see the > computation involved in arriving at "Wed Jan 3rd 2003, 14:00 GMT" > from a variable length integer that counts the number of seconds > since the Epoch[2]. Errr... I really am not sure what you mean, notably by "involved binary constructs". I think you can distinguish between two situations: a) the application wants a date, in which case seconds since the Epoch or a time_t struct might be exactly what it wants, it'll be cheaper than strptime(3) for sure; b) the application wants a string containing a date in which case you're free to store dates as strings in your binary format. > # Forced validation: > > The above situation would be even more ironic if the application > didn't care about the actual value of the date and was only > interested in some string that looked like a date. With XML > validation of data types is an option that is being enforced as a > requirement in the above scheme. Even where validation is required, > how far can a parser validate? A value may be syntactically or > semantically acceptable but contextually invalid (lame e.g. - a date > of birth being in the future). My point: validation is and should > remain an option. This is completely orthogonal to the subject. > # Tight coupling between schema revisions: > > XML is quite resilient to changes in the schema as long as the > changes are done smartly enough to allow old documents to pass > validation through the new schema. This flexibility would be > restricted the greater is the dependence of the binary encoding on > the schema. (I still have to reach XML's level of compatibility in > Xqueeze Associations (data dictionary). Fortunately, achieving that > wouldn't require changes in the grammar of the encoding). This is a solved problem in BinXML, multiple versions of the same schema can co-exist. > # What is gained in the end? > > With schema-based compaction done in all the aggressiveness > possible, how much would be gained against a simple markup > binarization scheme? Perhaps a compaction factor of, say, 5 over > XML. Would this be really significant when compared to a factor of, > say, 4 compaction achieved by markup binarization? This is an > optimization issue - the smaller the binary scheme, the more > computation required to extract information out of it. I'm not > totally against a type-aware encoding but for a standard binary > encoding to evolve, it would have to be in a "sweet spot" on the > size vs. computation vs. generality plane. I'm all for finding a sweet spot but pulling random numbers out of a hat and making broad assumptions about size vs computation won't contribute much in getting there. I am talking about empirically proven, tested, retested, put to work in a wide variety of situations, factors of 10, 20 or 50 (or more, but testing on SOAP is cheating ;). As for your remark on the speed of decompaction, note that you may be right for a naive implementation of the same thing but there's compsci literature out there on making such tasks fast. -- Robin Berjon <robin.berjon@e...> Research Engineer, Expway http://expway.fr/ 7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|