|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Validation vs performance - was Re: Fast text output from SAX?
On Apr 16, 2004, at 2:30 PM, Elliotte Rusty Harold wrote: > . I have seen any number of binary formats that achieve speed gains > precisely by doing this. And it is my contention that if this is > disallowed (as I think it should be) much, perhaps all, of the speed > advantages of these binary formats disappears. > Well, there is an immense amount of truth in this, but I have to take issue with the "as I think it should be" aside. For example, there are AFAIK plenty of enterprise systems out there that do a billion transactions a day during peak times. Even on big honking hardware that doesn't allow many cycles per transaction for data validation if you have to do it more than 10,000 times per second. As best I understand it, people get this kind of performance in an enterprise environment by various methods, including a) doing the business-rule validation and data cleansing earlier in the pipeline, b) trusting the overall business process to have produced valid data at crunch time; and c) auditing the results so that if somebody tries to exploit this trust, sooner or later they will be caught. The same basic approaches are available in "XML" environments, e.g. validating and optimizing the data early in the pipeline, and using efficiently formatted and trusted data for downstream processing. AFAIK essentially everyone using XML in a performance-critical environment (such as a DBMS or an enterprise messaging system) does something along these lines, including a couple of mega-corporations who do not see the value of *standardizing* the efficient XML formats. <duck> Echoes of the great RSS well-formedness debate: the choice isn't between unquestioningly accepting whatever data you are given and doing draconian checking at every single step in the pipeline, it's a question of how to setup the pipeline to detect corrupt data early on and do what it takes to get it fixed or rejected, and then efficiently process the data in those parts of the pipeline where speed is critical. Sometimes XML syntax level validation against a DTD or schema is useful as part of this, sometimes not. Sometimes double and triple checking of data validity against business rules by procedural code makes good business sense, sometimes not. Sometimes you can get away with throwing the data back at the originator to fix, and sometimes you gotta fix it yourself. I cringe at the "Right Thing vs the Cowboy Way" characterizations at various points in this these threads. There are a lot of ways to set up a business process or transformation/aggregation pipeline to get both scalability and validity, and recommendations "disallowing" particular approaches at one step by global fiat are certain to be ignored. It would be nice to get these threads turned into a discussion of best practices that people see in real life to find the optimal tradeoffs between desirable but somewhat incompatible properties such as loose coupling and high performance ... and away from discussion of alleged universal principles that should be promoted or disallowed.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








