|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: RE: Pushing SAX events out onto the Web?
Hi, Mike said: Most of you probably write a lot more code than I do these days, so forgive this brief pontification: I'm not sure about Roger Costello's situation, but there may be a general "antipattern" that assumes a priori that the overhead of XML text parsing is a significant bottleneck and goes to great lengths to preserve the parsed representation somehow. Of course, you'll just have to profile your own applications to figure this out, but I know I have wasted months trying to figure out how to avoid the "inefficiency" of parsing XML data multiple times as it flowed between different modules of a system. Subsequent analysis showed the overhead of parsing the XML to be roughly similar to the overhead of converting binary data structures back and forth, and sending the text around greatly simplified the architecture. Didier replies: I agree with your conclusion and let me share some of the experience I gained in Didier's labs. You know, when you integrate systems together you have to decide if this is done A) through function calls (i.e. integration through function/procedures) B) Through messaging (i.e. integration through data) What is funny is that both RPCs (case A) or XML documents (case B) have from the processing standpoint about the same overhead. This is because if you use RPCs you have to somehow use a marshaling protocol and this implies that the system has to encode, package and decode the procedures/function calls. This is not time or CPU free since for each function/procedure call you have to perform this overhead operation. In the case when you are using an XML document or more particularly if you choose to integrate through data, then, in that case, you have the document parsing overhead. The funny thing is that anyway, you have some processing overhead. In Didier's lab I have found that the rule of thumb is to think at the global level. To look at the entire system and not at one of its particular element. For instance, at the global level, if you pick the integration through function/procedure, then you may end up with a lot of interaction between the two systems. This could be translated in an overall processing time taking a lot more time when you count network latencies. In general, both from the theoretical and practical point of view, it is better to limit the interactions between the systems since accessing memory and local processing is a lot faster then transmitting data through the network. The image I am using in my mind to remember this is to see a processing unit waiting for something to do. I can clearly see the processor having what it needs a lot faster from the local memory than from a remote distance especially when the data has to travel lands that sometime resemble a lord of the ring adventure. Conclusion: It is better to use local processing and minimize data transmission either as data or as function/procedure call. However I never made any experiment involving huge and very huge XML documents. As a good (or fair) mathematician, I know that the whole system may behave differently in that case and that we may not necessarily infer that the behavior will be linearly proportional to what we got with smaller documents. All my experiments involved XML documents with a maximum size of 1Mbyte on a DSL, T1, Fast Ethernet (100Mbits/s) and cable broadband line. Cheers Didier PH Martin
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








