XML Performance in Client-Server Interactions
Hi Folks, I am interested in knowing the state-of-the-art practice for enhancing the performance of XML-based client-server interactions. Let us consider the process of a client sending XML to a server. Below I identify 3 "parts" to this process: Part 1: Client prepares the XML Part 2: Transmittal of the XML Part 3: Server processes the XML Now let us consider each part in turn, with the goal of determining the state-of-the-art practice for enhancing the performance of each part. Part 1: Client prepares the XML At some point the client decides to compose and prepare XML for transmittal to the server. Compose the XML The method employed to compose XML is highly variable. For example, XML could be composed from a Java program, or from a database query. I will restrict this investigation just to considering XML composition from a database query. The time required to compose XML from a database query will vary depending on which database is used: Oracle, SQL Server, MySQL, native XML versus relational, etc. Question: has anyone done a study comparing the time required to compose XML by the different databases? Prepare the XML Oftentimes the client will choose to validate the XML prior to transmittal. Validating XML could potentially take a significant amount of time. The time required will vary depending upon these factors: - Validation language: which language you use (DTD, XML Schemas, RelaxNG, Schematron, OASIS CAM) can determine how long the validation will take. - Parser: which parser you use (e.g., Apache Xerces, XML Spy, etc) can also impact the time required to validate. Question: has anyone done a study comparing validation times across validation languages and validation times across parsers? Part 2: Transmittal of the XML There is a delay between the moment the client sends the XML to the moment the server receives the XML. Assertion: the dominating factor in determining the length of the delay is the size of the XML. Small XML chunks gets from client to server quicker than large XML chunks. What are the options for reducing the delay? I am aware of 4 techniques: 1. Compression 2. Binary encoding 3. Streaming 4. Minimize markup Technique 1: Compression There are numerous XML compression tools. I will list 2 such tools here: - XMill - Bzip Technique 2: Binary encoding The W3C has a XML Binary Characterization (XBC) Working Group that is actively working to define a standard binary encoding for XML. I believe that the fruits of their labor will not be useable for several years. Technique 3: Streaming The idea of both HTML streaming as well as XML streaming is to break up into small chunks the data to be transmitted and then successively transmit one chunk at a time. The SAX event-based model is a form of streaming. Question: is it viable to use SAX in a client-server interaction? For example, if you are transmitting a SOAP message would it be reasonable to stream the SOAP? Is there such a thing as "SOAP Streaming"? Question: is the streaming technique viable for Web Services? Technique 4: Minimize markup Assertion: XML tags are the source cause for the increase in size of the XML data. In recognition of this, one solution is to design your XML to minimize the number of tags used. One approach for doing this is to maximize the use of attributes. Question: is the "attribute heavy" approach an effective approach for reducing delay? Is it a good approach? Question: all 4 techniques above attempt to reduce the delay via reducing the "size" of the data. Are there other things that can be done to the data that would reduce the delay? Part 3: Server processes the XML The server has now received the XML. The server may choose to validate it. In Part 1 above we discussed the impact on time due to validation. After validating the server "processes" the XML. Clearly, what it means to "process" XML is highly variable. I shall restrict the discussion just to storing the XML into a database. This is the mirror of that considered in Part 1, where we were interested in the time required to construct XML from a database query. The same issues arise: what database is being used? Is the database a native XML database or a relational database? Question: has anyone done a performance analysis of storing XML into a database? Summary Above I discussed the delays introduced when a client sends XML to a server. Below is a summary of all the delays: database ---> XML ---> validate ---> transmit ---> validate ---> database T1 T2 T3 T2 T4 The time for all the delays are: T1 + 2 * T2 + T3 + T4 Have I missed any steps/delays? /Roger  Obviously there are many factors other than the size of the data which affect the delay, such as network problems. Those are problems that the client has no control over. I am focused on the delays due to the information itself (which the client does have control over).  Whereas elements have a start-tag/end-tag pair, attributes don't have the concept of an "end attribute tag". Thus, by using attributes you can effectively reduce by half the amount of markup.
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format