[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Saxon and Sun Serializer problems?
Hi Ken comments below. -----Original Message----- From: G. Ken Holman [mailto:gkholman@CraneSoftwrights.com] Sent: Saturday, May 30, 2009 4:26 PM To: xml-dev@lists.xml.org Subject: RE: Saxon and Sun Serializer problems? Thank you for engaging me on these details, Jim. At 2009-05-30 15:10 -0700, Jim Tivy wrote: >Hi Ken > >I read what you said below. The jist seems to be: > >Why would you want to do this? I'm sorry I didn't make myself clear. My jist was: what feature(s) does having the DOCTYPE give you? Which is different. There are so many other reasons why an XML document cannot be round-tripped through XSLT that just providing the DOCTYPE feature won't solve. I cited the lack of preservation of CDATA sections, the lack of preservation of the entity references (which includes numeric character references (not even resolved by a DOCTYPE), internal parsed general entities, external parsed general entities), and there are others including no link to NOTATION declarations for processing instruction target de-referencing (a very sore point of mine that the designers of XML processing interfaces have never felt the need to support). [<JT>] When using Xml as a content format users never see the syntax of the underlying Xml. So all the things you mentioned above are not a problem except dropping the DocType. If you are using the DocType to indicate the validation rules of the document, then dropping that DocType means you can no longer validate the document. As well, you may use the schema information in the DTD in context sensitive help in an XML Editor to show the next sibling or child element that can be inserted. Numeric character references are not "dropped" they are converted into their equivalent form according to the encoding. If entities are inlined in the parsing process in the are not "lost", rather they are inlined. CDATA are characters so do not need to be "lost" - just the fact that they were treated in a special CDATA section. So given so many features for round-tripping that are not there, just putting in the DOCTYPE won't fix any of the ones I've cited. [<JT>] How many of these are fully lossy and how many have a logical equivalent. How many are we trying to discourage for fully interoperable Xml. My point is DocType limited to Name, PublicId and SystemId is an important thing to round trip - sax does it. >I should point out the "this" had to do with using SAX in java with the jaxp >Identity Transform. However, I now extend it more tentatively to include >the "no DocType in the XDM" problem. Yes, I saw that. I was trying to figure out what it was about the DOCTYPE that you would get when you can't get other things left out of the infoset or XDM. >To give you some context of what I am doing - my need is primarily pragmatic >- I am a java programmer trying to get from A to B. Fine ... I won't hold that against you. :{)} >In an Xml content management system users use a variety of Xml processors >(or programs if you would prefer) like diverse Xml Editors - XMetal, Epic, >XmlMind and the content management systems that have file Store and Retrieve >capabilities as well as link extract and other Xml processing needs. All of >these parts "process" Xml. Actually, they process XML syntax, they don't process the information in an XML document. XSLT and XQuery were designed to build new structures from the information in structured sources. They were not designed to process the syntax of an XML document. XML editors, in particular, are designed to process the syntax of an XML document, and as we old (er, long-time) SGML'ers learned long ago you can't base an XML editor on an XML processor in the same way you can't base an SGML editor on an SGML processor. [<JT>] I am not sure I agree XML editors process the syntax of Xml serializations. Many XML editors operate on DOMs. Now the DOM *does* have a few features that process some (not all!) of the syntax of an XML document, but the perspective is different. In the DOM the input tree *is* the output tree, unlike XSLT and XQuery where the input tree is read-only and the output tree is write-only: created, from scratch, in a single pass, without backtrack or repair or inspection. [<JT>] Without backtrack is a bit unclear - since most XSLT processors are based on DOM. >All of these parts rely on the DocType for >validation or element insertion help or just need it to "round trip" the Xml >so other processors can use that DocType. Without the DocType, the >serialization looses some serious part of its capability. Well now you've lost me again, because the limited number of serialization features in XSLT/XQuery renders the information found in the DOCTYPE quite irrelevant. The XSLT feature of adding a SYSTEM identifier is there as I see it really only for the validation bit. Because what is serialized is the information that was used to build the result tree ... not the syntax borrowed from the source tree. [<JT>] Why does this feature exist in XSLT if DocTypes are irrelevant as you suggested in your first question above. >Most of these parts operate on the serialization of the Xml from time to >time. Editors read serializations, users import serializations - >serializations are the standard way of exchanging and making xml processors >interoperable. Ummmmm .... I can't agree for anything other than XML editors which are XML syntax applications not XML information applications. [<JT>] By syntax I assume you mean "exact serialized form syntax". XML Editors do not have to be "syntax" based applications - they operate on DOM many times (XMetal). XML-based applications are interoperable because the XML processors all deliver the same content information to the applications using them. And the decision by designers of DOM to include syntax related issues (note again, not all syntax related issues) can enable many aspects of input syntax preservation because the DOM is acting *on* the document. XSLT and XQuery are not acting *on* the document, they are acting on the information found in the document. >Not being able to use powerful tools like XSLT and Sax to process Xml when >"round" tripping of the serialization is required, is restrictive to say the >least, as these technologies have their own strengths - eg: DOM is not XSLT >is not SAX. > >Fortunately SAX is usuable on Java - just make sure to use the Sun's Trax >serializer which keeps the docType as the saxon one drops the docType. (see >earlier post). > >Does this begin to motivate the reason why? I hear what you are trying to say, and I had already interpreted the need for syntax preservation to be to round trip the syntax of an XML document, but I haven't yet heard a justification for adding the DOCTYPE to XDM. Adding the DOCTYPE to XDM doesn't give you round-tripping of an arbitrary XML document because so much more would be needed. And all of it would be out of scope for XSLT/XQuery. [<JT>] I am not saying syntax should be preserved. I am saying that information items should not be "dropped" or lost especially when it is not replaced by some other "logical" equivalent. And DocType is an information item that has a purpose in its own right and it should not be dropped. Unlike character references which are converted into their equivalent underlying character. This comes up often in the classroom from students who thought XSLT and XQuery could/should be used for XML document syntax preservation. Because XSLT and XQuery are node-tree-transformation tools and not XML syntax tools, they cannot be used for syntax preservation. XSLT and XQuery are not angle-bracket processors, they are node-tree processors. Serialization is not needed when the processor is embedded in, say, an XSL-FO engine. Serialization is a nice-to-have that allows one to create artefacts that can be useful as input to other XML-based tools. [<JT>] My focus is on the idea of progress. Perhaps in the name of progress we should not use DocTypes and DTDs but instead use Xml Schema to store our validation information since the schema location will not be lost as it is an attribute in the XDM. This will not happen since many people agree DTDs are here to stay. Then, perhaps we should make the DocType with its public and systemIds accessible in the XDM and thus accessible in the input document. <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "/SysSchema/dita/topic.dtd"> Consider source tree data projection: if I do an XSLT or XQuery transformation on a source node tree created from a non-XML source, what is the definition of the DOCTYPE? More to the point, what information might there have been put into a DOCTYPE in the interpretation of the projection to be useful in the node-tree transformation? I claim there is no such information. [<JT>] If I serialize it I have validation rules available for the next Xml Processor. And I haven't found such information in the response that you've given. Thank you again for trying to help me better understand what you need. I really am trying to be supportive here to reveal what specific features of DOCTYPE you will find helpful. . . . . . . . . . . . . . . Ken -- XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08 Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18 Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18 G. Ken Holman mailto:gkholman@CraneSoftwrights.com Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/x/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|