[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] OpenOffice.org DOCTYPE declaration
As most people on this list will know, OpenOffice.org documents are stored as XML within a ZIP format file. The main file within the ZIP is called content.xml and starts with: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE office:document-content PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "office.dtd"> (line breaks added for mailability). The "office.dtd" system identifier here is a relative URI but, because the DTD is not in the document ZIP and is probably not even in the same directory as the document it is awfully confusing to any XML processor. A while ago I processed some OOo document contents using Saxon by the expedient of hand editing the small number of files to delete the doctype declarations - which felt dreadfully wrong. Now I want to read the contents of some other files using JDom which exceptions when it can't resolve the DTD - the document is coming from an InputStream from the Java zip library so doesn't have a base URI. Solutions considered include filtering the DOCTYPE declaration out of the file or doing a custom EntityResolver. I tried with a toy EntityResolver but that didn't seem to get called, maybe that's an issue with my program or JDom though - the O'Reilly book OpenOffice.org XML Essentials on the xml.openoffice.org site says to use this technique with a programmatic call to invoke an XSLT transformation. My questions: 1. Is having a system id which doesn't actually refer to a DTD a sign of faulty XML (i.e., not valid or not well formed)? 2. Is this true even if the public identifier is OK? 3. What is the best way to deal with this case in a program using a SAX reader? 4. What is the best way to deal with it when using a standalone XML tool like an XSLT program? 5. Would it help if OOo included standalone="no" in the XML declaration? (If the processor isn't validating and knows the document is standalone then presumably it doesn't have any reason to read the DTD?) Ed Davies
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|