[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

OpenOffice.org DOCTYPE declaration

  • To: xml-dev@l...
  • Subject: OpenOffice.org DOCTYPE declaration
  • From: Ed Davies <edavies@n...>
  • Date: Fri, 30 Apr 2004 16:58:22 +0100

office.dtd
As most people on this list will know, OpenOffice.org documents
are stored as XML within a ZIP format file.  The main file 
within the ZIP is called content.xml and starts with:

  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE office:document-content 
      PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN"
      "office.dtd">

(line breaks added for mailability).

The "office.dtd" system identifier here is a relative URI but, 
because the DTD is not in the document ZIP and is probably not 
even in the same directory as the document it is awfully 
confusing to any XML processor.

A while ago I processed some OOo document contents using Saxon 
by the expedient of hand editing the small number of files to
delete the doctype declarations - which felt dreadfully wrong.

Now I want to read the contents of some other files using JDom 
which exceptions when it can't resolve the DTD - the document 
is coming from an InputStream from the Java zip library so 
doesn't have a base URI.

Solutions considered include filtering the DOCTYPE declaration
out of the file or doing a custom EntityResolver.  I tried with 
a toy EntityResolver but that didn't seem to get called, maybe 
that's an issue with my program or JDom though - the O'Reilly 
book OpenOffice.org XML Essentials on the xml.openoffice.org 
site says to use this technique with a programmatic call to 
invoke an XSLT transformation.

My questions:

1. Is having a system id which doesn't actually refer to a DTD 
   a sign of faulty XML (i.e., not valid or not well formed)?  

2. Is this true even if the public identifier is OK?

3. What is the best way to deal with this case in a program
   using a SAX reader?

4. What is the best way to deal with it when using a standalone 
   XML tool like an XSLT program?

5. Would it help if OOo included standalone="no" in the XML 
   declaration?  (If the processor isn't validating and knows 
   the document is standalone then presumably it doesn't have 
   any reason to read the DTD?)

Ed Davies


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.