[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Handling internal general entities with SAX
At 9:29 AM -0700 2001-10-22, David Brownell wrote: > At 8:37 PM -0700 2001-10-21, Devlin, Kurt wrote: [deleted...] >> The reason for this is that we are taking our XML to several >> different output formats and each will want to handle some >> entities differently. > >The normal way to do that involves each output stream having >different entity declarations. That means each must have a >different DTD, either with different external subsets or with >conditional sections or (most simply) like > > <!DOCTYPE my-app-rootnode > SYSTEM http://www.example.com/dtds/my-app.dtd > [ > <!ENTITY test "[this is a test]"> > ]> > >Alternatively, some folk have adopted "no DTD" policies for >the data they interchange, and then paste their own DTDs >(with entity declarations) in front of files. It's easy enough to >splice one Reader (or InputStream) in front of another, using >an InputStream. > >- Dave > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Kurt, Since you're at WestGroup, I suspect that you might be working with a pretty large set of documents, and you might find Dave's suggestion of using conditional sections worthwhile. Here's an example of conditional sections that I set up this past summer for either (a) producing HTML files or (b) loading an OODB system. The "load-oodb" entity is set to INCLUDE when documents are being loaded into an OODB that back-ends some web servers. The "make-html" entity is set to INCLUDE when generating plain HTML files to go onto CD-ROM. <!-- NOTE! Activate either load-oodb or make-html by setting it to "INCLUDE". Set the other one to "IGNORE". The "obj-article", "obj-chapter", and "obj-page" entities identify the doctypes/object types in the Versant OODB. --> <!ENTITY % load-oodb "IGNORE" > <!ENTITY % make-html "INCLUDE" > <!-- for loading the OODB --> <![ %load-oodb; [ <!ENTITY servlet "/servlet/handler?id=" > <!ENTITY obj-article "&obj=Article" > <!ENTITY obj-chapter "&obj=Chapter" > <!ENTITY obj-page "&obj=Page" > <!ENTITY main_nav SYSTEM "main_nav-servlet.inc" > ]]> <!-- for making HTML files --> <![ %make-html; [ <!ENTITY servlet "" > <!ENTITY obj-article ".html" > <!ENTITY obj-chapter ".html" > <!ENTITY obj-page ".html" > <!ENTITY main_nav SYSTEM "main_nav-html.inc" > ]]> There is a single list of entities for of the 1,200 or so web pages that currently exist or can be generated, one entity per page, in a single collection of declarations like this: <!ENTITY link-ab_partners '&servlet;ab_partners&obj-page;' > <!ENTITY link-askus '&pods-servlet;askus&obj-page;' > <!ENTITY link-contact_us '&pods-servlet;contact_us&obj-page;' > et cetera, and throughout our XML the links are encoded like this: learn about <LINK ref="&link-ab_partners;">our partners</LINK>. which, because the big list of entity declarations containing link-ab_partners has already been read, gets replaced with &servlet;ab_partners&obj-page; which is then replaced with one of these: /servlet/handler?id=ab_partners?obj=Page ab_partners.html depending on whether which of the two initial entities (load-oodb or make-html) has been set to INCLUDE. In a similar fashion, I've set up the use of entities throughout the (current) 1,600+ documents for over 200 images, several hundred "body" inclusions, and about 3,650 links, and both Xerces and SP exhibit virtually no performance hit compared to processing the equivalent set of documents with all of the entities already fully expanded. The system could easily be extended to support SVG, CGM, EPS, Flash, or some other graphic format for specialized CD-ROMs or print output, or some other linking mechanism for proprietary systems like DynaText or Folio. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - At 11:45 AM -0700 2001-10-22, Devlin, Kurt wrote: > >Yes, I realize that I want to "break" the XML rules, but I feel >like my intentions are good. > >We definitely fall into the "no DTD" group for our data >exchange. I had considered chaining an InputStream in before the >Reader to "import" the entity declarations. This handles the >case for all of the known entities, but not for unknown ones. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Kurt, Danger, Will Robinson -- unknown monster coming toward us! I'm assuming that you are going to be processing documents such as statutes or case law, not data-centric XML such as purchase orders. Without DTDs you will almost certainly end up building all sorts of custom "validation" into your software. It rarely turns out nicely. The code is usually developed in and maintained with fixes, updates, and patches accumulating as new variations in the input documents are discovered. It's better to analyze the documents and develop your own DTDs for them if you can't get DTDs from the people who care creating the documents. As well as being used for XML validation, the DTDs can act as the documentation of your understanding of the allowable structures in the documents instead of burying that understanding in your programming. When you get some new variation that isn't valid, a validating parser makes it clear as to what the variation is, thus making the updating of your software much cleaner and safer. /s/ Ernest G. Allen Sunnyvale, CA, USA
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|