[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: practical question re: Java/XML handling

  • From: Uche Ogbuji <uche@ogbuji.net>
  • To: Mike Sokolov <sokolov@ifactory.com>
  • Date: Thu, 3 Sep 2009 08:13:59 -0600

Re:  practical question re: Java/XML handling
On Thu, Sep 3, 2009 at 7:26 AM, Mike Sokolov <sokolov@ifactory.com> wrote:
After all the discussion about "What is data?" I don't know if this list is the place to discuss actual details of implementation, but please feel free to send me elsewhere if you can think of a better venue.

For my part, I find it refreshing a place where one can discuss such fundamental matters as well as the lineaments of running code.  I think you'll find in the archives plenty of discussion of code, and plenty of code-free discussion alike.

 
I have a need to handle XML that references a non-existent DTD.  The DTD is irrelevant to the actual processing of the XML, and isn't available anywhere, but it is declared in in the DOCTYPE.  I'm sure many of you have encountered this situation: it's practically the norm, in my experience.

After years of dealing with this inherently unsatisfactory situation in a variety of ways, I came up with a new one that I am liking at the moment, which is to insert a Stream into a Java XML processing stack that strips out the prolog of the XML document before handing it off to a parser.  This has the nice property that it doesn't require modifications to the stored XML files.  It loses PIs and comments and the XML decl, but I can live with that.

Expat allows you to specify a standalone flag, which in effect expunges all external parameter entity declarations (and other such external resources incompatible with standalone="yes").  This certainly skates the edges of XML spec compliance, but I think it's legit, because I see it as an implicit transform.  Anyway, your Java tools might have the equivalent.  FWIW, I know that Jython 2.5 includes Expat wrapped for the core XMl libs, so that might be an option.

In Amara 2.x we expose this flag very conveniently.  You can do:

import amara
doc = amara.parse(myxml, standalone=True) #flag uses boolean values, not strings

And it will in effect ignore those pesky parameter entitiy decls, including declarations of external subset.

The rest of your post is Java-specific, so I'll snip and run like hell :)


--
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
Linked-in profile: http://www.linkedin.com/in/ucheogbuji
Articles: http://uche.ogbuji.net/tech/publications/
Friendfeed: http://friendfeed.com/uche
Twitter: http://twitter.com/uogbuji
Join me at Balisage:
* http://www.balisage.net/


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.