Subject: Correcting unbound namespace prefixes
From: Tony Nassar <tnassar@xxxxxxxxxxxx>
Date: Mon, 2 Aug 2010 09:08:43 -0700
|
I'm not sure this is the correct place to post. This may be a question about
JAXP, or simply about good standard operating procedure for bad input data.
I've got some XML that I know is invalid, but I'm not in a position to get the
customer to fix it. Here's what it looks like:
<document>
<text>Four score and twenty years ago..,</text>
<pp:metadata publication-date="2010-07-31T12:30:00Z" />
...
You get the idea (I hope): clearly someone began with XML in the "" namespace,
extracted metadata in a post-processing step, and inserted the corresponding
markup without adding the necessary namespace declarations or mapping "pp" to
one. I don't know of a way to fix this through the JAXP API (i.e.
interpolating the prefix mapping). Or am I better off just preprocessing this
XML via Perl or Python before it's ever parsed?
Tony Nassar Ph.D.
Palantir Technologies | Forward Deployed Engineer
|