[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: External subset processing by browsers

  • From: "Andrew Welch" <andrew.j.welch@g...>
  • To: elharo@m...
  • Date: Mon, 8 Dec 2008 11:10:53 +0000

Re:  External subset processing by browsers
Hi Elliotte,

2008/12/5 Elliotte Rusty Harold <elharo@m...>:
> Firefox. There are two separate issues here:
>
> 1. Whether Firefox should read the external DTD subset.
> 2. How it should treat unrecognized entities when it doesn't read the
> external subset.
>
> Let me check the spec, but my recollection is that if the external DTD
> subset is not read, unrecognized entities are not a fatal error.

I have a similar issue, for example there are some RSS feeds which
contain entity references but no doctype:

<foo>foo &euro; bar</foo>

I was trying the handle them by supplying a LexicalHandler (to trap
and convert them to numeric refs), and setting a few Xerces features,
but it always throws an exception for it before the startEntity event.

Sample code (using Xerces 2.9.0):

public class Test extends XMLFilterImpl implements LexicalHandler {

    public static void main(String... args) throws Exception {
        new Test();
    }

    public Test() throws Exception {

        String xml = "<foo>foo &euro; bar</foo>";

        XMLReader xmlReader =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
        xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
this);
        xmlReader.setFeature("http://apache.org/xml/features/scanner/notify-char-refs",
true);
        xmlReader.setFeature("http://apache.org/xml/features/validation/unparsed-entity-checking",
false);
        xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities",
false);
        xmlReader.setEntityResolver(this);
        xmlReader.parse(new InputSource(new StringReader(xml)));
    }

    @Override
    public void startDocument() throws SAXException {
        super.startDocument();
    }

   public void startEntity(String name) throws SAXException {
       System.out.println("Start ent: " + name);
    }

    public void endEntity(String name) throws SAXException { }
    public void startCDATA() throws SAXException { }
    public void endCDATA() throws SAXException {  }
    public void startDTD(String name, String publicId, String
systemId) throws SAXException { }
    public void endDTD() throws SAXException { }
    public void comment(char[] ch, int start, int length) throws
SAXException { }
}

The output when running this is:

[Fatal Error] :1:16: The entity "euro" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException: The entity
"euro" was referenced, but not declared.
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at Test.<init>(Test.java:37)


It would be really nice to handle this non-well-formed input using XML
tools without resorting to a regex replace across every feed... I'm
not sure it's possible but the features make it seem like it should be
- any ideas?


thanks
-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.