[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Problem parsing XML file with Xerces-J
My 2 questions are unanswered: 1) Which method is faster - implementing EntityResolver or pre-editing the XML file. This consideration is important for me. My program is pooling a remote process very frequently(every 15 seconds) and fetching the XML documents. If these 2 methods have some performance difference, then response time of my program will be slow in one of the case. So I must select whichever is fast. I am sure somebody has the answer.. 2) Is there a way to "not create" a redundant resource (like x.dtd below) public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId) { InputSource is = new InputSource(); is.setSystemId("file:///C:/x.dtd"); return is; } Somehow this does'nt look good to me. I'll be happy if something like "" or null can work ;) One other question is: If I don't override the resolveEntity method (and not implement EntityResolver) , and let my Java program fetch the DTD from remote location, how can I set some property in my program to increase the timeout. The error I am getting is "connection timed out". The DTD "does exists" at the remote server. (I can explore this option also). Another wierd thought is: My program has to fetch the DTD from the remote location (i.e. in the non-EntityResolver way to solve the problem). My program is doing these steps: a)DocumentBuilderFactoryImpl factory = new DocumentBuilderFactoryImpl(); b)DocumentBuilder builder = factory.newDocumentBuilder(); c)Document document = builder.parse(new InputSource(new StringReader(rsp))); So at line c) the parser will parse the XML (and will also fetch the remote DTD). It will be using HTTP transport for fetching DTD (the timeout error also indicates reference to java.net package). My PC is behind a proxy server. I have a proxy server userid and password to access internet. So the "parser HTTP hook" must have this proxy server "userid & password" available to it to connect to internet. Is there a way to provide a userid and password like this..? Another way to ask this question is: How will the "parser HTTP hook" behave, if it finds the HTTP connection behind a proxy server (which requires a userid/password for authentication) ? My other thoughts are.. Presently the XML I am fetching does not contain any external entity references. Everything can be resolved in the XML document itself. And I don't need to perform any validation. But at future date, I may need to resolve references from the DTD. The best solution for this I think is having a local copy of the DTD and override the resolveEntity method as usual and pointing the Entity Resolver to the local file. I think this is best.. I am fetching XML from a real world service provider. They may change the XML structure in future (and may possibly store entity definitions in the DTD, which the parser must resolve). Please let me know your thoughts .. Best regards, On Apr 1, 2005 1:40 PM, Michael Kay <mike@s...> wrote: > I'm glad you've got it working. Looks good. > > Michael Kay > http://www.saxonica.com/ > > > -----Original Message----- > > From: Midsummer Sun [mailto:midsummer.sun@g...] > > Sent: 01 April 2005 08:35 > > To: Michael Kay > > Cc: xml-dev@l... > > Subject: Re: Problem parsing XML file with Xerces-J > > > > > I think pre-editing of response XML (i.e. stripping DTD > > declration) is > > > more better "for me". For my requirement, DTD in the XML is > > useless to > > > me. Implementing EntityResolver imposes significant performance > > > overhead to my program. The parser is always pooling for callback > > > events.. So I think pre-editing by a simple string method is far > > > efficient.. > > > > I amend my above observation slightly.. > > > > My program is doing: > > DocumentBuilderFactoryImpl factory = new DocumentBuilderFactoryImpl(); > > DocumentBuilder builder = factory.newDocumentBuilder(); > > Document document = builder.parse(new InputSource(new > > StringReader(rsp))); > > > > So I am using a DOM parser! But a DOM parser underneath is probably > > using a SAX handler (to implement a DOM). i.e. a SAX handler is > > despatching events to the DOM parser, as it is reading the XML > > document. And DOM implementation is constructing a DOM object by > > "assembling input from SAX implementation". I read this in a nice > > article somewhere. > > > > My class implements EntityResolver interface, and calls > > builder.setEntityResolver(obj); i.e. it registers the class object > > itself(obj) as a handler for EntityResolver. This is probably a very > > lightweight reference within JVM, and is nothing expensive worth > > worrying about.. > > > > So the DOM parser starts to parse the document. If it encounter a DTD > > reference it will call resolveEntity method. It will probably call > > this method after a full DOM tree is constructed (so that all entity > > references can be resolved). The calling of resolveEntity method will > > only be one time. So there I no expensive processing going on, as I > > thought before ;) > > > > Please do correct me if I am wrong. > > > > If the resource consumption by implementing EntityResolver is same as > > the pre-editing solution(or there is a very marginal difference), I'll > > prefer implementing the EntityResolver interface! It could be a USP in > > my application! > > > > I am eagerly waiting for your opinion. > > > > Best regards, > > > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|