[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Announcement: SAX 1.0gamma

  • From: James Clark <jjc@j...>
  • To: David Megginson <ak117@f...>
  • Date: Sun, 03 May 1998 23:41:36 +0700

bom sax
David Megginson wrote:
> 
> James Clark writes:

>  > It should be specified whether a byte order mark at the beginning
>  > of a XML byte stream is included as part of the character stream.
>  > I don't think it should be since the byte-order mark isn't included
>  > the XML document production, and the XML spec says explicitly that
>  > the byte order mark "is an encoding signature, not part of either
>  > the markup or the character data of the XML document".
> 
> My first hunch is the opposite: the XML productions deal with
> characters, not bytes.  When I provide a raw byte stream
> (java.io.InputStream), I'm requiring the XML parser to take on two
> logical tasks:
> 
> 1) convert the bytes to characters
> 
> 2) apply the XML productions to the characters
> 
> You have already mentioned that, unlike many XML parsers (including
> AElfred), XP does not perform these as independent, serial steps;
> conceptually, however, the tasks are still distinct.  The BOM is part
> of the raw byte stream, but not part of the character stream.
> 
> I think that it also simplifies Java implementation if the parser can
> behave the same way with an InputStream from a URLConnection and an
> InputStream supplied explicitly by an application.

I'm a bit confused by your reply.  You say you're disagreeing with me,
but the points you make don't seem to contradict my suggestion. I agree
there are conceptually two stages.  My point is that the BOM bytes are
removed as part of the first stage because they are part of the encoding
signature not part of the sequence of characters that matches the
document production.  Thus the InputStream should include the BOM bytes,
but the Reader shouldn't include the 0xFEFF character.

>  > How are relative system identifiers supposed to be handled in
>  > DTDHandler?  Suppose I have a DTD with a system id of dir/foo.dtd,
>  > which declares an unparsed entity with a system id of foo.eps
>  > (which refers to dir/foo.eps). If the systemId argument to
>  > DTDHandler.unparsedEntityDecl is foo.eps, then the application is
>  > going to have problems.  There's a similar issue with
>  > EntityResolver.resolveEntity.
> 
> This does seem to be a serious problem.  One solution is to require
> the parser to fully resolve system identifiers before reporting them
> (as AElfred already does).  This approach will work well with URLs,
> but may break for other URI schemes.
> 
> Any other solutions?

In XP, my analog of InputSource has both an InputStream and optionally a
URL to use a base URL for system identifiers in that InputStream.  In
each case where the application is passed a system identifier (whether
for parsed or unparsed entity), the parser passes both the specified
system identifier and the base URL from the InputSource analog.  This
gives the application complete control over resolving relative URLs,
although at the cost of some complexity.

In implementing the SAX driver for XP I try to make an absolute URL from
the specified system identifier and the base; if that succeeds I pass
the result (after conversion to a String); if it fails (for example
because it is parsing from an InputStream with no specified system
identifier) I pass the specified system identifier.  That is the
approach I would suggest for SAX.

James

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.