[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Entities and Expat
Thank you both for your rapid response. I should have mentioned that i had previously tried to include entity declarations within my document (there is presently no DTD associated with the document). When i included the entity declaration at the beginning of the document (outside the root element), Expat returned XML_ERROR_INVALID_TOKEN ("not well-formed"). If the entity declaration was included within the root element, Expat returned XML_ERROR_SYNTAX ("syntax error"). Indeed, i double-checked this by copying the string from Joshua's email. Same results. Perhaps the real issue is Expat's handling of "<!ENTITY..>" declarations in standalone documents? Or am i still missing something? My application is parsing XML documents that contain HTML entity references ("©", etc.), indexing the text, and building a full-text database comprised of HTML "documents". The app doesn't need to expand, translate, or index the entity strings -- it just needs a string length to keep the document's word offsets straight, and to copy the string to the output stream. I had hoped to do this in the DefaultHandler callback, but of course i'm never getting there. Joshua E. Smith wrote: > If you want your application to "just know" about some entities which you > have failed to define anywhere, I don't think that documents relying on > that behavior would even be considered well-formed. David Brownell wrote: > Expat doesn't read external parameter entities, including > "the" external subset, but it does understand that if it > doesn't come across one of those, all entities must be > defined through the internal subset... John Cowan wrote: > All you actually have to do is to ensure that the next character > (if not #, see above) is a NAMESTRT character, and that all characters > until ; are either NAME or NAMESTRT characters. There is no need (and > in fact it is forbidden) to look up the supposed entity name anywhere. The first two statements seem to contradict the third (what statement sparked my first message). I must admit i remain a little confused about the boundary between well-formed and valid XML documents when it comes to general entities. My opinion would be to agree with John Cowan's statement -- if an EntityRef is physically valid (see "[68]" below), why should the parser, or any other intermediate processor care whether the referenced entity 'Name' exists? However, when i went back to the XML spec, it seems that Joshua E. Smith is indeed correct that my document is *not* well-formed, and therefore Expat is processing it correctly. The following references and excerpts are from Tim Bray's truly excellent annotated XML 1.0 Specification (http://www.xml.com/axml/testaxml.htm or http://www.xml.com/axml/target.html to omit the explanation frame). I've added the text in curly braces (e.g. "{43}") to describe Tim's hyperlinks. ======= Begin spec excerpt ======= 4.3.2 Well-Formed Parsed Entities [snip] An internal general parsed entity is well-formed if its replacement text matches the production labeled content {43}. All internal parameter entities are well-formed by definition. ======= End spec excerpt ======= ======= Begin spec excerpt ======= [43] content ::= (element | CharData | Reference {67} | CDSect | PI | Comment)* ======= End spec excerpt ======= ======= Begin spec excerpt ======= [67] Reference ::= EntityRef | CharRef [68] EntityRef ::= '&' Name ';' [ WFC: Entity Declared ] [ VC: Entity Declared ] [ WFC: Parsed Entity ] [ WFC: No Recursion ] ======= End spec excerpt ======= ======= Begin spec excerpt ======= 4.1 Character and Entity References [snip] Well-Formedness Constraint: Entity Declared In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", the Name given in the entity reference must match that in an entity declaration, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration. Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'. ======= End spec excerpt ======= According to the table in section "4.4 XML Processor Treatment of Entities and References", an "Internal General Entity" that is "Reference[d] in Content" is to be "Included". ======= Begin spec excerpt ======= 4.4.3 Included If Validating When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor must include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor may, but need not, include the entity's replacement text. If a non-validating parser does not include the replacement text, it must inform the application that it recognized, but did not read, the entity. ======= End spec excerpt ======= -Nik O, Content Mgmt Solutions, Jackson, Wyo. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|