[Home] [By Thread] [By Date] [Recent Entries]
At 18:13 15/01/98 -0600, Jeremie Miller wrote: >I've just updated my JavaScript parser < http://www.jeremie.com/xparse/ >, >and have a few questions... > >First, the update. Unlike normal software aging, I cut the code size by 50% >(below 5k w/o my comments) and increased the speed and compatibility. It >should work with almost _any_ incarnation of JavaScript. It now properly >and according to spec for a well-formed parser understands elements, >attributes, the prolog, comments, processing instructions, and CDATA >sections. What I am working on yet is entities and DOM compatibility(just >have to print out the spec and read it). Excellent. > >My question is this, being a fairly simple parser, how should I handle >entities? I'm confused by the spec as to how a well-formed parser should >handle them. Should I parse <!ENTITY definitions in an included DTD, or >simply handle & < > " ' ? If those are all I should >handle, which ones where? The spec does talk about these things, but I >don't feel right about my interpretation of it. You are not alone :-). There is a difficult decision here for parser writers - do they implement everything in the spec or do they go for a subset? If the latter they are not full XML implementations (and therefore cannot use the label "XML parser"). If the former, they have a *lot* of work to do in understanding the spec and getting it right. I have heralded my own incompetence in understanding NOTATION on this list :-) Every software writer therefore has to decide whether they are going to write a fully conformant XML processor. I am not sure whether *anyone* has yet done this other than James Clark (and those who adapt SGML systems to process XML). [XML *is* SGML, of course, but you have to use a customised SGML declaration for standard SGML tools to read XML.] Most of my work is done with Lark and AElfred and I think they both may have some small bits to fill in (please forgive if I'm wrong :-). For my own parser (Jumbo) I gave up about 6 months ago and do not process entities (other than the hardcoded ones). That means that if I get a document which uses them, my parser fails and I switch to Larkfred. (In fact I'll make one of them the default as soon as the dust settles...) So you have the following choice: - encode the *whole* spec (and nothing but the spec - i.e. no tricky non-compliant extensions) and give yourself the label "conforming XML tool". - encode the bits you feel are cost effective and label it "processes most XML documents, but gives 'Sorry' messages for some". >Other question: Either I can't find it or I am reading right by it, but how >do I handle whitespace in attribute values as a well-formed parser, just >allow anything, including \n? It depends on the type of the attribute value. see 3.3.3 (Attribute value Normalization). If the attribute value is of type CDATA it stays asis, else it gets normalised. How do you tell if it's not CDATA? - there has to be an ATTLIST for the element. This is in the external or internal subsets. So you have to be able to process those. - these subsets can use Parameter Entities. So you have to be able to process those. The alternative is not to process any ATTLISTs. This has the slight disadvantage that it can totally change the meaning of the document. e.g. an attribute value can be an ENTITY which effectively means it is a pointer to a chunk of information, whereas if it is assumed to be CDATA it's just a string. So the bottom line is that *if* the document author uses ENTITYs, and your software doesn't then you will end up with something radically different from what the author intended. This may or may not matter. If you are the author of the document as well as the parser, then you can make a bargain with yourself that you will never use ENTITYs so your software doesn't need to. If you then want other people to use your software you either have to add in entity processing OR give them a statement that you cannot process the document. What you must not do (IMO) is to ignore ENTITYs and assume the result is more or less OK :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|

Cart



