[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Parser considerations (was: MS XML parser only works with IE...)
At 14:13 25/11/97 -0500, [many people] wrote about MSXML Some of the things we mustn't forget at this time are: - there is as yet no frozen XML 'recommendation' (I hope that's the correct term). Under those circumstances it is unlikely that there are any completing conforming parsers; the spec is still changing and so any parser has addressed a moving target. - for many people helping in the development of XML the question of 'best parser' is not appropriate at this stage - and I suspect not for at least 3 months. The spec is quite large and is a lot of effort to implement (those of us who have hacked parsers know). Many of us give up on points we don't understand (for me it was parameter entities, and that caused others grief as well :-). So until we see the next spec [is there a later public one than Aug 7?] we can't be sure whether a parser 'gets PEs right' :-). I sympathise with anyone who has failed to implement part of the current spec, and I hope that people trying out parsers and other software will take a constructive view of such 'failings'. - I believe that all parser writers at present would like their parsers validated. Validation *of* a parser seems to me to include checks on - reporting errors in non-conforming XML documents - asserting that a conforming XML document is conforming - carrying out defined transformations on the original input All of these require a set of test inputs, which I believe we badly need at present. It is very likely that a parser writer at present will overlook something in the spec. Checking the transformations is less easy as there is no defined output. How, for example, do we check that parser A transforms all the entities correctly? An important way is to make sure that the outputs of two independent parsers agree. To this extent, whatever we think about 'steenking ESIS' [a quote from the source code of a well known XML parser], it is at least checkable :-) - the really hard bit comes when the semantics of behaviour are unclear. Does the statement <!DOCTYPE CML SYSTEM "cml.dtd"> require the parser to *do* anything? Different authors will certainly have different ideas - some see it as a request by the author that the document must be validated - authors that if the reader wishes to validate it, then this is the doctype that should be used. There are many subtleties of this sort. I believe that the development of XML has been one of the outstanding achievements of the WWW. It has been fast, rigorous, fair, open, and required extraordinary commitment and patience from those involved. Often the SIG has had 50 emails a day, and many have required a great deal of careful reading. I have been very gratified by the level and amount of constructive contributions to XML-DEV as this is an important area for ironing parts the spec cannot reach. I remember the agonies of early C++ compilers where every platform and vendor had messages 'this feature not supported' and so on. I believe that all contributors on this list want to avoid this and that 'any valid XML document can be parsed with any XML parser'. Since some parsers may purport to be XML compliant but not be, it is critical that this fact can be recognised, and a test suite of documents seems to be a key instrument. I hope very much that authors of such parsers will be able to find the energy to mend them :-) If - at some future time - I were looking for attractive features in an XML parser and after discarding the non-compliant ones, I would want to consider a wide range and I doubt that any one parser would 'win' in all aspects. To this end I am trying to make JUMBO accept a range of parsers by a simple commandline switch (or button). Thus: java jumbo.sgml.SGMLTree foo.xml parser=NXP (or Lark) I can quite envisage where a user wants to use parser A to read in the initial document (perhaps because it is large, or tree-structured) and parser B to read the entities. I am delighted to hear about WORA-MSXML, and shall hope to look at it shortly. I hope it's easy to bolt into JUMBO. I am slightly disappointed that Xapi-J seems to have become dormant, because then work inside JUMBO would be minimal. At present most of the parsers I have encountered are event-driven (e.g. doStartTag, doError...) and not all build trees (JUMBO is happy to build trees from streams) . If, indeed, this is the model most people use, then let's get a standard terminology (Element, PI, ElementType, Attribute, etc.) It would make things so much simpler. I also expect we could get a very very simple API defined... P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|