[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: ANNOUNCEMENT: Proposed SAX Revisions
* David Megginson | | If you are interested in SAX, either as a parser writer or as an | application writer, please take a few minutes to read through this | web page. I will look forward to receiving your comments, | corrections, and suggestions. Before my comments I should perhaps add some background: A Python special interest group for XML processing has recently been established and as a part of that effort there are now several budding XML parsers in Python (plus one C module), a SAX library, a prototypical DOM implementation etc. See <URL:http://www.python.org/sigs/xml-sig/> for more details. My part in all this has so far been an XML parser as well as the Python SAX translation and drivers. (This is available from the link above. A parser version with full well-formedness checking will probably be released later this weekend.) Apart from the comments below this I agree with all the changes and mostly with the rationales for the changes as well. However, I don't like adding the Location argument to every *Handler method. IMHO, that clutters the interfaces too much. I'd much prefer this alternative: Make a new interface BaseHandler, which only has two methods: getLocator and setLocator, which can be used to give the handler a Location object[1] it can ask about the current location. This interface can then be implemented by DTDHandler, EntityHandler, DocumentHandler and ErrorHandler. It would simplify those four interfaces (by removing an attribute from every method in each interface) and probably both simplify implementation and transition to the new SAX version. (SAX version numbers might perhaps be an idea?) The specification should perhaps also specify exactly where the Location object should point to. The most obvious choice is the first character of the reported construct, but IMHO that should be spelled out. The last issue is that of AttributeList. In Python (and many other languages) lists, hash tables and tuples are "native" types and this is basically what AttributeList is. Also, Java is now going to have a standardized Collections API with Java 1.2. I think AttributeList should be in a form that makes it implementable with the "native" types where that is natural and still make it conform with the Collections API of Java 1.2. One way to do that might be to have an Attribute object with Name, Type and Value attributes and just make AttributeList a hash table that maps attribute names to those objects. In Python/Common Lisp/Perl this might be implemented with hash tables and lists/tuples. Alternatively, one could throw out the type information and just use a plain hash table/associative array. Below this point I have two ideas that may clash with what people want with/from SAX. If they are out of the question that's OK, I just want to hear the reactions to them. One thing that would be very nice would be to make it possible for SAX clients to do validation themselves in case the underlying parser does not support it. This would make it possible to build a validating XML parser in languages like Python/tcl/Scheme from three components: a C module for fast document scanning, a Python/tcl/Scheme module for the same in case the C one hasn't been compiled in and finally the validation itself, written in Python/tcl/Scheme. What's necessary for this is basically the doctype method, access to the internal subset somehow and access to the XML declaration.[2] These things should perhaps be in a separate interface, since they are pretty different from most of the other things SAX is concerned with. (DTDHandler is probably the wrong place for them, since that would probably only be implemented if the parser already does validation.) This is of course not a matter of life and death, since it's possible to do this without SAX, but IMHO it would be nice as it would provide a clean decoupling of XML scanning and XML validation. Finally, one thought that struck me when I read this API was that SAX seemed to be biased towards parsers that read the DTD, while I'm biased the other way. This is probably due to the differences between the situation Python is in (and Perl/... probably will be in) and the currents state of affairs in Java. This makes me think that it might perhaps be an idea to add a second level to SAX: one that provides logical information about elements, entities, attributes and notations as they are declared in the DTD. The existing SAX can then be simplified to only provide logical information about the document itself. The second level will of course only be supported by the parsers that actually parse the DTD and if my suggestions above are taken in the second level can be built on the first. Doing this would also solve the AttributeList and DTDHandler "problems", since AttributeList would then become a plain hash table in most languages[3] while DTDHandler would become part of the DTD interfaces. One advantage of this is that there would no longer be several methods in SAX level 1 that many simple parsers will not support. Just an idea. The problem is of course how to provide access to the complex stuff: element content models. (Possibly by avoiding the issue entirely and instead having methods that ask the DTD "is this element allowed here?") [1] With this change it would probably be best to rename Location to Locator, since that's really what happens. Locator might also perhaps be merged with Parser. [2] It would be nice if the SAX spec could specify whether or not the XML declaration should be reported as an ordinary processing instruction. Reporting it would solve the problem with access to it, but would add complexity for the users. [3] Probably an assoc list in R5RS Scheme and object/record arrays in VB/Delphi/C/.... -- "These are, as I began, cumbersome ways / to kill a man. Simpler, direct, and much more neat / is to see that he is living somewhere in the middle / of the twentieth century, and leave him there." -- Edwin Brock http://www.stud.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|