[Home] [By Thread] [By Date] [Recent Entries]
sean.mcgrath@p... (Sean McGrath) writes: >Correctness or input fidelity - pick one - you cannot have both. Of course you can have both, if you haven't been lulled to sleep by chants of "Infoset, Infoset" or "XPath is the data model." Heck, you can even have both and deal with the PSVI, if you're that much of a masochist. When XML first appeared, it seemed important that parsers be small and easy to write. XML 1.0 gave parser writers escape hatches on a number of things, and developers frequently wrote to that minimum. XML 1.0 locked some functionality in the parser, and developers never went to the effort of exposing it. Since then, we've built huge edificies of code on top of these parsers, but I haven't seen anyone go back to retrieve what was thrown away in the first round. The Desperate Perl Hacker has been quite thoroughly betrayed, first by XML 1.0, then by namespaces, then by a variety of other devices that further separated the text from its supposed meaning. There's nothing inherent in XML or in the languages used to process XML that requires this division. Java is plenty capable of providing text renditions to accompany events or objects, if anyone thinks it valuable. Perl, Python, C# - heck, I think I could do this in Pascal or AppleSoft BASIC if I really had to do it. The problem isn't the code - it's the will. It certainly takes extra effort. I've been poking at this for years now, stuffing bits of code between books and other projects. I wrote up pretty much my whole process at http://lists.xml.org/archives/xml-dev/200303/msg00568.html, and I'm finally reaching the point where a framework is emerging that supports text, events, and objects. When I'm done, you'll be able to collect a series of parsing events into an object tree, play with the text, re-serialize that into a tree, and drop that tree into events. You'll be able to make changes to the events or the object tree and have your changes made with minimal impact on the original surrounding text - no need to obliterate all your entity references to make changes in a document. I'm not claiming that this framework will be the most efficient way to process XML, or that it will solve all problems. There's a huge amount of work yet to do (an XPath implementation is crucial, and I've not yet started that), and the primary interface for it is still through javadoc and code. I intend, however, to demonstrate that "you can have both", and hopefully other programmers will pick up on that and let more of us have the benefits of both. -- Simon St.Laurent Ring around the content, a pocket full of brackets Errors, errors, all fall down! http://simonstl.com -- http://monasticxml.org
|

Cart



