[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Suggestions for a slightly less verbose (and easierto a
[Paul Prescod]: >>If the instances are generated under your control by a machine, then by >>definition they won't use the short-tag feature if your regexps don't >>support it. The complexity argument also does not wash: entities and >>CDATA sections easily add the most complexity to XML of any feature. [Tim Bray] >Machine-generated XML usually doesn't do entities or CDATA. It does do ><someTag> > ..stuff.. > ..stuff.. ></someTag> >and perl is just the ticket. The problem of course is that there is no way to tell whether or not the 1 Gig XML instance you are about to process contains any entities, CDATA sections etc. So you need to make assumptions about the processing environment in your code. Such assumptions make me nervous and make Walter Perry very nervous indeed (they are tantamount to XML vocabulary semantics assumptions). I see three possibilities to make this work reliably: 1) a XML-Lint type utility that would flag the presence of such things so that assumption-laden Perl is protected from making erroneous processing decisions. Such lint-like utilities would make excellent components in XPipe or Schemamachine or Ant or Cocoon or DSDL pipelines. 2) A canonical XML representation guaranteed to have resolved away all the funnies e.g. canonical XML or PYX. 3) An manifest mechanism is XML to allow a human/machine to declare what features the XML instance uses e.g. XFM. This would be of the hint variety - subject to formal confirmation by an XML-Lint type utility - but very useful in stopping "grep" and Perl etc. in their tracks if the manifest asserts something that contradicts the processing assumptions. 4) A PSVI that .... (only joking!!!!!!) Personally (surprise, surprise) I think the lint utility in a *pipeline* is the way to go. That way, people can re-invent all of SGML's tag minimization features in a layered way without heaping them all into a monolithic morass with trickle down complexity to all XML tools. This trickle down effect is what made SGML such an exasperatingly powerful pain in the ass. Lets not re-invent it. Sean
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|