[Home] [By Thread] [By Date] [Recent Entries]
From: "Jonathan Robie" <jonathan.robie@d...> > This is what I had thought most people would expect - regular expressions > are not normally what you use to parse something described by a BNF. ... > Isn't the lesson simply that you need a parser to interpret XML? And if so, > why is that a problem? Most languages I use require a parser... The usual approach for people who want to do text processing on an XML document is to (accumulate) a set of standard normalizing tools, so that the XML is in a single format. This simplifies the expressions needed for text processing. (For example, Omnimark provided a script for normalizing, and I think SPAM was the normalizing tool for SP) A conclusion that you cannot use text processing on XML and should always use a parser is just wrong, and against experience. People hear that XML simplifies SGML to make it more parseable, but XML addresses the issues of needing a DTD or SGML declaration to parse the document (RCDATA content, shortrefs, minimization, etc). It does so that there is a more simply-parseable form of a document possible when canonicalized (which used to be called normalized, but other things use that now). Desparate Perl Hackers should canonicalize first. Get rid of CDATA sections, use a single literal delimiter, play with the namespaces, handle entity and character references. Of course, you might see this as splitting the parsing up into separate stages. The reason for doing text processing rather than 'parsing' (i.e. in to objects) is usually to take advantage of text streams rather than objects, for example where you want to process a very large document without creating zillions of objects. Cheers Rick Jelliffe
|

Cart



