[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Random Access XML
rjelliffe scripsit: > Yes, if people are happy to keep comments and PIs after the prolog, I > don't mind. (But I thought James' idea was to reduce the different > number of nodes types in the parse tree, because multiple node types > apparently freaks programmers out?) Well, you know the zero-one-infinite rule. Without PIs, you need only element nodes; with PIs, you need document nodes, element nodes, and PI nodes. That's triple the number of node types. MicroLark normally reports an error if a PI appears, but if you set the PI feature to true, its push and pull parsers will report PIs, but the tree builder will ignore them. Only PIs that look like well-formed start-tags (except for the question marks) are allowed, which covers things like xml-stylesheet and xml-model. XML declarations are still disallowed. >> The only reason [MicroXML] doesn't ban > in attribute values is that >> they are required for compatibility with Canonical XML. > > Oh, is that a requirement? No, but it's convenient because it means that XML->MicroXML converters already exist in the form of XML canonicalizers. > (I think using non-ascii characters for token separators wont > get any traction, unless encodings are restricted to UTF-*. [...]) MicroXML limits the encoding to UTF-8, with ASCII as a degenerate case. > BTW, the idea of using paths in names to allow random access is not new > or mine. IIRC the Dynatext readers indexed their SGML into a one element > per line format, with a long path name at the beginning of each line. > This allowed fast contextual searches using normal line-oriented text > matching. I think Steve deRose had the patent on this, but I'd think it > would be expired by now. A good thing, since I have such a script not for indexing but for pipelining: it produces lines of the form "path\tvalue" for every element path in a document, where "value" is the XPath value of the element. There's a switch to allow paths ending in "/@foo" as well. -- All Gaul is divided into three parts: the part John Cowan that cooks with lard and goose fat, the part http://ccil.org/~cowan that cooks with olive oil, and the part that cowan@ccil.org cooks with butter. --David Chessler
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|