[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: The subsetting has begun
Sean McGrath wrote: > The instance is UnicodeWithAngleBrackets for sure. But an XML compliant > parser much turn this mixture into a tree. If it can't, surely, the instance > is not WF? I don't see how a parser can match production [1] of the XML spec. > without turning the UnicodeWithAngleBrackets into a tree. The tree might be > communicated in its entitity to the application (a la DOM) or in a stream of > events (a la SAX) but there is always a tree there. An XML-compliant parser *must* not turn an XML instance into any one particular output. Production [1] is the syntactic criterion that *input* to an XML parser must meet to be accepted as a document, which the Rec requires. Nothing in that production (nor in any other) says anything about the form of output that a parser must give to an entity matching this definition of document. And that is just the point. Different styles of parsers natively produce different styles of output. It could not be otherwise; parsers are processors like any other and like all processors give a form to their output which reflects a particular understanding of it. Subsequent users of that output are not obliged to bring that same particular understanding to their own processing of it. To say that a processor is general-purpose is to say that the form which it gives to its output does not preclude any subsequent use of that same output understood in entirely different terms. In practice, there will be uses of parser output which will be precluded by the form which the parser has given to that output. This is inevitable in the specific implementation of processors, whether parsers or any other. In such cases, that particular parser will not be sufficiently general purpose for that particular subsequent process, but the difficulty can be cured by changing to a different style of parser whose native output is sufficiently general to the subsequent process required. Just such considerations will often decide whether a SAX or a DOM or some other style of parser is appropriate to a particular case. It does not mean however that ' there is always a tree there'. Perhaps in either the case of SAX or DOM a tree can be built if that is what a process subsequent to parsing chooses to do, but in terms of processing the input XML instance a SAX parser emits SAX events and a DOM parser renders a data structure defined by its particular DOM. > At one level of interpretation - mid-parse as it were - prior to entity > expansion, the parsers internal model might have shared sub-trees given than > the same entity an occur more than once. But, passed entity resolution - the > stuff passed on to the application - is be a tree. I would argue that there is no visible 'mid-parse' which we might reasonably discuss. There is only the input XML instance and, if it survives draconian error handling, there is the particular output in the particular style of the parser. That output is the transitional state, and though in many cases a tree might be instantiated upon it, it is itself of a form native to the style of parser. > The beauty of always starting with the UnicodeWithAngleBrackets is that it > forces a separation between the process-specific and that which is innate in > the data. Amen. And from the *parser's* perspective (as opposed to that of some subsequent processor) all that is innate in the data is compliance with WFCs, perhaps VCs, or the lack of it. > In SGML, we had a name for the latter "markup-aware" as distinct from > "structure controlled". I believe that I understand the distinction. In performing its job qua parser the parser is necessarily markup-aware. It cannot be structure controlled because the structure which you are expecting to find is instantiated on the output of parsing, not inherent in the (pre-parsed!) input instance. The distinction which you make is really a distinction in what various post-parsing processes should operate upon, giving their particular natures. Respectfully, Walter Perry
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|