|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Syntax Sugar and XML information models
> > > Conceptually, perhaps we have: > > > > The "Syntax Sugar InfoSet" (SSIS) that exposes everything worth > > round-tripping > > in the XML syntax... [even different quote characters > and whitespace???] > > That list could be endless - you did not even mention attribute order. Well, that's the nub of the issue here: The W3C InfoSet is widely interpreted as decreeing that everything not in the InfoSet is "mere syntax sugar". Some of these distinctions are clearly rooted in the XML spec and existing practice, such as the fact that the order of attributes is insignificant, the type of quotation marks around attribute values is insignificant, etc. Others are more controversial, such as CDATA sections. [For example, would you really want your XML database to take in XML documents with scripts escaped with CDATA sections and return them escaped with < etc.?] Others really MUST be interpreted differently by authoring tools than the InfoSet specifies -- for example, the whole POINT of parsed entities is lost if an editor doesn't round-trip them; likewise a database should either let its client resolve external entities, or resolve them at retrieval time rather than storage time. (Entities are the only thing supported in a Recommendation that enable control of redundant information ...). So, there seem to be two classes of things that the InfoSet doesn't cover: the "mere syntax" that no reasonable application (except maybe a "diff") would care about, and the gray area stuff that some XML tools must care about but that the InfoSet says nothing about. My suggestion is to make this distinction more formally, based on input from the folks "in the trenches" about which details of XML syntax are "significant" and which aren't. Maybe there is an endless list of things that some people care about and some don't, but I'd at least like to see some discussion before giving up. So, does ANYBODY care about round-tripping a) the specific quote characters around attribute values, b) the order of attributes; c) character entity references for characters that are in the specified character set d) the two diferent syntaxes for empty elements, .... ? Are there other bits that the InfoSet doesn't represent but have some practical significance to real applications? (Let's not discuss whitespace ... the complexities there are well-known and too painful to think about).
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








