Re: Random Access XML

From: rjelliffe <rjelliffe@allette.com.au>
To: <xml-dev@lists.xml.org>
Date: Sun, 20 Feb 2011 12:31:33 +1100

Play the video

 On Sat, 19 Feb 2011 15:36:46 -0500, John Cowan <cowan@mercury.ccil.org> 
 wrote:
> rjelliffe scripsit:
>
>> 1) For a start, we need to be able to know whether "<" "</" and ">" 
>> are
>> tag delimiters without knowing context. So we must ban direct use of 
>> "<"
>> and ">" in attributes and also get rid of CDATA sections. We should 
>> get
>> rid of comments and PIs too, for the same reasons. (Actually, we 
>> only
>> need to ban comments and PIs from after the first start tag. For 
>> other
>> reasons, we might like to treat the first start-tag and before it
>> specially.)
>
> Of course, random < is already banned everywhere, so if you ban > in
> character content as well as attribute values, you get full 
> reversibility:
> each of <, </, <?, <!--, >, />, and --> is guaranteed to be the open 
> or
> close delimiter of a markup construct.

 Yes, if people are happy to keep comments and PIs after the prolog, I 
 don't mind. (But I thought James' idea was to reduce the different 
 number of nodes types in the parse tree, because multiple node types 
 apparently freaks programmers out?)

> MicroXML already bans > in character content so that it doesn't have 
> to
> special-case ]]>, as required for full XML compatibility.  The only 
> reason
> it doesn't ban > in attribute values is that they are required for
> compatibility with Canonical XML.

 Oh, is that a requirement?

>> 3) The generic identifier would have to be more like an XPath.
>
> This could be achieved by convention, using a legal but rarely
> employed delimiter like U+00B7 MIDDLE DOT, or any of the vast number 
> of
> delimiters allowed by XML 1.0 Fifth Edition.

 Yes, lets make the 5th edition useful! :-)  Using special characters ad 
 hoc in names may be bad, but using them for systematic delimiters could 
 be good.  (I think using non-ascii characters for token separators wont 
 get any traction, unless encodings are restricted to UTF-*. Or allow an 
 builtin entity reference for the delimiter chosen.)

 For the sake of argument, say we use â£ [triangle] eg 
 <bookâ£sectionâ£personalName>, which is like a breadcrumbbar notation.  A 
 SAX processor for Random Access XML would plug after a normal SAX parser 
 and replace element names like 'bookâ£sectionâ£personalName' or 
 'sectionâ£personalName' with 'personalName'. (I.e. report back just the 
 element name--the last item. If sections only appear in books, then the 
 start tags <bookâ£sectionâ£personalName> and <sectionâ£personalName> should 
 not alter the infoset.)

 If we wanted to reduce name lengths, we could allow simple wildcards or 
 ellipsis too: eg <bâ¦â£sâ¦â£personalName>

 Cheers
 Rick Jelliffe

 BTW, the idea of using paths in names to allow random access is not new 
 or mine. IIRC the Dynatext readers indexed their SGML into a one element 
 per line format, with a long path name at the beginning of each line. 
 This allowed fast contextual searches using normal line-oriented text 
 matching. I think Steve deRose had the patent on this, but I'd think it 
 would be expired by now.

Follow-Ups:
- Re: Random Access XML
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: Random Access XML
  - From: Dave Pawson <davep@dpawson.co.uk>

References:
- Random Access XML
  - From: rjelliffe <rjelliffe@allette.com.au>
- Re: Random Access XML
  - From: John Cowan <cowan@mercury.ccil.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >