[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Random Access XML

  • From: rjelliffe <rjelliffe@allette.com.au>
  • To: <xml-dev@lists.xml.org>
  • Date: Sun, 20 Feb 2011 12:31:33 +1100

Re:  Random Access XML
 On Sat, 19 Feb 2011 15:36:46 -0500, John Cowan <cowan@mercury.ccil.org> 
 wrote:
> rjelliffe scripsit:
>
>> 1) For a start, we need to be able to know whether "<" "</" and ">" 
>> are
>> tag delimiters without knowing context. So we must ban direct use of 
>> "<"
>> and ">" in attributes and also get rid of CDATA sections. We should 
>> get
>> rid of comments and PIs too, for the same reasons. (Actually, we 
>> only
>> need to ban comments and PIs from after the first start tag. For 
>> other
>> reasons, we might like to treat the first start-tag and before it
>> specially.)
>
> Of course, random < is already banned everywhere, so if you ban > in
> character content as well as attribute values, you get full 
> reversibility:
> each of <, </, <?, <!--, >, />, and --> is guaranteed to be the open 
> or
> close delimiter of a markup construct.

 Yes, if people are happy to keep comments and PIs after the prolog, I 
 don't mind. (But I thought James' idea was to reduce the different 
 number of nodes types in the parse tree, because multiple node types 
 apparently freaks programmers out?)

> MicroXML already bans > in character content so that it doesn't have 
> to
> special-case ]]>, as required for full XML compatibility.  The only 
> reason
> it doesn't ban > in attribute values is that they are required for
> compatibility with Canonical XML.

 Oh, is that a requirement?

>> 3) The generic identifier would have to be more like an XPath.
>
> This could be achieved by convention, using a legal but rarely
> employed delimiter like U+00B7 MIDDLE DOT, or any of the vast number 
> of
> delimiters allowed by XML 1.0 Fifth Edition.

 Yes, lets make the 5th edition useful! :-)  Using special characters ad 
 hoc in names may be bad, but using them for systematic delimiters could 
 be good.  (I think using non-ascii characters for token separators wont 
 get any traction, unless encodings are restricted to UTF-*. Or allow an 
 builtin entity reference for the delimiter chosen.)

 For the sake of argument, say we use ‣ [triangle] eg 
 <book‣section‣personalName>, which is like a breadcrumbbar notation.  A 
 SAX processor for Random Access XML would plug after a normal SAX parser 
 and replace element names like 'book‣section‣personalName' or 
 'section‣personalName' with 'personalName'. (I.e. report back just the 
 element name--the last item. If sections only appear in books, then the 
 start tags <book‣section‣personalName> and <section‣personalName> should 
 not alter the infoset.)

 If we wanted to reduce name lengths, we could allow simple wildcards or 
 ellipsis too: eg <b…‣s…‣personalName>


 Cheers
 Rick Jelliffe

 BTW, the idea of using paths in names to allow random access is not new 
 or mine. IIRC the Dynatext readers indexed their SGML into a one element 
 per line format, with a long path name at the beginning of each line. 
 This allowed fast contextual searches using normal line-oriented text 
 matching. I think Steve deRose had the patent on this, but I'd think it 
 would be expired by now.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.