[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: The Goals of XML at 25, and the one thing that XML now needs

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: xml-dev <xml-dev@lists.xml.org>
  • Date: Fri, 23 Jul 2021 01:22:00 +1000

Re: The Goals of XML at 25
Hi Henry!

Yes I am very familiar with the XML Screamer paper, big fan, and it is one of the primary studies that informed many of the points I am making.

Lets note what they leave out :

XML Features Not Supported

  • 1. DTD external or internal subsets. Note that external subsets are optional in the XML Recommendation, but internal subsets are required [21]. In this respect XML Screamer is non-conforming.
  • 2. Support for encodings other than UTF-8 or UTF-16. Our architecture is in principle capable of supporting other encodings, but because our parsers are hand crafted to optimize for the characteristics of particular encodings, the work involved is significant.
  • 3. Very large instance documents, i.e. those too large to fit in a contiguous memory buffer.

XML Schema Features Not Supported

  • 4. Facets on simple types (these are accepted but not checked; among the facets not checked is the pattern facet.) [Note 5]
  • 5. Non-deterministic content models (such as certain models with nested numeric occurrence constraints.) [Note 6]
  • 6. Identity constraints (accepted but not checked)
  • 7. Validity checking of types other than anySimpleType, date, integer, decimal, nonnegativeInteger, boolean, positiveInteger,negativeInteger, nonpositiveInteger, and string (all types are accepted, but validation of lexical forms and conversion to binary is available only for the listed types.)

In previous decades, as I read these kinds of papers and came up to these omission sections, my reaction was to doubt that the method was in fact practical. (Benchmarking against Xerces, the slowest  parser, did not help, either: how it compares to MSXML is more compelling.) 

 I understand that a research paper has finite resources, but it seemed to me that the omissions were often not arbitrary but genuine pain points. And that there was a pattern to them. 

And that made me think, is this actually a good rational basis for enhancing XML? Instead of seeing the implementation omissions (in this and most other papers) as implementation flaws, if not academic lapses, are they really "telling us" that the features are roadblocks which prevent or dilute many different implementation aproaches? Not drowning, waving.

Hence my starting proposal also follows 1 & 2, and it also (I think) draws from 3 the idea that we want to avoid anything that prevents in-place contiguous parsing. (It also only validates some simple primitive types.)   

However, the problem of entity references causing extra buffer allocations, etc, isnt necessarily so:  if a parse method can support Numeric Character References, then it can also support general entity  references with the same characteristics: they contain no tags or references (CDATA entity) and which do not expand to more characters than the reference...e.g. the standard entity sets.

Hoping you are well,
Rick


On Thu, 22 Jul. 2021, 22:36 Henry S. Thompson, <ht@markup.co.uk> wrote:
You might find this interesting/useful wrt your high-speed parsing
goals:

  http://www.ra.ethz.ch/cdstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/xhtml/5011/p5011-mendelsohn.html

Take care,

ht
--
                    Henry S. Thompson, Markup Systems Ltd.
               Cavers Garden Farm, Denholm; by Hawick; TD9 8LN
                            +44 (0) 7866 471 388
               Fax: (44) 131 651-1426, e-mail: ht@markup.co.uk
                        URL: http://www.markup.co.uk/
[mail really from me _always_ has this .sig -- mail without it is forged spam]


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.