[Home] [By Thread] [By Date] [Recent Entries]
On Sun, 18 Jul 2021 18:37:34 +1000, you wrote: | "*For several decades I have dabbled with methods to speed up parsing UTF-8 | and XML using SIMD and parallel parsing: my conclusion is that the approach | I am suggesting here is the only feasible way for XML to not be sidelined | as slow and complex.[...]"* I don't think XML will ever get away from being "slow and complex". "Local" lookups - the benefit of your argument to dispense with entity expansions - can get pretty expensive too. The comparison with JSON has to do with representations of data sets and configuration files - essentially, trees of name-value pairs. But is parsing of JSON in any language other than Javascript significantly easier? All these other languages hide the gory details in a library or module or whatever, just like they do with XML, so the argument, if there is one, is about the performance of these add-ons: which is not very productive or enlightening. The fact remains that XML is still hideously verbose for just collections of name-value pairs (as well as setting traps for the naive who elect to put data content into attribute values.) Then there's the production side: editing, editing tools, and the travails of good old-fashioned manual input. Arguably, "shorthand" didn't start with Markdown or Wikitext, but with Ian Feldman's setext (1991 or so?). SGML's SHORTREF facility got left on the cutting room floor when XML was being spec'd, though I'd hazard the guess that Markdown et al have workable representations for a SGML parser. (But SHORTREF needs entity declarations in SGML syntax, so maybe that's a nogo?) Sadly, a paper from Balisage 2012 on this subject didn't go far, AFAICT. http://www.balisage.net/Proceedings/vol8/html/Blazevic01/BalisageVol8-Blazevic01.html (Also see https://marginput.blogspot.com/2012/08/shortref-redux.html) Personally, I think XML has fallen on the wrong side of the "easy to produce and consume" divide. Which is not necessarily a bad thing, but it does militate against quick-n-dirty use of XML. By the same token, I'm not convinced that XML parsing can be made _significantly_ faster to warrant the effort.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



