Re: RE: Two hugely significant conversions that XML parsers do

From: Tim Bray <tbray@textuality.com>
To: Roger L Costello <costello@mitre.org>
Date: Thu, 15 Apr 2021 11:40:55 -0700

Play the video

It's less important than it used to be. A very high proportion of online text is now UTF-8 and it'd be frankly weird to send data over the wire with Microsoft CRNL line endings.

On Thu, Apr 15, 2021 at 11:36 AM Roger L Costello <costello@mitre.org> wrote:

Hi Folks,

I find it totally fascinating that XML parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters. Applications that operate (reason) on the post-parsed input know exactly what they are working on.

Wicked neat!

Do other data format specifications specify that their parsers perform similar conversions?

Do JSON parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

Do CSV parsers (Comma Separated Value parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

Do YAML parsers (Yet Another Markup Language parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

Do Protocol Buffer parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

Or, does XML stand apart from other text data formats in this regard?

/Roger

From: Roger L Costello <costello@mitre.org>
Sent: Thursday, April 15, 2021 11:25 AM
To: xml-dev@lists.xml.org
Subject: Two hugely significant conversions that XML parsers do

Hi Folks,

An XML parser does two hugely significant conversions.

Suppose we provide input to an XML parser. Here are the conversions that the parser does to the input:

1. The parser converts the characters in the input to Unicode.

2. The parser converts line endings in the input to a linefeed character (hex 0A).

What are the consequences of these conversions?

Answer: your applications can operate on the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

I like the term that Amy used: your applications can _reason_ about the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

/Roger

References:
- Two hugely significant conversions that XML parsers do
  - From: Roger L Costello <costello@mitre.org>
- RE: Two hugely significant conversions that XML parsers do
  - From: Roger L Costello <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >