[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: RE: Two hugely significant conversions that XML parsers do

  • From: Tim Bray <tbray@textuality.com>
  • To: Roger L Costello <costello@mitre.org>
  • Date: Thu, 15 Apr 2021 11:40:55 -0700

Re:  RE: Two hugely significant conversions that XML parsers do
It's less important than it used to be. A very high proportion of online text is now UTF-8 and it'd be frankly weird to send data over the wire with Microsoft CRNL line endings. 

On Thu, Apr 15, 2021 at 11:36 AM Roger L Costello <costello@mitre.org> wrote:

Hi Folks,

 

I find it totally fascinating that XML parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters. Applications that operate (reason) on the post-parsed input know exactly what they are working on.

 

Wicked neat!

 

Do other data format specifications specify that their parsers perform similar conversions?

 

Do JSON parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do CSV parsers (Comma Separated Value parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do YAML parsers (Yet Another Markup Language parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do Protocol Buffer parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Or, does XML stand apart from other text data formats in this regard?

 

/Roger

 

 

From: Roger L Costello <costello@mitre.org>
Sent: Thursday, April 15, 2021 11:25 AM
To: xml-dev@lists.xml.org
Subject: Two hugely significant conversions that XML parsers do

 

Hi Folks,

 

An XML parser does two hugely significant conversions.

 

Suppose we provide input to an XML parser. Here are the conversions that the parser does to the input:

 

1. The parser converts the characters in the input to Unicode.

 

2. The parser converts line endings in the input to a linefeed character (hex 0A).

 

What are the consequences of these conversions?

 

Answer: your applications can operate on the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

 

I like the term that Amy used: your applications can _reason_ about the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

 

/Roger



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.