[Home] [By Thread] [By Date] [Recent Entries]

  • From: Tim Bray <tbray@t...>
  • To: Roger L Costello <costello@m...>
  • Date: Thu, 15 Apr 2021 11:40:55 -0700

It's less important than it used to be. A very high proportion of online text is now UTF-8 and it'd be frankly weird to send data over the wire with Microsoft CRNL line endings. 

On Thu, Apr 15, 2021 at 11:36 AM Roger L Costello <costello@m...> wrote:

Hi Folks,

 

I find it totally fascinating that XML parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters. Applications that operate (reason) on the post-parsed input know exactly what they are working on.

 

Wicked neat!

 

Do other data format specifications specify that their parsers perform similar conversions?

 

Do JSON parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do CSV parsers (Comma Separated Value parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do YAML parsers (Yet Another Markup Language parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do Protocol Buffer parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Or, does XML stand apart from other text data formats in this regard?

 

/Roger

 

 

From: Roger L Costello <costello@m...>
Sent: Thursday, April 15, 2021 11:25 AM
To: xml-dev@l...
Subject: Two hugely significant conversions that XML parsers do

 

Hi Folks,

 

An XML parser does two hugely significant conversions.

 

Suppose we provide input to an XML parser. Here are the conversions that the parser does to the input:

 

1. The parser converts the characters in the input to Unicode.

 

2. The parser converts line endings in the input to a linefeed character (hex 0A).

 

What are the consequences of these conversions?

 

Answer: your applications can operate on the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

 

I like the term that Amy used: your applications can _reason_ about the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

 

/Roger



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member