Hi Folks,
I find it totally fascinating that XML parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters. Applications that operate (reason) on the post-parsed input know exactly what
they are working on.
Wicked neat!
Do other data format specifications specify that their parsers perform similar conversions?
Do JSON parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?
Do CSV parsers (Comma Separated Value parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?
Do YAML parsers (Yet Another Markup Language parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?
Do Protocol Buffer parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?
Or, does XML stand apart from other text data formats in this regard?
/Roger
Hi Folks,
An XML parser does two hugely significant conversions.
Suppose we provide input to an XML parser. Here are the conversions that the parser does to the input:
1. The parser converts the characters in the input to Unicode.
2. The parser converts line endings in the input to a linefeed character (hex 0A).
What are the consequences of these conversions?
Answer: your applications can operate on the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.
I like the term that Amy used: your applications can _reason_ about the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.
/Roger