[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Tool converts records to XML
Hi Roger and all, On Tue, 15 Nov 2022 20:14:36 +0000 Roger L Costello <costello@mitre.org> wrote: > Michael Kay wrote: > > > the "Barnes & Noble" problem. The number #1 blunder > > when writing XML is not to bother escaping `<` and `&` > > if they happen to occur in your input. > > Ouch! > > You are right Michael. > > Upon reflection, I realized that there is an even nastier problem lurking > than the problem of converting & and < in the input record data into & > and < in the output XML. > > My actual record data is encoded in Windows 1252. The records contain many > occurrences of the degree symbol. In Windows 1252 the degree symbol is a > single byte (hex B0) but in UTF-8 the degree symbol is two bytes (hex C2 B0). > My AWK program doesn't convert the one byte Windows 1252 degree symbol to the > two byte UTF-8 degree symbol. In fact, to be correct every Windows 1252 > character with an encoding above hex 0F must be converted to two bytes, and > my AWK program doesn't do any of that. > > Eeeeeeek! > > For fun, I also wrote an XSLT program to convert the records to XML. My > program uses the unparsed-text() function, which does all character > conversions behind-the-scene (i.e., you have no idea that all the Windows > 1252 characters above hex 0F are being converted to two byte UTF-8 > characters). > > To implement the character conversions in AWK would be a monumental task. > > Eeeeeeek! > > Lesson Learned: Don't use AWK to convert records to XML. > Re AWK , perl , and POSIX, see: * https://shlomifish.livejournal.com/1991.html * https://twitter.com/shlomif/status/1542047869989011457 * https://github.com/ekalinin/github-markdown-toc/pull/104/files > Bummer! > > /Roger > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org > subscribe: xml-dev-subscribe@lists.xml.org > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > -- Shlomi Fish https://www.shlomifish.org/ https://github.com/shlomif/validate-your-html - Validate Your HTML If Chuck Norris is disappointed by you not following his advice, he’ll survive. On the other hand, you will not. — https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/ Please reply to list if it's a mailing list post - https://shlom.in/reply .
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|