[Home] [By Thread] [By Date] [Recent Entries]
{{ Normalization: For background, for readers who don't know what normalization is: consider A with an angstrom diacritical: a legacy character set may use two one character to represent A and one character to represent combining the angstrom, or it may use one. Unicode supports both forms ( U+0041 U+0301 i.e. NFD, and U+0058 i.e. NFC) , and they are invisible to the eye and disruptive for simple collating and string matching. So Unicode supports various kinds of decomponsing and combining operations, called normalization. W3C has a Character Model specification which recommends using Unicode Normalization Form C. }} First, to confirm the status quo: as I understand it:
(My proposal for my system is that normalization of names (to NFC) is a server-side responsibility, which clients may check for: or they may build name normalization in themselves too. This only applies to tokens that are not in double quotes, not to strings or literals. (I will update the documentation on www.schematron.com for RAN: Random Access Notation with this. ) On Sat, Jul 31, 2021 at 8:04 AM John Cowan <johnwcowan@g...> wrote:
I don't understand this. I don't think we disagree, but clearly there are transcoders in the wild that actually do not produce NFC for every legacy charset. (I think John may be reading "it is needed" as "the only reason it is needed" but I meant "it is at least needed".)
Really, the issue is building normalization checking into the APIs for creating element objects, etc., which requires doing it on the ground floor. Rick
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



