[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Best Practice for designing XML vocabularies containingacc
Roger, stop reinventing the wheel. This is all known territory you are exploring. Read http://www.w3.org/TR/charmod-norm/ and if you think it's wrong, tell us why. Michael Kay Saxonica On 02/02/2013 19:03, Costello, Roger L. wrote: > Hi Folks, > > I propose the following as Best Practice: > > For elements and attributes that have accents, > allow users to express them in either composed > normalized form (NFC) or decomposed normalized > form (NFD). > > Example: suppose that your XML vocabulary is to contain this element: > > <résumé> > > Notice the two accented characters. > > There are two standard, canonical ways to express those accented characters: > > 1. Normalization Form Composed (NFC): the accented character is expressed as a single composed character (U+E9 LATIN SMALL LETTER E WITH ACUTE) > > 2. Normalization Form Decomposed (NFD): the accented character is expressed as a decomposed sequence to two characters (U+65 LATIN SMALL LETTER E, U+301 COMBINING ACUTE ACCENT) > > In the following XML document the first <résumé> element is expressed using NFC. The second is expressed using NFD: > > <?xml version="1.0" encoding="UTF-8"?> > <Test> > <résumé>____</résumé> > <reìsumeì>____</reìsumeì> > </Test> > > The two <résumé> elements appear the same, dont they? Thats a neat thing about NFC and NFD -- visualization tools display them the same way. > > In order for users to express accented elements and attributes in either NFC or NFD, design your XML Schemas using a <xs:choice> element. In the following XSD snippet the first résumé is NFC and the second is NFD: > > <xs:choice> > <xs:element name="résumé" type="xs:string" /> > <xs:element name="reìsumeì" type="xs:string" /> > </xs:choice> > > By designing your schemas in this fashion you empower your instance document authors to use whatever normalization form they prefer (or their tools prefer). > > I inquired on the Unicode mailing list about NFD. Here are my notes on their responses: > > Most text exchanged on the Internet is NFC-encoded. However, you can't count on text to always be NFC-encoded. In fact, there are definite advantages to NFD-encoding text. > > Some operating systems store filenames in NFD encoding. > > Its easier to remember a handful of useful composing accents than the much larger number of combined forms. > > NFD makes the regular expressions used to qualify its contents much, *much* simpler. I imagine that things like fuzzy text matching are easier in NFD. > > There are well-documented cases of, for example, keyboards that generate de-normalized sequences, file systems that use other forms, and tools which generate content that is not normalized. This content enters the Web in a non-NFC state. > > It is easier to use a few keystrokes for combining accents than to set up compose key sequences for all the possible composed characters. > > Its easier to do searches and other text processing on NFD-encoded text. > > Some Unicode-defined processes, such as capitalization, are not guaranteed to preserve normalization forms. So the result of converting a lowercase character in NFC may be a decomposed uppercase character sequence (i.e., NFD). > > Thoughts? > > /Roger > > I have approximate answers and possible beliefs > in different degrees of certainty about different > things, but I'm not absolutely sure of anything. > > Richard Feynman > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org > subscribe: xml-dev-subscribe@lists.xml.org > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|