[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Difference between "normalize" and "canonicalize"?
Personally, I see "normalization" as changing the information into something that is common, while "canonicalization" is representing something in a common way without changing it. When line-ending sequences are normalized, they are changed into new values without their old values being retained. On DOS and Mac and mainframe systems, different line-ending sequences are changed to the line-feed character. Once you have the line-feed character there is no going back to what it was. If there was a line-feed in the DOS file, there is no distinguishing the authored line-feed from the normalized line-end line feed. The normalize-space() function changes a sequence of white-space characters into a single space. The information is changing and you can't undo it once you have the normalized string. There's no way to go back to an arbitrary sequence of white-space characters. The normalize-unicode() string changes a character without the ability to go back. Using NFKC normalization on U+1E9B creates U+1E61 and you can't go back because you've changed the Latin character that is the basis of the Unicode character from a long s to a simple s. On the other hand, canonicalization doesn't change the information, or the meaning of the information, it merely makes assumptions about how that information is presented or organized. One can then recover another arbitrary representation or organization again without changing the meaning. Consider empty elements: they can be created either as "<abc/>" or "<abc></abc>" and the meaning between the two is identical. In an XML processor you cannot distinguish between the two. However, when not using an XML processor you need a common representation of an empty element so that two users who see an empty element represent that empty element in the same canonical form so that other processes will see the same information from their perspective. But the information hasn't changed at all. So I see normalization as destructive and canonicalization as not destructive. Normalizing information creates a common form without necessarily being able to recover the original form because the information is being changed. Canonicalizing information creates a common form merely by convention and one could then change that to another alternate form simply by following a different convention without changing the information. So I personally don't consider the two terms the same. But I also don't think they are always consistently applied with such nuance and I wouldn't be surprised to find some users of the terms interchanging them. But when I'm given the choice I perceive a distinction. I hope this helps. . . . . . . . . . . . Ken At 2009-02-25 06:38 -0500, Costello, Roger L. wrote: >Hi Folks, > >Consider these two sentences: > > >1. When an XML parser reads in an XML document it normalizes all >line breaks to \n. > >2. A canonicalizer tool will canonicalize empty elements to >start-tag, end-tag pairs. > > >Both "normalize" and "canonicalize" seem to mean: > > Put into a standard form. > >Do they in fact mean the same thing? If so, why have two terms? Why >not have just one term? > >/Roger -- XQuery/XSLT training in Prague, CZ 2009-03 http://www.xmlprague.cz Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18 Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18 G. Ken Holman mailto:gkholman@C... Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/x/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|