[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Difference between "normalize" and "canonicalize"?

  • From: "G. Ken Holman" <gkholman@C...>
  • To: <xml-dev@l...>
  • Date: Wed, 25 Feb 2009 08:11:41 -0500

Re:  Difference between "normalize" and "canonicalize"?
Personally, I see "normalization" as changing the information into 
something that is common, while "canonicalization" is representing 
something in a common way without changing it.

When line-ending sequences are normalized, they are changed into new 
values without their old values being retained.  On DOS and Mac and 
mainframe systems, different line-ending sequences are changed to the 
line-feed character.  Once you have the line-feed character there is 
no going back to what it was.  If there was a line-feed in the DOS 
file, there is no distinguishing the authored line-feed from the 
normalized line-end line feed.

The normalize-space() function changes a sequence of white-space 
characters into a single space.  The information is changing and you 
can't undo it once you have the normalized string.  There's no way to 
go back to an arbitrary sequence of white-space characters.

The normalize-unicode() string changes a character without the 
ability to go back.  Using NFKC normalization on U+1E9B creates 
U+1E61 and you can't go back because you've changed the Latin 
character that is the basis of the Unicode character from a long s to 
a simple s.

On the other hand, canonicalization doesn't change the information, 
or the meaning of the information, it merely makes assumptions about 
how that information is presented or organized.  One can then recover 
another arbitrary representation or organization again without 
changing the meaning.  Consider empty elements: they can be created 
either as "<abc/>" or "<abc></abc>" and the meaning between the two 
is identical.  In an XML processor you cannot distinguish between the 
two.  However, when not using an XML processor you need a common 
representation of an empty element so that two users who see an empty 
element represent that empty element in the same canonical form so 
that other processes will see the same information from their 
perspective.  But the information hasn't changed at all.

So I see normalization as destructive and canonicalization as not destructive.

Normalizing information creates a common form without necessarily 
being able to recover the original form because the information is 
being changed.

Canonicalizing information creates a common form merely by convention 
and one could then change that to another alternate form simply by 
following a different convention without changing the information.

So I personally don't consider the two terms the same.

But I also don't think they are always consistently applied with such 
nuance and I wouldn't be surprised to find some users of the terms 
interchanging them.  But when I'm given the choice I perceive a distinction.

I hope this helps.

. . . . . . . . . . . Ken

At 2009-02-25 06:38 -0500, Costello, Roger L. wrote:
>Hi Folks,
>
>Consider these two sentences:
>
>
>1. When an XML parser reads in an XML document it normalizes all 
>line breaks to \n.
>
>2. A canonicalizer tool will canonicalize empty elements to 
>start-tag, end-tag pairs.
>
>
>Both "normalize" and "canonicalize" seem to mean:
>
>    Put into a standard form.
>
>Do they in fact mean the same thing? If so, why have two terms? Why 
>not have just one term?
>
>/Roger


--
XQuery/XSLT training in Prague, CZ 2009-03 http://www.xmlprague.cz
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@C...
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.