[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Re: What is a better word for "de-duplication"?
All sorts of terms with ambiguous or impenetrable meanings don't help. They muddy the water. A tool need not be pretty to be useful. Is there any doubt about the meaning of "de-duplication"? Not from where I sit. -- Charles Knell cknell@xxxxxxxxxx - email -----Original Message----- From: Andrew Franz <afranz0@xxxxxxxxxxxxxxxx> Sent: Tue, 29 Aug 2006 08:12:40 +1000 To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: What is a better word for "de-duplication"? Wendell Piez wrote: > At 03:33 PM 8/28/2006, Andrew wrote: > >> Wendell Piez wrote: >> >>> Dear Dimitre, >>> >>> At 08:41 PM 8/27/2006, you wrote: >>> >>>> I want to use a single, short word to express the act of removing >>>> duplicates from a node-set. I remember seing the word "de-duplication" >>>> used, however it sounds ugly. >>> >>> >> Normalisation > > > Normalization (or 'normalisation' for those who prefer British > orthography) would rather be the general process of transforming a set > of values into their normalized forms. So, > > <date value="2006">May Day 2006</date> > <date value="2006-05-01"/> > <date value="5-1-2006">May 1 2006</date> > > might be normalized as > > <date value="2006-05-01">May 1 2006</date> > <date value="2006-05-01">May 1 2006</date> > <date value="2006-05-01">May 1 2006</date> > > but this would not deduplicate them. > > These are very different problems, especially for XSLT. Generally > speaking, deduplicating requires normalization first since > deduplication works only over canonical forms (or comparing them to > see which are duplicates becomes very difficult). > > Cheers, > Wendell Yes, this is one meaning of 'normalisation'. But 'normalisation' is richer and deeper than that. Think about relational database theory. //2NF = / A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation In the above example: / <date value="2006">May Day 2006</date> <date value="2006-05-01"/> <date value="5-1-2006">May 1 2006</date> becomes: <standardDate id="x" year="2006" month="5" day="1" /> plus: <date id="x" format="t yyyy">May Day</date> <date id="x" format="yyyy-mm-dd" /> <date id="x" format="Mmm dd yyyy" /> I submit that these are *not* the same. In your example, you simply removed the 'inconvenient' differences. In the database normalisation, the commonalities are "normalised" or "factored" out as a basis for comparison. In this process (applied to XSLT perhaps), <date> has been "de-duplicated" into <standardDate> but there is no loss of information. Why invent new terminology?
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|