[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Normalizing XML [was: XML information modeling best practi

  • To: xml-dev@l...
  • Subject: Re: Normalizing XML [was: XML information modeling best practices]
  • From: Ronald Bourret <rpbourret@r...>
  • Date: Wed, 01 May 2002 00:31:04 -0700
  • References: <000801c1f03e$ad16dd80$6501a8c0@pcukmka>

normalizing xml
Michael Kay wrote:
> 
> > First normal form:
> > ------------------
> > Data is in first normal form if it (a) has a primary key and
> > (b) has no repeating fields.
> 
> Actually, *data* can't be in first normal form - only *relations* can.

You're correct -- poor wording on my part.

> So
> applying the concept to a data model that doesn't use relations is pretty
> dicey.

Agreed.

What I'm curious about is whether there is a concept analogous to
normalization that can be applied to the XML data model. This might be
useful in proposing XML information modeling best practices, which was
Simon's original goal.

> Obviously (b) doesn't have any relevance to a hierarchic data model,

That depends on how you choose to interpret what "repeating fields"
means. If you choose to view B in:

   <!ELEMENT A (B+)>

as a (pardon the expression) single, multi-valued attribute, then the
above content model has no repeating fields while the following content
model does:

   <!-- B1 and B2 represent the same real-world entity -->
   <!ELEMENT A (B1, B2)>

> > In XML terms, this
> > implies that you only store one "thing" per document
> 
> You've made a magic jump from "data" being normalized to "documents" being
> normalized, and you seem to be assuming that a document should represent one
> tuple in a relation - that's a mighty big jump.

That was my intention from the start, but I obviously wasn't clear about
it.

> Yes, [3NF] does apply to XML, but it certainly doesn't tell us how to split our
> data into multiple documents. It does tell us how to design our hierarchies,
> but not how to partition those hierarchies across documents.

I think it does tell us how to partition documents. It says that data
shared across multiple documents needs to reside in separate documents.
Sales orders are a bad example here, since the "XML normalized" form is
virtually identical to the relational normalized form. Semi-structured
data is probably a better example, since documents containing
semi-structured data are likely to be substantially different than their
relational counterparts.

> But all this presupposes that we are designing XML documents for storage and
> query. Most XML documents are designed for messaging of some kind (between
> humans or between software components). Within the context of a message,
> duplication is far less of a problem, for example it doesn't matter if I
> hold product code, description, and price as part of each order-line in an
> order. Many XML databases are actually archives of such messages, so
> duplication of data is a fact of life; and since it's an archive, the update
> problem doesn't arise.

This is the conclusion I came to.

-- Ron

Purchase Stylus Studio Online Today!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.