[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Flatter is Better (part two)

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: "Roger L. Costello" <costello@mitre.org>
  • Date: Thu, 4 Dec 2014 13:12:43 +1100

Re:  Flatter is Better (part two)

When Codd developed his ideas about relational data-banks, it was to allow the development of 'a universal retrieval language based on the second order predicate calculus.' 

To me, there are strong reasons for endorsing each kind of data model but also weak or bad ones: Codd's ideal of predicate calculus was really strong reason,  Goldfarb's hope of tree grammars was also a strong reason. Determining what category of data model you have allows efficient implementations and possibilities to be explored. (Is your data just facts, or is your data just annotated regions? When it is both, for the same process, we have problems.)

But the more you drill down from that top-level decision, the less that absolute recommendations can be made, it seems to me.  Being able to cut and paste a single arbitrary element, bundling all you need, is a nice idea; but the processing systems need to handle that kind of mix and match. So the question becomes, how can i make data models that may simplfy mix-and-match data?

One way I have seen this supported is by first defining the basic categories of information you have: into the level above data types and level below conventional semantics: for example, instead of a date element, you have a generalized "event" element that allows the bundling of dates and other properties. (I  think the ETL people have good approaches for this too.)

In the example, the fat design would have most elements replaced by eg <area kind='state'> , while in the flat example you might have eg <place kind='house'>.  You could combine them: elements drill down to the region, which places reference:

<area kind='country' value='AU'>
    <area kind='state' value='NSW'>
        ..
            <area kind='houseblock' value='26'>
         ...
</area>
<place kind="house"  location="AU/NSW/.../26">
   <year-built>1934</year-built>

This design is 'fat', but it avoids having to drill down through area information if you just need to search by year-by or other non-area properties. And identity matching is easy. So it may have different efficiency characteristics than either Roger's  hierarchical or flat models.

Rick

On 02/12/2014 9:31 PM, "Costello, Roger L." <costello@mitre.org> wrote:

Hi Folks,

 

The flat design is about creating XML documents that consist of a long series of standalone components:

 

 

A component in the document can be combined with other data (mashup):

 

Let’s take a concrete example to compare the flat design versus the fat design.

 

Here is a flat design:

 

<Iowa>
   
<house>
       
<street>1009 Arlington Court</street>
       
<city>Davenport</city>
       
<style>Ranch</style>
       
<porch>open</porch>
       
<year-built>1951</year-built>
       
<square-feet>1700</square-feet>
   
</house>
   
<house>
       
<street>1008 Arlington Court</street>
       
<city>Davenport</city>
       
<style>Ranch</style>
       
<porch>closed</porch>
       
<year-built>1955</year-built>
       
<square-feet>1850</square-feet>
   
</house>
    ...
</Iowa>

 

The document consists of a long series of standalone <house> components. Any of those <house> components could be mashed-up with other data, e.g., mashup a <house> component with a <GPS> component.

 

Here is a fat design:

 

<Iowa>
   
<city name="Davenport">
       
<street name="Arlington Court">
           
<house>

               <street-number>1009</style>
               
<style>Ranch</style>
               
<porch>open</porch>
               
<year-built>1951</year-built>
               
<square-feet>1700</square-feet>
           
</house>
           
<house>

                <street-number>1008</style>
               
<style>Ranch</style>
               
<porch>closed</porch>
               
<year-built>1955</year-built>
               
<square-feet>1850</square-feet>
           
</house>
       
</street>
        ...
   
</city>
   
<city name="Cedar Rapids"> ... </city>
    ...
</Iowa>

 

The flat design and the fat design are radically different!

 

In the fat design the houses have been grouped into streets and the streets have been grouped into cities. The street name data has been removed from each <house> and also the city name data has been removed from each <house>. Consequently, each <house> is no longer a standalone component. House data is now fragmented, scattered over the document. The ability to do mashups has been lost (or, at least, greatly hampered). The fat design has normalized the data and, as I argued in my last message: Normalization is horrible for data exchange formats.

 

It’s best to exchange the data in the flat design. Consumers can transform it into the fat design, if needed.

 

Recommendation: When designing a data exchange format create a flat design.

 

Comments?

 

/Roger



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.