[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: An alternative formulation of thedocument-centric/data-cen

  • To: Sean McGrath <sean.mcgrath@p...>, xml-dev@l...
  • Subject: RE: An alternative formulation of thedocument-centric/data-centric XML divide
  • From: "Kirkham, Pete (UK)" <pete.kirkham@b...>
  • Date: Thu, 03 Jun 2004 12:57:52 +0100
  • Thread-index: AcRJUgqJxM+eUX7IRjmqLbd1hm/f7AABXPBQ
  • Thread-topic: An alternative formulation of thedocument-centric/data-centric XML divide

xml element power law

I would expect most data XML to be approx. power law too, as you are have more nested elements under each element (in the simplest case: db:1, table:t, row:t*(rows per table), field:rows*(f=fields per table)), and typically t and f are small (say 7), but the number of rows is large, so for ~n elements root:1, level1: ~7, level 2:n/7, level3:n. So perfect power law for n~=350. If the tag names for fields differ, then the frequency of each tag ends up more evenly spread out between tags on the same level, but the cumulative distribution should still be approximatly a power law between levels.

For more structured data then similar distributions can be seen- for each element that is a container, there will be a few properties, for each property that is multivalued, many child elements. Mixed text abbreviates away some of the multivalued property tags, but still the container-contained 1:many relationship should hold.

Doing a quick count on two data oriented XML files of the same data sets (mtbf of a fuel system)- one derived straight from a database, the other from an XMI instance of the same data (which uses attributes instead of nesting elements for some properties)- give the frequencies, which look similar:

XMI data (single value properties as attributes)
1,1,1,1,1,4,4,5,5,8,10,14,16,18,18,19,21,22,24,29,51,57,59,59,64,64,83,94,95,123,125,148,190,210,242,277,447,

XML data (properties as child elements, elements named after properties)
1,1,1,1,1,1,1,1,1,1,2,2,4,4,4,4,5,5,5,6,7,8,9,10,14,16,18,18,19,21,22,24,24,31,51,57,59,64,64,64,64,83,83,94,95,95,116,116,116,116,120,123,123,125,134,147,210,211,238,242,277,385,447,646,

And for a different data set (XMI model for an application):
1,1,1,1,1,1,1,1,11,14,14,14,30,30,30,43,74,74,101,117,132,148,148,169,169,447,447,595,595,595,652,680,


Pete

********************************************************************
This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.
********************************************************************

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.