[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Icebergs - XML file metrics
Stefan, Charles, Ronald, Thanks to you all for your responses - I am looking through the material though a little surprised that no-one has turned up a set of larger test files. Perhaps that's a challenge for us to create one! Thanks, Robin At 10:59 AM +0200 3/26/01, Stefan Zier wrote: >http://www-106.ibm.com/developerworks/education/xmljava/xmljava-6-4.html > >I think this tool from IBM developerworks is a good basis to start from. It >collects stats about Document Nodes, Element Nodes, Entity Reference Nodes, >CDATA Sections, Text Nodes, Processing Instructions in a DOM tree. > >--------------------------------------- >Stefan Zier >Software Developer >Syntion AG - http://www.syntion.com >Leonrodplatz 2 - 80636 Munich/Germany >Phone +49 89 52 30 45-0 >Fax +49 89 52 30 45-20 > >----- Original Message ----- >From: Robin LaFontaine <robin@m...> >To: <xml-dev@l...> >Sent: Friday, March 23, 2001 6:41 PM >Subject: Icebergs - XML file metrics > > > > Can anyone help with this: Is there a way of 'profiling' an XML file > > to indicate its characteristics? > > > > We test our XML comparators on large files, but a 5Mb XML file could > > have twenty XML tags or 20,000 and it could be deeply nested or flat. > > So, are there any metrics to help in this characterization? > > > > Seems sensible to use ratios as far as possible, so that they are > > comparable for different file sizes, perhaps: > > > > 1. File size (not a ratio) > > > > 2. No. of elements / file size in kb = no. of elements/kb (or Mb perhaps?) > > > > 3. No. of attributes / no. of elements = no. of attributes/element > > > > 4. No. of text nodes / no. of elements = no. of text nodes/element > > > > 5. No. of text nodes / no. of unique text nodes = text re-use index > > > > 6. No. of attribute values / no. of unique attr. values = attribute > > value re-use index > > > > 7. (sum for each element of no. of ancestors for the element) / no. > > of elements = Average depth (iceberg factor). > > > > Last one indicates nesting depth, e.g. > > <a> <b/><b/><b/><b/></a> = (0+1+1+1+1)/5 = 0.8 > > > > <a> <b><b><b><b></b></b>/<b></b> </a> = (0+1+2+3+4)/5 = 10/5 = 2 > > > > <a> <b><b><b><b> <b><b><b><b> </b></b>/<b></b> </b></b>/<b></b> </a> > > = (0+1+2+3+4+5+6+7+8)/5 = 36/9 = 4 > > > > Perhaps someone has already developed a different set of metrics. > > > > Robin > > -- ----------------------------------------------------------------- > > Robin La Fontaine, Monsell EDM Ltd > > (XML file comparison, Engineering data exchange and management using > > XML, R&D Project Management) > > Tel: +44 1684 592 144 Fax: +44 1684 594 504 > > Email: robin@m... http://www.deltaxml.com > > > > ------------------------------------------------------------------ > > The xml-dev list is sponsored by XML.org, an initiative of OASIS > > <http://www.oasis-open.org> > > > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > > > To unsubscribe from this elist send a message with the single word > > "unsubscribe" in the body to: xml-dev-request@l... > > > > >------------------------------------------------------------------ >The xml-dev list is sponsored by XML.org, an initiative of OASIS ><http://www.oasis-open.org> > >The list archives are at http://lists.xml.org/archives/xml-dev/ > >To unsubscribe from this elist send a message with the single word >"unsubscribe" in the body to: xml-dev-request@l... -- ----------------------------------------------------------------- Robin La Fontaine, Monsell EDM Ltd (XML file comparison, Engineering data exchange and management using XML, R&D Project Management) Tel: +44 1684 592 144 Fax: +44 1684 594 504 Email: robin@m... http://www.deltaxml.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|