Re: use cases: binary XML for scientifc computing
This is interesting and relevant to the discussion of binary payloads of scalar data. In my opinion, the first level for "binary XML" (i.e., a new more efficient XML-like format) is to improve the efficiency of the structure. The second level can involve exactly the ideas apparent below in the description of DFDL. It is debatable whether a format spec would include definition of the binary data, standard types, and built-in type notation in a self-contained way, but if it's not in the spec for "binary XML", then it would be layered on top just as the schema specs are now layered. A format could be able to contain all standard scalar formats and have an efficient MIME-like way to note the types but that any schema language should be a separate specification. A format could support text, labeled scalar, and opaque binary formatted data with choice and placement of metadata controlled by the application. The DFDL work could support both labeled self-described and opaque methods, with the former supporting self-contained instances and the latter being normally more efficient by having metadata out of band. sdw mike.beckerle@a... wrote: > I do believe that GGF DFDL is relevant to the discussion here. > https://forge.gridforum.org/projects/dfdl-wg/ is the site, and > https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973 > <https://forge.gridforum.org/docman2/ViewProperties.php?group_id=113&category_id=803&document_content_id=2973> (or > http://tinyurl.com/435j7 in case email clobbered the long URL) is the > most recent presentation. Around slide 7 is where you'll find content. > > Here's a snippet to give you the "DFDL" idea: > > E.g., a description of a million element array of little endian double > floats would be this XSD: > > <sequence> > <element name="data" type="double" minOccurs="1000000" > maxOccurs="1000000"> > <annotation><appinfo source="http://dfdl.org"> > <representation repType="binary" byteOrder="littleEndian"/> > </appinfo></annotation> > </element> > </sequence> > > Several people and companies have been exploring this notion, so we're > trying to standardize it. > > I feel DFDL differs from binaryXML in being descriptive of format > rather than prescriptive of format. This matters why? > > 1) Legacy data formats - Much of the complexity of DFDL comes from the > need to handle quite complex legacy formats which are tricky to describe. > > 2) New data formats, but you need random-access I/O capabilities or > the ability to memory map the files into some exact memory layout with > all the alignments and inter-item offsets exactly specified. > > 3) you don't want to bother to have to use any particular XML-oriented > library to write out your data. So long as the data format is > describable in DFDL you can do your I/O with ordinary I/O operations. > In other words minimal investment has to be made up front in worrying > about data format and data interchange issues. DFDL lets you "just get > on with it". > > If none of these 3 apply, then either XML or binaryXML *should* be the > right thing depending on your data size and performance needs. If you > are just after efficiency and density then DFDL may be less effective > for you since a DFDL-described data file isn't necessarily nicely > self-contained like an XML or binaryXML file should be. (Though we do > have a placeholder on the issue of how to associate DFDL descriptors > tightly to binary data so they can't get separated.) > > > ...mikeb > Mike Beckerle > co-chair DFDL WG, GGF > > > > > > ------------------------------------------------------------------------ > *From:* Cutler, Roger (RogerCutler) > [mailto:RogerCutler@c...] > *Sent:* Monday, November 22, 2004 5:54 PM > *To:* Aleksander Slominski; Stephen D. Williams > *Cc:* Wolfgang Hoschek; xml-dev@l...; > public-xml-binary@w...; Kenneth Chiu; Madhusudhan Govindaraju > *Subject:* RE: use cases: binary XML for scientifc computing > > If you are going to be looking at how this stuff fits in with grid > computing, perhaps it would be worthwhile also to make some > comments about DFDL? I posted this suggestion previously (11/1) > and nobody seems to have picked up on it, so maybe the thought is > not appropriate for some reason, but at first glance DFDL does > seem related to me. > > -----Original Message----- > *From:* public-xml-binary-request@w... > [mailto:public-xml-binary-request@w...] *On Behalf Of > *Aleksander Slominski > *Sent:* Monday, November 22, 2004 4:41 PM > *To:* Stephen D. Williams > *Cc:* Wolfgang Hoschek; xml-dev@l...; > public-xml-binary@w...; Kenneth Chiu; Madhusudhan Govindaraju > *Subject:* use cases: binary XML for scientifc computing > > Stephen D. Williams wrote: > >> >> >>> >>> what are use cases for nux: what do you plan to use it for? >>> >>> are use cases related to XML Binary Characterization >>> <http://www.w3.org/TR/xbc-use-cases/>? >>> >>> i am a bit disappointed that scientific requirements are >>> completely omitted form XBC use cases - the closest i could find >>> is http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips >>> over whole issue how to transfer array of doubles without >>> changing endianess ... >> >> >> I have proposed to the group recently that I create one or more >> use cases that cover supercomputing, grid processing, and sensor >> networks. > > great to hear this. i think we worked in all those areas -it seems > XML became very popular and now wit convergence on Grid Web > Services having efficient binary XML format that can be used > between "optimized" peers seems to be very important ... > >> Your observation seems to validate that point. I would be happy >> to incorporate anything you could provide. My company builds and >> maintains Linux supercomputers and I have present and past >> experience with grid-like processing, so I have some resources >> and contacts. >> >>> we did lot of work in past related to XML performance (in >>> Indiana University and Binghamton) and are very concerned that >>> whatever binary XML will be characterized/standardized in W3C >>> will be of no much use for scientific computing and grids ... >> >> >> Could you provide links or details to any of this work? > > we worked on SOAP parsing and optimization for scientific computing: > > Madhusudhan Govindaraju, Aleksander Slominski, Venkatash > Choppella, Randall Bramley, and Dennis Gannon. Requirements for > and evaluation of RMI protocols for scientific computing > <http://www.extreme.indiana.edu/xgws/papers/sc00_paper/>. In > Proceedings of SC00 Conference, Dallas TX, Nov 2000. Available on > CD-ROM from IEEE > Kenneth Chiu, Madhusudhan Govindaraju, and Randall Bramley. > Investigating the limits of SOAP performance for scientific > computing > <http://www.computer.org/proceedings/hpdc/1686/16860246abs.htm>. > In The 11-th IEEE International Symposium on High Performance > Distributed Computing HPDC-11 2002 (HPDC'02), Jul 2002. > Madhusudhan Govindaraju, Aleksander Slominski, Kenneth Chiu, > Pu Liu, Robert van Engelen, and Michael J. Lewis. Toward > Characterizing the Performance of SOAP Toolkits > <http://www.extreme.indiana.edu/xgws/papers/soap_perf_char_grid2004.pdf>. > In 5th IEEE/ACM International Workshop on Grid Computing, November > 2004 > Kenneth Chiu and Wei Lu. A Compiler-Based Approach to > Schema-Specific XML Parsing > <http://wam.inrialpes.fr/www-workshop2004/ChiuLu.pdf>. In First > International Worksop on High Performance XML Processing(Satellite > of WWW2004), May 2004. > Kenneth Chiu. XBS: A Streaming Binary Serializer for High > Performance Computing. In Proceedings of the High Performance > Computing Symposium 2004. Society for Computer Simulation > International, 2004 > > however we never got enough forward momentum to come up with a > proposal for binary XML but still we are willing to work to get > use cases described. > >> How do you think that XML, espeically a binary characterized XML, >> should related to HDF5? > > HDF5 looks to me like a separate problem as it defines its own > schema for its own representation so that is a big task how to > make HDF5 to XML Infoset. > > we are more interested in how to transfer scientific data (mostly > arrays of primitive types or simple structs with primitive types > that can be perfectly well expressed in XML Infoset but are also > extremely inefficient including dreaded IEEE float conversion to > string and back) and make it consistent with XML messaging (such > as SOAP). > > thanks, > > alek > >-- >The best way to predict the future is to invent it - Alan Kay > > -- swilliams@h... http://www.hpti.com Per: sdw@l... http://sdw.st Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format