re: Representing Large Tabular Data Blocks
> I am converting a number of existing proprietary file formats > to a general XML format. The nature of the data I am working > with is very large, very well formatted blocks of data. Is it formatted well enough that you can write a scrap of code to retrieve element N at random? > The natural XML solution would be of course to embed each > data value within an element or element attribute. Such as: > <Point XYZ="324.1241 121.1214 -12.4521" NORMAL="0.0 0.0 1.0"/> Actually, I'd see that more intuitively as: <point x="324.1241" y="121.1214" z="-12.451" nx="0.0" ny="0.0" nz="1.0"/> More granularity, y'know? > However, when you replicate this point element a hundred > thousand times or so, you get an enormous increase in file > size. Thus raising the question of XML efficiency. I seem to remember seeing in the XML spec that brevity in code size was not a major design concern; its selling points were clarity and interchangeability. One thing I could suggest, if your NORMAL spec is the default that it appears to be, you could define that default value in the tag definition. I'd really have to see a block of elements to make a solid conclusion there, though. > Another possible solution is to use a single element to bound > all points, using some sort of delimiters to separate records, > such as: > <Point TEMPLATE="XYZ,NORMAL,DT" DELIM="|"> > 324.1241 121.1241 -12.4521, 0.0 0.0 1.0, 0.707 0.707 0.0| First up, the whole concept of defining a delimiter character is antithetical to XML theory; the tags are supposed to be containers and/or atoms which delimit themselves. Hence, the whole "tag" structure; a tag is structural, thus it delimits and/or contains data. > Is there a good way of representing bulk data embedded in > an XML file, without relying on external compression for > efficiency? Not that I am aware of; XML was not designed for code brevity. > Is the concept of using structured element contents > a viable method in this case? You mean like your second example? No, I wouldn't say so. One thing I do note is that you seem to have two or three sub-elements with identical structure; you have three coordinates representing, respectively, an X, Y, and Z value. Hence, it may make greater lexical sense to use one element to represent that triplet, and another element outside it to assign relative values. For instance: <point> <coords x="324.1241" y="121.1241" z="-12.4521" type="xyz"/> <coords x="0.0" y="0.0" z="1.0" type="normal"/> <coords x="0.707" y="0.707" z="0.0" type="dt"/> </point> Another thing to consider is that, if you have the same original data file that you can pass around to other systems, and if you can write a platform-independent scrap of code (you know, the one I mentioned earlier?) to extract and parse a given element from that data file, you may be able to use that code as the low-level interface between the data and a virtual XML document. Hence, instead of reading an actual static document, the agent requests that the interface give it element N in XML format. The interface scans the data file for the Nth line, reads it, internally converts it into a format like one of those above, and spits that back to the agent as a response to the request. Apply a cache system, and this could work pretty well. Since you'd still be using that original data format at the core (or an optimized conversion of that format), you shouldn't see any footprint growth outside that taken up by the interface and the agent module. If you're looking at a one-time lump conversion from the original format to another for your future use, you can have the interface handle the XML and use the conversion to make the new data format something your interface can more easily (and quickly) handle. At that point, you can either store the core file and the interface in a central location (having the interface function as the ultimate administrator) or distribute copies and find a way to reconcile/distribute changes regularly. I'd advise the former. ;) In other words, you'd just be moving a key bit of the logic. Instead of reading the full XML file and extracting an element, you move to the next higher level of abstraction (sorry, but I just had to work that in) and tell the extraction call to ask the interface for that element instead of doing the grunt work of pulling the element itself. Since the interface is delivering one element at a time, it doesn't need to keep the data around in full-fledged XML format - it only needs to deliver the data in that format, and the extraction call need never know that there's NOT a full XML document. Odds are, you'd wind up not only saving disk space, but processing time and disk wear as well. Rev. Robert L. Hood | http://rev-bob.gotc.com/ Get Off The Cross! | http://www.gotc.com/ Download NeoPlanet at http://www.neoplanet.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format