[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Packaging (was Re: Interoperability)

  • To: xml-dev@l...
  • Subject: Re: Packaging (was Re: Interoperability)
  • From: adam souzis <adam@k...>
  • Date: Mon, 31 Dec 2001 11:03:08 -0800

xpf format
I'm reading this thread about packaging over month a late, but I'd like to 
throw out this idea on packaging.  While zip/jar/xar is certainly a proven 
and well-understood technique it would be nice to have a text-based, human 
readable/editable standard that was simple enough could be implemented as 
extension to an xml processor with neglible impact to its code, memory, and 
processing footprint.

Other than suggestions to use MIME packaging I'm not aware of any 
discussions of a format that meets the above requirements, so it might be 
useful to share these thoughts on what that could look like.  The basic 
idea is to wrap file-level content in a stripped down xml-like syntax that 
has just one type of "element" which just contains CDATA (well, not even 
CDATA, just bytes).  Metadata is associated with the content via this 
element's attributes, which may match the semantics of the equivalent mime 
or http headers. Here's an example:

<?xpf http://www.somestandard.org/xpf/1/0 ?>
<file:boundary content-location="file1.xml" content-type='text/xml' 
meta:another-attribute='foo'>
     <?xml version="1.0" ?>
     <?xml-stylesheet href="styles/style.css" type="text/css"?>
     <doc>
     sample doc
     </doc>
</file:boundary>
<file content-location="styles/style.css" content-type='text/css' 
content-length='35' meta:another-attribute='foo'>
BODY { BACKGROUND-IMAGE: url(image/background.bmp) }
</file>
<file content-location='image/background.bmp' content-encoding='gzip'>
[binary data, I wonder how this will show in email]
§…Ú yç©©§õ¾“ à˜€O­ÈSöŠà4&±Nègƒ%bэäÿ8Rg ÈÅ4“@­OKÑx„dÝΗ 
L–Y»?a¹á2äÆwÖX•€¿"©ŸìˆÕç`ÛA'ºr¸âÊבi'|!”Ü=c$õ0r€¢•W£É»Ï‚Ö®ÞX\õ.íÕçõú 
0wJÆ ø3	–w.õқ)?Œ~§l±e)ô­6lÎƌ	?IFu•…@WÄ

The first line contains the header that identifies what kind of content 
this bag of bits is. The "xpf" stands for extensible packaging format (any 
better ideas for the name?), while the URI that follows specifies the 
particular version of this format.

Next are the file elements. This example has three <file> elements that 
correspond to three interrelated files, each element showing three 
different ways to define the element's boundaries.  If the element has an 
(arbitrary) string appended to its name (e.g. file:boundary) the element 
data will end when it encounter the end tag (in this case 
"</file:boundary>").  If no boundary string is specified in the element 
name, the end of the element data is found by skipping ahead by the number 
of bytes specified by the content-length attribute.  If neither is 
specified it is assumed that the rest of this file is this element's data 
(as shown by the last element).

The rest of the example should be fairly obvious -- metadata about each 
file appears as the file element attributes and an appropriate set of mime 
headers are spelled as attributes and retain their meanings as a specified 
in the various RFCs.  The files in the package can be references through 
the use of the content-location mime header as specified in the MHTML RFC.
This specification would be orthogonal to any manifest or catalog (such 
XPackage, etc.) -- they would be just be stored as another file in the 
package. But there should be an attribute (e.g. xpf:manifest) to indicate 
that the file can be treated as a manifest for this package.

This format can be easily streamed and provides random access as long the 
content-length attribute is specified for each file. If the processor 
supports gzip content-encoding it offers compression comparable to zip. One 
limitation as this is currently described is the lack of index of the files 
in the package.  Having the manifest orthogonal makes validation of file 
references and efficient random access difficult, so it would make sense to 
define optional index elements that contain the minimum information about 
the file (content-location and content-length).

Well, that's a brief description; I could go into more details about my 
thoughts about its syntax, encoding issues, etc. but this email is long 
enough.  Actually this evolved from an idea for a standard header format 
for embedding metadata that occurred to me while trying to piece together 
data files from a scrambled hard drive -- if anyone's interested I could 
describe that related idea.

-- adam


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.