[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: binary base64 definition

  • From: Ian Graham <igraham@i...>
  • To: Danny Vint <dvint@s...>
  • Date: Sat, 06 Jan 2001 19:13:59 -0500 (EST)

ampersand base64

On Sat, 6 Jan 2001, Danny Vint wrote:

> At 06:15 PM 1/6/2001 -0500, Jerry Johns wrote:
> >Thanks much for your input. You guys are a tremendous resource.
> >
> >Following the suggestion of describing exactly what I'm trying to do, here
> >it is:
> >
> >I'm trying to implement an interface in XML. The file has to contain some
> >dollar amounts, which is straightforward. The file also needs to contain
> >several "objects" that pertain to the dollar amounts. These objects include
> >an HTML web page and JPEG images. My trading partner will use the dollar
> >amounts for business logic and to store in database. The HTML and JPEG
> >objects will be loaded into an imaging system.
> >
> >I could easily go with the approach of putting the JPEG images as separate
> >files and FTP them along with the XML file. However, I was striving to keep
> >everything in a single file.
> 
> 
> There are some new messaging specs being worked on that would allow you to
> use something like multipart MIME to wrap all the pieces together - but
> this is real early work going on there.
> 
> This has also been a problem in the SGML days as well but basically it
> wasn't as big of a deal because everything we pretty much on the file
> system with documents and it was only when you were exchanging the
> information that you would want to create a single file.

One option would be to grab some code that already handles MIME
multipart/form-data -- then you could send the messages as an XML document
plus non-xml MIME 'attachments'. I imagine that there is apache server
codebase that does this for you. There is even a URI specification for
referencing the 'parts' of the message. Kinda messy though.

> >
> >Obviously, the JPEG file contains special characters when looked at on a
> >byte-by-byte basis.
> >
> >One method I'm considering is converting the JPEG file to base64, which
> >results in a string of text characters, as you know. However, this data
> >could also contain special characters, ie: greater-than symbol, less-than
> >symbol, etc. 
> 
> base64 has always been recommended for  doing what you want and from what I
> understand it actually guarantees that you won't end up with any
> troublesome characters.

Indeed -- base64 encoding eliminates the less/greater than and ampersand
charcters, leaving  safe ascii.
 
> >
> >Assuming this was a good approach, I have two issues remaining: 1) how to
> >code and decode the base64 and 2) can I prevent the DOM API from parsing the
> >encoded JPEG data and converting greater-than and less-than symbols into
> >"lt;" and "gt;" text strings.

YOu;'d also have to escape the ampersands ... I think base64-ing the whole
thing is likely the easiest to do. As Danny notes, there are lots of
base64 encoders/decoders out there - I am sure a search would find you
a perl or java package to do this. 

> 1) I belive there are some Perl modules for doing this, the description of
> "how to do" base64 is in one of the RFCs which I might be able to find.
> 
> 2) CDATA sections are another way to go, in these all you have to worry
> about is a string of ']]>' - not sure if that helps any. But with a CDATA
> section the parser is hands off except for that particular string. So this
> wouldn't be a problem for the DOM, you might have problems reading it in
> and then writing a CDATA section back out but you could use a standard tag
> <base64> and always read and write a CDATA section around its content.
> 
> 1) You will have to extend your DOM implementation to be able to recognize
> the format and hook some code in to handle it - you won't find this off the
> shelf but I would think it would be relatively straight forward to
> implement. Probably the safest way and allows you to maintain one file.

... but you can probably find base64 encoders/decoders to make things
easier for you.
 
> 2) It isn't DOM you are fighting it is XML and its parser - CDATA is the
> closest thing to doing this but it isn't a complete solution, using an
> entity reference and an external file is the safe way from a parser/DOM
> standpoint but you have the problem of multiple files.

Yes ... the mime mechanism I mention is a whole other layre of messiness
that you avoid by encoding stuff and putting it inside the XML.

> >
> >Can you please validate my approach and assumptions? Thanks! Jerry
> 
> base64 is usually the first recommendation for doing what you want, the
> cost is having to build the tool to do the work. To remove the work you can
> use the entity method but then your stuck with multiple files. No magic
> bullet here in XML, you just get a standard way of addressing the worlds
> problems - but their still problems.

I agree -- going the base64 route is I think the easiest approach, but
there is no magic bullet that makes it trivial.

Ian


> ..dan
> 
> >
> >-----Original Message-----
> >From: Danny Vint [mailto:dvint@s...]
> >Sent: Saturday, January 06, 2001 4:28 PM
> >To: Jerry Johns; 'Ian Graham'
> >Cc: 'xml-dev@l...'
> >Subject: RE: binary base64 definition
> >
> >
> >Notations and other formats have always been application dependant since
> >SGML days. In that arena we were primarily trying to use graphics and
> >usually just display them. So the SGML editors provided a mechanism to map
> >a tool to a format. Seems like you might be able to do something with OLE
> >on windows for the same sort of functionality. I'm not sure what was being
> >conveyed about the DOM support, but it would seem like there would be a way
> >to hook into knowing what the NOTATION type was (base64 in this case) and
> >launching some other application to deal with it.
> >
> >Maybe if you describe what the actually need for the base64 is we might be
> >able to offer suggestions along other lines.
> >
> >..dan
> >
> >
> >At 03:31 PM 1/6/2001 -0500, Jerry Johns wrote:
> >>What if I ditched DOM and used another tool for managing the XML file;
> >could
> >>I then insert the base64 content and still be within the XML standards? Is
> >>this a limitate of DOM? Thanks. Jerry
> >>
> >>-----Original Message-----
> >>From: Ian Graham [mailto:igraham@i...]
> >>Sent: Saturday, January 06, 2001 10:50 AM
> >>To: Dan Vint
> >>Cc: Jerry Johns; 'xml-dev@l...'
> >>Subject: Re: binary base64 definition
> >>
> >>
> >>
> >>The DOM supports access to notation nodes, but can enforce no statement
> >>aobut the proper encoding of a referenced external entity (which makes
> >>sense, as it is external to the document).
> >>
> >>Base64 encoding of content inside a document would require custom code for
> >>doing the encoding/decoding, and some attribute-based mechanism for
> >>labeling the 'type' content of the node containing the data. That is
> >>certainly possible, but as far as I can see is outside the scope of the
> >>DOM. 
> >>
> >>Ian
> >>
> >>
> >>On Fri, 5 Jan 2001, Dan Vint wrote:
> >>
> >>> You can't use elements this way, but an alternative would be to create a 
> >>> NOTATION type and then an external entity of this type - you would copy
> >>all
> >>> the contents of whatever should be base 64 into this external file, it
> >>> would be part of the XML document but it would be outside. Not sure if
> >>> DOM has been setup to understand NOTATIONS, but in an SGML world you
> >would
> >>> be able to associcate a "processor" with that notation and have it called
> >
> >>> whenever you needed to read or write that format. 
> >>> 
> >>> Your DTD might look like the following:
> >>> 
> >>> <!DOCTYPE .... [
> >>> 
> >>> <!NOTATION base64 SYSTEM "binary base64">
> >>> <!ENTITY extfile1 SYSTEM "extfile.b64" NOTATION "base64" >
> >>> 
> >>> ]>
> >>> ....
> >>> &extfile1;
> >>> ...
> >>> 
> >>> The syntax is probably not exact but you can look up the details.
> >>> 
> >>> ..dan
> >>> 
> >>> > 
> >>> > In the DTD, can I specify a type of "binary base64" for an element so
> >>that
> >>> > when I write to the XML file using DOM, DOM will automatically encode
> >>the
> >>> > binary data for me without parsing for control characters? If so, I
> >>assume
> >>> > it will do the reverse when I read that element.
> >>> > 
> >>> > Can anyone validate my assumption about the DTD data type? Has anyone
> >>seen
> >>> > an example DTD definition with this in it?
> >>> > 
> >>> > Thanks much.
> >>> > Jerry
> >>> > 
> >>> 
> >>> 
> >
> >---------------------------------------------------------------------------
> >Danny Vint
> >http://www.dvint.com
> >
> >Author: "SGML at Work"  
> >http://www.slip.net/~dvint/pubs/sgmlatwork.shtml 
> > mailto:dvint@u...
> >    
> 
> ---------------------------------------------------------------------------
> Danny Vint
> http://www.dvint.com
> 
> Author: "SGML at Work"  
> http://www.slip.net/~dvint/pubs/sgmlatwork.shtml 
>  mailto:dvint@u...
>     
> 


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.