[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fwd: RFC 3548 on The Base16, Base32, and Base64 Data Encodings

  • To: xml-dev@l..., caro@a...
  • Subject: Re: Fwd: RFC 3548 on The Base16, Base32, and Base64 Data Encodings
  • From: "Perry A. Caro" <caro@a...>
  • Date: Thu, 17 Jul 2003 11:40:11 -0700

base32 decoder
"Perry A. Caro" wrote:
> I used the following
> ranges:
> 
> U+3400 thru U+4DB5      for 15-bit values of 0 thru 6581
> U+4E00 thru U+9FA5      for 15-bit values of 6582 thru 27483
> U+E000 thru U+F4A5      for 15-bit values of 27484 thru 32767

Something I forgot to mention was "padding". Rather than padding, it turns
out to be more useful to think about which bits are significant in the very
last text character of the encoded data.  Unless the original data was an
even multiple of 15 bits, there will be from 1 to 14 bits left to encode.
These bits can easily fit into a 16-bit text character, but unless some
additional information is provided, a decoder will not be able to tell how
many of the bits in the final text character are significant.

To solve this problem, a final UTF-16 character is used. This character is
outside of the ranges listed above, so as not to be confused with data, and
is used as a clear termination for the encoded data. It is selected from a
contiguous range of 15 characters that have no normalization issues. I chose
the following range, but there are several possible alternatives:

U+2401 thru U+240F

When this character is encountered, it signals the end of the encoding, and
specifies the number of significant bits in the previous text character.
U+2401 specifies 1 bit is significant, U+2402 specifies 2 bits, etc., thru
U+240F for all 15 bits significant. This means that every encoded sequence
is terminated by one of these characters, regardless of how many bits were
in the original data.

As for all of the text characters, the data bits are read from most
significant (0x4000) to least significant (0x0001).

***********************

A curious thing to note is that a "dumb" transcoding of UTF-8 that contains
base64 encoded data to UTF-16 suffers more expansion (1.33*2.00) than a
"dumb" transcoding of UTF-16 containing Base32k encoded data to UTF-8
(1.06*1.50).

Obviously, a "smart" transcoding would first decode the data, and then use
the text encoding appropriate for the final XML encoding: base64 for UTF-8,
Base32k for UTF-16.

I tried to define an encoding similar to Base32k in UTF-8, by packing data
bits into UTF-8 character ranges, but I finally gave up. Some of the design
goals of UTF-8, such as using bits to self-describe the number of bytes in
the sequence, work against efficient data packing. I ultimately concluded
that base64 is about the best that can be achieved for UTF-8.

***********************

I have occasionally thought of drafting an RFC for the Base32k encoding.
Does anyone think that would be useful? Or would a technical note to the XML
Core WG be better?

Perry Caro
Adobe Systems Incorporated

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.