[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Fwd: Re: encoding converters?

  • From: "Simon St.Laurent" <simonstl@s...>
  • To: XML-Dev Mailing list <xml-dev@x...>
  • Date: Sat, 19 Feb 2000 18:30:32 -0500

iso converters
Rick Jelliffe asked that I forward this to the list - it's yet more answers
on the encoding converter question.

>Date: Sun, 20 Feb 2000 06:32:25 +0800 (CST)
>From: Rick Jelliffe <ricko@g...>
>Subject: Re request on XML-DEV
>
>GLUE and XML-TCS Transcoding Utility Software
>---------------------------------------------
>
>I have made an XML-aware version of TCS. The diff package is available at
>the Chinese XML Now site. It implements "lossless" transcoding, which is
>what I talked about that the XML Conference we met at last year. It
>basically means that you should convert unknown characters to NCRS.
>
>I can only provides diffs for because Bell has not AFAIK made tcs
>available for redistribution, even though at least one version of Linux
>does include it. I don't think they care particularly, but without
>confirmation I cannot make up binaries or a unified source
>distribution, unfortunately. The people involved cannot be contacted; the
>project leader is Dennis Ritchie (i.e., UNIX and C) who undoubtedly has
>more pressing matters to attend to.
>
>*HOWEVER* at my site you will also see "The GLUE Project Transcoders"
>
>GLUE (= "GLUE Loses User's Encodings") is a transcoder library I wrote.
>It is specified using XML and converted to C.  At the moment, only the
>x->UTF-8 is available, but that seems to be all you want.
>
>I made it because the existing transcoders had problems: the GNU iconv
>ones required their new glibc; and so on. Since then, IBM has released
>their excellent C++ libraries ICU, but it too do not do lossless
>transcoding. Also, Java now generates an exception if a character is
>missing instead of just silently swallowing the character; these are steps
>in the right direction.
>
>The mapping tables at Unicode.org have the problem that many encodings are
>better mapped by algorithm rather than by a table. So I made an XML format
>that could express declaratively certain relationships in a way 
>that can be simply translated into code.  Also, many encodings have
>variants, which can be represented well in XML.
>
>GLUE home page is at:
>	http://www.ascc.net/xml/en/utf-8/glue.html
>GLUE handles the following encodings:
>
>                  ASCII 
>                        ISO 646de 
>                        ISO 646en 
>                        ISO 646es 
>                        ISO 646fr 
>                        ISO 646it 
>                        ISO 646sv 
>                  ISO 8859-1 (Latin 1)
>                        CP1252 variant (Windows "ANSI") 
>                  ISO 8859-2 (Latin 2)
>                        CP 1250 variant 
>                  ISO 8859-3 (Latin 3) 
>                  ISO 8859-4 (Latin 4) 
>                  ISO 8859-5 (Cyrillic) 
>                  ISO 8859-6 (Arabic) 
>                  ISO 8859-7 (Greek) 
>                  ISO 8859-8 (Hebrew) 
>                  ISO 8859-9 (Latin 5) 
>                  ISO 8859-10 (Latin 6) 
>                  ISO 8859-11 (Thai) 
>                  ISO 8859-13 (Latin 7) 
>                  ISO 8859-14 (Latin 8) 
>                  ISO 8859-15 (Latin 9) 
>                  MacRoman 
>                        MacRoman with Euro 
>                  UTF-8 
>                  UTF-16 (little endian) 
>                  UTF-16 (big endian) 
>                  Big5 (Chinese, including user-defined area) 
>                  VISCII (Vietnamese) 
>(Note: the variants have not been tested thoroughly. Check them to
>confirm. The current implemetnation does not support well ISO 2022
>based encodings or non-Unicode encodings (i.e. the massice CCCII))
>
>
>The xml-tcs home page is at
>	http://www.ascc.net/xml/en/utf-8/transcode-index.html
>
>xml-tcs can generate the following NCRS with single or double delimiting
>
>                STRIP: no delimiter, 
>                UNKNOWN: put in unknown character indicator "?" or FFFD 
>                UNICODE: Unicode-style U+HHHH 
>                JAVA: Java-style \uHHHH 
>                JAVA_DD: Java-style \\uHHHH 
>                XML: XML-style &#xHHHH; 
>                XML_DD: XML-style &amp;#xHHHH; 
>                SPREAD1: Old SPREAD &U-HHHH; 
>                SPREAD1_DD: Old SPREAD &amp;U-HHHH; 
>                SPREAD2: New SPREAD &UHHHH; 
>                SPREAD2_DD: New SPREAD &amp;UHHHH; 
>                CSS1: CSS1 \HHHH 
>                CSS1_DD: CSS1 \\HHHH 
>                CSS2: CSS2 \\00HHHH (space following is delimiter) 
>                CSS2_DD: CSS2 \\00HHHH (space following is delimiter) 
>                SGML: SGML-, HTML (< 4) and Netscape style 
>			decimal &#DDDDDD; 
>                SGML_DD: SGML-style &amp;#DDDDDD; 
>
>
>
>
>
>Rick Jelliffe
> 
Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/threads.html
***************************************************************************

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.