|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Fwd: Re: encoding converters?
Rick Jelliffe asked that I forward this to the list - it's yet more answers on the encoding converter question. >Date: Sun, 20 Feb 2000 06:32:25 +0800 (CST) >From: Rick Jelliffe <ricko@g...> >Subject: Re request on XML-DEV > >GLUE and XML-TCS Transcoding Utility Software >--------------------------------------------- > >I have made an XML-aware version of TCS. The diff package is available at >the Chinese XML Now site. It implements "lossless" transcoding, which is >what I talked about that the XML Conference we met at last year. It >basically means that you should convert unknown characters to NCRS. > >I can only provides diffs for because Bell has not AFAIK made tcs >available for redistribution, even though at least one version of Linux >does include it. I don't think they care particularly, but without >confirmation I cannot make up binaries or a unified source >distribution, unfortunately. The people involved cannot be contacted; the >project leader is Dennis Ritchie (i.e., UNIX and C) who undoubtedly has >more pressing matters to attend to. > >*HOWEVER* at my site you will also see "The GLUE Project Transcoders" > >GLUE (= "GLUE Loses User's Encodings") is a transcoder library I wrote. >It is specified using XML and converted to C. At the moment, only the >x->UTF-8 is available, but that seems to be all you want. > >I made it because the existing transcoders had problems: the GNU iconv >ones required their new glibc; and so on. Since then, IBM has released >their excellent C++ libraries ICU, but it too do not do lossless >transcoding. Also, Java now generates an exception if a character is >missing instead of just silently swallowing the character; these are steps >in the right direction. > >The mapping tables at Unicode.org have the problem that many encodings are >better mapped by algorithm rather than by a table. So I made an XML format >that could express declaratively certain relationships in a way >that can be simply translated into code. Also, many encodings have >variants, which can be represented well in XML. > >GLUE home page is at: > http://www.ascc.net/xml/en/utf-8/glue.html >GLUE handles the following encodings: > > ASCII > ISO 646de > ISO 646en > ISO 646es > ISO 646fr > ISO 646it > ISO 646sv > ISO 8859-1 (Latin 1) > CP1252 variant (Windows "ANSI") > ISO 8859-2 (Latin 2) > CP 1250 variant > ISO 8859-3 (Latin 3) > ISO 8859-4 (Latin 4) > ISO 8859-5 (Cyrillic) > ISO 8859-6 (Arabic) > ISO 8859-7 (Greek) > ISO 8859-8 (Hebrew) > ISO 8859-9 (Latin 5) > ISO 8859-10 (Latin 6) > ISO 8859-11 (Thai) > ISO 8859-13 (Latin 7) > ISO 8859-14 (Latin 8) > ISO 8859-15 (Latin 9) > MacRoman > MacRoman with Euro > UTF-8 > UTF-16 (little endian) > UTF-16 (big endian) > Big5 (Chinese, including user-defined area) > VISCII (Vietnamese) >(Note: the variants have not been tested thoroughly. Check them to >confirm. The current implemetnation does not support well ISO 2022 >based encodings or non-Unicode encodings (i.e. the massice CCCII)) > > >The xml-tcs home page is at > http://www.ascc.net/xml/en/utf-8/transcode-index.html > >xml-tcs can generate the following NCRS with single or double delimiting > > STRIP: no delimiter, > UNKNOWN: put in unknown character indicator "?" or FFFD > UNICODE: Unicode-style U+HHHH > JAVA: Java-style \uHHHH > JAVA_DD: Java-style \\uHHHH > XML: XML-style &#xHHHH; > XML_DD: XML-style &#xHHHH; > SPREAD1: Old SPREAD &U-HHHH; > SPREAD1_DD: Old SPREAD &U-HHHH; > SPREAD2: New SPREAD &UHHHH; > SPREAD2_DD: New SPREAD &UHHHH; > CSS1: CSS1 \HHHH > CSS1_DD: CSS1 \\HHHH > CSS2: CSS2 \\00HHHH (space following is delimiter) > CSS2_DD: CSS2 \\00HHHH (space following is delimiter) > SGML: SGML-, HTML (< 4) and Netscape style > decimal &#DDDDDD; > SGML_DD: SGML-style &#DDDDDD; > > > > > >Rick Jelliffe > Simon St.Laurent XML Elements of Style / XML: A Primer, 2nd Ed. Building XML Applications Inside XML DTDs: Scientific and Technical Cookies / Sharing Bandwidth http://www.simonstl.com *************************************************************************** This is xml-dev, the mailing list for XML developers. To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev List archives are available at http://xml.org/archives/xml-dev/threads.html ***************************************************************************
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








