Subject:EDI built-in adapter - problem with UTF-8 characters Author:Jan Zaruba Date:10 Nov 2005 08:45 AM
I'd like to use EDI built-in adapter. I've got input EDI file which is in UTF-8 encoding. The conversion to XML format is almost fine except the special czech characters.
Any thoughts how to fix it? Where to setup encoding of input and output files?
This problem might also concern to other built-in adapters like CSV and others.
Subject:EDI built-in adapter - problem with UTF-8 characters Author:Minollo I. Date:10 Nov 2005 08:46 PM
EDI messages use their own peculiar ways to specify their character set. I tried creating a simple EDIFACT message specifying the UNOY character set (http://www.stylusstudio.com/edifact/40000/0001.htm), and cyrillic UTF-8 characters were properly encoded in the resulting XML.
Can you attach (or email) a sample of the specific on which you are having problems?
Subject:EDI built-in adapter - problem with UTF-8 characters Author:Jan Zaruba Date:11 Nov 2005 04:15 AM
thanks for your reply. Please see my comments below:
>EDI messages use their own
>peculiar ways to specify their
>character set. I tried
>creating a simple EDIFACT
>message specifying the UNOY
>cyrillic UTF-8 characters were
>properly encoded in the
Of course you're right. Sorry, I was wrong I didn't try UTF-8 but ISO8859-2, so it means UNOD in EDI format
>Can you attach (or email) a
>sample of the specific on
>which you are having problems?
I'm attaching both EDI and CSV examples.
In Order96A_iso8859-2.edi look for "czech chars:" string. In onecze_utf8.csv is UTF-8 characters on last line.
Subject:EDI built-in adapter - problem with UTF-8 characters Author:Minollo I. Date:11 Nov 2005 09:58 AM
Thanks for the testcase.
#1, the EDI file:
I believe that the EDI file you have attached is NOT using ISO-8895-2, but instead it's using windows-1250, which is in contradiction with the UNOD setting. I've fixed the character encoding in the EDI file you sent us to be ISO-8895-2, and the EDI converter is now generating what I guess is the expected output. I'm attaching both the fixed original EDI and XML converted version
#2, the CSV file:
In version 6 release 3 the CSV converter always assumes the input file is encoded as ISO-8895-1. Stylus Studio 2006 allows you to override the encoding setting for tab and comma separated flat file to XML converters. In version 6 release 3 you could easily create a convert-to-XML definition in Stylus Studio (File > New > Convert to XML) and set encoding and other custom information there
Subject:EDI built-in adapter - problem with UTF-8 characters Author:Jan Zaruba Date:14 Nov 2005 03:50 AM
thanks for your support. It seems you're right with the encoding of the EDI file. Unfortunately I missed this fact :).
I'm still playing little bit with the product; Let me describe what I did:
1. I took Order96A_iso8859-2-fixed.edi
2. I opened it with Stylus using built-in EDI adapter. Up to now czech chars are ok in XML file.
3. I added more czech chars to the xml and saved it to the file Order96A_iso8859-2-fixed2.edi. I used again EDI adapter to convert XML ---> edi format
4. I did steps 1 and 2 with Order96A_iso8859-2-fixed2.edi. Czech chars in the XML was incorrect.
Subject:EDI built-in adapter - problem with UTF-8 characters Author:Minollo I. Date:14 Nov 2005 09:51 AM
you are right; Stylus Studio is currently forcing the generated EDI to use UTF-8 rather than the encoding that is forced by the UNOD code.
We'll address the issue in the next Stylus Studio update; unfortunately it's too late to include the fix in Stylus Studio 2006 BL501d that is going public today, but we'll make sure the problem is solved in BL501e, available likely in 2-3 weeks from now.