XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Postnext
Jan ZarubaSubject: EDI built-in adapter - problem with UTF-8 characters
Author: Jan Zaruba
Date: 10 Nov 2005 08:45 AM
Hi folks,
I'd like to use EDI built-in adapter. I've got input EDI file which is in UTF-8 encoding. The conversion to XML format is almost fine except the special czech characters.
Any thoughts how to fix it? Where to setup encoding of input and output files?
This problem might also concern to other built-in adapters like CSV and others.

Thanks in advance,

Jan

Postnext
Minollo I.Subject: EDI built-in adapter - problem with UTF-8 characters
Author: Minollo I.
Date: 10 Nov 2005 08:46 PM
EDI messages use their own peculiar ways to specify their character set. I tried creating a simple EDIFACT message specifying the UNOY character set (http://www.stylusstudio.com/edifact/40000/0001.htm), and cyrillic UTF-8 characters were properly encoded in the resulting XML.

Can you attach (or email) a sample of the specific on which you are having problems?

Thanks,
Minollo

Postnext
Jan ZarubaSubject: EDI built-in adapter - problem with UTF-8 characters
Author: Jan Zaruba
Date: 11 Nov 2005 04:15 AM
Hi,
thanks for your reply. Please see my comments below:

>EDI messages use their own
>peculiar ways to specify their
>character set. I tried
>creating a simple EDIFACT
>message specifying the UNOY
>character set
>(http://www.stylusstudio.com/e
>difact/40000/0001.htm), and
>cyrillic UTF-8 characters were
>properly encoded in the
>resulting XML.
Of course you're right. Sorry, I was wrong I didn't try UTF-8 but ISO8859-2, so it means UNOD in EDI format

>
>Can you attach (or email) a
>sample of the specific on
>which you are having problems?
>
>Thanks,
>Minollo

I'm attaching both EDI and CSV examples.
In Order96A_iso8859-2.edi look for "czech chars:" string. In onecze_utf8.csv is UTF-8 characters on last line.

Regards,
Jan

Postnext
Jan ZarubaSubject: EDI built-in adapter - problem with UTF-8 characters
Author: Jan Zaruba
Date: 11 Nov 2005 04:20 AM
Here is the attachment


Unknownedi_and_csv_incorrect_output_encoding.zip

Postnext
Minollo I.Subject: EDI built-in adapter - problem with UTF-8 characters
Author: Minollo I.
Date: 11 Nov 2005 09:58 AM
Thanks for the testcase.

#1, the EDI file:
I believe that the EDI file you have attached is NOT using ISO-8895-2, but instead it's using windows-1250, which is in contradiction with the UNOD setting. I've fixed the character encoding in the EDI file you sent us to be ISO-8895-2, and the EDI converter is now generating what I guess is the expected output. I'm attaching both the fixed original EDI and XML converted version

#2, the CSV file:
In version 6 release 3 the CSV converter always assumes the input file is encoded as ISO-8895-1. Stylus Studio 2006 allows you to override the encoding setting for tab and comma separated flat file to XML converters. In version 6 release 3 you could easily create a convert-to-XML definition in Stylus Studio (File > New > Convert to XML) and set encoding and other custom information there

Thanks,
Minollo


DocumentOrder96A_iso8859-2-fixed.edi
fixed EDI

UnknownOrder96A_iso8859-2-fixed.xml
converted EDI

Postnext
Jan ZarubaSubject: EDI built-in adapter - problem with UTF-8 characters
Author: Jan Zaruba
Date: 14 Nov 2005 03:50 AM
Hi Minollo,
thanks for your support. It seems you're right with the encoding of the EDI file. Unfortunately I missed this fact :).
I'm still playing little bit with the product; Let me describe what I did:
1. I took Order96A_iso8859-2-fixed.edi
2. I opened it with Stylus using built-in EDI adapter. Up to now czech chars are ok in XML file.
3. I added more czech chars to the xml and saved it to the file Order96A_iso8859-2-fixed2.edi. I used again EDI adapter to convert XML ---> edi format
4. I did steps 1 and 2 with Order96A_iso8859-2-fixed2.edi. Czech chars in the XML was incorrect.


Thanks,
Jan


UnknownOrder96A_iso8859-2_fixed2.edi

Posttop
Minollo I.Subject: EDI built-in adapter - problem with UTF-8 characters
Author: Minollo I.
Date: 14 Nov 2005 09:51 AM
Jan,
you are right; Stylus Studio is currently forcing the generated EDI to use UTF-8 rather than the encoding that is forced by the UNOD code.

We'll address the issue in the next Stylus Studio update; unfortunately it's too late to include the fix in Stylus Studio 2006 BL501d that is going public today, but we'll make sure the problem is solved in BL501e, available likely in 2-3 weeks from now.

Thanks,
Minollo

 
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  
go

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.