[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Newbie question : accent, special chars,...


xml accent
You're right! the document is not well-formed. I've change the encoding to "UTF-8" and it seems to be well-formed now, but the xerces DOMparser still have trouble with it since I got the following error:

org.xml.sax.SAXParseException: Element type "Item" must be followed by either attribute specifications, ">" or "/>".
at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213)
at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:579)
at org.apache.xerces.framework.XMLDocumentScanner.abortMarkup(XMLDocumentScanner.java:628)
at org.apache.xerces.framework.XMLDocumentScanner.scanElement(XMLDocumentScanner.java:1800)
at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1182)
at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1098)



Here is the corresponding code:

...
// File f exists
FileInputStream fis = new FileInputStream(f);
org.xml.sax.InputSource is = new org.xml.sax.InputSource(fis);
org.apache.xerces.parsers.DOMParser parser = new org.apache.xerces.parsers.DOMParser();
parser.parse(is);
...

Here is the xml in File f :

<?xml version="1.0" encoding="UTF-8" ?>
<Item description="voici quelques caractères accentués : é ï è à utilisés en français"/>

It seems to me that the problem is more with the DOM parser than with the xml file. Should I make some configuration on it to make it run correctly with UTF-8 ?



On 15 oct. 04, at 15:44, Liam Quin wrote:

On Fri, Oct 15, 2004 at 02:59:00PM +0200, Benoit Mangez wrote:
Here is the content of a non-valid xml file :

<?xml version="1.0" encoding="ISO-8859-1" ?>
<Item description="voici quelques caractères accentués : é ï è à
utilisés en français"/>

It's not valid because of the special chars inside attribute
"description".

XML uses two term -- well-formed and valid.
As long as you actually use ISO 8859-1 for those characters, the
document should be well-formed. It isn't valid because you don't
have a "DTD". But I'll assume you just want well formed.

You didn't include the exact error message, so I can only guess that in
fact your file is in UTF-8 and not ISO-8859-1, so changing the encoding
may solve your problem.

Liam


--
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/

Benoit Mangez

___________________________________________

DENALI sa
- http://www.denali.be
Château de Clerlande - 1340 Ottignies - Belgium
Tel +32 (0) 10 43 99 51 - Fax +32 (0) 10 43 99 52
___________________________________________



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.