SSDN - encoding and entities for xml well-formedness

XML Editor

Sign Up

Search

Options

Chat

Help

News

Log in

Not Logged in

Topic

Topic Page 1 2 3 4 5 6 7 8 9

lucrecia chiartano

Subject: encoding and entities for xml well-formedness
Author: lucrecia chiartano
Date: 09 Nov 2004 10:04 AM

Hello,
I would thank anyone who could help me with this pure-xml-topic.

As ampersand and less-than charcters are not allowed in a well-formed xml, they should be replaced with entities.
I looked for information on the w3.org site but am not sure to have understood it well.
What I have understood is that an entity can be constructed from the 'code-point' for the character to be replaced in three ways :
- &#x[hexadecimal-code-point];
- &#[decimal-code-point];
- &[entity-name];
I think I have understood also that for the first and second cases 'code-point' must be compatible with the xml-encoding-declaration of the same xml (<?xml encoding=""?>).
'Named entities' must be declared before use (for example in a dtd), but for four cases (< , > , & , ') that are somehow 'internally-declared'in the xml-processor.

Supposed I have well understood this, I still have not it clear if the four internally-declared 'named-entities' are encoding independent.
When I wirte it down on my xml they are always recognaized, whatever encoding I do declare in the xml-encodind-declaration, but I cannot say if it depends on the xml-processor I'm using.

That's all,
Hope I've been clear.

Thanks,

Lucrecia Chiartano

(Deleted User)

Subject: encoding and entities for xml well-formedness
Author: (Deleted User)
Date: 09 Nov 2004 11:58 AM

Hi Lucrecia,

>[...]
>I think I have understood also that for the first and second
>cases 'code-point' must be compatible with the
>xml-encoding-declaration of the same xml (<?xml
>encoding=""?>).

No, the code-point is the number that has been assigned to that
character in the Unicode specs, regardless of the encoding currently
being used in the XML file.

From § 4.1 of the XML specs:
"If the character reference begins with "&#x", the digits and letters
up to the terminating ; provide a hexadecimal representation of the
character's code point in ISO/IEC 10646"

where the ISO/IEC 10646 is the official name for the Unicode specs.

>'Named entities' must be declared before use (for
>example in a dtd), but for four cases (< ,
>> , & , ') that are somehow 'internally-declared'in the
>xml-processor.

The predefined entities are 5; there is also "

In any case, both the character entities and the predefined ones are
encoding-independent.

Hope this helps,
Alberto

lucrecia chiartano

Subject: encoding and entities for xml well-formedness
Author: lucrecia chiartano
Date: 10 Nov 2004 08:42 AM

So, am I reading the following in the wright way?

From § 4.3.3 of the XML specs:
...
Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text Declaration) containing an encoding declaration:
....
--> from here I deduce that there exists two kinds of encoding declarations: one for xmls and one for entities.
UTF-8 and UTF-16 are somehow the entities-default-encoding-declarations, and that's why you said that entities are xml-encoding-declaration independent. Isn't it?

Then I can declare an ASCII encoding for the xml and use entities named by the UNICODE code-points also if that particular charcter exists not in the ASCII encoding.

The xml is still well-formed (I've tried it with Xerces and found no error, just the output contains the wrong character).
And there is no control from the encoding declaration over the xml text neither on entities used within it. Is it so?

Thanks a lot,

Lucrecia

(Deleted User)

Subject: Re: encoding and entities for xml well-formedness
Author: (Deleted User)
Date: 10 Nov 2004 09:10 AM

Hi Lucrecia,
the paragraph you are reading (§4.3.3) is about external entities,

that is, entities that require an external file to be imported in the XML.

The specs allow this external file to be in any encoding, provided that it

starts with the proper <?xml ... ?> header.
Character entities and entities like &myEnt; are internal
entities, and always use ISO 10646 as encoding.

Hope this helps,
Alberto

Topic Page 1 2 3 4 5 6 7 8 9

Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!

Go to Conference:

Log In Options Username: Password:

Site Map | Privacy Policy | Terms of Use | Trademarks

Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.