|
next
|
 Subject: encoding and entities for xml well-formedness Author: lucrecia chiartano Date: 09 Nov 2004 10:04 AM
|
Hello,
I would thank anyone who could help me with this pure-xml-topic.
As ampersand and less-than charcters are not allowed in a well-formed xml, they should be replaced with entities.
I looked for information on the w3.org site but am not sure to have understood it well.
What I have understood is that an entity can be constructed from the 'code-point' for the character to be replaced in three ways :
- &#x[hexadecimal-code-point];
- &#[decimal-code-point];
- &[entity-name];
I think I have understood also that for the first and second cases 'code-point' must be compatible with the xml-encoding-declaration of the same xml (<?xml encoding=""?>).
'Named entities' must be declared before use (for example in a dtd), but for four cases (< , > , & , ') that are somehow 'internally-declared'in the xml-processor.
Supposed I have well understood this, I still have not it clear if the four internally-declared 'named-entities' are encoding independent.
When I wirte it down on my xml they are always recognaized, whatever encoding I do declare in the xml-encodind-declaration, but I cannot say if it depends on the xml-processor I'm using.
That's all,
Hope I've been clear.
Thanks,
Lucrecia Chiartano
|
next
|
 Subject: encoding and entities for xml well-formedness Author: (Deleted User) Date: 09 Nov 2004 11:58 AM
|
Hi Lucrecia,
>[...]
>I think I have understood also that for the first and second
>cases 'code-point' must be compatible with the
>xml-encoding-declaration of the same xml (<?xml
>encoding=""?>).
No, the code-point is the number that has been assigned to that
character in the Unicode specs, regardless of the encoding currently
being used in the XML file.
From § 4.1 of the XML specs:
"If the character reference begins with "&#x", the digits and letters
up to the terminating ; provide a hexadecimal representation of the
character's code point in ISO/IEC 10646"
where the ISO/IEC 10646 is the official name for the Unicode specs.
>'Named entities' must be declared before use (for
>example in a dtd), but for four cases (< ,
>> , & , ') that are somehow 'internally-declared'in the
>xml-processor.
The predefined entities are 5; there is also "
In any case, both the character entities and the predefined ones are
encoding-independent.
Hope this helps,
Alberto
|
|
|
|