|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: ASN.1 is an XML Schema Language (Fix those lists!)andBinar
John Cowan wrote: > Alaric B Snell scripsit: > > >>The ASN.1 'equivelant' of a normal XML parser would just need to support >>BER, which is the current conventional "minimal" encoding. An ASN.1 >>toolkit that supported "BER, PER, CER, DER, XER, and probably LWER, OER, >>and SER" would be more closely related to an XML parser that supported >>US-ASCII, UTF-8, UTF-7, UTF-16, EBCDIC, ISO-8859-[1..15], Shift-JIS, >>Baudot, etc... > > Hardly. Except for some feedback from the encoding declaration, which > can be handled by a sniffer, charset decoding is a completely separate > layer from parsing in XML. The differences between BER, PER, and XER > parsing are so profound as to cause the three parsers to have essentially > nothing in common. Oh yes, there is certainly a bigger code difference between the different ASN.1 encoding rulesets than between character encodings. My point, though was that the original poster claimed that the ASN.1 notion of multiple encodings is worse than the XML world of a single encoding because it meant that the recipient might not have the right decoder, requiring interactive negotiation mechanisms, and leaving you in trouble if it's a situation where you can't interactively negotiate. So I pointed out that the XML world is just as bad since you may not have the required decoder. Some XML written in EBCDIC will look like gibberish when viewed as ASCII :-) For example, UTF-7: +ADw?xml version+AD0AIg-1.0+ACI charset+AD0AIg-UTF-7+ACI?+AD4 +ADw-document+AD4 +ADw-title+AD4-Hello World+ADw-/title+AD4 +ADw-/document+AD4 If it wasn't for the "?xml" which has survived in line 1, you could be forgiven for mistaking it for something like RTF :-) And another I dare only represent as hex because it contains 'binary' characters: 0000000 6f4c 94a7 4093 85a5 a299 9689 7e95 f17f 0000010 f04b 407f 8883 9981 85a2 7ea3 c97f d4c2 0000020 f0f1 f7f4 6f7f 256e 4c25 9684 a483 8594 0000030 a395 256e 4040 a34c a389 8593 c86e 9385 0000040 9693 e640 9996 8493 614c 89a3 93a3 6e85 0000050 4c25 8461 8396 94a4 9585 6ea3 0025 That's a charset called "IBM1047", an EBCDIC variant. Both of these are the IANA registered names of the charsets. As I read the XML 1.0 spec, they are valid XML 1.0 documents (I've even declared the charset name in the XML declaration), but according to: http://www.w3.org/TR/REC-xml#charencoding ...a parser isn't required to be able to read them. "processors are, of course, not required to support all IANA-registered encodings" "It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process." But the XML world hasn't exactly come tumbling down because of this, has it? It's not as big a problem as you might think. Anybody sending XML knows that if they are worried about it being understood in unguessable circumstances they'd better stick with UTF-8, since XML parsers are required to support it, and it will make at least partial sense wherever US-ASCII is spoken, too. Likewise, people in the ASN.1 world who want things to be generally readable will have used BER in the past, and now they can even use XML too! Progress, eh? But yes - I know it's probably more effort to create an ASN.1 decoder that supports every encoding ever developed than to create an XML decoder that supports every encoding ever developed (although the IANA list of encodings is *pretty* long...), but my point is that this isn't really relevant; nobody every BOTHERS to write a decoder that supports every possible encoding. You support the commonly agreed baseline encoding(s), and then support others if your closed-system niche application requires it. If you're doing anything outside of a fixed niche, then you try to stick to the baseline, for maximum interoperabliity. > When using XER, is one constrained to a specific encoding? If I remember correctly, when I was involved with discussions about this, we were going with what the XML 1.0 spec says, in order to be compatible with it; since an XER decoder is a compliant XML parser, it has to support at least UTF-8 and UTF-16. IIRC, we may have been more restrictive about output and mandated UTF-8, although I can think of arguments against that (ideographic languages generally take more than two bytes per character in UTF-8, so UTF-16 is more efficient there), so I doubt that was approved. > Also, I'm curious about which encoding-rules transformations one can > perform without knowledge of the schema: > > BER to PER? > XER to PER? > BER to XER? > XER to BER? In general, none of them - PER contains no information that can be found in the schema, since in the world it works in - where both ends know the schema - sending information that's available already to both ends is pointless. BER and XER both have the actual field boundaries in them so both of them can be converted into tree structures, but in the XER, you have no way of knowing how to interpret the textual content, and in BER, you are told how to interpret them (the type information is there) but you don't know what their names are :-) However, there are a few caveats. For a start, there is an ASN.1 type for the Infoset being produced, so arbitrary XML that can be parsed into an Infoset could then be encoded in PER or BER. But this isn't actually converting the abstract value itself into PER or BER - the result is not "Here is a person with name Alaric and email address alaric@a...", it's "Here is an element called Alaric with two children, an element called Name with content Alaric, and an element called EmailAddress with content alaric@a...". ABS
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








