[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Validation vs performance - was Re: Fast text ou
> -----Original Message----- > From: Bullard, Claude L (Len) [mailto:clbullar@i...] > Sent: Sunday, April 18, 2004 14:49 > To: 'Michael Champion'; 'XML DEV' > Subject: RE: Validation vs performance - was Re: > Fast text output from SAX? > > > Yes. Dead on. When where and under what conditions > do applications need alternative formats? Those > who think they need one should be making the cases > for those conditions now. > > Here's the shakedown: binaries vs text formats as > Bob W. points out is an old debate. There are: > > 1. Those who are developing a generalized binary > and want to offer that. Let me point out one fact about ASN.1 that I see overlooked sometimes, especially when people try to compare ASN.1 with XML: **ASN.1 is not inherently binary**. ASN.1 focuses on a level of data description that is more abstract than a wire representation. (This is one reason why a direct comparison with XML 1.x syntax is difficult or even inappropriate.) For example, the following ASN.1 type definition: ------------------------------------------ EmailMessage ::= SEQUENCE { from EmailAddress, to SEQUENCE OF address EmailAddress, cc SEQUENCE OF address EmailAddress, sent DATE-TIME, received DATE-TIME, subject UTF8String, body UTF8String } EmailAddress ::= UTF8String (PATTERN "(some pattern)") ------------------------------------------ is a complete description of data (from ASN.1's point of view), but says nothing at all about the on-the-wire representation of the data. In particular, there is no implication that the data will be represented in some binary form. The on-the-wire representation can be XML 1.0 just as well. ASN.1 folks call the data-description level "type definition" or "abstract syntax", and call the on-the-wire representation "encoding" or "transfer syntax". The main focus being on the "abstract syntax" enables multiple distinct "encoding rules" to exist, each specifying a different on-the-wire representation of the data that has been defined at the "abstract syntax" level. This has given rise, over the years, to a number of standard "encoding rules", some of which are binary, some of which use XML 1.0. Every time, there has been a good reason for standardizing a new set of encoding rules, starting from BER, then DER/CER, then PER, then XER, then EXTENDED-XER. I am not saying that the ASN.1 solution fits all cases (or even most of the cases). I know that many people prefer thinking in terms of bits-on-the-wire (or in terms of Unicode characters to be encoded in some character-encoding before being placed on the wire), and I am not questioning their views here. However, I suspect that many applications are being built around a schema (now often XML Schema) in such a way that they will not tolerate any variations to the form of XML document that does not conform to the schema. If my suspect is well-founded, then these applications could be built as easily around a schema written in ASN.1. ASN.1 fits a common definition of a schema language, in that it "offers facilities for describing the structure and constraining the contents of XML 1.0 documents, including those which exploit the XML Namespace facility". One special characteristic of ASN.1, (currently) not shared by XML Schema and others, is to allow multiple standardized on-the-wire representations, some of which are not based on XML 1.0. Here is an example of a fragment of XML that is valid according to the type definition above: ----------------------------------------------------- <EmailMessage> <from>abcde@x...</from> <to> <address>1@a...</address> <address>2@a...</address> </to> <cc/> <sent>2004-03-05T22:03:55</sent> <received>2004-03-05T22:04:55</received> <subject>Validation vs. performance</subject> <body>This is the body of the email</body> </EmailMessage> ----------------------------------------------------- Here is a fragment of XML Schema equivalent to the ASN.1 fragment above: ----------------------------------------------------- <xs:schema xmlns:xs="(schema namespace)"> <xs:element name="EmailMessage"> <xs:complexType> <xs:sequence> <xs:element name="from" type="EmailAddress"/> <xs:element name="to" type="MultipleEmailAddresses"/> <xs:element name="cc" type="MultipleEmailAddresses"/> <xs:element name="sent" type="xs:dateTime"/> <xs:element name="received" type="xs:dateTime"/> <xs:element name="subject" type="xs:string"/> <xs:element name="body" type="xs:string/> </xs:sequence> </xs:complexType> <xs:element/> <xs:complexType name="MultipleEmailAddresses"> <xs:sequence> <xs:element name="address" type="EmailAddress" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:simpleType name="EmailAddress"> <xs:restriction base="xs:string"> <xs:pattern value="(some pattern)"/> </xs:restriction> </xs:simpleType> </xs:schema> ----------------------------------------------------- Both the fragment of XML Schema shown here and the fragment of ASN.1 shown above "describe the structure and constrain the content" of a class of XML documents which includes the example shown above. Does it make any sense to compare ASN.1 with XML? No. Does it make any sense to compare ASN.1 with XML Schema? Probably yes. XML Schema has a dualism between value and lexical representation, which is not very far from the ASN.1 dualism between value and encoding. The main differences are: 1) In XML Schema, the concept of value only exists for simple types, whereas in ASN.1, the concept of value exists both for complex types and for simple types. 2) XML Schema specifies one standard mapping between the value space and the lexical representation, whereas ASN.1 specifies multiple standard mappings (encoding rules). I am sure there are many other points of contact between the two languages. In fact, the X.694 standard, which specifies a translation from XML Schema to ASN.1, would not be possible if XML Schema and ASN.1 were not sufficiently similar. Although there are some features of XML Schema that have no match in ASN.1, most of the language can be mapped faithfully. So, if ASN.1 can be considered as another schema language for XML, what is so special about it? The fact that a "value" (of a complex or simple type) can have several "lexical representations", some binary, some based on XML 1.0. This provides **one solution** to the "binary XML" problem. This is not a universal solution, of course, because it requires that a schema be known, shared, and invariable (although ASN.1 has important provisions for extensibility both across space and over time.) Alessandro Triglia OSS Nokalva
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|