|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Postel's law, exceptions
On Jan 13, 2004, at 9:24 PM, Michael Champion wrote: > > On Jan 13, 2004, at 8:57 PM, Dare Obasanjo wrote: >> >> I work on RSS in my free time. The most common well-formedness errors >> are documents with incorrect encodings or documents that use HTML >> entities without a reference to the HTML DTD. How exactly do you >> propose >> XML 1.x fix these problems? >> > I don't; I didn't take the time I've thought about this more and read some of the other responses. I guess where I come down is that XML per se is fine for what it does and purports to do here; we might want to consider different design patterns, if you will, for using XML in applications. The conventional approach is to assume that the XML document is in a file, it knows it's encoding, and is ready to face a draconian parser and possibly a validator as well. Some other application put that document in the file, and it had a moral responsibility to do its best to ensure that the XML is well formed (and possibly valid). If there is a DTD in the file or referenced from the file, it does a *lot*-- define internal entities, external entities, vocabularies, and structure models. That makes XML 1.0 processing sortof a Big Bang that either produces a fully formed Infoset, or an error message. It leads to the situation the Atom people are in, where there apparently is a stark choice between doing the Right Thing according to the spec or keeping the customers happy by aggregating the information they asked to be aggregated without annoying them about geeky details that they care nothing about. An alternative is to think of a processing *pipeline* connecting a data source (which may or may not be XML) through a number of processing steps that eventually lead to an Infoset that meets some vocabulary and content constraints. This provides any number of steps with which to adapt, cleanup, transform, and *then* parse and validate XML text. The Atom people who want to be liberal can simply add a step to their processing pipeline that does the sort of fixup that we've talked about -- make sure special characters are escaped properly, fix up the encoding, maybe even normalize escaped HTML into well-formed XHTML by running it thru tidy. That would be a service that would plug into the pipeline (as a library call, a SOAP invocation, a REST resource, or whatever) and not something that necessarily affected the rest of the application architecture. The best statement I know of this point of view is Uche Ogbuji's "Serenity Through Markup" piece http://www.adtmag.com/article.asp?id=6758 " As documents move between systems, trust the remote system's ability to interpret the document to meet its own local needs. This is known as the principle of loose coupling, and is reminiscent of the idea of late binding in programming theory. In the technology sphere, the idea is that an XML document used in a transaction between multiple systems need not always build in all possible aspects of the combined systems. Source systems design documents to meet their needs, while the target system interprets them to its own needs. ... [This can be done with] pipelines of data, where a document starts out in one form and ends up in one suited for a different environment." Sean McGrath has written a lot about the pipeline approach too, but all I can find are PPT presentations. Do you have a good link (if you're reading, Sean)?
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








