Re: The triples datamodel -- was Re: Semantic Web p
Elliotte Rusty Harold wrote: > But rigid fixed schemas fail when we're talking about thousands or tens > of thousands or even millions of disconnected developers who do not have > prior agreements, who do not know each other, and who are doing very > different things with the same data. This is the world of the Internet. > This is the world I work in. This is the world more and more developers > are working in more and more of the time, and the old practices that > worked in small, closed systems behind the firewall are failing. It's > time to learn how to design systems that are flexible and loosely > coupled enough to work in this new environment. XML is a critical > component in making this work. Maybe RDF is too, though I'm still not > convinced (to bring this thread back on topic.) Schemas really aren't. > At best schemas are a useful diagnostic tool for deciding what kind of > document you've got so you can dispatch it to the appropriate local > process. At worst, however, schemas encourage a mindset and assumptions > that are actively harmful when trying to produce scalable, robust, > interoperable systems. What Rusty said. Here are two vingettes from my own experience to underline his point. - We will be getting xml messages (via JMS) from a state agency - the state of California, in fact. Their contractor tells us the messages conform to such-and-such a schema. The schema happens to be one that we ourselves wrote; it is a draft version of a to-be standard. But the first documents we get do not validate against the schema, and unfortunately they are not just simple extensions. In a few places new structures have made their way into the document. It seems pretty clear what has happened. Probably the messages originally validated, but then the contractor found they wanted to make some changes and forgot that the changes might not be schema-valid. Or maybe they never tried validating in the first place. Anyway, no problem - xslt to the rescue! - I need to screen-scrape certain data from a web page updated from time to time. The page is put up by a US government agency. The data is critical medically-related information. The results of the data extraction go into the front end of a long and complex automated workflow. I write the front-end parser (this was before John Cowan's tag soup parser came out). It turns out that the page is hand-authored by someone who is not very expert about html. Every update the internal structure changes. It always looks the same in the browser, but certain key internal parts are actually invalid html, and the nature of the invalidity changes each time. Unfortunately we have to use those parts to extract indexes that point to the actual data we want to collect from other parts of the page. We cannot outguess all the changes, and so from time to time we get parse failures. We cannot influence the page design. Finally, we give up and use the text-only version that the agency also hosts. This has no markup, but the visual structure blocks out the information we need in a consistent way, and the visual structure matches the actual text format. I write a parser that emits sax-like events to feed into the downstream process. Everything works nicely and robustly after this change. As Rusty says, that is the world of the internet. Cheers, Tom P -- Thomas B. Passin Explorer's Guide the the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format