Re: XSL and the semantic web
Marcelo Cantos wrote: > > Yes, gratuitously. In the context of XSL, formatting can be done by a > style sheet at the client end, hence there is no need to provide the > final FO's. Well, David has already pointed out that there are places where the full XSL engine cannot run on the client. He has also pointed out that there are business reasons for wanting to keep your semantic data private. > None of the issues raised here applies to the discussion at hand, > which is automated processing of content. This is what the "semantic > web" is supposed to be about isn't it? Sure you can word index the > HTML, but that won't tell you which word is the person's name. > > So let me be more explicit: FO's are, for the purposes of automated > extraction of semantic content, little or no better than GIF's. In another message in this thread you said that trying to hide information from automated extractors (spam-bots) through data-dumbing was a lost cause. It raised the bar "a half inch off of the ground." Now you're saying that dumbed-down-text is as hard to process as a GIF. Which is it? My personal opinion is that dumbed-down-text is not hard to process if you know the dumbing-down algorithm in advance. But it is very hard to process if you are trying to write a bot that will *predict* what a random site's data dumbing algorithm will be like. Trolling the Web for "shoe prices" is a lot harder when shoe prices are labelled as <P>'s. My point of view is that making bot-creation harder is an information owner's perogerative. Making bot-creation easier is also the information owner's right. Charging extra money for the bot-friendly version is yet another right. > I would be surprised if the content provider went to all the effort of > exposing semantic markup and then didn't bother to tell anyone what it > meant. If their goal is not to share then that is exactly what they would do. That's my point: even if it were possible (which it isn't) to force people to share semantically meaningful data, the fact that it is semantically meaningful *to them* does not mean that it is meaningful *to you* without sufficiently smart software! Forcing them (as if it were possible) to distribute semantic data is only the start of the battle. > > If Lexis-Nexis publishes its terabytes of data in a proprietary > > document type, it might as well be Greek. HTML is more useful > > because I can at least display it. > > This is patently false. All you need is a stylesheet, which it would > be Lexis-Nexis' responsibility to provide you with if they wanted to > let you display it (if you are arguing for HTML then obviously they > want you to be able to display it). So what you are saying is that you need the information owner's help in understanding the information. That's what I'm saying also. Just getting it on the Web is not useful. Information owners need to *want* to build the semantic web so that they can help us interpret their data. > > Guessing at the structure of a document type from element type names > > is as dangerous as guessing based on text content like colons and > > font sizes. If you want the semantic web to be robust, you need > > people to WANT to publish semantic data in *standardized document > > types*. Even if we could force them to publish in semantic but > > non-standard document types we would be no farther ahead! > > > > Trees: XSL being used to destroy semantic information. > > > > Forest: The hard work of building robust information systems that > > will even *allow* us to share semantics meaningfully. > > This argument amounts to a throwing of the hands up in the air and > saying, "It's just too hard. We shouldn't even try!" No it isn't. Please read what I wrote above. Where did I say that we shouldn't try to build a semantic web? If anything, I said that we shouldn't try to *force organizations* that for some reason do not want to participate into doing so. Not only is it impossible and ill-conceived, it is just plain wrong from an economic and moral point of view. > Moreover, the example you give could easily be handled by separating > the formatting parts of the transformation side into two stages, the > non-formatting-related aspects at the server, and the formatting > aspects at the client. One might argue that this blurs the > distinction I am trying to make, but you obviously had no trouble > categorically asserting that footers are a formatting construct. Is > it ever really that difficult to discriminate between the two > concepts? Sure. Where do you insert boilerplate text? Is that formatting or transformation? In CSS it is formatting (since CSS doesn't do transformation) and in XSL it is transformation (since XSL formatting objects don't have prefixes). Where do you label something as being a block or inline? In CSS it is formatting. In XSL it is transformation. Where do you re-order the figure and the figure's caption? In some style languages (not CSS) this is possible without a transformation. In XSL it is a transformation. Where do you fetch the text from the other end of a cross-reference and stick it in the current location? In some style languages that is just a declaration in a simple style language. In others it is a transformation. If the only purpose of any transformation is for human display I call it formatting, no matter how sophisticated or complex it is. If you have some better distinction between formatting and transformation I would love to hear it. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco [Woody Allen on Hollywood in "Annie Hall"] Annie: "It's so clean down here." Woody: "That's because they don't throw their garbage away. They make it into television shows." xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format