|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: The triples datamodel -- was Re: SemanticWeb per
On Tue, 2004-06-08 at 09:15, bry@i... wrote: > > > > Imagine a document formatting system that just ignores unknown tagging, > > the way Elliott proposes. Now imagine that an author invents a new > > admonition tag for a particular market. (The U.S. and Japan have special > > requirements, so European manuals published in those countries would > > need some way to distinguish admonitions that must be processed > > differently than in other countries.) As a result, the market specific > > part of the document will either be omitted entirely (worst case) > > I don't get this, this has always seemed to me to be a strength not a weakness. > The market specific part of the document will either be omitted entirely in > markets to which it is not relevant right? I think what you're arguing here is Only if the software processing it knows what to omit, and when. If the market specific markup is created by an author who has little or no connection with the software developers, how can the software know how to process the new markup correctly? > about interop, that is to say I have extended a standard with market relevant > information for my application, it works great in my application, if another No. What Elliot argues is that it should be generally true that if I extend someones standard, yours, for example, then it is up to your software to figure out what I meant. For example, if you, the markup and systems designer, have markup like this: <name>Henrik Mårtensson</name> and I, a technical author, extend that: <name nationality="Swedish"> <firstname>Dag</firstname> <middlename>Henrik</middlename> <lastname>Mårtensson</lastname> </name> now it is up to your software to figure out that even though there are now three names, it is only appropriate to use two of them in most situations. It should also figure out that since I am Swedish, it is likely that I do not want to be called by my first name, but by my middle one. It must also be able to figure out that if I do this: <name nationality="SE"> <lastname>Mårtensson</lastname> <firstname>Dag</firstname> <middlename>Henrik</middlename> </name> it is still a Swedish name, and when the name is formatted, the order of the names should be changed. What I am arguing is that: * it is not likely that anyone can foresee all possible variations and build software flexible enough to handle them * even in those cases where it is possible, it is often not cost efficient We are not just discussing automatically generated XML. We are also discussing XML documents authored by humans. Part of the argument is about how much unpredictability a human (or large groups of humans) will introduce if allowed to arbitrarily add new markup. > application that knows nothing about my applications extensions needs to deal > with the market relevant information then there can be a problem but I'm not > sure for whom the problem actually is, is it for the people using my Part of the problem is that there will be hundreds, maybe thousands, of people inventing new markup for every person who actually writes code to process that markup. Most corporations have many more authors than XML DMS developers. Also, the information receivers tend to be those who has to bear the costs of the markup extensions. (Extending markup can be used as a powerful weapon, as proven by Microsoft and others.) Elliott argues that it is fairly easy to handle such situations at the developer end. I argue that it is very difficult. > applications data? If it is data brought into my application from another source > and then extended I would say no, go get the damn data from the original source > not second hand. If it is data generated by my application then the question is, > if you're gonna rely on a particular application why do you not find out if it > extends things in any way. The "application" is usually a human, in my part of the XML universe. If allowed to, they will make their own extensions. In one case I was involved in, an author wrote his own DTD with more than 200 elements, and refused to use anything else. Of course, it wasn't possible to support his private DTD in all processing applications in the company, or even build a filter system just for him, so everything he wrote was essentially useless. His position was that everyone else (60,000+ people) ought to start using his DTD. > > Anyway I could go on with all sorts of ways that I think the problems are > unclear, the thing is that this way of handling document extension has proven to > be particularly useful for xml applications. I see papers and tutorials > published all the time about it, one particular popular motif being 'extend x > with rdf'!!! The argument is being made that this method of extension is so > efficient in regards to other methods that it should be defaulted to. Your > argument against it doesn't seem clear enough to me to drop a powerful method. I am not arguing that anyone should. Different methods are suitable in different circumstances. RDF isn't necessarily much use when writing a user manual. This does not imply that RDF is useless, or that it is useless when publishing the manual, or even when searching for information to include in the manual. Far from it. > > >or > > formatted the wrong way (best case), probably just like any other block > > of text. Either way, if an accident happens, the company that publishes > > the manual would be liable to pay damages. > > > see when you're talking about doing something like publishing a manual then I > think you're arguing about the organisation that has the data has extended the > data with market relevant information and then they're too incompetent to change > their publishing methods. If you can't change the handling of the data in your > own applications then don't extend the data, that's my theory. Not quite. First of all, the thing I am against is arbitrary extensions of complex documents by individual authors. When an organisation extends, or changes, an XML schema, it is a bit different. (Well, not always...): * An organisation has access to XML expertise. An author usually has very little or no training in XML development. (Most have little or no training in structured authoring, or any other kind of authoring. Writing well requires a lot of expertise, but companies skimp on this. In many cases it would be more efficient to train the writers than to build custom software, but no one wants to pay for training humans.) * An organisation usually organises a project to determine what changes are necessary, and how to implement them as effectively as possible, with minimal disturbance to all processing systems. It also tries to identify those systems that will have to change, updates the systems, tests the new schemas, and times everything so that a changeover has minimal impact. An individual author usually just implements something, without knowing or caring about the overall impact on the systems involved. Then of course, a key idea with XML is that if you mark up the information well, with descriptive tagging, then you can change the way the data is processed without changing the tagging. In most cases this is true. My experience is that well over two thirds of all markup change requests in corporate projects are unnecessary. The desired functionality can be implemented much more cheaply and efficiently without changing the markup. In many cases, the functionality already exists, it is just that the users get no training, so they do not know about it. > > > > With XML, all hell still breaks loose when the format is changed. XML is > > no different from other formats in this respect. > > > well sometimes when I see all hell break loose when the format is changed I find > that is because the system was built without a clear idea of how format changes > should be approached in a particular xml dialect, sometimes this is because the > dialect itself has not clear idea how format changes should be approached. Most > often it is because the developers have no idea that there is such a thing as > different models for how to handle changes, one of which, the most common is the > one under discussion, that model presupposes that unknown markup is ignored but > subtrees of known markup are not ignored, that means that any extension to the > system has to take that model into consideration. I consider that as something > that should be blamed on the developer, I can see how we might argue that this > is something that developers should not be blamed for because it is just too > much to expect them to be able to take into consideration in their hectic > work-schedule but given that it is something that I have learned to take into > consideration and pay attention to when I build applications I don't feel like > giving anyone else a pass on it (especially as I consider it to be something > that makes application building easier with xml data). When dealing with technical documents, there are basically two kinds of new markup that may appear in a document: * Useless crud * Markup that is present for a good reason When there is a reason for having the new markup, it is always the same: the new markup is there to enable the processing systems to change their behavior in some way they could not otherwise do. (Or in some manner that would be very difficult without the markup.) This means that important markup must not be ignored by processing systems. It is meant to have an impact on their behavior. Consequently, a processing system that ignores the markup (and possibly the content), will do something bad. (Or fail to do something good.) Bad ranges from trivial inconveniences to fortunes lost, to people killed, depending on the circumstances. Great monetary loss or people killed have been risks in every XML project I have ever worked in so from my perspective, this is normal. (Well, no, not my own hobby projects, though one or two nearly killed me...) Useless crud could be safely ignored, of course, but everyone that gets their chance to leave a mark in the world through their very own XML tag sincerely believes that it is the one that will be the difference between making it or breaking it for their company, so how will a piece of software be able to differentiate? It can't. That is why it must treat so many anomalies as something that requires human attention. As for blaming downstream developers for mistakes made by content authors upstream, I disagree. You are certainly right when you write that developers should make their software as flexible and robust as possible, but then again, many developers lack the training to do that. In my experience, only about one developer in five knows about more than the basics of object oriented programming, design patterns, TDD, refactoring, versioning systems, or any other techniques and tools that you and I may take for granted. You can't blame individual developers for this. They are adapted to the requirements of large corporations and consultancy firms, i.e. they are cheap. If they were more skilled, they would be more expensive, and would get fired. (They would also be effective enough to more than compensate for their higher wages, but I have never seen a customer factor that in for a "generic" programmer, only for a few experts.) To make matters worse, having one good programmer on a team isn't enough. Competent project managers are even more rare. To produce good quality software, everyone must be well over average, and the team manager must be competent. This combination does not occur often. (In that particular market segment, that is. There are of course companies that are chock full of competent developers. Somehow, they seem to be very rare in the XML documentation business. Salespeople with good golf handicaps are common though.) Please note that I am not trying to proscribe to anyone else how to deal with their problems. I believe that different problems require different solutions. I don't believe that Elliot's techniques are bad. On the contrary, I believe he is very good at what he does, and uses techniques appropriate to the task. What I do not buy is that those techniques and strategies would necessarily be appropriate to the very different set of problems that I am dealing with. I do not put an overabundance of faith in golden hammers (including XML itself). /Henrik
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








