[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Repositories: From Argument to Architecture
Given the subject is now 'architecture' I hope the following comments are not deemed 'off topic'. Simon St.Laurent wrote on 30 January 1999 18:10: > Basically, it would mean that I could retrieve XML documents > from it using > HTTP, using familiar structures like URLs. I'd love to see > support for > XPointer queries on that same server, allowing me to pull out > fragments, > and another standardized query language (XQL or whatever) > that would let me > do more general searches. I think this must move from 'nice to have' to 'must have'. If we are to implement the next generation of web applications, as opposed to just document management, then there must be hooks into the data at all levels. For example, at the level of quoting from a magazine article - by including it inline in your own article, with your own formatting - you should be able to do: http://www.mag.com/issue[num=65]/article[num=22]/para[id=7] or whatever syntax becomes standardised. (We have implemented this already, using an XSL-style syntax for now, because it just looks neater (!) than some of the other syntaxes. I don't like the apparent procedural appearance of some of the other proposals - but we'll use whatever everyone else does, of course.) Likewise a portal-type site should be able to pull article information from us, allowing it to create links to the latest articles on our site, without re-coding every week or month, e.g.: http://www.mag.com/issue[num=65]/article[type=promo] Equally, a program should be able to pull figures from our company database, so that it can average them, chart them, or do whatever it wants: http://www.mag.com/company[ticker=MSFT]/economic[year=1998]/turnover And finally, a subscription fulfilment house should be able to retrieve any address changes made by subscribers via the magazine site, and synchronise them with their own databases. No more trying to make two different databases talk directly to each other. However, although we HAVE implemented all of this already, the only way we could get the data out fast enough was to use an indexing server on snapshots of the database. Not ideal, but OK for document-based projects where the output data does not have to change immediately that the database has been changed. > ... but at its > foundation I'd > like it to look like a vanilla Web server, whatever magic it's doing > internally. Definitely. We've done this as described above, but have an interesting issue in relation to pulling out information that requires formatting. Before we used a dot extension: http://www.mag.com/issue/65/article/22.htm http://www.mag.com/issue/65/article/22.xml But this looks 'wrong' in our new syntax: http://www.mag.com/issue[num=65]/article[num=22].htm http://www.mag.com/issue[num=65]/article[num=22].xml One possibility is to say that the server has a number of roots: http://www.mag.com/xml/issue[num=65]/article[num=22] http://www.mag.com/html/issue[num=65]/article[num=22] and perhaps others (XSL, and so on). I like this myself because it starts to say that the server is some sort of data repository, rather than just a 'web server'. However, it's not really 'correct', because the article is at the same position in the tree, regardless of how you output it. This is an important issue at the moment for us, because we obviously cannot assume that everyone is using XML-aware browsers to view the site, so we have to merge XML and XSL on the server for older browsers. Maybe we should really have: xttp://www.mag.com/issue[num=65]/article[num=22] http://www.mag.com/issue[num=65]/article[num=22] Who knows! Anyway, once all browsers are XML-aware, then we will just export XML - all we then have to do is work out how we tell the browser in what way to display it, without embedding that information in the XML document in the database through an explicit link to an XSL stylesheet. > The ability to modify and store document fragments would be > a significant > advance, making management and editing a heck of a lot > simpler than it is > now. Exactly right. We actually do use a web interface on an object-like database which allows you to drill down to any node in the tree. There's no uploading or downloading, you just edit the node (through a web browser). This brings with it its own problems though, as I will try to explain. To spell out the issues first; say we have something like: You live in <country id="USA">North America</country> and eat <animal>turkey</animal> at Thanksgiving. and I live in <country id="UK">Blighty</country> and have a friend in <country id="TKY">Turkey</country>. This gives us great search potential: - you could just search for the word Turkey, and get both entries - animal and country - you could search for the COUNTRY Turkey and get only the second entry - you could search for "Great Britain" and also find the second entry To achieve the latter, you simply say things like: <country id="UK">Great Britain</country> <country id="UK">UK</country> <country id="UK">United Kingdom</country> <country id="UK">U.K.</country> <country id="UK">perfidious Albion</country> and so on. Then a search for any of the strings inside the tag, is converted to a search for id="UK". (This is all 'pseudo-XML'. We actually use a more generalised link syntax.) So, to return to the problem, we can only achieve this at the moment by the user actually typing these tags into the database. It's not a bad solution - and is a lot better than manipulating 350K files in a text editor - but what we really want is to be able to highlight a word or expression and then apply a tag from a list of available ones. In other words, to achieve what we really want, the user-interface is going to be a major project in itself. For example, we also want to be able to automate tagging of certain obvious connections, especially useful for converting large quantities of legacy data. > (I love making changes in 350K HTML files and FTPing > them to their > home again and again.) As said, thankfully we don't do that. > Versioning and security would be > great as well. I don't think this is all that difficult. As far as security goes, our system has that on every node already. It's quite cute really, because two people can request the same document, and certain nodes can be denied to one and granted to the other, appearing to present two different documents. As to versioning, these issues are not new, and the technology is out there. Even with our relatively crude system, we could easily retain all historical versions of a node, and even apply labelling and commenting, like SourceSafe and PVCS do. Since we create our documents on the fly from the database then you could re-create any document from any time, and even search them. It would be more of a step for us to store these as deltas, but the expertise is around. > The management layer is a whole other set of things to consider, and I > think I'll let vendors ponder that, but again, I'd love to > see it managed > via the Web. I agree. Our current interface is all in JavaScript, and doesn't need the DOM. It has a tree structure that allows you to navigate through the nodes in the database. All data is edited by opening a node, and new nodes can be added at certain points dependent on whether they are allowed. An important next step is being able to work offline and then batch submit changes, whether just a few nodes or Tim's gigabytes of documents. For that we will need to work out some tracking mechanism to see if a node have been removed, altered, or whatever, but that isn't that difficult really, and may well just be a simple use of a syntax like XML-RPC. (I'm not trying to trivialise this stage; I know the software will have to, for example, respond in a reasonable way when someone tries to add data that might contain a node that they have no rights to, conflicts must be resolved, and so on, but it isn't really the most baffling of tasks.) The structure of the objects is also defined through this tool, but here I think is where we will need to do the most work. The ideal scenario is for there to be a very close relationship between the DTD and the storage structure. At the moment we can do it one way round - use the database structure to 'create' a DTD, which is handy, but what if we don't control the definition of the DTD? Just as you can 'import' your XML files, we want to 'import' other people's DTDs and presto, have our database structure. And more excitingly, there are certain types of changes that could happen to that DTD which could be immediately reflected by changes in the database. A dynamic database like that would be very useful. > 'Repository-in-a-box' is what I'd call this ... Mmm - snappy :-) > A lot more standards have to settle before there's > much chance of > implementing such boxes I don't know - I think we can already go a long way. We've already managed to alter our stuff easily to keep up with the changes in XSL, for example, and can't see much looking forward that will throw us out provided we plan carefully (and pay attention to this discussion forum, of course). >From our side the issues are more to do with performance and resilience, the same old issues we've always faced when building large distributed applications. In the short-term we need to build on something like Microsoft Transaction Server, for example, to ensure that everything is industrial-strength. But that is an implementation - not a theoretical - question. <aside> (This is perhaps really for another strand ...) As I've intimated, many of the problems we are addressing are not that new in software terms. There are however, some interesting conceptual issues that do need resolving, which I feel genuinely are new (even these may be old-hat in the SGML world - I know nothing of that, I'm afraid, so I apologise). For example, the search issues I referred to above present the need for a different type of search engine. Most of the XML search examples I have read, would find the country Turkey by: "find Turkey within a country tag" In my example above though, I would want to search for Turkey, and then see a list that says Country and Animal. I then choose country, and see all articles that are about Turkey, the country. We have taken a simple step towards this by having 'search for country', 'search for person', and 'search for industry' pages. They all cross-reference to each other, so searching for 'Bill Gates' will find articles that mention him, his individual profile, and Microsoft, because the latter it contains an entry for him as CEO. But longer term we want a user interface model that allows the user to start right at the top, not knowing what 'objects' we have available. (Imagine searching for Gates in a normal search engine and the first one hundred entries are about gardening and fence suppliers. Our way round, the first search results would be the categories available, not the actual web pages, and so Gates the person would be clearly visible.) But to make this user interface more usable, I think that DTDs or XSchema, or whatever, might need extending, to make the search results more meaningful. For example, say we had: <country><name>Turkey</name></country> We don't necessarily want a search for Turkey to show: Turkey + NAME + COUNTRY + ANIMAL when the following is far more meaningful: Turkey + COUNTRY + ANIMAL The results of these tags would be even less clear to a user: <ctry><nm>Turkey</nm></ctry> and if the user of the search engine was French, wouldn't we want the available objects to be shown in French? Anyway, you get the point; I think DTDs themselves might need to have some more information in them, or there may need to be some XSchema-type standard to handle this. </aside> > what it would > take to create such a beast and make it a commodity product Less than I think everyone thinks. To summarise, I think there is a lot of mileage in merging the right existing technologies together, rather than completely starting from scratch. There are a lot of developments out there that when put together create far more of what you are after than may at first sight be obvious. Regards, Mark Birbeck Managing Director Intra Extra Digital Ltd. 39 Whitfield Street London W1P 5RE w: http://www.iedigital.net/ t: 0171 681 4135 e: Mark.Birbeck@i... xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|