[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: MarkMail: now archiving xml-dev
Quoting Elliotte Rusty Harold <elharo@m...>: > Jason Hunter wrote: > > > I think the reason you *don't* see that is the inherent risk of letting > > someone else run arbitrary code on your server. What if the user starts > > calculating Pi to 1,000,000,000 digits? You don't need to let outsiders runs "arbitrary" code. > > Perhaps we shouldn't have made XQuery Turing complete? (Side note: I'm > pretty sure XQuery is Turing complete. Has anyone proved it yet?) Lets not even talk about XQuery. Do we talk about SQL in systems that have SQL back ends? Normally the functionality is wrapped in other functions and interfaces--- heck, these days, it seems most Java "programmers" could not even write a line of SQL if they had too (they'd argue, of course, that they don't). We must also be clear that XQuery is not "XQuery" especially in the context of information retrieval (e.g. Full-Text).. Beyond also the observation that SQL too is not "SQL" it would be foolish to promiscuously expose, despite all the user controls, one's RDBMS to every Tom, Dick and Harry.. One could design an XQuery scripting extension that would be "safer" for anonymous use (keep in mind that what looks "safe" is not always "safe" from malicious users and bug exploits) but why the bother? What's the benefit? Functionality? This, I suggest, could be exposed via other means. One of my own personal interests is to explore how one can expose the information functionality (the will to retrieve "relevant" bits of information) in the most naive and transparent manner. Since we have a completely flexible unit of retrieval (not bound by "record" or any other unit defined at index time) and the user might not understand or know the details of the structural mark-up used to encode the information, we need to figure out ways to interfaces to get the user the information that's relevant to them. Since the problem is not typically "individualistic" (there are classes of common responses) one should be able to make do without user scripting. The email archive case is really much much easier since much of the structure is not only known by the user (subject, sender, etc. in the header and in the message body we have lines, sentences and paragraphs and perhaps some attachments) but the semantic rules for content too.. "Relevant" retrieval objects are nearly always the message in the context of the thread in its temporal context (other messages that appeared in the list). The only hard-bit is to figure out what belongs in a thread--- we have Message-ids but not always and we have changing subject lines.. > > What if they start consuming > > disk or thrashing the disk IO? When you query against hundreds of gigs > > of content, you don't have to be malicious to mess things up. Its not 100s of GB. Mailing lists are not that large. > > > > > Or for a less constrained appraoch, try Amazon EC2. Run any code you > like on their servers. That's what virtual machines, zones and some other bits and concepts about.. Its not, however, needed, I think, for doing IR on XML. A lot of the functionality of XQuery--- holding back from talking about XQuery--- is not about the act of searching or retrieving information but about doing things to it. A lot of this "functionality" need not be performed by the "in-the-know" server. > > Yes, it's challenging; but I suspect there's a real business model in > there somewhere. :-) -- E. Zimmermann, BSn/Munich R&D Unit Leopoldstrasse 53-55, D-80802 Munich, Federal Republic of Germany http://www.nonmonotonic.net
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|