Industrial Strength XML Serving
I'm venturing this question as a general call for input--and pitches--with regard to the following project we're undertaking: 750,000 pages of journals, in both text form and gif images for "canonical preservation" and cross-check Typed text version, in XML (using TEI largely) yielding ~400,000,000 words (our initial estimates suggest something in the range of 30-50 gigs of total content including gifs), avg.'d to ~60,000,000 tag nodes, searchable based on content of tags (word strings), element heirarchy, and attribute values, with final form changing infrequently (archival/institutional memory) Primary access point being MARC records we're rendering into highly granular XML, for crosswalking to DC/RDF/GILS (we're starting with some 200 megs of MARC records alone) I've been asking offlist for possible consultants as our systems staff has a strong inclination to Oracle 8i and I'm hardly fluent enough on such software to argue based upon what I know. Based on Oracle's white paper, it sounds viable . . . however: In some of my offlist correspondence, I've detected a dichotomy between the view that "it doesn't matter if it's XML, pizza's, or washing machines you're storing, it's the size that counts (no pun intended)" -- so Oracle's great. ON the other side, is a sense that 8i's newness is a potential unknown for such size in XML (we'll also likely be subcontracting the serving of the gifs, likely out-of-state). The implication was that there were more SGML/XML-native packages out there if we have the budget (we do, within the limits that, say, commissioning a whole new softwre package is out of the question). :) Our project is perhaps one of the best funded efforts in the humanities in markup for some time, and surely in a class by itself viz. XML. As it's likely to be a model in various senses/case study, I really want to be sure we commit down the "right" road on this, and be sure of our options along that road. The vision I'm implementing from teh XML side is meant to go beyond another research resource to a full-scale research environment which exploits XSLT for having our stuff accessible--e.g., the MARC--in multiple tag vocabularies (DC, RDF, GILS, etc.), as well as very sophisticate construction of the resources found through the search (e.g., with DOM, etc.). At any rate, this question is in no way an obviation either of my offlist inquiries for a consultant, nor of their input thus far. Instead, since the vichy soisse is not yet ready to be stirred, nor even on the stove, all chef's are needed-- if there is a better mousetrap to be made without a reinvention of the wheel, now's the time to know. TYIA, jr =-=-=-=-=-=-=-=-=-==-=-=-= John Robert Gardner, Ph.D. XML Engineer ATLA-CERTR ------------------------------------------------------------ http://vedavid.org/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format