[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML aggregation question?

  • From: Robert Koberg <rob@xxxxxxxxxx>
  • To: xml-dev@xxxxxxxxxxxxx
  • Date: Sun, 27 Aug 2006 11:30:50 -0400

james fuller exists subversion
>>>> So I have been sticking with the filesystem,
>>> Yes - I am a great believer in the filesystem as a simple database engine.
>>>
>>>> Apache's Lucene for indexing
>>> Yes
>>>
>>>>  and CVS or Subversion for version control.
>>> Yes
>>  From the code point of view, it works so nicely. Easy to develop on a 
>> local machine just by doing a server commit and local update. Easy to 
>> change on the server by doing a local commit and a server update (and 
>> probably an Ant build). Same for content/metadata if necessary to change 
>> in different locations -- just need to run a lucene update to keep the 
>> UI experience in sync.
> 
> Well, herein lies the crux of Mike's comments about if you don't start
> with a DB that's likely what you'll end up building to some degree.  

I guess 'to some degree' is the main qualifier. Let's look at it:

- filesystem: this is perhaps the simplest of them all and probably does 
not need comment.
----
- Lucene: this library is one of the most clean and easiest to 
use/implement. If you are dealing with XML (i.e. not wanting to index 
tags/attribute names), though, you might need to write some code (SAX 
preferably or even some DOMish thing) to handle indexing appropriately. 
This can be used to tune the index as well. For example, you have some 
description-type fields that allow em(phasis) and strong (emphasis) - 
you can weight those terms/phases more and more heavily than the 
not-applicable-to-search and/or phatic text. Can you even do this type 
of thing in an XML DB or RDB? (unless you use something like Lucene?)

You will also want to determine whether you store the indexed data in 
the index (for quick/easy retrieval) or store references to pick back up 
from the filesystem. (this is where an XML DB has its most charm to me)
----
- CVS/Subversion (oh, and when I say subversion, I mean using the 
filesystem backend, not the Berkeley DB backend (licensing again)): this 
is pretty much a no-brainer as well (I assume?). This allows you to 
checkout anywhere where you give access and create an instant work, QA, 
gold-master or runtime environment.

If you need something like version control in your app it will be much 
easier with something that is filesystem based rather than held in some 
binary.
------------------------
Since this an XML list, I will assume we are talking about XML. I will 
also assume the main benefit of a DB is transactions (I don't think 
eXist has that yet). But that can be handled by code at the level of a 
wellformedness check and a validity check.

How does using a (XML)DB from the start compare?

best,
-Rob


> If
> I was managing "real" documents or providing more of a content
> management system, this would more than likely work.  However, I want to
> be able to "slice and dice" my XML instances to provide different views
> or ways of accessing the instances based on values of specific
> attributes or elements.
> 
> As I said originally, if I didn't care about wanting to keep the data in
> XML as the "native" format of the system for easy editing by hand (in
> particular for me, using vi on Linux) as well as providing more GUI
> based view/edit capabilities via a Web-based interface (most probably
> using XForms), I'd just forget about the XML aspect and it would be a
> "traditional" RDBMS application.
> 
> Based on all the comments thus far as well as reading some of the
> articles/documentation on eXist, it would seem that an XML database is
> really the only viable choice if I want to keep my data as XML and still
> provide aggregated views across the instances based on values of
> attributes (or other expressions using XPath and/or XQuery).
> 
> If I went with the "traditional" RDBMS approach, I'd be spending most of
> my application's CPU cycles going to and from XML, so the benefits of
> being able to use SQL to pull the list of instances really doesn't seem
> worth it.  At the moment, I'm leaning towards trying eXist to see how
> well it'll work for what I want to do.
> 
> Again, thanks for all the discussion so far.  There's likely to be some
> additional comments during the week, so my decision's far from set in
> stone yet.
> 
> Cheers,
> 
> ast



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.