[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Poll (was: Seeking advice on handling large industry-standard XML da

  • To: 'Jeff Rafter' <jeffrafter@d...>
  • Subject: RE: Poll (was: Seeking advice on handling large industry-standard XML data models)
  • From: Jeff Lowery <jlowery@s...>
  • Date: Tue, 14 Jan 2003 15:41:16 -0800
  • Cc: xml-dev@l...

handling large xml file
Thanks, Jeff. This was a good response. 

> -----Original Message-----
> From: Jeff Rafter [mailto:jeffrafter@d...]

> I am working in what is probably considered a much smaller 
> shop-- we are
> handling upwards of 100,000 page hits a month-- which may be 
> small or large
> depending on your viewpoint. 

I expect our volume will be pretty light in comparison. Efficiency of
development is my primary concern of the moment, since the spec runs at 470+
dense pages right now.  Of course, volume will grow... but I think _most_ of
the processes in this data flow will remain low volume/moderate content. 

> We have written our system as a hybrid of 
> data-binding/serialization and
> XSLT templates. 

When you say data binding, are you using generated code or handwritten?  

> For the interchange aspect any data we get (in XML) is
> transformed to our serialization format and then loaded into 
> our structure.

Are the structures: 
a) very different?
b) very large? (you answer this one later, thx)

> I was working in an environment where everyone didn't have 
> quite the same
> level of gung-ho-ness for XML. This was a good mix. So most 
> of the back-end
> is native object structres (including business rules). But 
> whenever we need
> to interchange or output to the web, we serialize the objects 
> and transform
> to HTML (or fill in the blank...).

I certainly don't want to touch native object structures. Those are well and
good from the application's standpoint.  

> It is a little bit... hmmm... overkill?... but it has 
> actually helped a
> great deal. Being a smaller shop with a smaller customer base 
> we have lots
> of revisions and customizations-- the XSLT step really gives 
> us everything
> we need. We can rewrite the the whole application, create 
> custom views, even
> handle versioning without getting into really nasty code. In 
> that way it is
> used as a View in an MVC paradigm. So we lose a little bit of 
> performance
> (which could be improved by adding more memory or another 
> server) but gain a
> lot of simplicity.

Yep, I can dig it.  When do XSLT solutions become too complex, though?  It's
basically a maintainability question.  And I don't know an alternative,
since any object representation of the interchange format will likely be
just as complex and hard to maintain (as will the cross-exhange code betwen
the two object models [native and interchange]).  I wonder if XQuery offers
something here?  I guess it's time to start digging into that Draft.

The main advantage I see to XSLT is it avoid API-ness for what is basically
simple data manipulation. But that's an old record in this jukejoint.

> It also enforces the guys who are good at code and the back 
> end to stick
> with it and stay away from the output... our data size 
> probably will not
> compare to yours... our files are 60K-120K on average. In 
> terms of real
> validation (a la XSD or DTD) we only do it, if necessary in 
> the initial
> stages of an interface. After that we turn it off as most/all 
> submissions
> are machine generated.

The XML files we generate right now are small (25k), but will surely become
very large as more information is incorporated. Especially true as new
domains come onboard. I'm trying to worry ahead of time (which of course
never works, but I can't help myself). 

What we could do is profile the larger spec into sub-specs, but as work
flows down the pipe you really can't subdivide the document model any
longer.  And the way the model is designed, it's pretty scattershot where
the data goes in the structures. 

The model is somewhat "baggy", also, in that there's quite a bit of type
information carried in attributes and many co-occurrence constraints in the
data.  Could use a bit of Schematron in the mix, although there's already
some appinfo annotations to handle what XML Schema can't.

> I am not sure if this is what you were looking for-- I can 
> add more if you
> need/want it. 

It's helpful. I've got you pigeonholed as a (3).  Gotta do best fit
approximation, ya know.




Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.