> Date: Friday, 13 September 2013 6:49 AM
>
>> On 12 Sep 2013, at 19:47, David Lee wrote:
>>
>> In my experience, ALL Large XML files are
really collections of smaller files.
>> I have never seen a single XML document of any
large size that isnt simply
>> <root>
>> <row> document 1 .... </row>
>> ..... 10 bizillion times
>> </root>
>
> That's certainly a very common pattern, but I've
seen a few examples that
> don't quite fit it. For example, a database dump
of 50 tables each of which
> fits the above pattern. Or GIS data consisting of
large numbers of objects of
> a wide variety of different kinds. What does seem
to be true is that as files
> get larger, it's rare for the hierarchy to get
deeper.
I agree with that and wanted to share a brief note on
our experience, dealing
primarily with XML that is to be printed in some
format.
While XML for things like parts catalogues can get
quite large, they tend to be
of the pattern of repeating sets of data. Some of the
larger XML documents we
deal with (which are not "database dumps") tend to be
lengthy pieces of
legislation.
While legislation can be broken down into provisions
and so on, there is still
enough cross-referencing and relationships between the
information to make it
tricky to break up into standalone components.
Having said that I don't think I've seen a single piece
of legislation (eg.
Bill or Act) exceed 100MB in XML document size.
-Gareth