Re: Merging two sets of files

Play the video

Subject: Re: Merging two sets of files
From: Emmanuel Bégué <medusis@xxxxxxxxx>
Date: Tue, 3 Apr 2012 11:50:00 +0200

Do I understand your requirements correctly -- you need to output
- a new version of book.xml with associated catalog information from
the drug database
- standalone documents from book.xml for topics that don't have
corresponding drug info
- a new version of each of the 10,000 for which information can be
found in book.xml (not sure about that last part -- requirements 3 and
4?)

I would build two temporary documents:
- a new book.xml with all associated catalog information
- a new big catalog file (from all the relevant little drug files),
with all associated book information

and then in a second pass, cut those big documents to output the
required result files as needed.

One way to do two passes with Saxon is to use saxon:next-in-chain
(variables are really painful to use).

Hope this helps.
Regards,
EB


2012/4/3 Emma Burrows <Emma.Burrows@xxxxxxxxxxx>:
> I'm currently using XSLT 2.0 (using Saxon 9.3 via Oxygen 12) to merge two
sets of XML files together based on a third file which is a kind of lookup
table. However, I'm coming across a problem when I need to effectively merge
two source files into the same output file, and I need some suggestions on a
change of approach.
>
> I have the following XML files:
>
> - Main document - let's call it book.xml
> This contains various types of topics, including about 4000 topics related
to drugs, each identified by a unique id.
>
> - Ancillary drug information files auto-generated from an online drug
database.
> These are about 10,000 little XML files, each named after the unique id of
the drug information in the online catalogue.
>
> - An XML file - let's call it lookup.xml - that is essentially a look-up
table, matching ids in book.xml to one or more drug catalogue ids, and vice
versa. However, not all records in book.xml have an entry in lookup.xml.
>
> Now my requirement is to convert book.xml from its current proprietary
format into a DITA-based specialisation, and while I'm doing that:
>
> 1- Output the records with no corresponding catalogue entry as standalone
documents.
>
> 2- Merge each drug record in book.xml that has catalogue entries with the
corresponding auto-generated catalogue file(s), based on lookup.xml.
>
> 3- If a record in book.xml has more than one catalogue id in lookup.xml, I
need to copy the book.xml record into every one of the corresponding
auto-generated files.
>
> 4- If more than one record in book.xml corresponds to one catalogue id in
lookup.xml, I need to merge all the book.xml records with that same catalogue
file.
>
> 5- Make sure the converted and merged files are referenced in the correct
location in the book's hierarchy.
>
> I expect we'll ultimately do something more sensible like use conref rather
than tamper with the auto-generated files, but merging them is my current
brief as it stands.
>
> Point 4 is the immediate stumbling block because my solution to fulfilling
points 2 and 3 was as follows:
>
> 1. Convert the book.xml drug record into the desired DITA format and place
that in a variable.
> I'm doing this based on a matched template, so this happens whenever the
processor "encounters" a drug record as it travels book.xml. This ensures that
I can export records with no catalogue id and keep track of where the record
was in the hierarchy.
>
> 2. Use the lookup.xml file to find corresponding catalogue ids for that
record.
>
> 3. For each catalogue id, open the corresponding catalogue file using
document(), and result-document it to a new file with the contents of the
variable inserted in the XML.
>
> The problem is that in step 3, I can't reopen a document that was previously
created by the transform, so I can't "add" a new book.xml record to the
contents of an already generated catalogue file, even by outputting a new file
with a different name.
>
> I can see that I'll probably need a process with an intermediate step,
perhaps using lookup.xml to guide the processing so I can group records with
the same catalogue id. But the only trouble with that is what to do with
records that don't appear in lookup.xml...
>
> Anyway, I hope all this is clear and I'm open to ideas. :)

Current Thread
Merging two sets of files Emma Burrows - 3 Apr 2012 08:55:58 -0000 Emmanuel Bégué - 3 Apr 2012 09:50:46 -0000 <= Emma Burrows - 3 Apr 2012 14:30:02 -0000 Emma Burrows - 3 Apr 2012 14:32:05 -0000 Emmanuel Bégué - 3 Apr 2012 15:12:31 -0000 Emma Burrows - 4 Apr 2012 08:52:12 -0000

<- Previous	Index	Next ->
Merging two sets of files, Emma Burrows	Thread	RE: Merging two sets of files, Emma Burrows
Merging two sets of files, Emma Burrows	Date	RE: Merging two sets of files, Emma Burrows
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >