Re: Generic stylesheet to flatten XML hierarchy

Play the video

Subject: Re: Generic stylesheet to flatten XML hierarchy
From: Sara Mitchell <samitchell6@xxxxxxxxx>
Date: Mon, 7 Dec 2009 10:49:01 -0800 (PST)

I know that this may not work in every case. Basically the rules are: 

*
every attribute on an element becomes a column in a row
* every element that
has data content becomes a column in a row
* repeating elements define a row
-- with the further restriction that if there are hierarchical levels of
repeating elements (nested), the final lowest level of repeating elements
defines a row and ancestor levels get repeated
* hierarchical relationships
get flattened
* siblings at any level that don't repeat get repeated in each
row

I'm going to try one last possible solution using keys and XPath, I
think, and if that does not work I may move on to Michael Kay's suggestion of
a meta-stylesheet. 

Thanks to everyone for the ideas.

--- On Fri, 12/4/09,
C. M. Sperberg-McQueen <cmsmcq@xxxxxxxxxxxxxxxxx> wrote:

> From: C. M.
Sperberg-McQueen <cmsmcq@xxxxxxxxxxxxxxxxx>
> Subject: Re:  Generic
stylesheet to flatten XML hierarchy
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@xxxxxxxxxxxxxxxxx>
> Date: Friday,
December 4, 2009, 6:35 PM
> On 4 Dec 2009, at 12:37 , Sara
> Mitchell wrote:
>
> > ...
> > 
> > With input like this:
> > <rss ...some attributes>
> >   ...
> > </rss>
> > 
> > I would like XML output like this:
> > 
> > <root>
> >
<row>
> >  <rss-attr1>value</rss-attr1>
> > ...
> > </row>
> > <row>...again
rss attributes, channel
> attributes, non-repeating children of channel
followed by
> fields for second item </row>
> > ...more rows ...
> > </root>
>
> I'm having trouble seeing exactly what should be going on
> here,
> because
I can't see anything in your sample input (elided
> here
> without loss of
generality) that gives rise to the name
> 'rss-attr1'.  It's hard to correlate
input with output
> if
> all the values are spelled 'value' and some details
in one
> half of the input / output pair correspond to ellipses in
> the
>
other.
> 
> 
> 
> > 
> > This example is for a single level of repeating
>
descendants, but my solution has to be able to handle any
> level of repeating
descendants. More over, the stylesheet
> has no knowledge of the structure of
the input document.
> 
> My very strong gut reaction here is to suspect that
such
> an
> absolutely generic transformation is unlikely to produce
> helpful
> (or: meaningful) output in some unknown but possibly large
> percentage of
cases.
> 
> Perhaps the transformation you have in mind is intended to
> work
generically on all XML documents that follow certain
> conventions in
structuring the information they represent?
> Can you say what those
conventions are?
> 
> Perhaps you have a very clear understanding of the
>
transform you
> want, but so far this discussion has not elicited a clear
>
description from you.  The following questions are
> intended to
> try to
elicit some more clarity.
> 
> In a generic XML document, there are elements
with
> parents,
> left and right siblings, children, descendants, and
>
attributes.
> 
> In a generic table, there are rows and columns.  Each
> row
but
> the first or last has a predecessor and a successor, and
> ditto
> each
column but the first or last.
> 
> What is the relationship between the
elements, attributes,
> containment and sibling relations in the input, and
the
> rows and columns and their sequence relations in the
> output?
> 
>
Given your output table, should I expect to have all the
> information present
in the XML?  Can I recreate the
> XML from
> your table?
> 
> Do all your rows
have the same number of columns?  (I
> suppose
> they must, or it's not much
of a table, but perhaps I'd
> better check?)
> 
> When does an XML document
give rise to a single row in the
> output
> table?  When does it give rise to
exactly three
> rows?  When
> does the resulting table have exactly one
column?
> 
> What information do the labels of columns convey?
> 
> What
tables would you want to produce for the documents
> 
> (1) <e/>
> (2) <e><e
n="23"/><e
> n="45">Pax</e></e>
> (3) <table>
>     <row a="1" b="2"
>
c="34">998</row>
>     <row a="2" b="22"
> c="34">999</row>
>     <row a="3"
b="2"
> c="3">1000</row>
>     <row a="4" b="24"
> c="">1001</row>
>     <row
a="5" x="Viva Villa!"
> c="34">998</row>
>     </table>
> (4) <p>This isn't
mixed content, because the schema
> says I'm a string.</p>
> 
> ?
> 
> 
> > 
>
> I have a solution that works ok by traversing the
> input document in doc
order -- but it does not handle the
> siblings of repeating nodes that are not
themselves
> repeating.
> > 
> > I have thought of doing this the opposite
way, get a
> key of all repeating nodes and process only those at the
> lowest
depth to generate rows.  I haven't actually
> written the logic.
> 
> I gather
that the tables you want to generate have
> something
> to do with multiple
occurrences of elements with the same
> name.
> Does adjacency matter, or
would
> 
> 
> <a><b/><b/><b/><c/><c/><c/></a>
> 
> be treated differently from
> 
> 
> <a><b/><c/><b/><c/><b/><c/></a>
> 
> ?  (Assume if you like, for
purposes of discussion,
> that the b and c
> and a elements all have
interesting attributes.)
> 
> > 
> > Any better ideas would be welcome.
> 
>
Your example reminds me of the contortions I've seen
> people
> go to trying
to represent structured information in RFC
> 822
> attribute-value pairs.  So
the best idea I have at the
> moment
> is:  Save yourself!  Don't do it!
> 
>
But probably you know exactly what you're doing, there is a
> perfectly
>
reasonable algorithm for what you want, and I just haven't
> understood.
> 
>
hth
> 
> --****************************************************************
>
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
> *
http://www.blackmesatech.com
> * http://cmsmcq.com/mib
> * http://balisage.net
> ****************************************************************
> 
> 
> 
>
> 
> --~------------------------------------------------------------------
>
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> To
unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
> or e-mail:
<mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
> --~--

Current Thread
RE: Generic stylesheet to flatten XML hierarchy, (continued) Sara Mitchell - 4 Dec 2009 19:37:54 -0000 C. M. Sperberg-McQueen - 5 Dec 2009 02:35:49 -0000 Michael Kay - 5 Dec 2009 10:27:03 -0000 Sara Mitchell - 7 Dec 2009 18:45:32 -0000 Sara Mitchell - 7 Dec 2009 18:49:23 -0000 <=

<- Previous	Index	Next ->
RE: Generic stylesheet to fla, Sara Mitchell	Thread	General trick for re-applying, Ben Stover
Re: database and XSL, a kusa	Date	Re: database and XSL, Ganesh Babu N
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >