[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Grouping of text input file lines

Subject: Re: Grouping of text input file lines
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Sun, 11 Aug 2013 18:44:52 +0100
Re:  Grouping of text input file lines
I've generally done this using your second approach: convert each line to an
element and then use group-starting-with to group them.

In XSLT 3.0 we're allowing patterns to match atomic values, so you can do
group-starting-with on a sequence of strings.

Michael Kay
Saxonica

On 11 Aug 2013, at 15:46, Wolfgang Laun wrote:

> I'll briefly describe the problem and outline two approaches to a
> solution. I'd be pleased to receive a comment or two.
>
> The task is to convert a plain text file to XML using XSLT 2.0. The
> text file contains lines, all according to
>  tag: value
> and these lines are grouped at three levels: "database", "relation"
> and "field", where each entity has some options and one or more
> children of the lower level (except for field, of course).
>
> Example, indentation according to nesting level:
>
> node: abc    # a DB option
> key: CMOS   # a DB option
> rel: rlo_one
>  com: a relation # a relation option
>  alg: direct         # a relation option
>  ele: fa int
>    com: blurb       # element (field) options
>    def: 0
>    acc: px
>    acc: py
>  ele: fb chars
>    com: bla bla
>    def: "----"
>    alg: permute
>  num: 100          # a relation option
> rel: rlo_two
>  com: another relation    # a relation option
>  com: more comment
>  com: yet more comment
>  ele: fx int
>    com: blurb
>    def: 0
>    acc: px
>  ele: fy int
>    com: bla bla
>    def: 42
>  num: 50                   # a relation option
>
> The expected XML structure is obvious, I think: a sequence of DB
> options and relation elements; these contain relation options and
> field elements, which contain field options. Field order must not be
> changed. "com" entries should be joined while observing line breaks,
> and "acc" entries too, but joined with a space.
>
> The first basic idea I used throughout is to maintain another string
> sequence in parallel to the one containing the text lines. That
> sequence contains just the tags, so that index-of can be used to
> compute "interesting" line numbers. This way, subsequences of lines
> for all or individual relations and fields can be conveniently
> extracted.
>
> The second idea is to use grouping. The sequence of lines is converted
> to a sequence of nodes <tag>value</tag> and a nested
> group-starting-with separates relations and fields - almost. As you
> can see, there's some leading lines defining DB options, and each
> relation contains option lines before and after the element groups.
> Most likely, cherry-picking lines and line groups prior to the
> glorious for-each-group has to be done using the technique described
> above.
>
> Any better ideas?
> Thanks

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.