[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Nesting a flat XML structure

Subject: Re: Nesting a flat XML structure
From: "ian.proudfoot@xxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 29 Oct 2018 21:52:24 -0000
Re:  Nesting a flat XML structure
Peter,
Going a little off topic, but the concept is relatively simple. Many writers
don't make the best use of their word processors. Maybe lists are manually
indented with bullets inserted from a character palette. Titles may be
'Normal' text with character overrides for font size and weight. You get the
idea? Careful analysis of many documents showed that there are between eight
and ten properties that have the most effect on the output for character
styles and paragraph styles. This is presented as an override code in a format
that is very compact but also possible for anyone to understand. The
combination of any correctly defined style name plus its override code gives
us a key that can be used for mapping to elements in the output.

This works well when there is some inherent logic to the implied structure of
the source document. Less so when no regard has been given to sensible style
use.
Of course you are correct. When styles have been rigorously applied the
results can be very good too. In those (rare) cases this method still catches
the occasional accidental override.

~Ian

-----Original Message-----
From: Peter Flynn peter@xxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: 29 October 2018 21:14
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re:  Nesting a flat XML structure

On 29/10/18 21:04, ian.proudfoot@xxxxxxxxxxx wrote:
> Agreed Wendell and Graydon. I am already doing multiple passes to get
> the content in a suitable state to do the nesting part. I find that
> most word processed text is in a poor state for easy conversion to
> good XML that is valid to a specific schema.

Microsoft's excellent marketing has successfully persuaded this planet that
"looking pretty" is the same thing as "being right".

> When based simply on paragraph and character style names the end
> result is often unusable.

IFF the styles are applied rigorously and in conformance with a known
stylesheet, it is actually possible to get fairly good transformations to (eg)
JATS, DocBook, TEI, etc.

> So I use temporary attributes that encode the important stylistic
> overrides - capturing what the author was trying to achieve. I have
> been very pleased with the results.

I'm very intrigued by this: where do you get the author's intentions from?
Traces they leave in the markup (eg italics or bold)?

///Peter

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.