[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Nesting a flat XML structure
Peter, Going a little off topic, but the concept is relatively simple. Many writers don't make the best use of their word processors. Maybe lists are manually indented with bullets inserted from a character palette. Titles may be 'Normal' text with character overrides for font size and weight. You get the idea? Careful analysis of many documents showed that there are between eight and ten properties that have the most effect on the output for character styles and paragraph styles. This is presented as an override code in a format that is very compact but also possible for anyone to understand. The combination of any correctly defined style name plus its override code gives us a key that can be used for mapping to elements in the output. This works well when there is some inherent logic to the implied structure of the source document. Less so when no regard has been given to sensible style use. Of course you are correct. When styles have been rigorously applied the results can be very good too. In those (rare) cases this method still catches the occasional accidental override. ~Ian -----Original Message----- From: Peter Flynn peter@xxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: 29 October 2018 21:14 To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: Nesting a flat XML structure On 29/10/18 21:04, ian.proudfoot@xxxxxxxxxxx wrote: > Agreed Wendell and Graydon. I am already doing multiple passes to get > the content in a suitable state to do the nesting part. I find that > most word processed text is in a poor state for easy conversion to > good XML that is valid to a specific schema. Microsoft's excellent marketing has successfully persuaded this planet that "looking pretty" is the same thing as "being right". > When based simply on paragraph and character style names the end > result is often unusable. IFF the styles are applied rigorously and in conformance with a known stylesheet, it is actually possible to get fairly good transformations to (eg) JATS, DocBook, TEI, etc. > So I use temporary attributes that encode the important stylistic > overrides - capturing what the author was trying to achieve. I have > been very pleased with the results. I'm very intrigued by this: where do you get the author's intentions from? Traces they leave in the markup (eg italics or bold)? ///Peter
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|