[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Converting non-pure trees to pure trees
I'm very new to XML - so please forgive my ignorance. I have a XML file which I have automatically converted from msword, the basic structure is: <worddocument> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <pagebreak/> <p>2/1</p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <pagebreak/> <p>2/2</p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <worddocument/> I wish to transform this tree using some knowledge I have about the document: The first page is always the "introduction", whilst all sebsequent pages are "monographs" <semanticdocument> <introduction> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> </introduction> <mongraphs> <mongraph id="2/1"> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> </mongraph id="2/1"> <mongraph id="2/2"> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> <p>paragraph <b>hello</b> <i>world</i></p> </mongraph> </mongraphs> <semanticdocument/> With the help of a colluege I have managed to get the document into 'sort' of this structure :) but I don't seem to be able to control the monographs correctly: basically how do I tell XLST that I want all the information between pagebreak tags - the problem is caused by the fact that the number of paragraphs on a page in variable. so, where am I going wrong - the XLST I have that sort of works is: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" encoding="UTF-8"/> <!-- Recurse through every element --> <xsl:template match="/*"> <xsl:element name="semanticdocument"> <introduction> <xsl:apply-templates select="p[ not( preceding-sibling::pagebreak ) ]" /> </introduction> <monographs> <xsl:apply-templates select="p[ ( preceding-sibling::pagebreak ) ]" /> </monographs> </xsl:element> </xsl:template> <!-- Anything else is in <monograph/> --> <xsl:template match="*"> <xsl:copy-of select="."/> </xsl:template> </xsl:stylesheet> which of course puts ALL the paragraphs into one mongraphs section, when I would actually like to break them out. Any thoughts? Kinds Regards, Philip. -----Original Message----- From: owner-xsl-list-digest@xxxxxxxxxxxxxxxx [mailto:owner-xsl-list-digest@xxxxxxxxxxxxxxxx] Sent: 20 November 2000 13:31 To: xsl-list-digest@xxxxxxxxxxxxxxxx Subject: The XSL-List Digest V3 #357 The XSL-List Digest Monday, November 20 2000 Volume 03 : Number 357 In this issue: Fw: Bug in SAXON (entity/character set ) ?? search and replace along with apply-templates Re: Q.) Encode URL inside HTML Anchor Tag. Re: Fw: Bug in SAXON (entity/character set ) ?? Re: Fw: Bug in SAXON (entity/character set ) ?? Re: Fw: Bug in SAXON (entity/character set ) ?? Re: Fw: Bug in SAXON (entity/character set ) ?? Re: Fw: Bug in SAXON (entity/character set ) ?? Re: Fw: Bug in SAXON (entity/character set ) ?? Re: search and replace along with apply-templates RE: Bug in SAXON (entity/character set ) ?? RE: self axis and attributes RE: Standard XSLT API & Debugging RE: Q.) Encode URL inside HTML Anchor Tag. RE: Fw: Bug in SAXON (entity/character set ) ?? do I have better option other than mode Re: do I have better option other than mode RE: do I have better option other than mode RE: Q.) Encode URL inside HTML Anchor Tag. RE: how can I write in the commandline that dealwith several inpu t xml files I'm a newbie, where do I start ?? ---------------------------------------------------------------------- Date: Sun, 19 Nov 2000 10:34:51 -0500 From: "Melvyn Rosengarden" <melrose@xxxxxxxxxxxxxxxx> Subject: Fw: Bug in SAXON (entity/character set ) ?? I am having trouble getting the character • to output corrrectly using SAXON. Can any one provide a working example of how to do this. I created a small XML test file as shown below. I then wrote the XSLT to display @e_test1 and @e_test2 I used both MSMXL and SAXON and saved both results to a file. I opened the respective files in both IE and Netscape. The MSXML transform rendered the bullet (•) correctly. The SAXON transform produced "gibberish" characters. Can anyone offer some help with this ?? <?xml version='1.0' encoding="ISO-8859-1" standalone="yes" ?> <?xml:stylesheet type="text/xsl" href="e_test.xsl"?> <!DOCTYPE entity_test [ <!ELEMENT entity_test ANY > <!ATTLIST entity_test e_test1 CDATA #IMPLIED> <!ATTLIST entity_test e_test2 CDATA #IMPLIED> <!ENTITY bullet "&#8226;"> ]> <entity_test e_test1="•" e_test2="•" /> > "You already have zero privacy -- get over it !! > Melvyn Rosengarden > melrose@xxxxxxxxxxxxxxxx > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 07:55:37 -0800 From: Robert Koberg <rob@xxxxxxxxxx> Subject: search and replace along with apply-templates I have some JavaScript functions that, on click of a glossed word, open a new window and writes to it with document.write. That all works fine except there is potential for it to break if the defintion contains a single quote. Is there some way to apply-templates and search for the single quote character and prepend the js escape character "\" to the single quote? Below is where I put together the JS for the switch(case) statement: <xsl:template match="glossentry"> <xsl:variable name="theword" select="normalize-space(glossterm)"/> case "<xsl:value-of select="$theword"/>": item = '<b><xsl:value-of select="$theword"/></b><br/><br/>'; <!-- this part could contain a single quote --> def = '<xsl:apply-templates select="glossdef/example/para" mode="glossary"/>'; break; </xsl:template> an example of the output: case "expatriate": item = '<b>expatriate</b><br><br>'; def = 'The French <i>expatriates</i> in the U.S. got together to celebrate Bastille Day.<br><br>While managing his company's operations in a small town southeast of Paris, Leon Chester, then an American <i>expatriate</i>, noticed that his French colleagues shook hands every morning.<br><br>'; break; - -------- This will fail is JS because of the single quote after "<br>While managing his company's ". Is there a way to search the string delivered by: <xsl:apply-templates select="glossdef/example/para" mode="glossary"/> and escape any single quotes, for example: case "expatriate": item = '<b>expatriate</b><br><br>'; def = 'The French <i>expatriates</i> in the U.S. got together to celebrate Bastille Day.<br><br>While managing his company\'s operations in a small town southeast of Paris, Leon Chester, then an American <i>expatriate</i>, noticed that his French colleagues shook hands every morning.<br><br>'; break; tia, Rob XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 19:37:45 GMT From: David Carlisle <davidc@xxxxxxxxx> Subject: Re: Q.) Encode URL inside HTML Anchor Tag. > XSLT spec only says that non-ASCII characters should be automatically hmm I suppose it does. Live and learn:-) David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Control Centre. For further information visit http://www.star.net.uk/stats.asp XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 19:51:28 GMT From: David Carlisle <davidc@xxxxxxxxx> Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? > I am having trouble getting the character • to output corrrectly > using > SAXON. Can any one provide a working example of how to do this. what do you mean by "correct" most likely saxon output using UTF8 encoding, which is the default encoding for XML, in which case this character would be output using multiple bytes, and would look like > The SAXON transform produced "gibberish" characters. If you look at the file using a latin1 encoded editor or any other encoding than utf8. > <!ENTITY bullet "&#8226;"> why the double escaping here? It cold more easily have been written <!ENTITY bullet "•"> David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Control Centre. For further information visit http://www.star.net.uk/stats.asp XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 13:37:05 -0700 (MST) From: Mike Brown <mike@xxxxxxxx> Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? David Carlisle wrote: > > <!ENTITY bullet "&#8226;"> > why the double escaping here? > It cold more easily have been written > <!ENTITY bullet "•"> It looks like he's trying to see if it's possible to force a character reference into the output by putting the 7 characters & # 8 2 2 6 ; into the stylesheet. Of course, the &, as character data, will always be serialized as & or &, so it's not going to work. :) He should just put the 1 bullet character into the stylesheet by specifying it as • and not worry about how it gets serialized. - Mike ____________________________________________________________________ Mike J. Brown, software engineer at My XML/XSL resources: webb.net in Denver, Colorado, USA http://www.skew.org/xml/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 16:26:16 -0500 From: "Melvyn Rosengarden" <melrose@xxxxxxxxxxxxxxxx> Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? Thanks very much for your response. In my example I had a declaration of <?xml version='1.0' encoding="ISO-8859-1" standalone="yes" ?> I looked at the outputted file using IE/Netscape both confiured to use Arial font. All things were equal except one was a SAXON transform and the other MSXML. - ----- Original Message ----- From: "David Carlisle" <davidc@xxxxxxxxx> To: <xsl-list@xxxxxxxxxxxxxxxx> Sent: Sunday, November 19, 2000 2:51 PM Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? > > > > I am having trouble getting the character • to output corrrectly > > using > > SAXON. Can any one provide a working example of how to do this. > > what do you mean by "correct" most likely saxon output using UTF8 > encoding, which is the default encoding for XML, in which case > this character would be output using multiple bytes, and would > look like > > > The SAXON transform produced "gibberish" characters. > > If you look at the file using a latin1 encoded editor or any other > encoding than utf8. > > > > <!ENTITY bullet "&#8226;"> > why the double escaping here? > It cold more easily have been written > <!ENTITY bullet "•"> > > David > > _____________________________________________________________________ > This message has been checked for all known viruses by Star Internet delivered > through the MessageLabs Virus Control Centre. For further information visit > http://www.star.net.uk/stats.asp > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 16:44:16 -0500 From: "Melvyn Rosengarden" <melrose@xxxxxxxxxxxxxxxx> Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? I wasn't SURE about the entity declaration. In Kay Michaels book I saw examples of both. Soooo in my test bed I used both ways <entity_test e_test1="•" e_test2="•" /> Neither one was rendered correctly in SAXON, both rendered aas bullets using MSXML. - ----- Original Message ----- From: "Mike Brown" <mike@xxxxxxxx> To: <xsl-list@xxxxxxxxxxxxxxxx> Sent: Sunday, November 19, 2000 3:37 PM Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? > David Carlisle wrote: > > > <!ENTITY bullet "&#8226;"> > > why the double escaping here? > > It cold more easily have been written > > <!ENTITY bullet "•"> > > It looks like he's trying to see if it's possible to force a character > reference into the output by putting the 7 characters & # 8 2 2 6 ; into > the stylesheet. Of course, the &, as character data, will always be > serialized as & or &, so it's not going to work. :) He should just > put the 1 bullet character into the stylesheet by specifying it as > • and not worry about how it gets serialized. > > - Mike > ____________________________________________________________________ > Mike J. Brown, software engineer at My XML/XSL resources: > webb.net in Denver, Colorado, USA http://www.skew.org/xml/ > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 22:23:42 GMT From: David Carlisle <davidc@xxxxxxxxx> Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? > Thanks very much for your response. In my example I had a declaration > of <?xml version='1.0' encoding="ISO-8859-1" standalone="yes" ?> that specifies the encoding used in the input file or the stylesheet (whichever it appears in). It does not specify the encoding to be used for output. for that you should use <xsl:output encoding="...."/> David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Control Centre. For further information visit http://www.star.net.uk/stats.asp XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 18:44:02 -0700 (MST) From: Mike Brown <mike@xxxxxxxx> Subject: Re: Fw: Bug in SAXON (entity/character set ) ?? Melvyn Rosengarden wrote: > I wasn't SURE about the entity declaration. In Kay Michaels book I really wish Michael Kay, as an Englishman, would make his mail software put his name with his first name first, family name last. :) > I saw examples of both. Soooo in my test bed I used both ways > > <entity_test e_test1="•" e_test2="•" /> > > Neither one was rendered correctly in SAXON, both rendered aas bullets using > MSXML. It's an encoding issue, as David was saying. The problem is not so much what's in your stylesheet, but what encoding you're getting upon output, and what your browser is assuming about the encoding. I suspect that if you use xsl:output to ensure that your HTML is output in the encoding your browser expects, or if you ensure that your HTML contains the appropriate meta element declaring the encoding, then you will see the results you expect. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Sun, 19 Nov 2000 17:39:21 -0800 From: "Christopher R. Maden" <crism@xxxxxxxxxx> Subject: Re: search and replace along with apply-templates At 07:55 19-11-2000 -0800, Robert Koberg wrote: >I have some JavaScript functions that, on click of a glossed word, open a >new window and writes to it with document.write. That all works fine except >there is potential for it to break if the defintion contains a single quote. >Is there some way to apply-templates and search for the single quote >character and prepend the js escape character "\" to the single quote? <xsl:template name="fixQuotes"> <xsl:param name="do.quote"/> <xsl:param name="string"/> <xsl:choose> <xsl:when test="$do.quote"> <xsl:choose> <xsl:when test="contains($string, "'")"> <xsl:value-of select="substring-before($string, "'")"/> <xsl:text>\'</xsl:text> <xsl:call-template name="fixQuotes"> <xsl:with-param name="do.quote" select="$do.quote"/> <xsl:with-param name="string" select="substring-after($string, "'")"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$string"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:value-of select="$string"/> </xsl:otherwise> </xsl:choose> </xsl:template> HTH, Chris - -- Christopher R. Maden, Senior XML Analyst, Lexica LLC 222 Kearny St., Ste. 202, San Francisco, CA 94108-4510 +1.415.901.3631 tel./+1.415.477.3619 fax <URL:http://www.lexica.net/> <URL:http://www.oreilly.com/%7Ecrism/> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 09:07:11 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: Bug in SAXON (entity/character set ) ?? > I am having trouble getting the character • to output > corrrectly using SAXON. > > The SAXON transform produced "gibberish" characters. Saxon (like any conformant XSLT processor) will produce UTF-8 characters unless you ask for anything else. UTF-8 looks like gibberish if you try reading it using software that doesn't understand UTF-8. Try <xsl:output encoding="iso-8859-1"/> Mike Kay XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 09:02:23 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: self axis and attributes > Am I missing something? Is there any way of testing the identity of > namespaced attributes while retaining independence between the source > and stylesheet and without testing the namespace-uri()? You're not missing anything, except the self-attribute axis that the language designers forgot to put in the spec. You just have to test local-name()="x" and namespace-uri()="y" Mike Kay XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 10:12:50 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: Standard XSLT API & Debugging > As someone who's trying to build vendor- > independent XSLT tools, I am definitely > interested in a standard XSLT API. > But more importantly, I would like to see > this API address XSLT debugging. Progress on TrAX (transformation API for XML) is starting to look good, thanks to heroic efforts by Scott Boag, but debugging interfaces are currently out of scope. Mike Kay XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 09:08:18 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: Q.) Encode URL inside HTML Anchor Tag. > Is Saxon just following the instructions for the HTML output method in > XSLT1.0: > > "The html output method should escape non-ASCII characters in URI > attribute values using the method recommended in Section B.2.1 of the > HTML 4.0 Recommendation." > http://www.w3.org/TR/xslt#section-HTML-Output-Method > > Or does it go beyond this? Saxon implements this behavior by default, but gives you an option to turn it off. Mike Kay XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 10:23:29 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: Fw: Bug in SAXON (entity/character set ) ?? > I really wish Michael Kay, as an Englishman, would make his > mail software put his name with his first name first, family name last. :) You wouldn't believe the battles I've had with our corporate IT department on this one! Sorry, I can't change it. Actually they agree they got it wrong but they tell me that reconfiguring a Microsoft Exchange server with 25,000 names in its directory is prohibitively expensive/disruptive. So it comes down to the evil empire again. I'm only half English anyway. Mike Kay > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 11:12:25 -0000 From: "Pollington, Lee (ELSLON)" <lee.pollington@xxxxxxxxxxxxx> Subject: do I have better option other than mode Hi all, I working on a stylesheet covering over 170 elements. Many elements have an "id" attribute. Now I want to output an HTML target for every element with and "id", I thought <xsl:if test="@id">.....</xsl:if> would be a bit messy in every template that needed it, so I thought I could do: <xsl:template match="*[@id]"> <a name="{@id}"/> <xsl:apply-templates select="." mode="has-id"/> </xsl:template> <xsl:template match="fig" mode="has-id">.....</xsl:template> However that still means knowing which templates are going to need that mode and on the face of it makes the templates themselves less reusable, it doesn't look too efficient either, but I don't know about that. I was hoping to do something a little more generic. Any suggestions? Kind regards Lee XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 11:26:47 GMT From: David Carlisle <davidc@xxxxxxxxx> Subject: Re: do I have better option other than mode Rather than <xsl:if test="@id">.. You could just have <xsl:apply-templates select="@id"/> in any templates that might need this, then <xsl:template match="@id"> <a name="{.}"/> </xsl:template> David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Control Centre. For further information visit http://www.star.net.uk/stats.asp XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 11:54:58 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: do I have better option other than mode > I working on a stylesheet covering over 170 elements. Many > elements have an "id" attribute. > The answer to this one might be xsl:apply-imports. Define the standard behavior for each element in one stylesheet module. Import this into another module that does <xsl:import href="standard-templates.xsl"/> <xsl:template match="*[@id]"> <a name="{@id}"/> <xsl:apply-imports/> </xsl:template> Mike Kay XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 11:55:57 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: Q.) Encode URL inside HTML Anchor Tag. > > Is Saxon just following the instructions for the HTML output method > > Or does it go beyond this? > > Yes, SAXON goes beyond this. SAXON escapes characters in the > ASCII range > that are reserved in URIs. e.g., the space character becomes > %20. You're right. I thought I'd fixed this a while ago but it seems not. Mike Kay > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 10:00:17 -0000 From: Kay Michael <Michael.Kay@xxxxxxx> Subject: RE: how can I write in the commandline that dealwith several inpu t xml files > I want to know how can I write in the command line > to deal with several input xml file and one xslt file then output one > output file You can't. If there's one output file then there is one transformation and therefore one principal input document; secondary input documents have to be loaded by the stylesheet itself using the document() function. Mike Kay XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ Date: Mon, 20 Nov 2000 23:16:09 +1100 From: "Daniel Wong" <wongy@xxxxxxxxxxxx> Subject: I'm a newbie, where do I start ?? Hi, I'm new at XSLT and XML... I'm looking to learn more about web-services based on XML and XSLT (in trasnforming into HTML, WML, etc..) Where do I start and what are your advices as to what parser I should use.. Cheers Dan XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ------------------------------ End of The XSL-List Digest V3 #357 ********************************** XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|