[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Splitting an XML file based on size

Subject: Splitting an XML file based on size
From: Adam Van Den Hoven <Adam.Hoven@xxxxxxxxxxxx>
Date: Tue, 3 Apr 2001 15:50:04 -0700
sed xml
Hey guys, 

I'm processing an NITF file into HTML. NITF is very much like HTML in that
it has a body with paragraph tags that has mixed content. The HTML that I am
creating from my tranforms can quickly become several tens of kb in size.
Since I'm transfering this over a wireless modem to a PocketPC at a maximum
of 14.4 kbs, an HTML file that is 15kb is entirely too big. 

I need some way to keep track of the number of characters I've processed and
stop when I reach a specific size, stoping at the end of the paragraph. I
understand that counting characters is not very precise but I am only
interested in getting the transfer size to be less than 2K or so. 

As an example, I might have the following NITF code:

<nitf baselang="en.ca">
   <head><!-- Header Metadata here --></head>
   <body>
      <body.head><!-- Body head stuff here --></body.head>
      <body.content>
         <p>
            Lorem ipsum dolor sit amet, 
            <em>consectetuer adipiscing elit, sed diem</em>
             nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam
erat volutpat. 
         </p>
         <p>
            Lorem ipsum 
            <q>dolor sit amet, consectetuer adipiscing elit,</q>
             sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat. 
         </p>
         <p>
            Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem 
            <em>nonummy nibh euismod </em>
            tincidunt ut lacreet dolore magna aliguam erat volutpat. 
         </p>
         <p>
            Lorem ipsum dolor sit amet, 
            <em>consectetuer adipiscing elit, </em>
            sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat. 
         </p>
         <p>
            Lorem ipsum dolor sit amet, 
            <q>consectetuer adipiscing elit,</q>
             sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat. 
         </p>
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam erat
volutpat. </p>
      </body.content>
      <body.end><!-- tagline here --></body.end>
   </body>
</nitf>

The text there happens to be nearly 500 characters. Lets say that my target
size is 375 characters. That should be "o" in "euismod" in the third <p>
tag. Normally I would create:
<html>
   <head><!-- Header Metadata here --></head>
   <body>
         <p>
            Lorem ipsum dolor sit amet, 
            <em>consectetuer adipiscing elit, sed diem</em>
             nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam
erat volutpat. 
         </p>
         <p>
            Lorem ipsum 
            <q>dolor sit amet, consectetuer adipiscing elit,</q>
             sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat. 
         </p>
         <p>
            Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem 
            <em>nonummy nibh euismod </em>
            tincidunt ut lacreet dolore magna aliguam erat volutpat. 
         </p>
         <p>
            Lorem ipsum dolor sit amet, 
            <em>consectetuer adipiscing elit, </em>
            sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat. 
         </p>
         <p>
            Lorem ipsum dolor sit amet, 
            <q>consectetuer adipiscing elit,</q>
             sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat. 
         </p>
         <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam erat
volutpat. </p>
   </body>
</html>

but what I want to create is:

<html>
   <head><!-- Header Metadata here --></head>
   <body>
         <p>
            Lorem ipsum dolor sit amet, 
            <em>consectetuer adipiscing elit, sed diem</em>
             nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam
erat volutpat. 
         </p>
         <p>
            Lorem ipsum 
            <q>dolor sit amet, consectetuer adipiscing elit,</q>
             sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat. 
         </p>
         <p>
            Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem 
            <em>nonummy nibh euismod </em>
            tincidunt ut lacreet dolore magna aliguam erat volutpat. 
         </p>
         <p><a href="someURL">View Entire story</a></p>
   </body>
</html>

> I can't be so coarse as counting paragraphs since I might also have a
> table (essentially an HTML table) or lists or something. Some paragraphs
> will be as short as a single sentance, others will be much longer. 
> 
> I also need to do some additional processing after I reach the end of the
> NITF text (but the size of those will be much more rigid and simply
> subtracted from the target filesize). 
> 
> I had thought about doing something approximately like:
> 
> <xsl:template match="p" mode="block">
> 	<xsl:param name="cursize" select="0">
> 	<xsl:variable name="size" select="$cursize" />
> 	<p>
> 		<xsl:apply-templates select="child::node()" mode="inline">
> 			<xsl:with-param name="cursize" select="$size + 7" />
> <!-- +7 characters for the tags -->
> 		</xsl:apply-templates>
> 	</p>
> 	<xsl:if test="$size <= 400">
> 		<xsl:apply-templates match="followingsibling::p[1]"
> mode="block"/>
			<xsl:with-param name="cursize" select="$size"
		</xsl:apply-templates>
> 	</xsl:if>
> </xsl:template>
> 
> but clearly that isn't going to work. I also assume that making a global
> variable called $size wouldn't work either.
> 
> I am getting the feeling that this isn't strictly possible with XSL. I am
> using MSXML 3 so scripting might be a solution but I am loath to use it
> unless I have to. 
> 
> Adam van den Hoven
> Internet Application Developer
> Blue Zone
> tel. 604.685.4310
> fax. 604.685.4391
> Blue Zone makes you interactive.(tm) http://www.bluezone.net/
> 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.