[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Looking for a cleaner way of auditing table cell d

Subject: Re: Looking for a cleaner way of auditing table cell data than this
From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 10 Mar 2023 00:02:17 -0000
Re:  Looking for a cleaner way of auditing table cell d
I can second the recommendation for BaseX as a tool here: itbs easy to
install, it supports XML catalogs out of the box, and you can just point it at
a directory and load it up quick and easy.

If you donbt need DTD-aware parsing itbs really fast. For example, on our
corpus of about 40K DITA documents I can load it from disk in about two
minutes with DTD parsing turned off.

>From the BaseX GUI you can then do whatever XPath or XQuery you want to
analyze and report on your data.

If youbre not familiar with XQuery I also recommend XQuery for Humanists
(https://www.tamupress.com/book/9781623498290/xquery-for-humanists/) as an
excellent introductory how-to text. The target audience is people familiar
with XML but not necessarily XML experts. I found it to provide a really solid
overview of XQuery as well as useful practical examples that you can follow
along with.

Cheers,

E.

_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> |
Twitter<https://twitter.com/servicenow> |
YouTube<https://www.youtube.com/user/servicenowinc> |
Facebook<https://www.facebook.com/servicenow>

From: Steven D. Majewski steve.majewski@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thursday, March 9, 2023 at 5:36 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re:  Looking for a cleaner way of auditing table cell data than
this
[External Email]

________________________________
o;?
If you have a substantial library of documents you want to report on, I would
suggest you use an XQuery database like BaseX or eXist that indexes the
documents
of the work with your XPath selector.
If I understand your question, this should select tables with a td with
significant (i.e. non whitespace) text element and a child element on the
list. ( and you can make the list a variable ).

//table/td[normalize-space(.)!=bb][*[local-name() =  ( bparab,
bnoteb, bcnoteb , bcriticalb, bheadlineb, b& )  ]]


On Aug 29, 2022, at 10:37 AM, Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Hi

I have a substantial library of XML documents which include a great number of
tables. As it happens the content model for table cells is promiscuous; a
table cell may contain "block" data:

<td>
  <para>blah blah.</para>
</td>

even to the extent of nested tables:

<td>
  <para>..</para>
  <table>
    <tb>
      ..
    </tb>
  </table>
<td>

or, in the case of very many simple tables, just simple text content:

<td>Y</td>
<td>N</td>

I would like to identify cases where table cells have exploited the
promiscuous schema and mixed both text and block content, for example:

<td>For example:<para>This is a bad table cell.</para></td>

I can't construct the schema so that this is illegal while the earlier
examples are valid. At least I don't think I can. But I would like to identify
these cells (and correct them, but at the moment just reporting them is
sufficient).

This is the XSL fragment I have come up with (using XSL 2), but I imagine
there is a much cleaner way of doing it and I might learn a useful technique
if I ask.

<xsl:template name="mixed-cells">
  <xsl:for-each select="//table">
    <xsl:for-each select="descendant::td[child::text()[normalize-space() !=
'']]">
      <xsl:if test="count(*[self::para | self::note | self::cnote |
self::critical | self::headline | self::error | self::define | self::qanda |
self::inset | self::ihead | self::steps | self::list | self::ol | self::inlist
| self::syntax| self::fragment | self::table]) &gt; 0">
        <xsl:text>Table cell with mixed content: </xsl:text>
        <xsl:call-template name="get-source" />
        <xsl:value-of select="$nl" />
        <xsl:text> content=</xsl:text>
        <xsl:value-of select="normalize-space(.)" />
        <xsl:value-of select="$nl" />
      </xsl:if>
    </xsl:for-each>
  </xsl:for-each>
</xsl:template>

The normalize-space() in the third line is necessary because otherwise it
picks up newlines in a sequence of block children.
The list of "block" elements in the fourth line above is incomplete, and
should probably be sourced from a variable rather than given as a literal
condition the way I have done it here.
The get-source template outputs the input document name and current line
number, and $nl is what you would expect it to be.

As it stands this template is going to report nested table cells multiple
times; there might be a clever fix for this but at the moment my focus is on
the best way to identify these troublesome cells in the first place.

cheers
T
XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/504751> (by
email)

XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3453418> (by
email<>)

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.