[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

SHRVL: hierarchical, summarized SVRL may be useful for implyingstructure

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: xml-dev <xml-dev@lists.xml.org>
  • Date: Tue, 14 Sep 2021 00:46:05 +1000

SHRVL: hierarchical

SHRVL: Schematron Hierarchical Report View Language

It is possible to take an SVRL document and convert it into a hierarchical XML document. This can give us another weapon in our toolbelt for feature extraction and detecting implied structures.
 
See https://schematron.com/document/3472.html for the approach, for feature extraction, for what it is worth. I am wondering if this makes life easier, for some kinds of processing, because it unwraps the @location Xpaths and makes the report into more of a shape like the document (like a kind of reverse Examplotron?).

We collate the SVRL and transpose the @location XPaths into element names with position attributes.

An example mockup of a SHRVL-ed SVRL document made from running a Schematron report on a large set of large HTML documents follows:
<shrvl>
<html> <body pos="2"> <p pos="1" found="p-with-feasible-title" /> <div pos="2" found="div-with-small-simple-table"> <table pos="2" found="td-with-feasible-title"/> </div> <div pos="5" found="div-with-feasible-title" /> </body> </html>
</shrvl>

where the bold part is transposed from
<svrl:successful-report id="p2-r88-a88" role="p-with-potential-heading"
    location="/html/*[1][self::body]/*[1][self::p]" >
   <text>A p with the right words may be the main title</text>
</svrl:successful-report>

We might then give this document to our XSLT transformer, with the original document, so that its features can guide the transformation of the document. (For example, the XSLT could take the title from the first candidate found, except that a table in the second position trumps a starting p.) 

This may reduce the spaghetti coding, and provide a clear intermediate artifact that can be viewed to understand what is going on when some new document that fails the conversion is found.

It also can provide a way to do meta-grammars (architectural forms-ish): where you have a RELAX NG (or Schematron) schema to validate the SHVRL document.  This can act to let you know when your flat document has the features expected for some transformation, without having to actually do that heavyweight transformation.

Regards
Rick


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.