[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: 3 approaches to expressing filter rules
If your SME is not computer savvy, don't you need to rule out almost anything that involves a computer language? You need some kind of web page where they can select rules from some list, and convert this to some computer language (e.g. Schematron QuickFix, _javascript_). You might use some intermediate language: 1. JSON A different approach I have found useful for allowing fuzzy sorting and matching of tabular data is to have a configuration file that enables or disables particular transforms for each row (and, potentially, for each field.) Here is the kind of thing in JSON, but you can see this could easily be generated by some webpage. { "fuzz": [ 1, "zero", 4 ], "redact": [ 2, "del", "secret" ], "zero": [ 3, "zero", 0 ], "default": [ "*", "case-insensitive", 0 ] } where you are saying that on field 1 you apply some zero() function with a parameter of 4, on field 2 you apply some del() function with parameter "secret", and on field 3 you apply some zero() function with parameter, and on any other, you do a case-insensitive match. In the old days, we would have made some "little language" for this. In other words, if you are dealing with tabular data, why does it need any kind of path or selector mechanism (let alone schemas?) The natural thing is to use the table names: (if necessary, it would be better to use the spreadsheet column names (A-Z, AA-ZZ) rather than the integer numbers, and better to use the column names). The user needs to configure based on the presentation they see the data in, the particular tools, not some abstraction: for tabular data, the natural tool is the spreadsheet. If you must use a selector, you can hide it as { "CONTEXT-ID": "this-table-id", "fuzz": [ 1, "zero", ... or whatever. 2. Schematron Another approach is to get the SME to write the rules as their list, then you translate it into Schematron, write code to handle it, and they can tweak the Schematron perhaps. <sch:pattern id="FuzzyRules"> <sch:rule context="telephone" > <sch:p>Example: Fuzz the telephone number 555-841-9087 to 555-841-0000 </sch:p> <sch:report test="." role="fuzz" /> </sch:rule> <sch:rule context="amount" > <sch:p>Example: If the field labeled “amount” is empty, set it to 0</sch:p> <sch:report test="." role="zero"/> </sch:rule> <sch:rule context="text()" > <sch:p>Example: Remove the word “secret” from the data </sch:p> <sch:report test="." role="redact" property="secret-word" /> </sch:rule> </sch:pattern> ... <sch:property id="secret"> <s>secret</s> </sch:property> The SVRL of this will generate an element svrl:successful-report for each of these, with the XPath, the role (used to select some function), and any necessary parameters (e.g. the "s"). The developer could flesh it out to make it remotely efficient: <sch:pattern id="FuzzyRules"> <sch:rule context="bribe-table/person/telephone" > <sch:p>Example: Fuzz the telephone number 555-841-9087 to 555-841-0000 </sch:p> <sch:report test="." role="fuzz" /> </sch:rule> <sch:rule context="bribe-table/person/amount" > <sch:p>Example: If the field labeled “amount” is empty, set it to 0</sch:p> <sch:report test="string-length(.) = 0" role="zero"/> </sch:rule> <sch:rule context="bribe-table/person/*/text()" > <sch:p>Example: Remove the word “secret” from the data </sch:p> <sch:report test="contains(., 'secret')" role="redact" property="secret-word" /> </sch:rule> </sch:pattern> ... <sch:property id="secret"> <s>secret</s> </sch:property> Rick On Tue, Oct 12, 2021 at 6:27 AM Roger L Costello <costello@mitre.org> wrote:
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|