[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

Regular Expression search

Michael Kay mhk at mhk.me.uk
Fri Dec 16 10:10:53 PST 2005


regular expression uk postcode
> Search and querying are very different. Search is basically for 
> document-centric XML (like XHTML), where as querying is for 
> data-centric 
> XML (like invoices, etc). If you're using regular expressions for 
> data-centric XML, then I'd say you have a design flaw - but 
> not if you 
> are using them for document-centric XML.

That seems very simplistic to me, for a number of reasons. 

(1) The distinction between document-centric and data-centric is not a
hard-and-fast one. If you take any real application, for example a system
for managing insurance claims, then it contains a spectrum of information
from highly-structured to very loosely-structured. One of the big benefits
of XML is that we can now handle this full spectrum using a single
technology.

(2) XML structures are often designed primarily for information interchange,
not for storage and query. The database often needs to contain the message
as transmitted or received. The fact that the XML design is not optimized
for query is not a design flaw, it is a consequence of the fact that
information interchange rather than query is now the primary driver.

(3) I can think of many perfectly good reasons for using regular expressions
to search highly structured data, even when it was designed primarily for
querying. For example if I receive an invoice that's damaged in the post and
I can't quite read the purchase order number, I might want to do a regular
expression search for the parts of the number that I can read. 

(4) Any argument that says "in data-centric XML there should be no implicit
structure in textual fields, it should all be denoted by explicit markup"
can be applied equally well to document-centric XML. In both cases the
argument is false: it's entirely reasonable to store a UK postcode such as
"RG4 7BS" as a single string even though the "RG4" on its own carries
meaning; similarly dates, part numbers, etc. The granularity of markup
involves a design compromise, you can't argue that finer-grained markup is
always better.

Michael Kay
http://www.saxonica.com/




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.