[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: parsing markup with Perl

  • From: Hans-Juergen Rennau <hrennau@yahoo.de>
  • To: Michael Kay <mike@saxonica.com>, Rick Jelliffe <rjelliffe@a...>
  • Date: Sat, 8 Feb 2014 20:01:24 +0000 (GMT)

Re:  parsing markup with Perl
My experience with Perl is positive, to put it mildly: using it as a preprocessor for XML processing - as an XMLifier I would call it. The task at hand was a very versatile reporting tool for log data distributed over a dozen log files, each one with a different log event format, all non-XML but most containing embedded XML fragments. (XML was never parsed - Perl just identified the begin and end of the XML fragments and transferred them as a chunk into the XML output.) It worked wonderfully: Perl raced through the lines of (typically) 50-200 MB of non-XML log messages in less than a minute, applying a few dozen of regular expressions which never caused any trouble, and emitted XML; Saxon stepped in and wrought its miracles which would have been impossible to imagine (not to mention - implement) if not designed and defined in terms of XML processing. The Perl-enabled transition to the XML data model was not the implementation of a predefined task - it enabled the very discovery of a task, a radically new perspective of querying and reporting capabilities no one had imagined. (Hitherto, developers had thought that the natural way to analyze log data was grepping.)
 
To summarize - regular expressions are a grand way to solve one's dependence on regular expressions - "parse once and for (XQuery) all". (The only alternative would be to write a Domain Specific Language, of course.)
 
One last remark: the main problem with regular expressions for most people is that they never take the time to learn the little language completely, believing that they save time by just "looking up" solutions at stackoverflow & Co, which is an illusion. This is an aspect regex has in common with XPath.
 
As in so many places - integration is the magic word. Not XML or something else, but XML and something else.
 
Hans-Juergen

Von: Michael Kay <mike@saxonica.com>
An: Rick Jelliffe <rjelliffe@a...>
CC: "xml-dev@l... OASIS" <xml-dev@l...>
Gesendet: 15:20 Samstag, 8.Februar 2014
Betreff: Re: parsing markup with Perl

They had unreadable code and it was driving them into the ground. My takeaway was that Perl as it stood then required infeasibly much commenting to be maintainable


My only encounter with Perl was equally negative. I was called in as a consultant to rescue a system that had serious performance problems (like response times of two minutes for customers checking the balance on their accounts). It all turned out to be due to one module, written in Perl, which was doing regex-based transformations on XHTML pages. It took a while to work out what the 500 lines of Perl was trying to do, but in the end we rewrote it (using Java DOM, I believe - the project wasn't a good place right then for anything innovative) and solved all the problems at a stroke.

Regular expressions seem to have two problems. The first is that they are unreadable. Anything but the simplest regexes are impenetrable to anyone reading the code, and often to the person writing it, which is why debugging is so hard. The second is that performance is highly unpredictable except to people who really understand the technology extremely well. 

I think we should treat Perl a bit like certain pesticides; something you're only allowed to use if you've been through the right training courses and have acquired a license, which has to be renewed every year by passing exams.

Michael Kay
Saxonica




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.