|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX-J and the DPH (DJH?)
[Sean McGrath] > So this works if: > > 1) No more than 1 telephone number per line [Chris] For my trivial solution. Perl can handle multiple matches per line; I'm just not very sophisticated yet. > 2) No cdata marked sections [Chris] Can be handled by looking for CDATA marked section starts and ends, using code similar to the appendix, and adding && !$incdata to all element-matching conditionals. > 3) The attribute value literal for client does not have any entity > references [Sean - suggested] > 4) The target telephone number does not contain entity references > [Sean - suggested ] The two real problems in this list. > 5) appendix elements do not nest [Sean - suggested] Not a problem - keep a reference counter instead of my trivial boolean approach. (Appendices rarely nest, but this is applicable to other kinds of elements.) > 6) Telephone numbers do not nest (problem if regexp matching is > greedy) [Sean - suggested] The regexp is greedy, but I can use a pattern that will only match single elements. > Others? I think a little list of "gotchas" like this would find the > way onto many a DPH's wall (including mine!). There are only two real problems here, the ones with entity references. These are, on their face, beyond the scope of a DPH. I would either (a) do a quick grep to see if I need to worry about it, or (b) run my script on the output of spam or a similar normalizer. I don't think anyone has claimed that Perl can address everything; as David (I think) said, there is a large fuzzy gray line between problems in the Perl domain and problems in the full XML processor domain. (The assertion can be proven by the fact that a Perl script can solve arbitrary XML processing problems, but will, in the course of doing so, eventually implement a full XML processor.) -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








