[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: text files & xsd & regex
Two answers: 1) SGML. It allows you to specify regular expressions (content models) together with the delimiters used, to read in text, parse it to SGML, then output as events. If you have many documents with lots of these, run them through an SGML processor. 2) Check out Xpath2. It seems that it will have some kind of syntax for this kind of thing, see http://www.w3.org/TR/xquery-operators/#func-matches and the reference to captured substrings. I guess the most consistant thing would be to follow them in some way. 3) Probably ISO DSDL will have something like this. In particular, which features in addition to Regular Fragmentations are you interested in? 4) In the meantime, you can tokenize many kinds of strings and check them for various constraints using Schematron, which can be embedded in <appinfo> now and extracted using a stylesheet. That does not give you full regular expressions. (Schematron 1.6 will be out within a month, with <let> statements that help you do consecutive substring capturing from strings, though this is not as powerful as full regular expressions.) We have a free Windows tool that supports embedded Schematron in XML Schemas. 5) If you are writing your own script, the embedded Schematron XSLT scripts may be useful anyway: Francis Norton made some really tricky code for extracting things from appinfo, and you may find it useful to hack that code to generate, for example, Perl scripts. Cheers Rick Jelliffe http://www.topologi.com/ ----- Original Message ----- From: "KRUMPOLEC Martin" <krumpolec@a...> To: <xml-dev@l...> Sent: Friday, March 14, 2003 3:25 AM Subject: text files & xsd & regex > Hi, > > I would like ask if anyone seen something like this : > > - W3C XSL schema annotated (appinfo) with regular expressions > - regexes consists of groups named after child (text only) elements > - simple processor reads text file line by line and produces SAX "events" > - this processor is driven by content model of our schema > - streamed "infoset" matches the schema > > it is similar to "Regular Fragmentations" by Simon St.Laurent, > just a little bit more complicated ... > > PS: if there is nothing like this I'll have to do it myself :-) > > Thank you > > Martin > > -- > Martin Krumpolec <krumpo@p...> > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl> > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|