Re: Non-XML documents to XML Converter?
"Roger L. Costello" wrote: > > Interestingly, while driving in this morning I realized that this is > what an XSL processor does. The only difference is that an XSL > Processor has (1) hardcoded to use <...> as the delimiter. > > I think that it would be interesting to make an XSL Processor more > generic such that you could "plug in" a format description document. > Thus, the XSL Processor could transform not just XML documents, but any > kind of documents. Comments? >From a formal languages point of view your "format description document" is a grammar and grammar construction is not very easy. I mean your particular non-XML syntax is easy but what about the C++ grammar? I don't think that there is any grammar-based parsing tool that can both handle the full generality of context free languages and have high performance. :( Another way to approach it is to abandon the grammar and just embed the parsing logic directly in some computer program. This is typically what Perl, Python and Omnimark programmers do. (though there are formal parser packages for Perl and Python) For your simple language either mechanism would be easy. In fact it looks like about a fifteen line Python program to me. Here's the start of one that optimizes readability over performance: from string import split from fileinput import FileInput data = FileInput().read() records = split( data, "//" ) counter = 0 for record in recordstrings: counter = counter+1 parts = split( record, "/" ) if parts=="fruit": print "<message%s setid='%s'>"%(counter, parts) ... elif parts=="...": ... You can see how the "parsing" logic is spread through the program. In this case that doesn't matter much because the language is so simple. As an aside: your document type is a little odd. I don't think it is intuitive or convenient to give every message a unique generic identifier ("tagname"). The whole point of the generic identifier is that it should identify a *genre* -- i.e. all messages, or all fruit messages, etc. On the other hand, you've got something called "setid" which seems to me to be the right place for an element-unique identifier -- but you seem to have put the generic identifier there! -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco The dress code in Las Cruces New Mexico has been tightened [to] target Gothic clothing, such as dark trench coats. "It is not a witch hunt" Superintendent Jesse L. Gozales said. "It is for the safety of the kids in our schools." - Associated Press, May 16 1999 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format