[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Non-XML documents to XML Converter?

  • From: Paul Prescod <paul@p...>
  • To: "xml-dev@i..." <xml-dev@i...>
  • Date: Tue, 18 May 1999 08:58:16 -0500

xml converter perl
"Roger L. Costello" wrote:
> Interestingly, while driving in this morning I realized that this is
> what an XSL processor does.  The only difference is that an XSL
> Processor has (1) hardcoded to use <...> as the delimiter.
> I think that it would be interesting to make an XSL Processor more
> generic such that you could "plug in" a format description document.
> Thus, the XSL Processor could transform not just XML documents, but any
> kind of documents.  Comments?

>From a formal languages point of view your "format description document"
is a grammar and grammar construction is not very easy. I mean your
particular non-XML syntax is easy but what about the C++ grammar? I don't
think that there is any grammar-based parsing tool that can both handle
the full generality of context free languages and have high performance.

Another way to approach it is to abandon the grammar and just embed the
parsing logic directly in some computer program. This is typically what
Perl, Python and Omnimark programmers do. (though there are formal parser
packages for Perl and Python)

For your simple language either mechanism would be easy. In fact it looks
like about a fifteen line Python program to me. Here's the start of one
that optimizes readability over performance:

from string import split
from fileinput import FileInput

data = FileInput().read()
records = split( data, "//" )

counter = 0
for record in recordstrings:
    counter = counter+1
    parts = split( record, "/" )
    if parts[0]=="fruit":
        print "<message%s setid='%s'>"%(counter, parts[0])
    elif parts[0]=="...":

You can see how the "parsing" logic is spread through the program. In this
case that doesn't matter much because the language is so simple.

As an aside: your document type is a little odd. I don't think it is
intuitive or convenient to give every message a unique generic identifier
("tagname"). The whole point of the generic identifier is that it should
identify a *genre* -- i.e. all messages, or all fruit messages, etc. On
the other hand, you've got something called "setid" which seems to me to
be the right place for an element-unique identifier -- but you seem to
have put the generic identifier there!

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself

The dress code in Las Cruces New Mexico has been tightened [to] target 
Gothic clothing, such as dark trench coats. "It is not a witch hunt"
Superintendent Jesse L. Gozales said. "It is for the safety of the kids
in our schools."  - Associated Press, May 16 1999

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.