[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: how to best integrate XML in a programming language
> I have questions concerning the use of XML in programming languages. Excellent! This is an area that I have been working on for more than a year! Our project is called Wheat, and can be found at http://www.wheatfarm.org/ -- Wheat is a language designed with several deep connections to XML. My answers here are about how we are handling these issues in Wheat. Since Wheat is in early implementation stage, the features in Wheat described here run the gamut from implemented to designed to just sketched out. > In order to set the stage, here are three suggestions that should > connect the XML world to the world of general-purpose programming > languages like ML, C, C++, Java (using only statically typed ones > here). Wheat is a non-statically typed general-purpose programming language in the tradition of Smalltalk, Self, Perl, PHP, Python, etc... [ I think non-statically typed languages are a much better match to XML processing, but this issue is more of a design stance than a technical argument, so please, no wars over static/non-static. ] > No matter whether you are more concerned about documents or data, > writing code in non-XML syntax makes life easier. Absolutely. Wheat has an abstract syntax that defines what processing in Wheat is. Then it has three (!) concrete syntaxes: 1 - A Wheat native language that looks like most programming languages. This is what any programmer wants to code in. 2 - An XML based syntax that makes program transformation and manipulation easy 3 - A C++ "syntax" that allows one to write code with the same semantics as Wheat, but compiled and access to host OS facilities. The first two forms are isomorphic - you can round trip to and fro without any loss of information (including formatting). > No matter whether you are more concerned about documents or data, > writing XML syntax to describe XML makes life easier. Not clear to me. I don't find XSLT particularly easy or concise to write lexically. > No matter which programming language you prefer, if it cannot deal with > XML it will soon be forgotten or condemned to accomodate numerical > computations only. No comment... this is just tinder! > out.println( <html>... </html>.serialize() ); > > because a compiler can check many of the well-formedness constraints. > It > is thus less likely that a typo will break your neck at runtime. Plus a > very sophisticated type system could even check whether your value > conforms to some type specified in DTD, Relax NG or Schema - this is > something like "built-in data binding". It is unrealistic that a program wants to embed a literal that is an entire XML document. It would be more likely that a program wants to output an XML document composed of XML literals and data values from the running program. Therefore, it is highly unlikely that the compiler could do much in the way of ensuring schema conformance. In Wheat, we use templating based approach: If a program wants to generate an XML document of type X, then the programmer creates an actual example XML document of type X. Next the template is marked up with elements and attributes from a the template namespace. These define which portions of the document are manipulatable by the running program. The program then calls for the expansion of the template, directing the expansion of the marked up sections. The template expander can then ensure that a well-formed XML document results. (Yes, even when the program replaces the content of an element with arbitrary XML (say an XHTML fragment), the template engine ensures the final result is well-formed.) The output of the expander could be schema checked, though we don't at present. We like this approach because it allows the programmer to author the XML template document in a tool that is appropriate to the XML application involved. For example, if it is XHTML, then Dreamweaver can be used. > - Where are XML literals allowed ? > (There must be a clear entry point, at which a language spec links > to the W3C recommendation. This can be problematic if the language > interprets symbols like <, or /> as tokens. In XQuery, this problem > does > not arise, because a query language is by definition declarative and > result-centric... here XML literals, possibly with embedded blocks, are > just the result of queries ) We don't have literals that are XML. In Wheat there is a close isomorphism between a Wheat object and its XML representation. So, instead, Wheat has object literals. These are always "well formed". The object can be turned into XML on demand. > - How to specify that XML literals may contain code blocks ? > (XQuery uses an escape mechanism <b> { msg } </b> with braces, > where > msg is a text variable in the current scope. How can one call such an > XML literal with embedded blocks, is it an XML document, a > half-document, an XML template, an XML form, or what ?) I actually think this is a big mistake. Frankly, there is little difference between: print "<html><head><title>"; print $title; print "</title></head><body>"; for ($i = 1; $i < 10; $i++) { print "<p>"; print $i; print "</p>"; } print "</body></html"; and <html> <head> <title><?script $title ?></title> </head> <body> <?script for ($i = 1; $i < 10; $i++) { ?> <p><?script $i ?></p> <?script } ?> </body> </html> In fact, as JSP shows, these are completely isomorphic. Either representation is neither beast nor fowl. We all dislike the first representation as it clearly leads to code that brittle, and it is extremely hard to see and modify the XML document. The second mearly swaps these problems: the XML is easier to work with (assuming you have an XML editor that can pick around the script blocks, especially when they appear in the middle of attribute values (!)), at the expense of the code being opaque. Further, in real applications that use the second form (I've written some mighty big PHP scripts...) the document can easily end up not even being XML well-formed! It is this very problem that lead Wheat to use templates. With templates there is clear separation: The XML document looks, feels and is authored as XML. The code is clear, concise and compact. The trick is making sure that the code one writes to connect the two is easy and clear. > - How to deal with entities ? > (In programming language syntax this would correspond to constants.) Well, except for &, <, > and " I think they are best banished... :-) I am loath to encourage them by providing language features to support the general issues of entities. (Yes, yes, I recognize their utility in DTDs...) > - How to deal with namespaces ? > (A natural correspondence would be java packages, or C++ > namespaces) I see the same natural relationship. Since Wheat is non-statically typed, the object model allows a further correspondence in how namespaces can be used in XML: In Wheat it is possible for one module to add instance variables to an object that inherits from another. Sort of like a annotation. The base class will happily ignore the instance variables that come from another module. > - Canonicalization or not ? How ? In Wheat this isn't an issue I think. Since when code is running it is concerned with the semantic meaning of the XML, (i.e. we convert it to a tree of objects) we don't care so much about comparing the whitespace. > - How does one write comments in XML literals > (that is an easy one, just use XML comments) As we said, there are no XML literals. Of course, the XML templates can have XML comments in them - and they are (or can be) preserved on output. > - Which type does the XML literal have ? > (This demands a class library that represents XML, which is ideally > not as bloated as DOM) Well, in Wheat, the XML/object dualism takes one of three forms: 1- Canonical Wheat XML. This is just an XML format that any tree of Wheat objects can be written out as. It captures the Wheat semantics exactly. 2- DOM-like tree. This is a set of Wheat objects and classes that exactly capture the semantics of any arbitrary XML document. It will be DOM-like, but we'll probably roll our own to closely match the object and processing facilities in Wheat. 3- Programmer specified. Here, the programmer supplies the Wheat objects and classes that are used to represent a specific XML application. Presumably (if the programmer did the job correctly) the semantics between the XML and objects are identical. This is similar to the data binding systems out there, though it is bi-directional. > - How can one navigate in such a XML representation > (Ideally, an XPath like syntax or something comparable like pattern > matching is used) We always navigate one of the object representations above. XPath is reflected as an object tree pattern matching library. - Mark Mark Lentczner markl@w... http://www.wheatfarm.org/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|