[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML <-> non-XML filter project
At 99/03/30 13:13 +0800, James Tauber wrote: >Earlier this month, I posted the following to XSL-LIST. With apologies to >those who received it there, I'm posting it (modified) here to see if anyone >is interested in some co-operative effort in this area. > >What I would like to see is people taking existing non-XML formats and >developing: > > a) a URI for the non-XML format (for notations and for the namespace of >the XML format) > b) a DTD representing the existing non-XML format > c) an output filter to convert documents conforming to the DTD into the >non-XML format > d) (possibly) an input filter to convert the non-XML format into XML >... >I would personally find great value in this being done for Makefiles, >procmail files, simple shell scripts and PalmPilot databases. Others of >value I can think of include Windows INI files, Unix mailboxes, your >favourite programming language... I'm sorry I didn't notice it when reading XSL-list, but I found this last night on XML-DEV, so I'll post my response to both lists ... apologies in advance for the duplicates. The subject line implies *both* directions XML<->non-XML ... but your prose leans towards only XML->non-XML. I've just recently added this to my XSL training materials (X-Tech attendees didn't see it, WWW8 attendees will see it) because I have since successfully used XML and XSL to produce text-only files (including batch files, control files, etc.) using an environment created by James Clark (many thanks, James!) for his XT program: At Sun, 17 Jan 1999 10:34:34 +0700 James Clark wrote: ====8<---- Here's what the DTD for such a result namespace might look like: <!ELEMENT nxml (escape*, (control|data)*)> <!ATTLIST nxml encoding NMTOKEN "UTF-8"> <!ELEMENT escape (#PCDATA|char)*> <!ATTLIST escape char CDATA #REQUIRED> <!ELEMENT control (#PCDATA|char|data|control)*> <!ELEMENT data (#PCDATA|data|control)*> <!ELEMENT char EMPTY> <!ATTLIST char number NMTOKEN #REQUIRED> The nxml element is the root element; the encoding attribute is a MIME charset to be using for encoding characters as bytes. The data element contains data. Within a data element control characters get escaped. The escape element specifies how a particular control character gets escaped. The control element contains control information. Within a control element, all characters are output directly without escaping. The char element allows the output of a character that is not allowed by XML (such as control-L). ====8<---- The encoding= attribute works with the character set encodings supported by the Java engine running XT ... unfortunately, I haven't found a list of encodings for XT.EXE (Microsoft VM). The character sets that I think I'll need personally for all my text-only work are ISO-8859-1 (Latin 1), IBM Code Page 850 and UTF-8. >From the list of character sets in: ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets ... I found through trial and error that for the Symantec Java environment these are named "Latin1", "IBM85O" and "UTF8" respectively. HELP!!!! - Can anyone help me find the reference list of these (and other) character encodings supported by the Microsoft Java VM? Attached is the sample I wrote to help myself understand the features of the namespace. Once I found the encodings, I richly marked up in XML the source material for a number of simple text files and I now use XT to emit from the XML by using this namespace. So far it has covered what I personally need to emit non-XML text. I haven't yet needed to emit accented characters, but I'm ready with the encodings for my Symantec environment ... I'm hoping someone can help me find the encodings for the Microsoft Java VM. I hope this helps. ......... Ken P:\jclark>type nxml.xsl <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl" xmlns="java:com.jclark.xsl.sax.NXMLOutputHandler" result-ns=""> <xsl:template match="/"> <!--indicate the kind of text being produced--> <nxml encoding="Latin1"> <!--others for Symantec: "IBM850", "UTF8"--> <escape char="\">\\</escape> <!--escape any back slashes--> <data><xsl:apply-templates/></data> <!--translate what's in data--> </nxml> </xsl:template> <xsl:template match="charValue"><!--don't translate what's in control--> <control> <xsl:text>\</xsl:text> <xsl:value-of select="@val"/>-<char number="{@val}"/> <xsl:text>\</xsl:text> </control> </xsl:template> </xsl:stylesheet> P:\jclark>type nxml.xml <?xml version="1.0"?> <test>This is a test with a backslash \ and eacute é in it - plus the latin-1 for eacute <charValue val="233"/> as well </test> P:\jclark>call xsljava nxml.xml nxml.xsl nxml.txt P:\jclark>type nxml.txt This is a test with a backslash \\ and eacute é in it - plus the latin-1 for eacute \233-é\ as well P:\jclark> -- G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (Fax:-0995) Website: XSL/XML/DSSSL/SGML services outline, XSL/DSSSL shareware, stylesheet resource library, conference training schedule, commercial stylesheet training materials, on-line XSL CBT. Next instructor-led XSL Training: WWW8:1999-05-11 XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|