[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Formatless files
Roger L Costello <costello@mitre.org> writes: > A file has no inherent format. > ... > ------- > The above are excerpts from the book, The Art of UNIX Programming, > page 46-47. The "system" being referred to is the UNIX system. > How do those excerpts apply to XML? Pervasively, I would have said. Almost every point raised by your uncredited author (Eric Raymond, if the Web is telling me the truth) has an analog in the XML ecosystem, though in most XML usage the data (and those who create and manage it) tend to be somewhat more central than the programmers. > The format of a file is determined by the programs that use it. The specific set of tags used to mark up an XML document is determined, in the usual case, by those who create and use the data, either choosing from a published vocabulary or rolling their own. In some cases, (e.g. the vocabularies used in the XML data in Microsoft Office and its open-source competition), those who create some software choose the vocabulary and hard-code it into their programs; in other cases, the persons or institutions managing the data make the choice. > Since file types are not determined by the file system, the > "kernel" can't tell you the type of file: it doesn't know. Since XML document types can be declared, and in any case are manifest in the document, any program or person who can read a little XML can tell you what vocabulary is in use in a given XML document. Most XML software doesn't care because it can and will handle any XML. > ... > Instead of creating distinctions, the system tries to erase/lessen > them. All text consists of lines terminated by newline characters, > and most programs understand this simple format. Instead of using binary formats which require a close matchup between files and the programs that read and write them, XML (like all other text-based formats) uses structures that can be represented as sequences of characters, and all XML processors understand the relatively simple syntax of XML. (The major difficulties for a programmer in parsing XML come from the fact that in parsing XML you have to bit the bullet and finally learn to deal with Unicode and ISO 10646.) Most text-based formats leave some room for variation: some parts of a JSON data stream are for user-specified names, and the variable and functions names in a programming language are usually chosen by the author of the program ('main' in C is an exception to this rule). XML seems to leave a bit more freedom to the user: XML allows more variation in the internal structure of an XML document than JSON or Markdown (for example) allow in their files. > There's a good test of file system uniformity, due originally to Doug > Mcllroy. Can the output of a FORTRAN program be used as input to the > FORTRAN compiler? A remarkable number of systems have trouble with > this test. McIlroy's test is memorable and mostly persuasive; I have sometimes seen it in the more general form "it should be possible to feed the output of any program to any other program as input". It would not surprise me if it or something like it was bumping around in the back of people's minds when they specified that the output of an XSLT transform would, by default, be an XML document (or before that, that the output of a DSSSL SGML-to-SGML tree transformation would be an SGML document), and that the result of evaluating an XQuery expression would be a sequence of XDM items on which further operations might be performed, and that the result of most XProc processing steps would be one or more XML documents which can be fed to other XProc steps. (It is of course easy enough to serialize results in non-XML forms when that is required.) At another level, since the primary purpose of many Fortran progams is numeric computation, I confess that it has never been clear to me why one would want to use their output as input to a compiler. However, in the generalized form "Can you use a program in programming language L to generate a new program in L?", it's an interesting question. For XSLT, the answer is clearly 'yes'. One of the most common techniques for handling some problems in XML processing is to use an XSLT transform to generate a new XSLT transform. (This may be becoming less common now with XSLT 3.0. And perhaps because it uses a non-XML syntax, XQuery does not seem to have developed this kind of idiom.) > Why are there so many file formats - the XML file format, the JSON > file format, the CSV file format, and so on? Why do human beings have so many different ideas? Perhaps the answer is: because the format of such files is determined by the programs that use them. (By the way, I think you made a typo here: surely you meant to say "the CSV file formats" in the plural, because there are almost as many variants of CSV as there are programs which purport to read or write it.) > Isn't that contrary to the idea of formatless files? At some level, yes. At other levels (the operating system and character I/O routines in C), no. Is that a problem? -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|