[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Is XML a language or a data format?
Roger L Costello <costello@mitre.org> writes: > Yesterday a colleague made a fascinating distinction between > “language” and “data format”: > * First he noted that English is a language, not a data > format. Likewise, Java is a language, not a data format. > * A language is specified by a grammar. There is a grammar for English > and a grammar for Java. A language is intended to read by humans. > * A data format probably does not have a grammar. It oftentimes is > simply a collection of pieces and parts. It is intended to be > processed by a machine. An example of a data format is JPEG > (Exif). There is no grammar for it. It is just a series of parts > pieced together as the graphic below illustrates. As Michael Kay has already pointed out, if a data format can be read and processed by machine, then it has a reliably recognizable structure; the chances are very good that that structure can be described by a context-free grammar that describes the set of possible instances of the data format somewhere more closely than the context-free grammars of Java and other programming languages match the set of conforming programs. (That is, many data formats are in fact context-free; programming languages with type systems are not context-free.) If "there is no grammar for it" means "it is not possible to write a grammar for it", then the claim is false for every data format I can think of off the bat (including JPEG and Exif). If "there is no grammar for it" means "the people who defined the data format did not bother to write down a formal grammar for it because they couldn't be bothered and formal grammars are for quiche-easters", then it's a sociological statement about the mentality of the data format specifier. (And whenever I encounter a data format designed by someone who believes that data formats don't have grammars, I do my level best to give it and them a wide berth, since I don't need more aggravation in my life.) [If any readers of xml-dev are mystified by the reference to quiche, a search for "Real programmers don't eat quiche" will provide relevant context for the mindset I am attributing to the unnamed data format designers here.] > Do you agree with that distinction? How do you define language? How do > you define data format? How do they differ? For technical purposes, the most useful definition of "language" is as a set of sequences of symbols. Some languages can be defined by grammars, others can by usefully approximated by grammars. For most purposes, I'd say a "data format" is a form in which data can usefully be stored on persistent media or exchanged with others. As a rule, a data format worth using has regularities which can be captured with a grammar, and by and large the provision of a formal grammar describing a format is a plausible sign that reasonable care and thought have gone into the design and specification of the format. There may be exceptions, but the most prominent exceptions I can think of are cases of proprietary formats where an explicit grammar would allow access to the data by undesirable people (i.e. those who are not paying royalties ot the owner of the data format). -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|