|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Penance for misspent attributes
In <1021562444.984.6178.camel@l...>, "Simon St.Laurent" <simonstl@s...> wrote: | It's striking me more and more that developers, myself included, have | done a poor job of examining and explaining how markup works and what | the parts do best. That extends to a key discussion which is generally | considered dull but radioactive: the elements/attributes distinction. The litmus test is whether one thinks these are or should be equivalent: 1) <foo bar="baz"/> 2) <foo><bar>baz</bar></foo> I would say that the job so far, poor or not, has almost entirely been one of propounding the equivalence view. | A lot of people have been storing data in attributes rather than in | element content. There are lot of reasons for this, ranging from a more | compact form to simpler processing in SAX. And, of course, Keeping Things Safe For Netploder. Is there some taboo on mentioning this? | To some extent, the misuse arose because attributes had features | (defaulting, free order, some types, enumeration) that elements didn't | have. W3C XML Schema condones those practices for attributes and | extends the same features to elements. Maybe this is an improvement, | maybe it isn't. Taking the minority view, I would say that it isn't. That is, rather than trying to unify attributes and (sub)elements - especially those that wind up with the moral equivalent of (#PCDATA) content models - it may be more fruitful to keep them distinct. | Separating markup from content - and putting attributes squarely in the | markup side - seems like one means of at least alleviating the headache. Well, that's how it all started (see eg, [1]). My personal rule of thumb has always been "elements for analysis, attributes for annotation". The key is the sense in which attributes are not directly "analytic". In my own attempts to explain this to (computer-savvy?) people, I've often drawn a parallel with parsing theory, based on the similarities between content models and BNFs (extended regular grammars). Given a set of production rules, a successful parse yields a parse tree with nonterminals as nodes and terminals as leaves. With one twist, the SGML/XML serialization of such a parse tree is obvious. (The twist is in the treatment of what are *taken* to be terminals, in that programs such as Bison allow terminals of two kinds: variables instantiated by a lexer, and string constants. The former actually correspond to #PCDATA elements with obvious expansions, the latter to text directly.) The basic outcome is a complete partitioning of the data into a hierarchy of semantically meaningful categories. Turning this around, a SGML/XML instance basically represents a *complete parsing* of its text content. That is, while the problem in parsing theory is to recognize input, the primary intent of generalized markup is to express the result of a prior process of recognition in the same formalism of parse trees. Pushing the analogy further, where attributes make their appearance in the semantic processing of parse trees, markup-attributes are very similar to inherited (as opposed to synthesized) parse-attributes. The basic lesson: Do not use attributes to *analyse* wholes into parts. [1] http://www.sgmlsource.com/history/AnnexA.htm
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








