[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] which came first: content or markup?
I asked a question about manipulating a document through a flat view of its text this morning, and got back a variety of answers that didn't quite seem to do what I was looking for. I'm guessing that the reason for that disjunction is that I didn't make myself very clear. I'll try to tell this as a story, and see if it helps. When I first got into hypertext, I was using HyperCard. My method of creating hypertexts was pretty simple.[1] First, I created a stack of cards which had text in them. Some of those cards had an understood sequence, because of the limitations of a card approach on a 512x342 screen, but basically I wrote out a collection of small texts with titles but minimal internal structure. In order to turn text into hypertext, I ran a script that searched through all the texts and added links based on keywords. Effectively it was marking up the document, but it was invisible markup. I made hyperlinked text bold to distinguish it. Eventually I added a script to convert these stacks into HTML for broader distribution, but it did so by adding textual markup explicitly to the text fields - pretty ugly when I'd been used to pristine text. Then it dumped the fields to files, and I threw away the modified stacks without saving. None of this was brilliant programming, but it did very nicely for 34K of stack overhead. As I shifted gears toward HTML - heck, the world was moving toward this markup stuff - I still used a similar approach. I'd write up a document in plain text, and then mark it up. Eventually I started putting the text into a template with headers and footers before marking it up, but it was still a pretty simple and straightforward process, one that echoes (I think) the typesetting usage of markup. Textual stories appear, and markup gets added. Over time, I've come to write the markup along with the text, though it's not exactly fun. I've been looking for an editor that would mesh well with my style of marking up documents, more or less a markup painter, but it seems my perspective must be unusual. (Topologi's[2] editor is extremely cool, probably the best thing I've found.) In my programming, I've wanted to take a similar approach. Regular Fragmentations [3] is a 'painter' of a sort, though it applies rather rigid rules to information, sort of a paint by number. A number of my projects work with just the marked-up text of a document directly, and also make changes based on sets of rules, though that's as much remodeling as painting. What I'm finding as I build my applications is that most of the toolkits out there assume that the markup process is already done, and that content should or must be handled as individual nodes of content defined by the markup. There is little or no concept of the content as a coherent whole separate from the markup. (Although such a concept is often cited as key differentiator of 'documents' rather than data, I suspect that the relative potential chaos of documents is a more important differentiator. I don't have much trouble marking up tables of repetitive information if they're presented to me as text with headers.) The toolsets I find readily available are delighted to process nodes, but they have very little concept of a text separate from or prior to those nodes. There's not much notion of searching that text or processing that text in a way which modifies the nodes underneath - perhaps deleting, adding, or changing - determined primarily by the contents of the text. The only spec I've really seen try to address this notion of the text in a document is XPointer, and I'm afraid XPointer is snarled in the same thing I am: everyone's working with nodes these days. The dominant view at the moment seem to be that documents are composed of nodes, and it's the nodes that are primary, not the content of those nodes. Document structures are containers filled with content that must fit precisely, not information added to a document to reflect its content. I guess that's fine for what most people are doing, but it also means that I'll have to roll my own tools. A content-first view doesn't seem very popular in the XML world at present, and I can't say I see that changing. Markup now seems to come first. [1] - http://simonstl.com/projects/ht22/ [2] - http://www.topologi.com [3] - http://simonstl.com/projects/fragment/ -- Simon St.Laurent Ring around the content, a pocket full of brackets Errors, errors, all fall down! http://simonstl.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|