How to design XML in a way that focuses on the strengths of thecomputer?

From: "Costello, Roger L." <costello@mitre.org>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Sat, 21 Nov 2015 14:14:07 +0000

Play the video

Hi Folks,

	Computers have always had a bit of a tenuous
	relationship with text. Although we tend to think
	of text processing as a central task for computer
	hardware and software (indeed, the 8-bit byte is
	the standard design element  in modern computers,
	in large part, due to how well suited it is for 
	encoding Western character sets), the truth of the
	matter is that the human concept of text is really
	alien to a computer.

	... handling text is not a computer's strength. It is
	a necessary evil best kept to a minimum. [Barski]

What is fundamental to computers? Answer: Among other things, memory addressing is fundamental. 

Can we design XML in a way that it focuses on the strengths of the computer?

Let's take an example. Suppose we want to retrieve "west". Consider this XML design:

	<edge>garden west door</edge>

That XML design represents "west" as text. "west" can be retrieved using string manipulation:

	substring-before(substring-after(., ' '), ' ')

You would be shocked at the huge number of machine instructions needed to implement that trivial XPath expression. Hundreds or thousands of machine instructions are needed. 

In an ideal world I should be able to retrieve "west" in a single machine instruction (or, a handful of machine instructions).

Here is an alternate XML design which avoids the use of text:

	<edge>
		<garden/>
		<west/>
		<door/>
	</edge>

Node access is easy and fundamental to the XML language. Now "west" can be retrieved using this simple element reference:

	*[2]/name()

It seems to me that this should involve a simple memory address look-up, and the number of machine instructions required should be one (or a few). Alas, I discovered that it is highly dependent on the XPath processor (XML processor). In fact, I did some timing tests and, with the XPath engine that I used, there was no time difference between the above two XPath expressions. Bummer.

The XML specification is silent on how XML parsers should represent XML. Consequently, a parser might implement this: 

	<edge>
		<garden/>
		<west/>
		<door/>
	</edge>

as a linked list, and therefore, with *[2]/name(), the XPath engine must traverse the linked list to obtain the second child element.

Conversely, if an XML parser were to represent child elements using an array:

   edges
   -----------
0 |      ---------> garden
   ----------
1 |     ----------> west
   ----------
2 |    ---------- > door
   ----------

then "west" is just a single memory reference away.

Are there any XML parsers that represent XML using arrays?

Is there any way to design XML to take advantage of a computer's strengths?

/Roger

[Barski] "Land of Lisp" by Dr. Conrad Barski

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >