[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: HTML5 and almost no namespaces

  • From: "David Lee" <dlee@calldei.com>
  • To: "'Kurt Cagle'" <kurt.cagle@gmail.com>
  • Date: Fri, 3 Jun 2011 11:56:31 -0400

RE:  HTML5 and almost no namespaces

Its "trivial" yes, but its not "right" IMHO :)

Nor is it necessarily efficient.

 

I wouldn't bet a case of beer that for a large value of attribute x that

               

     points = fn:tokenize( $x , "[ ,]")

 

is more efficient then for a node x with point children

     points = $x/point

 

I can imagine in some processors for some size of $x one or the other is more efficient.

 

But is that a reason to make the design decision for a (potentially) widely used standard schema ?

This is a serious question, not rhetorical.

 

 

 

 

 

----------------------------------------

David A. Lee

dlee@calldei.com

http://www.xmlsh.org

 

From: Kurt Cagle [mailto:kurt.cagle@g...]
Sent: Friday, June 03, 2011 11:45 AM
To: David Lee
Cc: Michael Sokolov; Andrew Welch; John Cowan; Pete Cordell; Mukul Gandhi; stephengreenubl@g...; Jesper Tverskov; xml-dev@l...
Subject: Re: HTML5 and almost no namespaces

 

David,

 

I brought up the very question of point set optimization with the SVG working group when the SVG 1.0 spec was still in development. Adobe was essentially calling the shots at that point with the only real working implementation, and they found that for their processing parsing lists of points was preferable to querying an XML document with sets of nodes. In retrospect, they were probably right - even in XQuery, retrieving point lists is relatively trivial.

 

 

Kurt Cagle

Managing Editor, XMLToday.org

443-837-8725

 



On Fri, Jun 3, 2011 at 9:09 AM, David Lee <dlee@calldei.com> wrote:

Agree 50% .  Certianly you can optimize a tagset for a particular processor.

But does that mean you *should* ?


Once you go down the route of optimizing your XML for a particular processor
all sorts of tricks become useful.
For example MarkLogic works best on lots of small documents instead of very
large ones, so for optimization I split up my 500MB XML file into about a
million small ones.    Other processors have other tricks needed to get them
to work optimally.

My personal opinion is that shouldn't dictate the source schema design.  But
rather be a post-processing phase optimized for a particular processor.
Micro-designing XML schema for optimization on one processor can eventually
bite you... say when you change processors or they come out with new
performance characteristics in V(n+1).

A good non-processor-specific example is SVG.
I just started using SVG this month as an experiment and am 'horrified' that
it 'abuses' attributes to represent lists of points.
A single graph might have a hundred thousand points stored in a single
attribute value !
While I wasnt there when it was invented, I can guess that this was done
with the eye to compactness/optimization with the assumption that small is
better.
i.e.

<svg:polyline points="1 0,2 120.46,3 97.95,4 104.97,5 124.5,6 97.81,7
97.94,8 92.37,9 100.15,10 99.2,11 ....
1000000  bytes later
...
"/>

This is certainly more *compact* then

<svg:polyline>
    <p x="1" y="0"/>
....
1000000  bytes later
</svg:polyline>


But is it *better* ?   I actually found an article about EXI discussing this
exact issue

http://www.svgopen.org/2010/papers/3-Compressing_SVG_with_EXI/index.html


I find this a good example to demonstrate the woes of prematurely optimizing
source data formats for assumption of performance.

And consequently I propose that in general one should not do that.  But
rather design an XML schema for clarity not performance on a particular
version of a particular processor (or imagined one in the case above).

You can *usually* post-process data to be optimized for your current
processor at the point of injest rather than make the world suffer with
predictive optimization.

(by "usually" I mean there are always exceptions.  No statement is always
right, even this one)


-David









----------------------------------------
David A. Lee
http://www.xmlsh.org

-----Original Message-----
From: Michael Sokolov [mailto:sokolov@i...]
Sent: Friday, June 03, 2011 8:36 AM
To: David Lee
Cc: Andrew Welch; John Cowan; Pete Cordell; Mukul Gandhi;
stephengreenubl@g...; Jesper Tverskov; xml-dev@l...
Subject: Re: HTML5 and almost no namespaces

On 6/2/2011 10:22 PM, David Lee wrote:
> I do ( use MarkLogic )
> And it appears to work perfectly fine using context sensitive duplicate
names
> It's true that if you want to fine tune fragmentation or create special
range indexes it bites you but overall I've had no problems
>
>
> Sent from my iPad (excuse the terseness)
That's ok David - after all, brevity is the soul of wit, as the bard put
it.  Still it is the case that MarkLogic's built-in term indexes (not
the range ones) are based on element (and attribute) names, and although
there are also contextual (parent/child) indexes, you will not get best
performance there if you rely on context sensitivity; eg queries for
//name can be resolved straight out of the indexes accurately and don't
require additional filtering, wheras //person/name and //place/name
require (some) extra processing.  For example, to get an accurate count
there, ML has to filter every possible result returned by the indexes.
ML is spiffy and does this really fast, so you usually don't notice, but
if you have 1M docs and want to know exactly how many have a person name
"Lee", you really will notice the difference.

I'm not trying to run down MarkLogic - it's a great system for XML work;
merely pointing out that in some cases practical considerations that
have little to do with semantic correctness may inform the design of
your tag set.

-Mike

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.