Re: JSON - The Fat Free Alternative - Redux

From: Ihe Onwuka <ihe.onwuka@gmail.com>
To: James Fuller <james.fuller.2007@gmail.com>
Date: Thu, 2 Oct 2014 20:07:34 +0100

Play the video

Actually it is worth flesh out some stuff here.

JSONiq - it's vey welcome existence doesn't yet mean it can be part of the solution. If it's not available in the environment you are working in you have to send your data to it - in which case rather than sending the data to the computation it might be simpler to convert to XML and leave the computation in situ. Additionally what I am doing (integrating data from a variety of sources) entails alot of joins. For performance reasons I am really leery about doing that in XQuery or any deriviative thereof. I don't want to expend more R & D capital (doing enough of that already) on a problem (efficient joins) which is solved in other enviroments. Which brings me on to the last point. JSONiq is not an option if XSLT is part of the solution - and in my case because of all the joins XSLT (via xsl:key) is.

David Lee's paper. As I mentioned this post was precipitated by a debate with somebody that was repeating many of the JSON/NoSQL hype as fact (as it related to data formats at least). The problem with deploying a paper in an argument is that the other side are not bound to agree with it's conclusions or interpretations. So lo and behold David's figures were interpreted as supporting the superiority of JSON and his conclusions were attributed to XML bias and I couldn't fully respond because I know nothing about jQuery so wasn't able to counter in those areas. I didn't mention his paper at the outset of this discussion because it would have entailed echoing the fellow's critique.

The "advantage" of my evidence in the conversation was that he couldn't claim bias since it wasn't set up as a benchmark rather it was a real life solution to a problem that required compacting markup.

On Tue, Sep 30, 2014 at 8:54 PM, James Fuller <james.fuller.2007@gmail.com> wrote:

I agree with Michael's assertion and would add that sometimes subtleties (like the truth) gets lost in the headlong rush forward.

Part of the problem is using opaque, non specific terms (like 'fat') to describe something ... 'fat' really could mean anything or nothing in terms of computing.

and this is where I bring out David Lee's excellent balisage paper

http://www.balisage.net/Proceedings/vol10/html/Lee01/BalisageVol10-Lee01.html

What is clear is that there is a cost to modelling something as a fully fledged document versus data and when confronted with a choice, developers choosing json believe they are mitigating their risk (esp. true if your toolchain and execution environment of choice marshals json objects around).

What is interesting to watch is the rarer occasion when someone has chosen json to represent document data ... things can get complicated quickly indeed. Even then the cost of transforming to higher fidelity format may not be as cost effective as everyone thinks.

What is fun to watch is the recent upsurge in the use of Web components; watching developers have fun again with markup is great... which is more to say about the 'semantics' of using markup then anything.

I think the performance argument has always been a red herring ... important for some, but not very relevant for most.

Jim Fuller
On Tue, Sep 30, 2014 at 9:34 PM, Michael Kay <mike@s...> wrote:
I think the "fat" referred to by the phrase "fat-free alternative to XML" has very little to do with data size, it has much more to do with complexity: the fact that the JSON specification fits on one sheet of paper, whereas XML extends to thousands of pages if you include the whole stack.

Michael Kay
Saxonica
mike@saxonica.com
+44 (0) 118 946 5893
On 30 Sep 2014, at 19:25, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:
So the story goes something like this.
I get into one of these JSON is better/slimmer/faster - oh no it isn't arguments and we are getting entrenched in our respective positions when I realise that I actually have data that can test the slimmer argument,  created not for the purpose of a benchmark but as a solution to a problem I had.
I am creating a movie data mashup and neet to integrate movie data from a JSON repository with some XML movie data. The mashup is an XSLT transformation so the JSON has to be converted. The problem I had was apres download and conversion the XML file was too big for the XSLT processor and I was getting heap space errors. 
Here is a snippet of JSON data for one movie.
{
  "result": [
    {
      "initial_release_date": "2006-11-30",
      "rottentomatoes_id": [],
      "key": [{
        "namespace": "/authority/imdb/title",
        "value": "tt0259822"
      }],
      "name": ".45",
      "type": "/film/film",
      "starring": [
        {
          "actor": [{
            "/common/topic/alias": [
              "Milla",
              "Milica Natasha Jovovich",
              "Milica Jovović",
              "Milla Yovovich",
              "Reigning Queen of Kick-Butt",
              "Milica Nataša Jovović"
            ],
            "name": "Milla Jovovich"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Angus McFadyen",
              "Angus MacFadyen"
            ],
            "name": "Angus Macfadyen"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Aisha N. Tyler"
            ]i
            "name": "Aisha Tyler"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Stephen Dorff Jr.",
              "Brad Matlock"
            ],
            "name": "Stephen Dorff"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [],
            "name": "Sarah Strange"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Vincent LaResca",
              "Vinnie the kid"
            ],
            "name": "Vincent Laresca"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Dawn Greenhall",
              "Hazel Dawn Greenhalgh"
            ],
            "name": "Dawn Greenhalgh"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Nola Auguston"
            ],
            "name": "Nola Augustson"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Katherine Mary Craven Hawtrey",
              "Kay Hartrey",
              "Kay Hawtry",
              "Katherine Hawtrey"
            ],
            "name": "Kay Hawtrey"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [],
            "name": "Shawn Campbell"
          }]
        }
      ],
      "mid": "/m/0c2l1s",
      "directed_by": [{
        "/common/topic/alias": [],
        "name": "Gary Lennon"
      }]
    }
Following a naive JSON to XML convesion by yours truly I produced this.

<result>
<item>
<key>
<item>
<value>tt0259822</value>
<namespace>/authority/imdb/title</namespace>
</item>
</key>
<type>/film/film</type>
<name>.45</name>
<starring>
<item>
<actor>
<item>
<alias>
<item>Milla</item>
<item>Milica Natasha Jovovich</item>
<item>Milica Jovović</item>
<item>Milla Yovovich</item>
<item>Reigning Queen of Kick-Butt</item>
<item>Milica Nataša Jovović</item>
</alias>
<name>Milla Jovovich</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Angus McFadyen</item>
<item>Angus MacFadyen</item>
</alias>
<name>Angus Macfadyen</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Aisha N. Tyler</item>
</alias>
<name>Aisha Tyler</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Stephen Dorff Jr.</item>
<item>Brad Matlock</item>
</alias>
<name>Stephen Dorff</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias/>
<name>Sarah Strange</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Vincent LaResca</item>
<item>Vinnie the kid</item>
</alias>
<name>Vincent Laresca</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Dawn Greenhall</item>
</alias>
<name>Dawn Greenhalgh</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Nola Auguston</item>
</alias>
<name>Nola Augustson</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Katherine Mary Craven Hawtrey</item>
<item>Kay Hartrey</item>
<item>Kay Hawtry</item>
<item>Katherine Hawtrey</item>
</alias>
<name>Kay Hawtrey</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias/>
<name>Shawn Campbell</name>
</item>
</actor>
</item>
</starring>
<directed_by>
<item>
<alias/>
<name>Gary Lennon</name>
</item>
</directed_by>
<initial_release_date>2006-11-30</initial_release_date>
<alias/>
<mid>/m/0c2l1s</mid>
<rottentomatoes_id/>
</item>

There were a hundred movies per file and the JSON data came in at 325k, by the time it had been converted to the XML above it had ballooned to 1.16MB.

My aim was to compact the XML sufficiently to allow a single transformation to accept the contents of about 1800 or so such files. So here is the data pre the compacting transformation

ihe@ihe-ThinkPad-T410:~/film$ ls rawFreebase/1.xml -l
-rw-r--r-- 1 ihe ihe 1160691 Aug 29 06:46 rawFreebase/1.xml

after the compacting, which was supposed to be lossless the data looked like this

<movie imdb="tt0259822" name=".45" mid="/m/0c2l1s" date="2006-11-30">
<actor name="Milla Jovovich">
<alias>Milla</alias>
<alias>Milica Natasha Jovovich</alias>
<alias>Milica Jovović</alias>
<alias>Milla Yovovich</alias>
<alias>Reigning Queen of Kick-Butt</alias>
<alias>Milica Nataša Jovović</alias>
</actor>
<actor name="Angus Macfadyen">
<alias>Angus McFadyen</alias>
<alias>Angus MacFadyen</alias>
</actor>
<actor name="Aisha Tyler">
<alias>Aisha N. Tyler</alias>
</actor>
<actor name="Stephen Dorff">
<alias>Stephen Dorff Jr.</alias>
<alias>Brad Matlock</alias>
</actor>
<actor name="Sarah Strange"/>
<actor name="Vincent Laresca">
<alias>Vincent LaResca</alias>
<alias>Vinnie the kid</alias>
</actor>
<actor name="Dawn Greenhalgh">
<alias>Dawn Greenhall</alias>
</actor>
<actor name="Nola Augustson">
<alias>Nola Auguston</alias>
</actor>
<actor name="Kay Hawtrey">
<alias>Katherine Mary Craven Hawtrey</alias>
<alias>Kay Hartrey</alias>
<alias>Kay Hawtry</alias>
<alias>Katherine Hawtrey</alias>
</actor>
<actor name="Shawn Campbell"/>
<director name="Gary Lennon"/>
</movie>

and the size of a file of 100 such entries....

ihe@ihe-ThinkPad-T410:~/film$ ls freebase/1.xml -l
-rw-r--r-- 1 ihe ihe 351067 Aug 29 06:53 freebase/1.xml

350k compared to 324k of JSON.

I then decided to see what would happen to the file sizes after they were compressed.

Here are the results.

ihe@ihe-ThinkPad-T410:~/film$ ls -l rawFreebase/*.zip
-rw-r--r-- 1 ihe ihe 83833 Sep 30 12:41 rawFreebase/1.xml.zip

This is the compacted XML
ihe@ihe-ThinkPad-T410:~/film$ ls -l freebase/*.zip
-rw-r--r-- 1 ihe ihe 69058 Sep 30 12:41 freebase/1.xml.zip

This is the compressed JSON
-rw-r--r-- 1 ihe ihe 61528 Sep 30 12:42 Downloads/films.json.zip

83K for the naive XML,
69k for the compacted XML and
61k for the JSON.

Hmmmmmmmmmmm!

Follow-Ups:
- Re: JSON - The Fat Free Alternative - Redux
  - From: Dimitre Novatchev <dnovatchev@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >