[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: {mark} - a new simple notation that unifies JSON andXML

  • From: Henry Luo <henry@perpetuatech.net>
  • To: Michael Kay <mike@saxonica.com>
  • Date: Fri, 26 Jan 2018 14:37:21 +0800

Re:  {mark} - a new simple notation that unifies JSON andXML

Just to elaborate a bit more on the issue:

Imagine element <price currency="USD">5.00</price> goes through a transformation process to give some output:

  • should it be {span 'Price' ':' ' ' 'USD' '$' 5.00 null ''}, i.e. keep the items as they are without normalization?
  • or should it be {span 'Price: USD $5.00'}, i.e. normalized as in XML/HTML/Mark (with numbers and booleans converted into string, null value removed, and string merged, ...)

Content normalization is a process common in HTML/XML, people/developer must be already used to it, e.g. when using XSLT/XQuery.
If we throw away content normalization, people/developer might feel awkward, and might be prone to error, as developer might already be assuming on content normalization.

So the question is: should we stay with XML/HTML's content normalization process? or should we go with the more orthogonal approach?

One more possibility is to leave it to the user, i.e. take in a parameter during parsing/content construction, whether to normalize the content or not.
But again, giving such an option without enforcing a convention, could lead to confusion.

Current Mark's thinking is that: if your usecase needs a generic array, put it in property, if your usecase needs a mixed content model, put it in object content.
It might not be ideal, but probably more practical.

Anyone, any further thoughts?

Regards

Henry


On Fri 26/1 1:37 PM, Henry Luo wrote:

Hi Michael,

Thanks for sharing your thoughts on the specific issue of content model for element/object.

Here are my thoughts on this specific issue.

There are two options here, each with its pros and cons.

Option 1: Orthogonal approach, allows any value in element/object content, just like for property value

  • Very orthogonal design, simpler in syntax and data model;
  • Keep the door open for future 'innovative' usecases;
Option 1: Non-orthogonal approach, as currently defined in Mark (or similarly in XML, HTML)
  • Current Mark's content model for element/object is primarily designed for or optimized for mixed-content usecase;
  • There are few important differences of the element content model vs. generic array:
    • null values are stripped;
    • no direct nesting of array; (Mark flattens an array when constructing object content)
    • consecutive strings are merged into one;
    • do not allow number and boolean primitive values;
  • So although on the surface, the element content might look like an array, but there are subtle differences. And may require very different tools/API to work with them;
    • e.g. we are already used to use CSS selectors to query the content model, and it requires content normalization that is not usually done on array data
  • Among the 4 differences, I think the last one is up to discussion. But if we also give up the first 3 restrictions, then it could make the primary mixed-content usecase awkward, and people might as stay with their current non-orthogonal HTML or XML format.
  • Current Mark's design is more on a conservative approach on what is allowed in the content model.

Anyone, any further thoughts?

Regards

Henry

On Fri 26/1 2:39 AM, Michael Kay wrote:

That is also a concern that I have when designing Mark. Should it be generalized to allow any value in content? Do we have any solid usecase for storing number and boolean in content? They are probably not needed for normal mixed content usage.


I don't think the absence of a use case should ever be used to justify lack of orthogonality. If design were driven solely by use cases, orthogonality would go out of the window. The design process should aim to produce the most elegant/orthogonal design that satisfies all the use cases, it should not aim to satisfy the known use cases and nothing else.

Or to put it another way: find the space with the smallest boundary that has all the use cases within the space, where the "boundary" is the boundary between things that the system handles and things that it doesn't.

(An extreme example I often use to illustrate this principle: arithmetic expressions X+Y should allow either operand to be a literal zero, even though there is no use case for adding zero to a number, because a specification that allows literal zero is smaller than one that doesn't).

The lack of orthogonality between attribute values and element content was in fact one of the design mistakes in XML that I was trying to correct.

I know XML Schema allows primitive values as element content. But personally, I don't like that, and prefer to use attribute for those usecases.

Why should personal preferences come into it?

You might prefer <price currency="USD" amount="5.00"/> to <price currency="USD">5.00</price>, but others don't.

Michael Kay
Saxonica







[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.