[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] XML vs. Other Data Formats
B Tommie Usdin wrote: > > At 11:30 AM -0700 6/8/06, <juanrgonzaleza@c...> wrote: > ... > >> For instance, we began to encode data in XML and after of some >> experiences decided to abandon the format. Therefore in the next >> statistics we will not be in the 40% as now ;-) > > That's really interesting. Can you tell us why you decided to abandon > XML? And what data format better meets your needs? And why? (This sort > of user experience is very valuable.) > > -- Tommie We attempted to base our data and applications in full XML technology. We obtained many difficulties for a correct implementation of available specifications such as XHTML 1.1, MathML 2.0, XSL-FO, and others. We also found further difficulties with CML, STMML, and UnitsML between others. Then turned to our own Markup XML based language: CanonML. Composed of several modules, e.g. CanonMath for mathematics that would improve MathML in some aspects, CanonTexT instead XHTML (2.0), etcetera. [http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html] This would rely on XSLT for transformations to browser side and CSS (instead XSL-FO because is not supported). But more problems did arise! I finally decided that independently of how many time and money was wasted in the XML approach it would never work in the way that I had in my mind. In fact, it is really impressive to see that even giants as Elsevier are not following all XML standards because problems, e.g. they are using an in-house modification of w3c MathML and complementing its usage with own Elsevier CEP markup. Next logical step was the search for alternative technologies but what one? SGML? TeX/LaTeX? Liminal? LAMN? YAML? Other? None of them satisfied all needs and since we were ready to break with the XML world, we could break more still adding requirements that initially were outside the technological program at the Center. Then I decided we could begin from zero, rewriting all layers. I choose S(cheme)XML as good initial point but modified for adapting it to exclusive needs. Then CanonML re-borns now being not a XML application but a Canon(ical) Meta (formaL) Language. Actually I also am reusing CSS. Requirements: 1) Dick Formal Language + Keizer vectors. This let us unification of previous physicochemical scientific approach with the humanities world. 2) Data optimization similar to CSV-like approaches. This is specially useful in large datuments: 7 Gb or bigger. 3) Encoding of non-hierarchical structures in a more powerful way that liminal or GODDAG. Without added parsing difficulties associated to them. For example Lewis structure for HF would be represented as [H}[F} e e {H] e e e e e e {F] <H/<F/ e e /H> e e e e e e /F> 4) Mathematical sophistication. Whereas in theory any mathematical structure could be implemented in XML markup (e.g. OpenMath) in practice there are problems related to presentation and also to human authoring. Already MathML is so verbose cannot be authored by hand and tools are generating ugly code. For example, after of 10 years of MathML, people still has been unable to encode (ds)^2. Take as illustration Distler MUSINGS blog which is claimed to be the most technologically advanced blog of the planet. I do not needs the most advanced technology if cannot encode something so simple as (ds)^2 in a full way. Only real possibility for a first class encoding of scientific-mathematical content is human authoring of formulae, which cannot be achieved in a XML syntax. This is one of reason of popularity of TeX/LaTeX/AMSTeX in academic comunities. Also it may be remarked that nobody has still achieved TeX mathematical typesetting quality in SGML or XML worlds. 5) No double data format: elements more attributes. More elimination of any limitations of attributes (hierarchies, any content) such as has been done in liminal or in ConciseXML. 6) Better internationalization and extensibility. Elimination of limitations to tags names. It is interesting that internationalization is only achieved in text not in markup. I can write Spanish text in a XML document but cannot use Spanish words for the markup of a document <niño> or <cigüeña> are not permitted for example. The Spanish version of English <section> is <sección> and I can see some Spanish documents writing <seccion> because own limitations of XML. Any other limitations to markup are eliminated. For example, I can write <water> or <cicloheptatrieno> but not <1,5-ciclooctadieno>. This is to be avoided. 7) Multiple markup. How many times all of us need to name something in two ways at the same time? Normally one finds <B><I>A</I></B> and then people proposed <BI>A</BI> years ago (today used in some forum boards). One finds <pre><code> sequences today and then proposed <blockcode> in next XHTML 2.0. But that obligates to add new tags in novel specifications instead reusing available ones. This is achieved in CanonML via multimarkup model. 8) Elimination of end tags or doing them optative at least, somewhat as in liminal, SXML, ConciseXML, and others. 9) Unification of language. In the XML world one finds: XML, XPath, CSS, Relax NG, SVG... We use same syntax for everything. Whereas XSL-FO is a "XMLized" version of CSS, XPath, however, uses a non-XML syntax due to XML limitations. 10) Consistency. Each module would be interoperable. Many XML technology developed by the w3c is not consistent and interoperable, with many groups reinventing the wheel or compiting ones with others -doing very difficult the life to end users-. It is very socking for outsiders that one group was using HTML links other Xlinks and a third one using XHTML-links. It is surprising the open discussion between CSS and XSL-FO members with rude critizism in some cases. It is atonishing that <tag>a </tag> was equal to <tag>a</tag> in some XML applications but not in the original XML specification. It is unlikely that the wheel was being reinvented again, for example if I want change <tag>content</tag> to bold font, I would use differents aproaches if I am following CSS, XSL-FO, or MathML approaches; the problem is that we can choose our favourite; the problem is we are forced to use is implemented in browsers for each case (i.e. using the three in some ocassions). 11) Simplification Elimination of all unneded complexity and redundancy. For example initially the first non-XML version of CanonML included pairs tags-entities. For example <para> My favourite greek letter is β because elegance </para> was encoded as [::para My favourite greek letter is \beta because elegance ] now it is done like [::para My favourite greek letter is ::beta because elegance ] once a few days ago we took a LISP-TeX functional approach in CanonML forgotting heritances and last residues from the initial XML-based CanonML. Apparently T. Bray also want eliminate most of entities in next XML 2.0. In fact, we did recently discuss a rendering problem with one of MathML predefined entities: ⅆ. That problem had already been solved in the original CanonMath language which was not using the entity. I also am worried with XML dealing of White Space and that is also simplified. All other complexities as namespaces, Schemas, DOCTYPE, special notation for empty tags and others are also to be removed. 12) But power it E.g. eliminate the "--" limitation in comments. This is also achieved in liminal. Let PIs inside PIs. This nestig is also allowed in some other approach but now I do not remember which. 13) Increase readability Many datuments cannot be automated, e.g. scientific or mathematical papers, therefore improving of readability may be welcomed. Increase visual difference between begining and the end of marked fragments. In XML difference there is a character of difference and this is far from optimal. In liminal this is increased with the variation of two characters. CanonML copies the excellent readability of SXML, TeX, and similar approaches. White Space is also used for increasing the readability of CanonML datuments. 14) Above 13 may be an motivating collection of replies to your original queries. Juan R. Center for CANONICAL |SCIENCE)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|