[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

JUMBO

  • From: Peter@u... (Peter Murray-Rust)
  • To: xml-dev@i...
  • Date: Sat, 22 Mar 1997 19:07:22 GMT

video jumbo
JUMBO is a prototype browser/editor/search/transformation tool for
XML documents.  I have now managed to bolt in both Lark and NXP 
instead of my parser (which was crude and did not support some of the
XML constructs).  The bolting-in is still rather crude and concentrates
my mind on the need for a simple API at this level.  Here are some comments
which may be useful.

NXP.
----
NXP has an interface Esis, with function such as open_tag, close_tag,
process_instruction, etc.  [I think they would be more properly called 
start_element??].  JUMBO uses this to build up a Vector representing the
ESIS event stream, somthing like:
"_START_TAG" "CML"  AttributeList "_START_TAG" "MOL" ... "_END_TAG" "MOL"...
JUMBO then builds a tree out of this, adding attributes, etc.

NXP has a class XML which is built by JACC.  This contains inter alia
an Esis_Stdout object (implements Esis).  There are several objects in XML
which are private and therefore not easily accessed - I think they should
have accessors, but at present I have subclassed it to PMRXML, which has
the requisiste accessors.

My test program then creates a PMRXML object, and extracts the event stream
which is then passed to JUMBO's existing tree object:
    NXP.PMRXML xml = new PMRXML(NXP.Streams.load_File(file, true));
    pmr.chemime.ChemTree chemTree = new ChemTree(xml.getStreamVector());
    pmr.sgml.GeneralTOC toc = chemTree.createGeneralTOC(3);

Comments:  I have still to work out what whitespace NXP creates - there seems 
to be a lot of content which is simply white.  Maybe we have to address
COLLAPSE and KEEP at this stage?  Also it isn't easy to extract certain 
info - for example I had to hack XML.java to get the doctype - this isn't a good
idea and we need an accessor.  I am also still not clear how NXP does (or should)
behave with:
<!DOCTYPE CML>
and <!DOCTYPE CML SYSTEM "cml.dtd">
(the default on the latter is to try to validate, I think, even if validate
is set to false.  I'd prefer to be able to turn off validation, but I may have
missed something).
	In general I'd like to be able to treat NXP as a black box, and subclass
my Esis object.  That could mean passing it as an argument to XML, e.g.:
   
public class PMREsis implements Esis {
    public void open_tag(String name) {
...
    }
}

    PMREsis esis = new PMREsis();
    NXP.XML xml = new NXP.XML(esis, NXP.Streams.load_File(file, true))
    pmr.sgml.SGMLTree tree = new pmr.sgml.SGMLTree(xml);

and so on.

NXP is a validatin parser, but my DTDs are still struggling with Parameter
Entities so I have no experience here.

Lark
----
	Lark creates a tree (called Lark) and provides a handler for 
the user to pick up a variety of events (e.g. doDoctype(), doPI()).  The
tree contains Elements ('Nodes') which have Attributes and a type (String).

Rather than subclassing these elements, I process Lark but iterating through
the Elements and creating a JUMBO SGMLTree (this can be delayed if required).
The tree seems complete, but I am not sure I have got all the doFOO routines
working correctly.  I have also had problems with PIs (if the ?> delimiter
is used) - these may be mine.

Lark does not validate.  However it is easy to interface and is fast.


General
-------
I do not use PIs myself though I shall start to do so.  If they are
kept in the document tree, is there a convention where they live?  (The last
opened element?  What if they occur in PCDATA?).

I intend to make JUMBO available with both Lark and NXP but it's a bit creaky
at present and the interface is a bit slow.  I have been told that the larger
the number of classes, the slower the program - any comments?  Also I don't
know whether I should be deliberately garbage-collecting at this stage.

Any general thoughts would be welcome.  I intend to bolt a crude search tool
into JUMBO along the TEI lines.  I shall also see whether I can extract the
bits of NXP that do the validating, because then we have a crude validating
editor.  

Any feedback from the current JUMBos would be appreciated.  (I already know
it's slow, and the graphics creak in several places :-)

P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


  • Follow-Ups:
    • Re: JUMBO
      • From: "Norbert H. Mikula" <nmikula@e...>

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.