[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] JUMBO
JUMBO is a prototype browser/editor/search/transformation tool for XML documents. I have now managed to bolt in both Lark and NXP instead of my parser (which was crude and did not support some of the XML constructs). The bolting-in is still rather crude and concentrates my mind on the need for a simple API at this level. Here are some comments which may be useful. NXP. ---- NXP has an interface Esis, with function such as open_tag, close_tag, process_instruction, etc. [I think they would be more properly called start_element??]. JUMBO uses this to build up a Vector representing the ESIS event stream, somthing like: "_START_TAG" "CML" AttributeList "_START_TAG" "MOL" ... "_END_TAG" "MOL"... JUMBO then builds a tree out of this, adding attributes, etc. NXP has a class XML which is built by JACC. This contains inter alia an Esis_Stdout object (implements Esis). There are several objects in XML which are private and therefore not easily accessed - I think they should have accessors, but at present I have subclassed it to PMRXML, which has the requisiste accessors. My test program then creates a PMRXML object, and extracts the event stream which is then passed to JUMBO's existing tree object: NXP.PMRXML xml = new PMRXML(NXP.Streams.load_File(file, true)); pmr.chemime.ChemTree chemTree = new ChemTree(xml.getStreamVector()); pmr.sgml.GeneralTOC toc = chemTree.createGeneralTOC(3); Comments: I have still to work out what whitespace NXP creates - there seems to be a lot of content which is simply white. Maybe we have to address COLLAPSE and KEEP at this stage? Also it isn't easy to extract certain info - for example I had to hack XML.java to get the doctype - this isn't a good idea and we need an accessor. I am also still not clear how NXP does (or should) behave with: <!DOCTYPE CML> and <!DOCTYPE CML SYSTEM "cml.dtd"> (the default on the latter is to try to validate, I think, even if validate is set to false. I'd prefer to be able to turn off validation, but I may have missed something). In general I'd like to be able to treat NXP as a black box, and subclass my Esis object. That could mean passing it as an argument to XML, e.g.: public class PMREsis implements Esis { public void open_tag(String name) { ... } } PMREsis esis = new PMREsis(); NXP.XML xml = new NXP.XML(esis, NXP.Streams.load_File(file, true)) pmr.sgml.SGMLTree tree = new pmr.sgml.SGMLTree(xml); and so on. NXP is a validatin parser, but my DTDs are still struggling with Parameter Entities so I have no experience here. Lark ---- Lark creates a tree (called Lark) and provides a handler for the user to pick up a variety of events (e.g. doDoctype(), doPI()). The tree contains Elements ('Nodes') which have Attributes and a type (String). Rather than subclassing these elements, I process Lark but iterating through the Elements and creating a JUMBO SGMLTree (this can be delayed if required). The tree seems complete, but I am not sure I have got all the doFOO routines working correctly. I have also had problems with PIs (if the ?> delimiter is used) - these may be mine. Lark does not validate. However it is easy to interface and is fast. General ------- I do not use PIs myself though I shall start to do so. If they are kept in the document tree, is there a convention where they live? (The last opened element? What if they occur in PCDATA?). I intend to make JUMBO available with both Lark and NXP but it's a bit creaky at present and the interface is a bit slow. I have been told that the larger the number of classes, the slower the program - any comments? Also I don't know whether I should be deliberately garbage-collecting at this stage. Any general thoughts would be welcome. I intend to bolt a crude search tool into JUMBO along the TEI lines. I shall also see whether I can extract the bits of NXP that do the validating, because then we have a crude validating editor. Any feedback from the current JUMBos would be appreciated. (I already know it's slow, and the graphics creak in several places :-) P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|