[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] XML as a programming tool
Hi, I'd like to give some examples for the use of SGML/XML in software development (sorry, I never did any publishing with SGML/XML and used it for software development only). Force: flexible and adaptive software needs meta-information: This kind of software tends to remove definitions from source code. They are put into meta-layers, repositories or - more likely because of missing software infrastructures - into simple configuration files. These files generate a big mess pretty soon: They are changed and code breaks. The overall structure is more than unclear. Parameter definitions in configuration files are complicated and have to be parsed by every client. example: Token = 15 somevalue 32 anothervalue Team development gets very hard. What IS the authoritative structure and content of configuration files? The first approach is usually to come up with a class that maps ".ini" style configuration files. Still, you would like to have more: tokens in hierarchies, many tokens of the same name. Validation. And you would like to split the information into smaller, separate parts so you can avoid copying them. That means you want entity management. All of this must be programmed by your team - or? Solution: It takes about 2-3 weeks to integrate e.g. the SP parser/entity manager kit into a framework. Most of the work goes into wrapping SP native classes from the parser API into apropriate framework classes and interfaces (This should get much better with a standard parser API). If you got a generic composite object machine built in, just map the parser events into your tree classes (nodes) and you have a representation of the configuration information in memory. The next step is to add some wrappers for convenience, e.g. implementing a tiny query interface (findElementByName() etc. and your clients can avoid hardcoding element lookups and a value class with some conversion functions. During boot your framework pulls in the configuration documents, the parsers validates the content and hands it off to "PartBuilder" instances that instantiate the proper objects and off you go. The entity manager in the SP toolkit even enables you to pull configuration information from some server without the client even noticing. This way you end up with a defined configuration information that still can be highly specialized per customer. Its real power shows when you imaging having hundreds or thousands of installations at customer sites (some configured even there by service teams) and you want to ship a new verson of your software. Can you integrate the existing information during installation? Across releases and possibly extensive modifications? If you used a copy/paste/change approach to create new customer configuration information it's now time to look for a new job... Force: Use dynamic information safely. Static typed languages sometimes force developers to use untyped information to avoid changes in interfaces. Examples are: getValueByName(String name) etc. In effect one is working around the static type system. Solution: Semantic data streams or the composite message pattern are easily implemented using the basic tree model from above. You can transfer whole trees or just parts. The factory that generates these types (they ARE types because there is a DTD for them) makes sure that they are created properly. Due to their self-describing nature the structures can change without breaking existing clients. Applications for this are externalization, serialization, event and object bus systems. Force: Error messages must be language independent and unique. Solution: Describe your message catalogs in SGML/XML. Use the ID mechanism name the programmatic tokens that show up in source code. The parser is going to tell you if somebody used the same token twice. The same applies if you need a poor mans implementation repository with some trader functionality, e.g. to automatically load classes in factories where the client tells you what interface he wants and some hints about the properties the object should have. You could map these properties via introspection directly to beans but every once in a while an indirection is necessary, e.g. if you bought some beans whose properties have to be mapped to your systems language. Force: avoid copying of information in your system. Many systems duplicate a lot of information in various components or layers. Let's say there is a customer type in the analysis model. This usually turns into a customer database table schema, a gui ressource description of a customer view and some representation of customer in the "model" part which is a C++ header or a java class. Most of this information is just a duplicate. Solution: Use SGML/XML description for all these aspects and reference customer information from one place. Write generic modules that read this information at runtime. Force: Share information without coupling objects tightly. Let's say you are doing some workflow. The workflow objects are part of a tree (built from SGML/XML information) and child and parent nodes can communicate with each other, using some fixed interfaces and some dynamic ones(semantic data streams using DOM). But every once in a while some information is created in a node that is useful for some other node that is NOT directly connected to the first node. How can this node get the information without linking both nodes? Solution: Turn some information tree into a blackboard. Create some SGML/XML instance that models the structure of the information you want to share. The elements can be empty (there IS use for markup without content(:-)). Load this tree into memory. Make your nodes also implement an observer type interface. Now clients can do lookups and if nothing is there yet, they can register for change. This has three advantages: - publisher and subscriber are NOT directly coupled and can change any way they want without affecting the other. - There is no need to do sequential processing. The workflow tree will settle into a correct state but the path it takes is undetermined and decoupled from the descriptive workflow logic. (this makes some people with a strong procedural background a bit nervous). - the blackboard is highly structured and not chaotic. Debug routines can print human readable snapshots. Force: process error, trace and debug information automatically. I guess everybody has seen that huge and unstructured mess created by error, trace or debugging messages. In mission critical applications agents are supposed to react on those kinds of messages. Solution: Write error, trace or debug information in SGML/XML. This can be well formed information only. Don't allow anybody to write unstructured information anywhere. They have to go to a factory, get a special type of SGML/XML node and fill it in. Now it is easy for agents to find critical information. To get to the information they let the output go through the parser and use a SGMLApp implementation that does not build a tree but processes the parser events on the fly. (Assuming that in this case the information need not be represented as a tree). Using the same convenience wrappers from above the agents are totally independent of any structural changes in the output stream, caused by different execution order etc.) and will continue working. (I have seens desperate moves to process e.g. Unix kernel and boot messages via handcoded applications...) Force: Translate from one domain language into a different one. I suspect that about 50% of work in business programming goes into format transformations between different COTS or other applications and databases. One can view database schemas, interfaces and protocols as little domain languages. Since SGML/XML information trees have enough descriptive power to represent those, it is possible to build automatic translator sub-frameworks for "data-schlepping". Solution: example: import server. Frequently information in a new format has to be imported into a system (e.g. DTA electronic commerce data, financial instruments data etc.). Storage Objects convert these formats into SGML/XML representation. This makes further processing independent of the different physical data formats of the new format and the existing system. But it does not solve the language problem itself: one format might call the customer "customer" and the other one "BusinessPartner". A translator framework provides wrapper that wrapp the new information tree (e.g. DTA info) into the internal language. Of course the mapping process is driven by mapping information specified in SGML/XML. If more than simple name mapping is necessary, the wrappers can be dynamically configured with little action objects that can compute values etc. Of course these are again configured using SGML/XML configurations. Force: Get information from OO-Analysis into the system In every larger framework the gap between OO-Analysis and implementation is huge. Direct mapping from an analysis class to code just leads to totally inflexible systems. That's why e.g. Enterprise Java Beans treats concurrency, persistence etc. as being "orthogonal" to an objects implementation. This means that the implementation of these do not happen in the object. They are provided by containers etc. The next thing that's going to be pulled out of objects is business logic. (our framework did this already and used SGML/XML to describe the workflow). But what does this mean for the analysis information if it doesn't get turned into code? Solution: Use analysis information to build up a meta-information layer. Use SGML/XML to describe it. Now generic objects can interpret this information and instantiate the necessary objects for processing. The meta-layer objects are of course the same ones we used to implement the configuration information, the trace facility etc. Conclusion: For all these uses of SGML/XML basically the same software components were used over and over again. And the real hard ones were written by James Clark anyway(:-). This is reusable software and has the nice side effect that after a while programmers get familiar with the interfaces and don't have to learn new ones to new data formats all the time. I mean - what's the difference between configuration information, external data formats, blackboards etc? Just different DTDs. But more important than reuse is the flexibility of software using SGML/XML to represent meta-information. Bringing a system to a new release does no longer mean: transform BLOB1 into BLOB2. It means transform XXX.dtd into YYY.dtd - a defined and traceable process. Due to the self-describing nature of SGML/XML information versioning becomes a defined and automated process too. Different versions can be detected and automatic translators can upgrade "legacy" objects. No longer do I have to have old classes in the system for backward compatibility reasons only. The bad news: Past (bad) experience shows that the real problem with using SGML/XML in software development is not a technical one. Using SGML/XML makes only sense if the everybody is willing to make information and assumptions EXPLICIT so they can go into DTDs and instances. This seems to be a sore point for many programmers that rather see this hidden in code (just look at the slow progress of pre/postcondition specification or semantic interface definitions). And no, I don't have a solution for this one. Merry Christmas and a Happy New Year, Walter xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|