[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] ANN: Python implementation of Regular Fragmentations and online demo
See: http://downloads.dyomedea.com/python/regfrag/ Even if the title says it all, I'd like to propose for discussion some suggestions and extensions which I have added to my implementation as a proof of concept and will eventually remove depending on the result of this discussion. 1)Matching errors handling These errors deserve some more specification: what's happening when a pattern doesn't match? when there are more matchs than nodes specified to serialize them? when there are more nodes specified than matchs? My suggestion is to ignore nodes and matches "overflow" (ie process only the minimum between the number of matchs and the number of specified nodes). To be coherent with this rule, when there is no match, no nodes should be serialized and the fragmented node could be left empty. 2) Attribute prefixes are only hints Namespace prefixes specified for the attributes generated by regular fragmentations cannot be used when they conflict with prefixes used in the instance document or required by other attributes in the same element. They should therefore be considered as hints rather than directives. The algorithm used in my implementation is the following: * The required prefix is used for the generated attribute if it is either not defined or defined for the namespace URI of the generated attribute. * Otherwise, if the namespace URI of the generated attribute is already associated to a prefix, this prefix is used. * In last resort, an indice is added to the required prefix to generate a prefix not yet used in this element. 3) Generalization of the repeat attribute An alternative way to write the example: <fragmentRule pattern="(\d{1})(\d{1})"> <applyTo> <element nsURI="http://simonstl.com/ns/types/" localName="century" /> <element nsURI="http://simonstl.com/ns/types/" localName="year" /> <element nsURI="http://simonstl.com/ns/types/" localName="month" /> </applyTo> <produce> <element nsURI="http://simonstl.com/ns/types/" localName="digit" prefix="type" /> <element nsURI="http://simonstl.com/ns/types/" localName="digit" prefix="type" /> </produce> </fragmentRule> could be to generalize the use of the repeat attribute to match rules: <fragmentRule pattern="(\d{1})(\d{1})" repeat="true"> <applyTo> <element nsURI="http://simonstl.com/ns/types/" localName="century" /> <element nsURI="http://simonstl.com/ns/types/" localName="year" /> <element nsURI="http://simonstl.com/ns/types/" localName="month" /> </applyTo> <produce> <element nsURI="http://simonstl.com/ns/types/" localName="digit" prefix="type" /> </produce> </fragmentRule> 4) skipFirst It's often subjective to define default values, however I think that the default value for the skipFirst attribute could be "false". Also, it's not clear if this attribute applies to all the types of rules (match and split) --I think that for coherence, it should be the case. 5) Duplicate attributes The current rule is: "Repeating the same attribute name will leave only the last version in the final output" which I find error prone especially when attributes are generated out of the fragmentation of other attributes: this can lead to recursion loops and even when this is not the case, the order which which the attributes will be processed and thus generated is not significant. I would suggest to raise a fragmentation time error when an attribute is "overriden". 6) Escape recursion An attribute "break" could be added to the fragmentRule element to specify that no further recursion should take place. 7) Attributes fragmentation I have implemented attribute fragmentations trying to stay as much as possible in the original idea of using the same mechanism even though the semantic is slightly different and this proposal is coherent even though not always deterministic. The major two issues with fragmentating attributes are that the result of the fragmentation cannot be kept in the attribute (at least not in the general case) since attributes are not structured and that the order of the attributes is not meaningfull. Since the result cannot be kept into the attribute, it is located in the "hosting" element and if the result is serialized as elements or characters, the relative order of the serialization of the fragmentation of two or more attributes in the same element cannot be guaranted. 7) Other node types (not implemented) Currently, elements and attributes can be fragmented into elements, attributes and text nodes. What about adding other types of nodes (ie PIs and comments) to the list? Thanks for your feedback, Eric -- See you in San Diego. http://conferences.oreillynet.com/os2002/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com http://xsltunit.org http://4xt.org http://examplotron.org ------------------------------------------------------------------------
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|