[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] analysis of temporal XML data -- research ideas
Hello. since I'm currently not sure in which direction my master thesis will go I have some research ideas for analysing temporal XML data. What I've done so far is - written an XMLUpdateShredder, which doesn't find the minimal cost edit distance between two revisions, to shredder Updates into our revisioned database - sorted Wikipedia (with edit history) via Hadoop and written an Importer (not tested on real wikipedia yet) - finding diffs between two revisions in the database itself (since it's based on the internal node encoding it finds the minimum edit cost) - developed a GUI with a simple text view, tree view and a sunburst view which enables explorative visualization and modification abilities and also can visualize changes between two revisions (might be extended to more than two revisions, but seems to be very complex) Since some XML files are rather huge it would be great (besides pruning, which can be enabled and is implemented to prune nodes after some depth is reached) to have some kind of overview and maybe cluster data via a text clustering algorithm to find similar topics for example. This could be done through a force directed visualization between two revisions. - first select nodes via an XPath expression - cluster nodes through some text cluster algorithm (which can be combined in further work with some structur analysis) - through a force based layout the selected nodes get clustered and the color of the nodes encode if they are the same/inserted/deleted/updated - clicking on a node or maybe even hovering is getting synchronized with the SunburstView to provide a detailed hierarchical overview of the changes in this subtree Another direction would be to develop an index structure to track changes via XPath extensions (through time axis) which has been done by some ETHlers in TimeMachine and further research in this area. A third alternative would be to develop further visualizations for which I also have some ideas. I'm currently not sure in which direction to go and what would be of some value for "real world" applications. I'm open for any suggestions. regards, Johannes [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|