[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Copying text from a source, then converting to XML
Mark Novembrino (novembri) wrote: >Hi, Daniel. > >There are probably many ways to do this. > >One way, perhaps a little crude, would be to use a text/macro editor to >process the files in batch mode first. I've often used Vedit for these >sorts of things. (http://www.vedit.com) The program doesn't do these >kinds of batch things out of the box. You'd have to write a macro, but >the macro language is easy to work with. Of course, you could also do >the same thing in Perl or another scripting language. > >Once you've extracted the text you want to each file, the conversion to >XML is another matter. That would depend on *which* XML you mean, i.e., >what DTD, what sort of text, what are the mapping rules you want to use >and how do you want to tag the resulting XML output. You could continue >to use the text editor for this sort of thing, or if you want a more >"official" method, use XSLT to do the transform. > >Hope this helps. > >Not sure your level of programming expertise. If you need any more info >(and nobody else on the list comes up with any better answers), I'd be >glad to help with any small scripts/macros. I don't know Perl very well, >but I probably have some Vedit and/or VBScripts floating around >somewhere that could do the job. > >- Mark Novembrino > > > > >>-----Original Message----- >>From: Daniel Gresh [mailto:dgresh@l...] >>Sent: Thursday, July 13, 2006 1:12 PM >>To: xml-dev@l... >>Subject: Copying text from a source, then converting to XML >> >>I have a question about this. Some of the question may not >>pertain to XML, but if anyone knows a method, that'd be great. >> >>So, I basically want to automatically search a large number >>of documents for certain keywords. When I find that keyword, >>I want the paragraph the keyword is in, not the page, to be >>copied and pasted somewhere. After that, I want to convert >>the pasted text to XML. >> >>Does anyone know a method for doing either of these tasks? >>Copying certain paragraphs or substrings of text that have >>certain phrases in them, then converting to XML? Perhaps >>there is a script of some sort? Or a free program? >> >>Any help would be appreciated. >> >>----------------------------------------------------------------- >>The xml-dev list is sponsored by XML.org >><http://www.xml.org>, an initiative of OASIS >><http://www.oasis-open.org> >> >>The list archives are at http://lists.xml.org/archives/xml-dev/ >> >>To subscribe or unsubscribe from this list use the subscription >>manager: <http://www.oasis-open.org/mlmanage/index.php> >> >> >> > > > You're going to have to forgive my lack of knowledge regarding the subject, but I am not all that familiar with XSLT. As for extracting the text, I've looked around a bit, and it does look like a script of some sort would be useful; I'll look around for an example before I try to make one from scratch. As for what type of XML I'm converting to, I guess I should have been a little more specific. I'm not even sure if this is possible, but I'm really crossing my fingers and hoping it is, because it will make this task a whole lot easier: I want to somehow extract the text and use it with an ontology built in RDF/OWL. Is that ... possible? Even if it's not possible to convert it directly to RDF/OWL format, which I would guess is impossible, because in OWL and RDF one needs to predefine the classes and such, I figured converting to a XML format would be the first step in the right direction. I'm sort of digressing here, and I apologize, but I simply don't know where else to ask this: is there some way to extract large amounts of text from a large number of documents, then access it in some way by applying metadata to it and using RDF/OWL? Extracting the text can be accomplished with scripts, as mentioned earlier, or by using XSLT, although I am not familiar with that method, but putting it into an ontology is a different matter. I was thinking of organizing the text according to the keywords and areas I extract, and then using something to search through it, but that's not really what I need, and I could just use XQuery for that, or something similar. Does anyone have any thoughts? Again, I apologize for the off-topic subject, I just haven't found any other places to ask this. Thanks for all the help, Dan
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|