[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Copying text from a source, then converting to XML

  • To: xml-dev@l...
  • Subject: Re: Copying text from a source, then converting to XML
  • From: Daniel Gresh <dgresh@l...>
  • Date: Fri, 14 Jul 2006 08:39:27 -0400
  • In-reply-to: <73B9C8D87DA7654D804B36A7EDF4C5A601C9B331@x...>
  • References: <73B9C8D87DA7654D804B36A7EDF4C5A601C9B331@x...>
  • User-agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)

text to ontology
Mark Novembrino (novembri) wrote:

>Hi, Daniel.
>
>There are probably many ways to do this.
>
>One way, perhaps a little crude, would be to use a text/macro editor to
>process the files in batch mode first. I've often used Vedit for these
>sorts of things. (http://www.vedit.com) The program doesn't do these
>kinds of batch things out of the box. You'd have to write a macro, but
>the macro language is easy to work with. Of course, you could also do
>the same thing in Perl or another scripting language.
>
>Once you've extracted the text you want to each file, the conversion to
>XML is another matter. That would depend on *which* XML you mean, i.e.,
>what DTD, what sort of text, what are the mapping rules you want to use
>and how do you want to tag the resulting XML output. You could continue
>to use the text editor for this sort of thing, or if you want a more
>"official" method, use XSLT to do the transform.
>
>Hope this helps.
>
>Not sure your level of programming expertise. If you need any more info
>(and nobody else on the list comes up with any better answers), I'd be
>glad to help with any small scripts/macros. I don't know Perl very well,
>but I probably have some Vedit and/or VBScripts floating around
>somewhere that could do the job.
>
>- Mark Novembrino
>
>
>  
>
>>-----Original Message-----
>>From: Daniel Gresh [mailto:dgresh@l...] 
>>Sent: Thursday, July 13, 2006 1:12 PM
>>To: xml-dev@l...
>>Subject:  Copying text from a source, then converting to XML
>>
>>I have a question about this. Some of the question may not 
>>pertain to XML, but if anyone knows a method, that'd be great.
>>
>>So, I basically want to automatically search a large number 
>>of documents for certain keywords. When I find that keyword, 
>>I want the paragraph the keyword is in, not the page, to be 
>>copied and pasted somewhere. After that, I want to convert 
>>the pasted text to XML.
>>
>>Does anyone know a method for doing either of these tasks? 
>>Copying certain paragraphs or substrings of text that have 
>>certain phrases in them, then converting to XML? Perhaps 
>>there is a script of some sort? Or a free program?
>>
>>Any help would be appreciated.
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org 
>><http://www.xml.org>, an initiative of OASIS 
>><http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>    
>>
>
>  
>
You're going to have to forgive my lack of knowledge regarding the 
subject, but I am not all that familiar with XSLT. As for extracting the 
text, I've looked around a bit, and it does look like a script of some 
sort would be useful; I'll look around for an example before I try to 
make one from scratch.

As for what type of XML I'm converting to, I guess I should have been a 
little more specific. I'm not even sure if this is possible, but I'm 
really crossing my fingers and hoping it is, because it will make this 
task a whole lot easier: I want to somehow extract the text and use it 
with an ontology built in RDF/OWL. Is that ... possible? Even if it's 
not possible to convert it directly to RDF/OWL format, which I would 
guess is impossible, because in OWL and RDF one needs to predefine the 
classes and such, I figured converting to a XML format would be the 
first step in the right direction.

I'm sort of digressing here, and I apologize, but I simply don't know 
where else to ask this: is there some way to extract large amounts of 
text from a large number of documents, then access it in some way by 
applying metadata to it and using RDF/OWL? Extracting the text can be 
accomplished with scripts, as mentioned earlier, or by using XSLT, 
although I am not familiar with that method, but putting it into an 
ontology is a different matter. I was thinking of organizing the text 
according to the keywords and areas I extract, and then using something 
to search through it, but that's not really what I need, and I could 
just use XQuery for that, or something similar. Does anyone have any 
thoughts? Again, I apologize for the off-topic subject, I just haven't 
found any other places to ask this.

Thanks for all the help,
Dan

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.