[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Re: Structured from/within unstructured documents
Sounds good. Thanks Marcus. On 17/12/2007, Marcus Carr <mcarr@a...> wrote: > > Stephen Green wrote: > > > What methods are there, these days, for extracting structured data from > > unstructured documents (such as PDF)? > > Maybe I'm missing something, but I didn't see anyone suggest saving the > PDF as XML straight from Acrobat. If you have a full licence, it does a > pretty respectable job, getting you paragraph and character tagging, > tables and images. You can also batch process, converting entire > directories or what have you. The results are at least as good as saving > the PDF to something like Word first and you could be forgiven for > expecting that they might even be better. > > Once you're that far, you can get on your XSLT boots... > > > Marcus > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@l... > subscribe: xml-dev-subscribe@l... > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > > -- Stephen Green Partner SystML, http://www.systml.co.uk Tel: +44 (0) 117 9541606 http://www.biblegateway.com/passage/?search=matthew+22:37 .. and voice
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|