[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Re: table transformation
To some extent, the conversion industry has already been through one similar transformation, when OCR got good enough that there was no longer any real point in the old double-keyboarding + proofreading workflowbreplaced by separate OCR passes + human reconciliation of discrepancies. We have a good deal of transformation of letterpress volumes to TEI-XML done in Chennai, and it probably is the case that an AI engine at the current level of sophistication could do maybe 80% of the markup correctly (based on a bit of testing with ChatGPT), to be polished off by humans. David S. -- David Sewell Co-Manager of the Rotunda Imprint, Pro Tem The University of Virginia Press From: "Dorothy Hoskins dorothy.hoskins@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Reply-To: "xsl-list@xxxxxxxxxxxxxxxxxxxxxx" <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Date: Tuesday, June 20, 2023 at 11:14 PM To: "xsl-list@xxxxxxxxxxxxxxxxxxxxxx" <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Subject: Re: table transformation HI, as a person who has met and worked with people overseas who rekey PDFs and add metadata and XML tags to text extracted from them (or from Word) to create XML I am concerned about the ultimate effects of ChatGPT on the workers. In Chennai, doing the rekeying and markup was a lower middle class job with more prestige than working in a call center. The women and men I met were proud to have the technical computer skills and the knowledge of the XML DTDs. When the bulk of their work shifts to automated markup, the majority of them will not be needed for this work. What then will happen to this whole class of workers will ripple through the economies of countries that perform rekeying and tagging. There's probably no stopping this shift to automation. Many scientific publishers have "back end" service providers overseas in southeast Asia or eastern Europe, but the publishers constantly struggle to contain costs. I anticipate social unrest will spread as unemployment rises after publishers shift their workflows to cut production expenses. If anyone knows of a business already shifting to using LLMs for XML markup, it would be interesting to know the social impacts on their previous service providers. I have already played with using prompts and samples to get ChatGPT to generate some schematron and XSpec unit tests. As Jon Udell notes in his post, the breakdown of large transformations into smaller tasks seems to be the best way to get good results, and QA is critical. But if newbies to LLMs can produce results, the entire XML transformation space is going to be revolutionized shortly. By the way, millions of scientific articles that Jon mentions as sources for extracting text from PDFs, are already tagged in JATS xml with full metadata by the publishing platforms like Atypon and Siverchair, which then produce web pages from the JATS. Those web pages are generally the paid subscriber content of the publishers. (Many articles are also in the common license if required by funding sources.) So it sounds like a bad practice to engineer a transform for extracted text if an article might already be fully tagged in a semantically rich content model like JATS that includes HTML table tagging. https://jats.nlm.nih.gov/publishing/tag-library/1.0/n-pau2.html Regards, Dorothy > --------- Forwarded message ---------- From: Dave Pawson <dave.pawson@xxxxxxxxx<mailto:dave.pawson@xxxxxxxxx>> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 19 Jun 2023 07:38:31 +0100 Subject: Table transformation Interesting use of LLM https://blog.jonudell.net/2023/06/18/why-llm-assisted-table-transformation-is -a-big-deal/ regards -- Dave Pawson XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list> EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/1090027> (by email<>)
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format