[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: The entity was referenced, but not declared.
Hello XSL-List, If the source files in question contain HTML named entities there is a fair chance they also have a DOCTYPE declaration, and depending on what that is, it might be possible to use it to provide a set of entity declarations to the parser as a kind of "DTD stub", enabling the parse. But I like the idea of using an HTML parse first to normalize. Not only is there DC's XSLT Tag Soup parser as Martin mentioned, there are also not a few libraries with Tag Soup parsers. Resolving the entities can be considered as a discrete preparation process (i.e. a 'process'). Cheers, Wendell On Tue, Jun 13, 2023 at 2:36b/AM Martin Honnen martin.honnen@xxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On 6/13/2023 12:48 AM, Manuel Souto Pico terminolator@xxxxxxxxx wrote: > > > > > > I'm trying to convert a collection of XLIFF files into TMX. The files > > contain some HTML named entities, which makes my stylesheet choke: > > > > > > > > My question is: Is there any way I can avoid or fix this problem from > > the XSLT stylesheet without having to modify the input XLIFF files? > > > > The example above is with ndash but I believe there must be many HTM > > named entities in the files. > > > > David Carlisle wrote an HTML tag soup parser in XSLT 2 > ( > https://github.com/davidcarlisle/web-xslt/blob/main/htmlparse/htmlparse.xsl > ) > that knows all the named entities and can also be used as an XML parser > knowing those entities so if you use/import his stylesheet and use its > function instead of normal XML parsing, as in > > > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="3.0" > xmlns:xs="http://www.w3.org/2001/XMLSchema" > xmlns:d="data:,dpc" > exclude-result-prefixes="#all" > expand-text="yes"> > > <xsl:import > href=" > https://raw.githubusercontent.com/davidcarlisle/web-xslt/main/htmlparse/htmlp arse.xsl > "/> > > <xsl:param name="xml-uri" as="xs:string" select="'sample1.xml'"/> > > <xsl:mode on-no-match="shallow-copy"/> > > <xsl:template name="xsl:initial-template"> > <xsl:apply-templates select="unparsed-text($xml-uri) => > d:htmlparse('', false())"/> > </xsl:template> > > </xsl:stylesheet> > > the named entity references should be parsed into the corresponding > characters (and you can process all nodes by adding any templates you > need/have/want to transform the XML). So the above assumes starting e.g. > Saxon 9.8 or later with `-it` for the initial template. > > > -- ...Wendell Piez... ...wendell -at- nist -dot- gov... ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org... ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|