[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Form feed character () in decoded xs:base64B

Subject: Re: Form feed character () in decoded xs:base64Binary
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 13 Jul 2020 19:16:50 -0000
Re:  Form feed character (&#xc;) in decoded  xs:base64B
On 13.07.2020 20:55, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx wrote:
Ah, sorry :) I get it now.

After I make the change:

<?xml version="1.1" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
     <!ENTITY rdf    "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
     <!ENTITY rdfs   "http://www.w3.org/2000/01/rdf-schema#">
     <!ENTITY xsd    "http://www.w3.org/2001/XMLSchema#">
     <!ENTITY dct    "http://purl.org/dc/terms/">
     <!ENTITY skos   "http://www.w3.org/2004/02/skos/core#">
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
...

I start getting this error:

Error on line 248 column 46 of messages2trix.xsl:
   SXXP0003  Error reported by XML parser: The entity "xsd" was
referenced, but not declared.
org.xml.sax.SAXParseException; systemId: file:/.../messages2trix.xsl;
lineNumber: 248; columnNumber: 46; The entity "xsd" was referenced,
but not declared.

Line 248 contains "&xsd;dateTime".

I donbt know about any limitations of XML 1.1 parsers wrt plain character entities declared in the internal subset. There shouldnbt be any, according to my reading of the spec.


Can you extract only the single template that is responsible for base64 decoding and LF stripping into a separate XSLT document (XML 1.1) that need not refer to any entities? Then xsl:include this document from the XML 1.0 XSLT document that contains what used to be line 248.

Gerrit


On Mon, Jul 13, 2020 at 8:44 PM Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

I was suggesting that you prepend <?xml version="1.1"?> to your stylesheet document, hoping that you are then able to apply translate(., '&#xc;', '') to the decoded string.

On 13.07.2020 20:26, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx wrote:
With xsl:output version="1.1", the form feed is not a problem - Saxon
writes the decoded xs:base64Binary string without any replacements.

However I'm getting weird parsing errors downstream in my RDF toolkit
(which works fine with XML 1.0). I'll try to see what the problem is.

On Mon, Jul 13, 2020 at 8:00 PM Imsieke, Gerrit, le-tex
gerrit.imsieke@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:

What happens if you use version="1.1" in the XML declaration (of the stylesheet)?

On 13.07.2020 19:54, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx wrote:
Hi,

I'm transforming large JSON files with some email data using XSLT 3.0.
They contain xs:base64Binary literals which I'm decoding using
bin:decode-string() and want to include the decoded values in the
output XML.

The problem is that some of the decoded string values have illegal XML
1.0 characters in them, such as Form feed (&#xc;).

I want to remove them but cannot find a way.
I can't use translate(., '&#xc;', '') because the stylesheet would not
be well-formed anymore.
I can't even use replace(., codepoints-to-string(12), '') because I
get this error (with Saxon 10.1 EE):

       codepoints-to-string(): invalid XML character [xc]. Found while
atomizing the second argument of fn:replace()

Are there any native XSLT options here?

Thanks.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.