[Home] [By Thread] [By Date] [Recent Entries]
Mario Madunic wrote:
the character is and its a control character Unfortunately, that says it all. Control characters are not allowed in UTF-8 and as a result, are not allowed in XML, when the encoding is UTF-8 (making XML not well-formed) the error message I recieve is This is indeed illegal. The other day I accidentally used , which is also illegal (I had it mistaken for a tab character, x09, which *is* legal) . I've tried using ANT to clean it out but with no luck using native2ascii or Won't help either. Escaping these characters will not help. But you are on the right track: use a filter to remove this character, or replace it with something useful. I use a filter to get Micrososft *.msg format, which has some useful lines, but the rest are control characters and other illegal data. Here's what it might look like when you'd resort to using Ruby (you can call it from Ant if you like), see www.ruby-lang.org. (spoiler warning: this is off-topic and only marginally related to xslt)
Dir.entries(".").each do |fn| if fn =~ /\.yourextension/ # open file and set it to binmode file = File.new(fn) file.binmode # read complete file contents and scan it newfile = File.new("trimmed/#{fn}.txt", 'w') file.gets(nil).scan(/[^\x18]+/m) do |found| newfile.puts(found); end end end Just replace "yourextension" with the extension of your file and replace "trimmed" with an output dirname of your choice. Replace '.txt" with whatever extension you would like yourself. It runs through the current directory and copies all files to the "trimmed" directory, with one change: the x18 character is removed. Of course, you can use Perl, a DOS Batch file (takes some practice), Bash, VBScript, PHP, Grep, Awk or any other tool you'd prefer. HTH, Cheers, Abel Braaksma http://abelleba.metacarpus.com Can this be done or do I need to ask the client to remove it from their data, which might not be an option?
|

Cart



