Smalltalk › Pharo › Pharo Smalltalk Developers

Issue 3541 in pharo: TextConverter should take into account iso

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

pharo

Issue 3541 in pharo: TextConverter should take into account iso

Status: Accepted
Owner: [hidden email]
Labels: Milestone-1.3

New issue 3541 by [hidden email]: TextConverter should take into
account iso
http://code.google.com/p/pharo/issues/detail?id=3541

After loading ConfigurationOfXML try to parse it:

|fs|
fs := FileStream fileNamed: 'test.xml'.
XMLDOMParser parseDocumentFrom: fs.

=> gives an error: 'Invalid utf8 input detected'
=> it works if you remove the CDATA section

Looks like UTF8TextConverter is used independent
from the encoding of the XML...

--------------

Torsten, what you are trying to do is not incorrect and should work as you
expected it to. The reason why it didn't has less to do with XMLSupport per
se and more to do with its reliance on Pharo's TextConverter system. The
problem is faulty matching of the "encoding" attribute value to the
appropriate subclass of TextConverter. The code responsible for this in
XMLSupport is:
converterClass :=
(Smalltalk
at: #TextConverter
ifAbsent: [^ self])
defaultConverterClassForEncoding: anEncodingName asLowercase.

But as you can see, the matching is actually done by TextConverter and its
class-side #defaultConverterClassForEncoding: method, which works by
sending #encodingNames to all subclasses and testing the array returned to
see if it includes the specified encoding name. If you browse
Latin1TextConverter, the right class for the encoding you specified, and
look at its #encodingNames message, you will see the array it returns does
not include "ISO-8859-1":
^ #('latin-1' 'latin1') copy.

Change it to this (note the lowercase):
^ #('latin-1' 'latin1' 'iso-8859-1') copy.

and everything now works.

So this is really a bug in TextConverter and its Latin1TextConverter
subclass, not XMLSupport. Also, the #allSubclassesDo: test in
#defaultConverterClassForEncoding: should probably be augmented with a
Dictionary cache to speed-up lookups for known encoding-converter pairs.
Can someone forward this message to whoever maintains TextConverter?

pharo

Re: Issue 3541 in pharo: TextConverter should take into account iso

Updates:
Status: Fixed
Cc: stephane.ducasse
Labels: -Milestone-1.3 Milestone-1.2

Comment #1 on issue 3541 by [hidden email]: TextConverter should
take into account iso
http://code.google.com/p/pharo/issues/detail?id=3541

Fixed in SLICE-Issue-3541-TextConverter-should-take-into-account-iso-tbn.1

Retagged for 1.2 since
- this is safe to add
- and XML processing would not work in iso-8859-1 scenarios without this
fix in core

pharo

Re: Issue 3541 in pharo: TextConverter should take into account iso

Comment #2 on issue 3541 by [hidden email]: TextConverter should
take into account iso
http://code.google.com/p/pharo/issues/detail?id=3541

The slice is empty. As soon as you send the code we will integrate it.

pharo

Re: Issue 3541 in pharo: TextConverter should take into account iso

Comment #3 on issue 3541 by [hidden email]: TextConverter should
take into account iso
http://code.google.com/p/pharo/issues/detail?id=3541

Here it is, as a simple changeset.

It just changes Latin1TextConverter>>encodingNames as suggested.

Attachments:
Latin1TextConverter class-encodingNames.st 244 bytes

pharo

Re: Issue 3541 in pharo: TextConverter should take into account iso

Updates:
Status: Closed

Comment #4 on issue 3541 by [hidden email]: TextConverter should
take into account iso
http://code.google.com/p/pharo/issues/detail?id=3541

12308