Issue 3541 in pharo: TextConverter should take into account iso

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue 3541 in pharo: TextConverter should take into account iso

pharo
Status: Accepted
Owner: [hidden email]
Labels: Milestone-1.3

New issue 3541 by [hidden email]: TextConverter should take into  
account  iso
http://code.google.com/p/pharo/issues/detail?id=3541

After loading ConfigurationOfXML try to parse it:

|fs|
fs := FileStream fileNamed: 'test.xml'.
XMLDOMParser parseDocumentFrom: fs.


=> gives an error: 'Invalid utf8 input detected'
=> it works if you remove the CDATA section

Looks like UTF8TextConverter is used independent
from the encoding of the XML...

--------------


Torsten, what you are trying to do is not incorrect and should work as you  
expected it to. The reason why it didn't has less to do with XMLSupport per  
se and more to do with its reliance on Pharo's TextConverter system. The  
problem is faulty matching of the "encoding" attribute value to the  
appropriate subclass of TextConverter. The code responsible for this in  
XMLSupport is:
        converterClass :=
                (Smalltalk
                        at: #TextConverter
                        ifAbsent: [^ self])
                                defaultConverterClassForEncoding: anEncodingName asLowercase.

But as you can see, the matching is actually done by TextConverter and its  
class-side #defaultConverterClassForEncoding: method, which works by  
sending #encodingNames to all subclasses and testing the array returned to  
see if it includes the specified encoding name. If you browse  
Latin1TextConverter, the right class for the encoding you specified, and  
look at its #encodingNames message, you will see the array it returns does  
not include "ISO-8859-1":
        ^ #('latin-1' 'latin1') copy.

Change it to this  (note the lowercase):
        ^ #('latin-1' 'latin1' 'iso-8859-1') copy.

and everything now works.

So this is really a bug in TextConverter and its Latin1TextConverter  
subclass, not XMLSupport. Also, the #allSubclassesDo: test in  
#defaultConverterClassForEncoding: should probably be augmented with a  
Dictionary cache to speed-up lookups for known encoding-converter pairs.  
Can someone forward this message to whoever maintains TextConverter?










Reply | Threaded
Open this post in threaded view
|

Re: Issue 3541 in pharo: TextConverter should take into account iso

pharo
Updates:
        Status: Fixed
        Cc: stephane.ducasse
        Labels: -Milestone-1.3 Milestone-1.2

Comment #1 on issue 3541 by [hidden email]: TextConverter should  
take into account  iso
http://code.google.com/p/pharo/issues/detail?id=3541

Fixed in SLICE-Issue-3541-TextConverter-should-take-into-account-iso-tbn.1

Retagged for 1.2 since
  - this is safe to add
  - and XML processing would not work in iso-8859-1 scenarios without this  
fix in core


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3541 in pharo: TextConverter should take into account iso

pharo

Comment #2 on issue 3541 by [hidden email]: TextConverter should  
take into account  iso
http://code.google.com/p/pharo/issues/detail?id=3541

The slice is empty. As soon as you send the code we will integrate it.


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3541 in pharo: TextConverter should take into account iso

pharo

Comment #3 on issue 3541 by [hidden email]: TextConverter should  
take into account  iso
http://code.google.com/p/pharo/issues/detail?id=3541

Here it is, as a simple changeset.

It just changes Latin1TextConverter>>encodingNames as suggested.

Attachments:
        Latin1TextConverter class-encodingNames.st  244 bytes


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3541 in pharo: TextConverter should take into account iso

pharo
Updates:
        Status: Closed

Comment #4 on issue 3541 by [hidden email]: TextConverter should  
take into account  iso
http://code.google.com/p/pharo/issues/detail?id=3541

12308