Status: Accepted
Owner:
[hidden email]
Labels: Milestone-1.3
New issue 3541 by
[hidden email]: TextConverter should take into
account iso
http://code.google.com/p/pharo/issues/detail?id=3541After loading ConfigurationOfXML try to parse it:
|fs|
fs := FileStream fileNamed: 'test.xml'.
XMLDOMParser parseDocumentFrom: fs.
=> gives an error: 'Invalid utf8 input detected'
=> it works if you remove the CDATA section
Looks like UTF8TextConverter is used independent
from the encoding of the XML...
--------------
Torsten, what you are trying to do is not incorrect and should work as you
expected it to. The reason why it didn't has less to do with XMLSupport per
se and more to do with its reliance on Pharo's TextConverter system. The
problem is faulty matching of the "encoding" attribute value to the
appropriate subclass of TextConverter. The code responsible for this in
XMLSupport is:
converterClass :=
(Smalltalk
at: #TextConverter
ifAbsent: [^ self])
defaultConverterClassForEncoding: anEncodingName asLowercase.
But as you can see, the matching is actually done by TextConverter and its
class-side #defaultConverterClassForEncoding: method, which works by
sending #encodingNames to all subclasses and testing the array returned to
see if it includes the specified encoding name. If you browse
Latin1TextConverter, the right class for the encoding you specified, and
look at its #encodingNames message, you will see the array it returns does
not include "ISO-8859-1":
^ #('latin-1' 'latin1') copy.
Change it to this (note the lowercase):
^ #('latin-1' 'latin1' 'iso-8859-1') copy.
and everything now works.
So this is really a bug in TextConverter and its Latin1TextConverter
subclass, not XMLSupport. Also, the #allSubclassesDo: test in
#defaultConverterClassForEncoding: should probably be augmented with a
Dictionary cache to speed-up lookups for known encoding-converter pairs.
Can someone forward this message to whoever maintains TextConverter?