Smalltalk › Instantiations

Double UTF-8 decoding of SOAP responses

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

2 messages Options

Hans-Martin Mosner-3

Double UTF-8 decoding of SOAP responses

Hi folks,
I'm receiving data from a web service via SOAP.
Due to a change in AbtXmlStreamConverter class>>#encodingFromStream:, a double UTF-8 decoding happens, which destroys umlaut characters in the result.
I've tracked it down to a change from 2010-10-28, don't know which VAST version this change appeared in, as we have skipped a number of versions before switching to 8.6 now.
Does anybody know why this change was introduced and why the regression caused by this change has not been noticed?
For now, we've reverted to the old version of the method, as it is unclear to us which problem the change is supposed to solve.
In general, I'd claim that as soon as byte data is being converted to a string, the encoding should be determined finally and correctly, and string data should never be subject to additional decoding steps (except for strictly string-level operations such as decoding HTML entities or other escaped characters). But fixing all code related to character encoding is definitely not within our scope...

Cheers,
Hans-Martin

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at http://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/groups/opt_out.

dfe

Re: Double UTF-8 decoding of SOAP responses

Hi Hans-Martin,

The change was made in version 8.0.3 in to prevent a nil value being returned for the encoding when it was not specified in the xml document as this caused issues for some customers.

Does your XML or SOAP document have a prolog in which the encoding is specified ?

For example: <?xml version="1.0" encoding="UTF-8" ?>

If it does not, the side effect you are likely seeing with the umlaut characters has been reported once. We have case 49129 open to resolve this issue.

The correct workaround is to use the older version of the method.

On Thursday, December 19, 2013 4:59:58 AM UTC-5, Hans-Martin Mosner wrote:

Hi folks,
I'm receiving data from a web service via SOAP.
Due to a change in AbtXmlStreamConverter class>>#encodingFromStream:, a double UTF-8 decoding happens, which destroys umlaut characters in the result.
I've tracked it down to a change from 2010-10-28, don't know which VAST version this change appeared in, as we have skipped a number of versions before switching to 8.6 now.
Does anybody know why this change was introduced and why the regression caused by this change has not been noticed?
For now, we've reverted to the old version of the method, as it is unclear to us which problem the change is supposed to solve.
In general, I'd claim that as soon as byte data is being converted to a string, the encoding should be determined finally and correctly, and string data should never be subject to additional decoding steps (except for strictly string-level operations such as decoding HTML entities or other escaped characters). But fixing all code related to character encoding is definitely not within our scope...

Cheers,
Hans-Martin