Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

Maarten Mostert-2
Hi,
 
I have a problem with a Glorp Query and Postgres.
 
The postgres database has UTF8 encoding.
 
The query I do is very simple, I just want to verify I don't allready have a project with the same name.
 
newName='Nigéria'.
 
self getGlorpSession readOneOf: Projects
        where: [:each | each proj_name = newName ]
 
I get the above mentioned error when doing the query which is surprising as the database allready contains a Project with this name and inserting a Project where project_name = 'Nigéria' is no problem.
 
Notice that this only arises with Postgres, Oracle, ACCES and SQLServeur have no problems with it.
 
Going over the documentation (BasicLibraries 2-20, and Internationalisation page 20) I don't undertand what is going on:
 
It states "EncodedStream that wraps a WriteStream on a ByteArray".
 
If I execute the following code to obtain a UTF8 encoded string the situation gets worse:
 
'ét à moi'  stringEncoding streamEncodingType  ==> #ISO8859_1
 
('ét à moi'  asByteArrayEncoding:#ISO8859_1) ==> #[233 116 32 224 32 109 111 105]
 
('ét à moi'  asByteArrayEncoding:#utf8)  ==>  #[195 169 116 32 195 160 32 109 111 105]
 
('ét à moi'  asByteArrayEncoding:#utf8) asByteString   ==>> 'ét à moi'
 
 
Regards,
 
@+Maarten,

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

Georg Heeg

Maaten,

 

Inside Smalltalk a string is a sequence of characters, viewed as a black box, encoding does not matter. When Smalltalk is talking to the outside world, C function calls, files or (as in the case of Postgres) sockets the communication is a based upon sequences of bytes. Only here encoding (in your case UTF8) is of any importance.

 

Certainly there are conversion methods in Smalltalk which allow to convert strings into ByteArrays and vice versa. On the string side the encoding is of no importance, on the ByteArray side the program must know the encoding as ByteArrays are just sequences of bytes.

 

That is why

('ét à moi'  asByteArrayEncoding:#utf8) asByteString   ==>> 'ét à moi'

is crystal clear. You take a string, encode it using UTF8, and then you take the resulting ByteString and decode it using ISO8859-L1.

 

A similar situation seems to happen in the Postgres interface (just the other way round). Smalltalk sends 'Nigéria' over the socket to the Postgres server using ISO8859-L1 encoder and Postgres tries to decode it as UTF-8. And Postgres complains.

 

Obviously the interface on the Smalltalk side does not know/does not care about the encoding the Postgres server wants.

 

Can you share which version of Smalltalk you are using?

 

Georg

 

 

Georg Heeg eK, Dortmund und Köthen, HR Dortmund A 12812

Tel. +49-3496-214328, Fax +49-3496-214712

Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten MOSTERT
Gesendet: Sonntag, 21. Februar 2010 12:21
An: VWNC
Betreff: [vwnc] Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

 

Hi,

 

I have a problem with a Glorp Query and Postgres.

 

The postgres database has UTF8 encoding.

 

The query I do is very simple, I just want to verify I don't allready have a project with the same name.

 

newName='Nigéria'.

 

self getGlorpSession readOneOf: Projects
        where: [:each | each proj_name = newName ]

 

I get the above mentioned error when doing the query which is surprising as the database allready contains a Project with this name and inserting a Project where project_name = 'Nigéria' is no problem.

 

Notice that this only arises with Postgres, Oracle, ACCES and SQLServeur have no problems with it.

 

Going over the documentation (BasicLibraries 2-20, and Internationalisation page 20) I don't undertand what is going on:

 

It states "EncodedStream that wraps a WriteStream on a ByteArray".

 

If I execute the following code to obtain a UTF8 encoded string the situation gets worse:

 

'ét à moi'  stringEncoding streamEncodingType  ==> #ISO8859_1

 

('ét à moi'  asByteArrayEncoding:#ISO8859_1) ==> #[233 116 32 224 32 109 111 105]

 

('ét à moi'  asByteArrayEncoding:#utf8)  ==>  #[195 169 116 32 195 160 32 109 111 105]

 

('ét à moi'  asByteArrayEncoding:#utf8) asByteString   ==>> 'ét à moi'

 

 

Regards,

 

@+Maarten,


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

Bruce Badger
In reply to this post by Maarten Mostert-2
If you want to see exactly what is flowing between your image and
PostgreSQL, just open the PostgreSQL connection monitor in VisualWorks
(Tools > PostgreSQL > PostgtreSQL Connection List.   Use F5 to refresh
the list of connections.  Double click on the connection you want to
monitor and choose show to see all exchanges.  Use single step to have
the monitor stop after each exchange.

hth

On 21 February 2010 11:20, Maarten MOSTERT <[hidden email]> wrote:

> Hi,
>
> I have a problem with a Glorp Query and Postgres.
>
> The postgres database has UTF8 encoding.
>
> The query I do is very simple, I just want to verify I don't allready have a
> project with the same name.
>
> newName='Nigéria'.
>
> self getGlorpSession readOneOf: Projects
>         where: [:each | each proj_name = newName ]
>
> I get the above mentioned error when doing the query which is surprising as
> the database allready contains a Project with this name and inserting a
> Project where project_name = 'Nigéria' is no problem.
>
> Notice that this only arises with Postgres, Oracle, ACCES and SQLServeur
> have no problems with it.
>
> Going over the documentation (BasicLibraries 2-20, and Internationalisation
> page 20) I don't undertand what is going on:
>
> It states "EncodedStream that wraps a WriteStream on a ByteArray".
>
> If I execute the following code to obtain a UTF8 encoded string the
> situation gets worse:
>
> 'ét à moi'  stringEncoding streamEncodingType  ==> #ISO8859_1
>
> ('ét à moi'  asByteArrayEncoding:#ISO8859_1) ==> #[233 116 32 224 32 109 111
> 105]
>
> ('ét à moi'  asByteArrayEncoding:#utf8)  ==>  #[195 169 116 32 195 160 32
> 109 111 105]
>
> ('ét à moi'  asByteArrayEncoding:#utf8) asByteString   ==>> 'ét à moi'
>
>
> Regards,
>
> @+Maarten,
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>



--
Make the most of your skills - with OpenSkills
http://www.openskills.org/

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

Maarten Mostert-2
In reply to this post by Georg Heeg
Dear Georg,
 
I haven't jet solved my problem but thanks for the explanation.
----- Original Message -----
 

Can you share which version of Smalltalk you are using?

 

Georg

I run VW7.7NC (dec01) with Glorp 7.8 0.
 
Regards,
 
@+Maarten,

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

Joachim Geidel
In reply to this post by Maarten Mostert-2
Hi Maarten,

AFAICT, the error you got is an exception raised by PostgreSQL, not
VisualWorks. So it seems that the bytes passed to PostgreSQL are not encoded
as UTF-8. Did you set the characterEncoding of your instance of
PostgreSQLPlatform to UTF-8? Did you also set the CLIENT_ENCODING property
of the session to 'UNICODE'? I am just speculating as I am not too familiar
with the PostgreSQL EXDI and PostgreSQL, but the code in
Store.Glorp.StoreLoginFactory class>>setPostgresqlSessionToUnicode:
indicates that setting the characterEncoding of the platform might not be
enough.

> Going over the documentation (BasicLibraries 2-20, and Internationalisation
> page 20) I don't undertand what is going on:
>  
> It states "EncodedStream that wraps a WriteStream on a ByteArray".
>  
> If I execute the following code to obtain a UTF8 encoded string the situation
> gets worse:
>  
> 'ét à moi'  stringEncoding streamEncodingType  ==> #ISO8859_1

This means that VisualWorks uses ISO8859P1 internally.

> ('ét à moi'  asByteArrayEncoding:#ISO8859_1) ==> #[233 116 32 224 32 109 111
> 105]
>  
> ('ét à moi'  asByteArrayEncoding:#utf8)  ==>  #[195 169 116 32 195 160 32 109
> 111 105]

This just demonstrates that ISO8859P1 and UTF-8 are different encodings. You
stated that you wanted to encode the string as UTF-8, so the last
expressions yields the correct result.

> ('ét à moi'  asByteArrayEncoding:#utf8) asByteString   ==>> 'ét à  moi'

This last one encodes the string in UTF-8 and decodes it as ISO8859P1
(default), which of course won't return the original string. You have to use
the same encoding for decoding which you used for encoding:

('ét à moi'  asByteArrayEncoding:#utf8) asStringEncoding: #utf8

Note that asStringEncoding: was introduced in VW 7.7, it doesn't exist in
7.6.

HTH,
Joachim Geidel





_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Postgres 'ERROR: invalid byte sequence for encoding"UTF8": 0xe97269

Maarten Mostert-2
In reply to this post by Maarten Mostert-2
Thanks for the help.
 
I found a bug in my database upgrade mecanismes which made this possible.
Actually I request the encoding from the database and then adjust accordingly.
But when switching and upgrading from one descriptor to another it appeared that I lost these settings.
 
Regards,
 
Maarten,

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Postgres 'ERROR: invalid byte sequence for encoding "UTF8": 0xe97269

Niall Ross
In reply to this post by Maarten Mostert-2
Dear Maarten,
    (I expect you are well aware of this, but just for the benefit of
any vwdev readers who need it)  Glorp 7.7 - 70 is the VW7.7 released
version and is later than Glorp 7.8.0.  There is no reason I know to use
Glorp 7.8.0 with the VW 7.7 release instead of the distro version.

(I assume the 7.8 start-fork version was the latest visible to you at
the time you installed/created your current set up.)

          Yours faithfully
             Niall Ross

> Dear Georg,
>  
> I haven't jet solved my problem but thanks for the explanation.
>
>     ----- Original Message -----
>     *From:* Georg Heeg <mailto:[hidden email]>
>      
>
>     Can you share which version of Smalltalk you are using?
>
>      
>
>     Georg
>
> I run VW7.7NC (dec01) with Glorp 7.8 0.
>  
> Regards,
>  
> @+Maarten,
>
>------------------------------------------------------------------------
>
>_______________________________________________
>vwnc mailing list
>[hidden email]
>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>  
>


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc