[glorp]Unicode string into SQL DB

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[glorp]Unicode string into SQL DB

Milan Čermák
Hi all,
is it possible to store unicode encoded multilingual string in any SQL
database? My concern is in PostgreSQL and MS SQL databases.

Thanks,
--
Ing. Milan Čermák
programátor, analytik

[hidden email]

................................................................
e-FRACTAL, s.r.o. => e-business driven company
nám. Míru 15, Praha 2, http://www.e-fractal.cz
tel: 222 512 000, fax: 222 515 000
................................................................


Reply | Threaded
Open this post in threaded view
|

Re: [glorp]Unicode string into SQL DB

Alan Knight-2
Yes, it is. It will depend on how the database is set up, and often on the connection. For Postgresql, you need a reasonably recent version (e.g. the one the public Store runs on right now is too old to support it properly). For Postgresql, I use code like the following, which tries to get the the encoding used for the connection, and if it can't, assumes the database is too old and falls back to iso-8859-1. Note that with Glorp, if you change the encoding, you need to tell the database connection, and you also need to tell the Glorp platform object.  For other databases, things may vary considerably, and you may need to do things like use an nvarchar in the schema, rather than a varchar.

setPostgresqlSessionToUnicode: session
        | encoding |
        (session accessor currentLogin database class == PostgreSQLPlatform) ifTrue: [
                "We're going to force the encoding, but if it can't tell us which one it's using, then this is so old that it won't let us set the encoding, and we really can't do much about it. Just hope that the encodings match well enough for the characters in use"
                encoding := session accessor executeSQLString: 'SHOW CLIENT_ENCODING'.
                encoding isEmpty
                        ifTrue: [
                                session accessor connection encoding: #'iso-8859-1'.
                                session system platform characterEncoding: #'iso-8859-1']
                        ifFalse: [      
                                session accessor connection encoding: #'utf-8'.
                                session accessor executeSQLString: 'SET CLIENT_ENCODING TO ''UNICODE'''.
                                session system platform characterEncoding: #'utf-8'.]].

At 11:06 AM 1/23/2006, =?ISO-8859-2?Q?Milan_=C8erm=E1k?= wrote:

>Hi all,
>is it possible to store unicode encoded multilingual string in any SQL database? My concern is in PostgreSQL and MS SQL databases.
>
>Thanks,
>--
>Ing. Milan Èermák
>programátor, analytik
>
>[hidden email]
>
>................................................................
>e-FRACTAL, s.r.o. => e-business driven company
>nám. Míru 15, Praha 2, http://www.e-fractal.cz
>tel: 222 512 000, fax: 222 515 000
>................................................................
>

--
Alan Knight [|], Cincom Smalltalk Development
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

"The Static Typing Philosophy: Make it fast. Make it right. Make it run." - Niall Ross

Reply | Threaded
Open this post in threaded view
|

Re: [glorp]Unicode string into SQL DB

Vladimir Pogorelenko
Alan Knight-2 wrote
setPostgresqlSessionToUnicode: session
                                session accessor connection encoding: #'utf-8'.
                                session accessor executeSQLString: 'SET CLIENT_ENCODING TO ''UNICODE'''.
                                session system platform characterEncoding: #'utf-8'.

Very helpful for me. Thanks Alan.

I've downloaded last Glorp from Cincom Public Repository, but still need to use code above to work with postgresql in utf-8.

Such an explicit code is rather bad. It would be great if this will be default behaviour for GLORP to use PostgreSQL without any additional coding.

I'm interesting in how I can integrate this code with Glorp-Seaside bundle and WAGlorpSession/GlorpConfiguration?

Do I need to explicitly setup encoding for each connection?
Reply | Threaded
Open this post in threaded view
|

RE: [glorp]Unicode string into SQL DB

Boris Popov, DeepCove Labs (SNN)
Speaking of Unicode, here's something to keep in mind,

http://leftshore.wordpress.com/2007/11/08/word-of-caution-when-enabling-
unicode-odbc/

... or for those whose email client is as bad or worse than my Outlook,

http://tinyurl.com/256cos

Cheers!

-Boris

--
+1.604.689.0322
DeepCove Labs Ltd.
4th floor 595 Howe Street
Vancouver, Canada V6C 2T5
http://tinyurl.com/r7uw4

[hidden email]

CONFIDENTIALITY NOTICE

This email is intended only for the persons named in the message
header. Unless otherwise indicated, it contains information that is
private and confidential. If you have received it in error, please
notify the sender and delete the entire message including any
attachments.

Thank you.

> -----Original Message-----
> From: Vladimir Pogorelenko [mailto:[hidden email]]
> Sent: Wednesday, December 12, 2007 7:59 AM
> To: [hidden email]
> Subject: Re: [glorp]Unicode string into SQL DB
>
>
>
> Alan Knight-2 wrote:
> >
> >
> > setPostgresqlSessionToUnicode: session
> >                                 session accessor connection
encoding:
> > #'utf-8'.
> >                                 session accessor executeSQLString:
'SET

> > CLIENT_ENCODING TO ''UNICODE'''.
> >                                 session system platform
> characterEncoding:
> > #'utf-8'.
> >
>
>
> Very helpful for me. Thanks Alan.
>
> I've downloaded last Glorp from Cincom Public Repository, but still
need
> to
> use code above to work with postgresql in utf-8.
>
> Such an explicit code is rather bad. It would be great if this will be
> default behaviour for GLORP to use PostgreSQL without any additional
> coding.
>
> I'm interesting in how I can integrate this code with Glorp-Seaside
bundle
> and WAGlorpSession/GlorpConfiguration?
>
> Do I need to explicitly setup encoding for each connection?
> --
> View this message in context:
http://www.nabble.com/-glorp-Unicode-string-
> into-SQL-DB-tp2537501p13892301.html
> Sent from the VisualWorks mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: [glorp]Unicode string into SQL DB

Alan Knight-2
In reply to this post by Vladimir Pogorelenko
At 10:59 AM 12/12/2007, Vladimir Pogorelenko wrote:


Alan Knight-2 wrote:
>
>
> setPostgresqlSessionToUnicode: session
>                                 session accessor connection encoding:
> #'utf-8'.
>                                 session accessor executeSQLString: 'SET
> CLIENT_ENCODING TO ''UNICODE'''.
>                                 session system platform characterEncoding:
> #'utf-8'.
>


Very helpful for me. Thanks Alan.

I've downloaded last Glorp from Cincom Public Repository, but still need to
use code above to work with postgresql in utf-8.

Such an explicit code is rather bad. It would be great if this will be
default behaviour for GLORP to use PostgreSQL without any additional coding.

I'm interesting in how I can integrate this code with Glorp-Seaside bundle
and WAGlorpSession/GlorpConfiguration?

Do I need to explicitly setup encoding for each connection?

That would probably be a good idea. You'd still want to set the characterEncoding on the platform appropriately, but you can do that when you're defining your login. As Boris points out, it's not necessarily the case that you always want to use unicode. And the way to set the encoding varies by database, and also by dialect. I've made a note to look at that, but I'm not going to be able to get to it immediatley.

If you set up your Postgresql database to default to utf-8 encoding, then you don't need the middle line. So that would basically leave you with the first line to do in setup for each connection.

To integrate this with Seaside-Glorp (note it was renamed) it does seem like you'd need to subclass or modify the connection pool at this point. Having some hooks where you can do things to initialize a new connection would be a good idea.
--
Alan Knight [|], Cincom Smalltalk Development
Reply | Threaded
Open this post in threaded view
|

Re: [glorp]Unicode string into SQL DB

tblanchard
In reply to this post by Boris Popov, DeepCove Labs (SNN)
Yep - setting the encoding on the client connecter tells it what you  
want to consume.  If the encoding in the database is different, it  
will perform conversions on the fly.  This (as you've seen) is  
expensive.

What you really want to do is make the database's internal encoding  
and the client encoding match.  If you need unicode, then you're  
looking at a full-on migration (often doable by taking the db offline,  
dumping the whole thing in the new encoding, then rebuilding it, again  
with the new encoding).

-Todd Blanchard

On Dec 12, 2007, at 9:09 AM, Boris Popov wrote:

> Speaking of Unicode, here's something to keep in mind,
>
> http://leftshore.wordpress.com/2007/11/08/word-of-caution-when-enabling-
> unicode-odbc/
>
> ... or for those whose email client is as bad or worse than my  
> Outlook,
>
> http://tinyurl.com/256cos
>
> Cheers!
>
> -Boris
>
> --
> +1.604.689.0322
> DeepCove Labs Ltd.
> 4th floor 595 Howe Street
> Vancouver, Canada V6C 2T5
> http://tinyurl.com/r7uw4
>
> [hidden email]
>
> CONFIDENTIALITY NOTICE
>
> This email is intended only for the persons named in the message
> header. Unless otherwise indicated, it contains information that is
> private and confidential. If you have received it in error, please
> notify the sender and delete the entire message including any
> attachments.
>
> Thank you.
>> -----Original Message-----
>> From: Vladimir Pogorelenko [mailto:[hidden email]]
>> Sent: Wednesday, December 12, 2007 7:59 AM
>> To: [hidden email]
>> Subject: Re: [glorp]Unicode string into SQL DB
>>
>>
>>
>> Alan Knight-2 wrote:
>>>
>>>
>>> setPostgresqlSessionToUnicode: session
>>>                                session accessor connection
> encoding:
>>> #'utf-8'.
>>>                                session accessor executeSQLString:
> 'SET
>>> CLIENT_ENCODING TO ''UNICODE'''.
>>>                                session system platform
>> characterEncoding:
>>> #'utf-8'.
>>>
>>
>>
>> Very helpful for me. Thanks Alan.
>>
>> I've downloaded last Glorp from Cincom Public Repository, but still
> need
>> to
>> use code above to work with postgresql in utf-8.
>>
>> Such an explicit code is rather bad. It would be great if this will  
>> be
>> default behaviour for GLORP to use PostgreSQL without any additional
>> coding.
>>
>> I'm interesting in how I can integrate this code with Glorp-Seaside
> bundle
>> and WAGlorpSession/GlorpConfiguration?
>>
>> Do I need to explicitly setup encoding for each connection?
>> --
>> View this message in context:
> http://www.nabble.com/-glorp-Unicode-string-
>> into-SQL-DB-tp2537501p13892301.html
>> Sent from the VisualWorks mailing list archive at Nabble.com.
>