OracleEXDI has no support for AL32UTF8

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

OracleEXDI has no support for AL32UTF8

Runar Jordahl
We use VisualWorks 7.7.1, and have a customer which uses Oracle 10
with 'NLS_CHARACTERSET' set to 'AL32UTF8'. When connecting with
OracleEXDI, the connection fails with 'Unhandled exception: Column
encoding not yet recognized'.

Looking into the system, I can see that OracleConnection
class>>initializeEncoderMap does not include AL32UTF8.

Here is what Oracle writes about AL32UTF8:

"The 8-bit encoding of Unicode. It is a variable-width encoding. One
Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8
encoding. Characters from the European scripts are represented in
either 1 or 2 bytes. Characters from most Asian scripts are
represented in 3 bytes. Supplementary characters are represented in 4
bytes. The Oracle character set that supports UTF-8 is AL32UTF8." (
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/glossary.htm
)

As far as I can understand, AL32UTF8 is Oracle's proper implementation
of UTF8: http://www.mail-archive.com/perl-unicode@.../msg02239.html

I therefore wonder whether OracleConnection
class>>initializeEncoderMap should have the following entry added:

at: 'AL32UTF8' put: #utf_8


Kind regards
Runar
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OracleEXDI has no support for AL32UTF8

Henrik Sperre Johansen
On Mar 17, 2011, at 1:28 58PM, Runar Jordahl wrote:

> We use VisualWorks 7.7.1, and have a customer which uses Oracle 10
> with 'NLS_CHARACTERSET' set to 'AL32UTF8'. When connecting with
> OracleEXDI, the connection fails with 'Unhandled exception: Column
> encoding not yet recognized'.
>
> Looking into the system, I can see that OracleConnection
> class>>initializeEncoderMap does not include AL32UTF8.
>
> Here is what Oracle writes about AL32UTF8:
>
> "The 8-bit encoding of Unicode. It is a variable-width encoding. One
> Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8
> encoding. Characters from the European scripts are represented in
> either 1 or 2 bytes. Characters from most Asian scripts are
> represented in 3 bytes. Supplementary characters are represented in 4
> bytes. The Oracle character set that supports UTF-8 is AL32UTF8." (
> http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/glossary.htm
> )
>
> As far as I can understand, AL32UTF8 is Oracle's proper implementation
> of UTF8: http://www.mail-archive.com/perl-unicode@.../msg02239.html
>
> I therefore wonder whether OracleConnection
> class>>initializeEncoderMap should have the following entry added:
>
> at: 'AL32UTF8' put: #utf_8
>
>
> Kind regards
> Runar
> __________________

I've run into the same, it is really outdated.
Not to mention it contains incorrect mappings, like using a standard UTF8Encoder for oracle's "UTF8" (which is correct in *most* cases though)

IMHO, the whole mess of looking up where the client reads NLS_LANG from to find the correct format to send the data in could/should be replaced by setting the encoding of the connection directly, using OCIEnvNlsCreate().
It's been there since 9.X, and 8.X went out of extended support from Oracle > 5 years ago...

Not that it solves all problems for us norwegians, even when you directly set the character set you'll be sending data in to something sensible, you still need to convert decimal points to that used by the country specified in NLS_LANG (which is not currently done at all by VW, btw)...
You can get it for an environment through OCINlsGetInfo, passing as item OCI_NLS_DECIMAL, haven't found a way to tell it what to use.

TLDR; Oracle i18n support in VisualWorks is a real mess, and could be simplified/improved significantly.

Cheers,
Henry

PS.
AFAIK, NLS_CHARACTERSET is internal to the server, and have no effect from a clients point of view (except being what the client converts to for certain column types)
To find the expected encoding of strings you give to the client (if not set specifically as described above), you use NLS_LANG.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OracleEXDI has no support for AL32UTF8

Kooyman, Les
Re: [vwnc] OracleEXDI has no support for AL32UTF8
FWIW, the thousands separator and decimal point can be accessed for the CLDR locales from the associated NumberPrintPolicy or currency policy.
 
There is no guarantee that the operating system's and the Unicode Consortium's value agree for any particular operating system, however. (In fact, I'd be interested in hearing of cases where they disagree.)
 
Les


From: [hidden email] on behalf of Henrik Johansen
Sent: Thu 3/17/2011 7:34 AM
To: Runar Jordahl
Cc: [hidden email]
Subject: Re: [vwnc] OracleEXDI has no support for AL32UTF8

On Mar 17, 2011, at 1:28 58PM, Runar Jordahl wrote:


> We use VisualWorks 7.7.1, and have a customer which uses Oracle 10
> with 'NLS_CHARACTERSET' set to 'AL32UTF8'. When connecting with
> OracleEXDI, the connection fails with 'Unhandled exception: Column
> encoding not yet recognized'.
>
> Looking into the system, I can see that OracleConnection
> class>>initializeEncoderMap does not include AL32UTF8.
>
> Here is what Oracle writes about AL32UTF8:
>
> "The 8-bit encoding of Unicode. It is a variable-width encoding. One
> Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8
> encoding. Characters from the European scripts are represented in
> either 1 or 2 bytes. Characters from most Asian scripts are
> represented in 3 bytes. Supplementary characters are represented in 4
> bytes. The Oracle character set that supports UTF-8 is AL32UTF8." (
> http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/glossary.htm
> )
>
> As far as I can understand, AL32UTF8 is Oracle's proper implementation
> of UTF8: http://www.mail-archive.com/perl-unicode@.../msg02239.html
>
> I therefore wonder whether OracleConnection
> class>>initializeEncoderMap should have the following entry added:
>
> at: 'AL32UTF8'                put: #utf_8
>
>
> Kind regards
> Runar
> __________________

I've run into the same, it is really outdated.
Not to mention it contains incorrect mappings, like using a standard UTF8Encoder for oracle's "UTF8" (which is correct in *most* cases though)

IMHO, the whole mess of looking up where the client reads NLS_LANG from to find the correct format to send the data in could/should be replaced by setting the encoding of the connection directly, using OCIEnvNlsCreate().
It's been there since 9.X, and 8.X went out of extended support from Oracle > 5 years ago...

Not that it solves all problems for us norwegians, even when you directly set the character set you'll be sending data in to something sensible, you still need to convert decimal points to that used by the country specified in NLS_LANG (which is not currently done at all by VW, btw)...
You can get it for an environment through OCINlsGetInfo, passing as item OCI_NLS_DECIMAL, haven't found a way to tell it what to use.

TLDR; Oracle i18n support in VisualWorks is a real mess, and could be simplified/improved significantly.

Cheers,
Henry

PS.
AFAIK, NLS_CHARACTERSET is internal to the server, and have no effect from a clients point of view (except being what the client converts to for certain column types)
To find the expected encoding of strings you give to the client (if not set specifically as described above), you use NLS_LANG.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OracleEXDI has no support for AL32UTF8

Henrik Sperre Johansen
On 17.03.2011 17:15, Kooyman, Les wrote:
Re: [vwnc] OracleEXDI has no support for AL32UTF8
FWIW, the thousands separator and decimal point can be accessed for the CLDR locales from the associated NumberPrintPolicy or currency policy.
 
There is no guarantee that the operating system's and the Unicode Consortium's value agree for any particular operating system, however. (In fact, I'd be interested in hearing of cases where they disagree.)
 
Les
Haha.
These aren't Unicode or OS system's locale values we are talking about, these are Oracles.
Just like with their encoding names, they felt no need whatsoever to follow any kind of standard.
Heck, they even have special Java classes just for mapping between them:
http://www.stanford.edu/dept/itss/docs/oracle/10g/appdev.101/b10971/oracle/i18n/util/LocaleMapper.html

Sorry if I sound harsh, still a bit agitated from trying to remember the details required for writing the last post, and remembering the pain it brought me when trying to understand how it worked.

Cheers,
Henry

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc