Smalltalk › Pharo › Pharo Smalltalk Users

NVARCHAR and Glorp

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

eftomi

NVARCHAR and Glorp

Dear all,

I'm working on ADO database driver for Glorp by using Pablo's PharoCOM. The
structure of the driver is similar to Sven's P3DatabaseDriver and P3Client
(thanks, Sven for very clear architecture :-) ) and the work is progressing
quite nicely. I can connect to SQL Server, PostgreSQL and create tables,
insert and read data directly with ADO Connection and Recordset objects, and
with Glorp, too. In next days I'll do some more testing, including Oracle,
and publish the prototype on GitHub. The goal is not PostgreSQL, of course,
but to finally reach SQL Server, Oracle and possibly MS Access, at least
from Windows systems.

One thing is bothering me though - SQL Server supports Unicode with NCHAR
and NVARCHAR types, and when using INSERT INTO with literal strings these
should be prefixed with N, like in

N'This is a Unicode string'

Does somebody remember whether Glorp can prefix the strings in that way when
constructing SQL statements? I checked all the constants and string
conversions under DatabasePlatform but haven't found anything related to
this kind of prefixing.

Thanks and best wishes,
Tomaz

--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Sven Van Caekenberghe-2

Re: NVARCHAR and Glorp

Hi Tomaz,

> On 11 Oct 2019, at 18:15, eftomi <[hidden email]> wrote:
>
> Dear all,
>
> I'm working on ADO database driver for Glorp by using Pablo's PharoCOM. The
> structure of the driver is similar to Sven's P3DatabaseDriver and P3Client
> (thanks, Sven for very clear architecture :-) ) and the work is progressing
> quite nicely. I can connect to SQL Server, PostgreSQL and create tables,
> insert and read data directly with ADO Connection and Recordset objects, and
> with Glorp, too. In next days I'll do some more testing, including Oracle,
> and publish the prototype on GitHub. The goal is not PostgreSQL, of course,
> but to finally reach SQL Server, Oracle and possibly MS Access, at least
> from Windows systems.

Great to hear this, please keep us all posted of your progress.

> One thing is bothering me though - SQL Server supports Unicode with NCHAR
> and NVARCHAR types, and when using INSERT INTO with literal strings these
> should be prefixed with N, like in
>
> N'This is a Unicode string'
>
> Does somebody remember whether Glorp can prefix the strings in that way when
> constructing SQL statements? I checked all the constants and string
> conversions under DatabasePlatform but haven't found anything related to
> this kind of prefixing.

I don't know for sure.

Could you not add the N at the moment the string quotes are written ?

What I do know is this: the interface between P3DatabaseDriver and P3Client is essentially an SQL string to execute. That SQL string was built by the Glorp machinery, so it must know about certain specifics (like string quoting and escaping, etc). Also, in Glorp's Login #database specification, a platform is set, like PostgreSQLPlatform. So I am assuming some platform specific delegation is happening (or could happen).

I don't have a Glorp image ready, but there must be a way to find this out.

That being said, I do foresee a problem for you. You are building a more general driver that can access several different platforms, you will need to make that explicit, else you won't be able to do something specific for SQL Server.

HTH,

Sven

> Thanks and best wishes,
> Tomaz
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>

eftomi

Re: NVARCHAR and Glorp

Thanks, Sven. I found the "ultimate" place where strings are being prepared
for INSERT INTO clause - this happens in String>>#glorpPrintSQLOn:. Not so
easy to parametrize at a first glance. Maybe with
WideString>>#glorpPrintSQLOn: but I'm not sure how other platforms would be
happy. I decided to work on other issues right now so that I can publish a
relatively decent package ASAP.

An interesting observation about UTF-8 and SQL Server 2019 is here:
https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/.

Best wishes,
Tomaz

--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Solomon.Rutzky

Re: NVARCHAR and Glorp

eftomi wrote

I found the "ultimate" place where strings are being prepared for INSERT INTO clause - this happens in String>>#glorpPrintSQLOn:. Not so easy to parametrize at a first glance. Maybe with WideString>>#glorpPrintSQLOn: but I'm not sure how other platforms would be happy. An interesting observation about UTF-8 and SQL Server 2019 is here: https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/.

Hello Tomaz. I am the author of that UTF-8 in SQL Server 2019 post you linked to, and I think I can provide some context here.

UTF-8 in SQL Server is not something you need to worry about, nor is it really an option at this point. SQL Server 2019 is still in beta, and even when officially released it will be some time before a lot of folks are using it. And for some it will take years considering that some places are still using SQL Server 2000, or 2005.
If you are generating INSERT and UPDATE statements, and those include string literals, then you need to use the encoding that is expected by SQL Server. When string literals are not prefixed with an upper-case "N" they are interpreted in the 8-bit code page associated with the default collation of the database that you are currently using (likely the database that you connected to, if specified in the connection string, or the default database of the login that you connected as). If someone is using SQL Server 2019 (or newer, someday) then that might be UTF-8, but otherwise it is some Extended ASCII character set. If you are sending over Unicode data, then you can't rely on the target database using a "_UTF8" collation (as well as using it for the columns). In which case, unfortunately, you really do need to prefix all string literals with an upper-case "N". However, fortunately, for those systems that are using a "_UTF8" collation for the columns, passing in the literals prefixed with an upper-case "N" is compatible, so you don't need to have your code distinguish between the two.
NVARCHAR data (i.e. string literals prefixed with an upper-case "N", and NVARCHAR / NCHAR / NTEXT columns and variables), and the T-SQL query batch itself, are encoded as UTF-16 Little Endian.
Don't forget to escape embedded single-quotes ( ' ) as two single-quotes ( '' ) :-)

Take care, Solomon...

_____________________________________________________________

SQL# — https://SQLsharp.com/ ( SQLCLR library of over 340 Functions and Procedures )

Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )

Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )

Info sites — Collations • Module Signing • SQLCLR

_____________________________________________________________

Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.

_____________________________________________________________

SQL# — https://SQLsharp.com/ ( SQLCLR library of over 340 Functions and Procedures )

Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )

Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )

Info sites — Collations • Module Signing • SQLCLR

_____________________________________________________________

eftomi

Re: NVARCHAR and Glorp

Solomon, thank you for your additional insights. Of course we'll have to wait
for SQL Server 2019 to be adopted - tactful devops wait for SP2 of v4 to
change from v2 to v3 :-)

Indeed I solved the problem in NVARCHAR / NCHAR / NTEXT by adding the method
WideString>>#glorpPrintSQLOn: which is almost the same as
String>>#glorpPrintSQLOn: except that it checks whether the database
platform understands a message #prefixForUnicode. Then I subclassed
SQLServerPlatform as #SQLServerPlatformN with this constant which returns
$N.

To use this, one has to:
- define the field type in DescriptorSystem as 'platform nvarchar width: xx'
(or nchar, ntext)
- set the platform to SQLServerPlatformN (N for 'National Language Character
Set').

For now, I subclassed SQLServerPlatform if there will be any other additions
to SQL Server Glorp platform. In the future the constant should probably
become just an "ordinary" extension to Glorp's SQLServerPlatform.

These changes are now available at https://github.com/eftomi/pharo-ado.

Best wishes,
Tomaz

--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html