FFI Plugin | Auto-conversion of char* return value into String considered harmful

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

FFI Plugin | Auto-conversion of char* return value into String considered harmful

marcel.taeumel
 
Hi all!

The FFI plugin automatically converts "char*" return values into a Smalltalk string when returning from the FFI call.

I would rather leave this conversion to the image side because you have to do it anyway when interpreting external structures. See ExternalData >> #fromCString. And because it can be dangerous. Note that I do like automatic String-to-char* conversion when making an FFI call. Just not the other way around.

See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.

What are your thoughts on this matter?

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

marcel.taeumel
 
Hi all,

I just found out that you can avoid this automatic interpretation as C string if you use a type alias to char*. This could at least help if an external library forgets the NULL character for that returned char* and makes your VM crash. :-)

Best,
Marcel

Am 10.06.2020 12:35:27 schrieb Marcel Taeumel <[hidden email]>:

Hi all!

The FFI plugin automatically converts "char*" return values into a Smalltalk string when returning from the FFI call.

I would rather leave this conversion to the image side because you have to do it anyway when interpreting external structures. See ExternalData >> #fromCString. And because it can be dangerous. Note that I do like automatic String-to-char* conversion when making an FFI call. Just not the other way around.

See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.

What are your thoughts on this matter?

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

Nicolas Cellier
 
Maybe there is also the option of declaring it byte *?
The convention would be char* <=> Null terminated string, byte* <=> uninterpreted bytes

Le mer. 10 juin 2020 à 15:14, Marcel Taeumel <[hidden email]> a écrit :
 
Hi all,

I just found out that you can avoid this automatic interpretation as C string if you use a type alias to char*. This could at least help if an external library forgets the NULL character for that returned char* and makes your VM crash. :-)

Best,
Marcel

Am 10.06.2020 12:35:27 schrieb Marcel Taeumel <[hidden email]>:

Hi all!

The FFI plugin automatically converts "char*" return values into a Smalltalk string when returning from the FFI call.

I would rather leave this conversion to the image side because you have to do it anyway when interpreting external structures. See ExternalData >> #fromCString. And because it can be dangerous. Note that I do like automatic String-to-char* conversion when making an FFI call. Just not the other way around.

See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.

What are your thoughts on this matter?

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

Tobias Pape
 

> On 10.06.2020, at 17:41, Nicolas Cellier <[hidden email]> wrote:
>
> Maybe there is also the option of declaring it byte *?
> The convention would be char* <=> Null terminated string, byte* <=> uninterpreted bytes

uint8_t?

>
> Le mer. 10 juin 2020 à 15:14, Marcel Taeumel <[hidden email]> a écrit :
>  
> Hi all,
>
> I just found out that you can avoid this automatic interpretation as C string if you use a type alias to char*. This could at least help if an external library forgets the NULL character for that returned char* and makes your VM crash. :-)
>
> Best,
> Marcel
>> Am 10.06.2020 12:35:27 schrieb Marcel Taeumel <[hidden email]>:
>>
>> Hi all!
>>
>> The FFI plugin automatically converts "char*" return values into a Smalltalk string when returning from the FFI call.
>>
>> I would rather leave this conversion to the image side because you have to do it anyway when interpreting external structures. See ExternalData >> #fromCString. And because it can be dangerous. Note that I do like automatic String-to-char* conversion when making an FFI call. Just not the other way around.
>>
>> See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.
>>
>> What are your thoughts on this matter?
>>
>> Best,
>> Marcel


Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

marcel.taeumel
In reply to this post by Nicolas Cellier
 
Hi Nicolas.

The convention would be char* <=> Null terminated string, byte* <=> uninterpreted bytes

Ah! Good to know. So, for byte* there would be an additional field (in a struct) for the size?

Best,
Marcel

Am 10.06.2020 17:41:33 schrieb Nicolas Cellier <[hidden email]>:

Maybe there is also the option of declaring it byte *?
The convention would be char* <=> Null terminated string, byte* <=>
uninterpreted bytes

Le mer. 10 juin 2020 à 15:14, Marcel Taeumel a
écrit :

>
> Hi all,
>
> I just found out that you can avoid this automatic interpretation as C
> string if you use a type alias to char*. This could at least help if an
> external library forgets the NULL character for that returned char* and
> makes your VM crash. :-)
>
> Best,
> Marcel
>
> Am 10.06.2020 12:35:27 schrieb Marcel Taeumel :
> Hi all!
>
> The FFI plugin automatically converts "char*" return values into a
> Smalltalk string when returning from the FFI call.
>
> I would rather leave this conversion to the image side because you have to
> do it anyway when interpreting external structures. See ExternalData >>
> #fromCString. And because it can be dangerous. Note that I do like
> automatic String-to-char* conversion when making an FFI call. Just not the
> other way around.
>
> See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.
>
> What are your thoughts on this matter?
>
> Best,
> Marcel
>
>
Maybe there is also the option of declaring it byte *?
The convention would be char* <=> Null terminated string, byte* <=> uninterpreted bytes

Le mer. 10 juin 2020 à 15:14, Marcel Taeumel <[hidden email]> a écrit :
 

Hi all,

I just found out that you can avoid this automatic interpretation as C string if you use a type alias to char*. This could at least help if an external library forgets the NULL character for that returned char* and makes your VM crash. :-)

Best,
Marcel

Am 10.06.2020 12:35:27 schrieb Marcel Taeumel <[hidden email]>:

Hi all!

The FFI plugin automatically converts "char*" return values into a Smalltalk string when returning from the FFI call.

I would rather leave this conversion to the image side because you have to do it anyway when interpreting external structures. See ExternalData >> #fromCString. And because it can be dangerous. Note that I do like automatic String-to-char* conversion when making an FFI call. Just not the other way around.

See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.

What are your thoughts on this matter?

Best,
Marcel


Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

Eliot Miranda-2
In reply to this post by marcel.taeumel
 
Hi Marcel,

On Wed, Jun 10, 2020 at 3:37 AM Marcel Taeumel <[hidden email]> wrote:
 
Hi all!

The FFI plugin automatically converts "char*" return values into a Smalltalk string when returning from the FFI call.

I would rather leave this conversion to the image side because you have to do it anyway when interpreting external structures. See ExternalData >> #fromCString. And because it can be dangerous. Note that I do like automatic String-to-char* conversion when making an FFI call. Just not the other way around.

See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.

What are your thoughts on this matter?

Agreed.  One issue is how to make the behaviour optional to keep backwards compatibility. Another is efficiency.  If it turn out that the set of useful conversions is small we could parameterise the plugin wth those conversions, still have it do the relevant conversion.

For example, the class of the container for the result could be somehow encoded in the ExternalLibraryFunction's flags inst var.  That might also give us backwards compatibility because very few bitsa of flags are used.  The flags var simply defines the relevant call type: C: (0) or apicall: (1) in the least significant bit and whether the call is threaded or not in bit 8 (256).  So we could use, say, bits 16,17 or bits 6,7, to encode the string class if the function returns a string. 0 -> ByteString, 1 -> DoubleByteString (unused or undefined?), 2 -> WideString, 3 -> return pointer.

Best,
Marcel


--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

marcel.taeumel
 
Hi Eliot.

For example, the class of the container for the result could be somehow
encoded in the ExternalLibraryFunction's flags inst var.

Since FFICallTypesMask is 2r11111111, I suggest bits > 8. Bits 16 and 17 sound good to leave room for other call flags. Or is the encoding of the return format part of the "call type"? I wouldn't think so...

Let's say...

FFICallTypesMask := 2r11111111.
FFICallTypeCDecl := 0.
FFICallTypeApi := 1.

FFICallFlagMask := 2r11111111 << 8.
FFICallFlagThreaded := 1 << 8.

FFICallReturnTypeMask := 2r11111111 << 16.
FFICallReturnByteString := 0.
FFICallReturnDoubleByteString := 1 << 16.
FFICallReturnWideString := 2 << 16.
FFICallReturnExternalData := 3 << 16. "Even if return type is not char* ... for later override?"

And may this, too:

FFICallReturnExternalDataArray := 7 << 16. "Assume char** or int** etc."

What do you think? I plan to add the in-image part for ExternalDataArray soon. I found a simple way to manage pointer-to-pointer types with little adjustment in FFI-Kernel. :-)

Hmmm... not sure about FFICallReturnExternalData, though. Seems to be the most generic case where interpretation is completely in the image. Even a possible lift to ExternalDataArray for pointer arrays. Maybe...

FFICallReturnExternalData := 255 << 16. "= FFICallReturnTypeMask"
FFICallReturnExternalDataArray := 128 << 16. "= highest bit in mask"

That would leave either room for "UTF-64" encoding :-D as 3 << 16.

Best,
Marcel

Am 12.06.2020 04:52:06 schrieb Eliot Miranda <[hidden email]>:

Hi Marcel,

On Wed, Jun 10, 2020 at 3:37 AM Marcel Taeumel
wrote:

>
> Hi all!
>
> The FFI plugin automatically converts "char*" return values into a
> Smalltalk string when returning from the FFI call.
>
> I would rather leave this conversion to the image side because you have to
> do it anyway when interpreting external structures. See ExternalData >>
> #fromCString. And because it can be dangerous. Note that I do like
> automatic String-to-char* conversion when making an FFI call. Just not the
> other way around.
>
> See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.
>
> What are your thoughts on this matter?
>

Agreed. One issue is how to make the behaviour optional to keep
backwards compatibility. Another is efficiency. If it turn out that the
set of useful conversions is small we could parameterise the plugin wth
those conversions, still have it do the relevant conversion.

For example, the class of the container for the result could be somehow
encoded in the ExternalLibraryFunction's flags inst var. That might also
give us backwards compatibility because very few bitsa of flags are used.
The flags var simply defines the relevant call type: C: (0) or apicall: (1)
in the least significant bit and whether the call is threaded or not in bit
8 (256). So we could use, say, bits 16,17 or bits 6,7, to encode the
string class if the function returns a string. 0 -> ByteString, 1 ->
DoubleByteString (unused or undefined?), 2 -> WideString, 3 -> return
pointer.

Best,
> Marcel
>


--
_,,,^..^,,,_
best, Eliot
Hi Marcel,

On Wed, Jun 10, 2020 at 3:37 AM Marcel Taeumel <[hidden email]> wrote:
 
Hi all!

The FFI plugin automatically converts "char*" return values into a Smalltalk string when returning from the FFI call.

I would rather leave this conversion to the image side because you have to do it anyway when interpreting external structures. See ExternalData >> #fromCString. And because it can be dangerous. Note that I do like automatic String-to-char* conversion when making an FFI call. Just not the other way around.

See (Threaded)FFIPlugin >> #ffiReturnCStringFrom:.

What are your thoughts on this matter?

Agreed.  One issue is how to make the behaviour optional to keep backwards compatibility. Another is efficiency.  If it turn out that the set of useful conversions is small we could parameterise the plugin wth those conversions, still have it do the relevant conversion.

For example, the class of the container for the result could be somehow encoded in the ExternalLibraryFunction's flags inst var.  That might also give us backwards compatibility because very few bitsa of flags are used.  The flags var simply defines the relevant call type: C: (0) or apicall: (1) in the least significant bit and whether the call is threaded or not in bit 8 (256).  So we could use, say, bits 16,17 or bits 6,7, to encode the string class if the function returns a string. 0 -> ByteString, 1 -> DoubleByteString (unused or undefined?), 2 -> WideString, 3 -> return pointer.

Best,
Marcel


--
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

Jakob Reschke-2
 
Am Fr., 12. Juni 2020 um 18:47 Uhr schrieb Marcel Taeumel
<[hidden email]>:
>
> That would leave either room for "UTF-64" encoding :-D as 3 << 16.
>

Luckily we won't need that for any while soon because there are only
16r10FFFD codepoints many of which are still unused.
Reply | Threaded
Open this post in threaded view
|

Re: FFI Plugin | Auto-conversion of char* return value into String considered harmful

marcel.taeumel
 
FFICallReturnExternalDataArray

Update: I don't think anymore that such a specialization of ExternalData is necessary. See:



Best,
Marcel

Am 12.06.2020 21:36:38 schrieb Jakob Reschke <[hidden email]>:


Am Fr., 12. Juni 2020 um 18:47 Uhr schrieb Marcel Taeumel
:
>
> That would leave either room for "UTF-64" encoding :-D as 3 <>
>

Luckily we won't need that for any while soon because there are only
16r10FFFD codepoints many of which are still unused.