Hi Esteban, Guille and Everyone,
I haven't looked at using FFI much, however it is easy to imagine that different file encoding rules on different platforms will make writing FFI calls more difficult, i.e. some of the different formats are: - OSX uses Mac specific decomposed UTF8 encoding - Windows uses Wide Strings (16 bit Unicode characters) - Linux allows pretty much anything, but precomposed UTF8 is common Believe it or not, I'm still working on getting the FileAttributesPlugin working (file name encoding on Windows being the latest issue - the tests in Pharo need to be extended). Would it be useful for future FFI work to have primitives available which convert file names to and from the various platform specific formats? (Linux is basically a no-op, and Windows could be written in-image, but OSX requires the platform routines to be called). Cheers, Alistair |
On Mon, Sep 17, 2018 at 6:52 PM Alistair Grant <[hidden email]> wrote: Hi Esteban, Guille and Everyone, Well not really (from my point of view :)) From the point of view of the FFI call an encoded string is just a bunch of bytes. FFI does not do any interpretation of them. i.e. some of the different formats are: At the image side, we could have an strategy that, depending on the OS, could encode in one encoding or another, or even not encode at all. Believe it or not, I'm still working on getting the I believe you, don't worry ^^. Would it be useful for future FFI work to have primitives available Maybe... Are the OSX routines exposed as C functions (that we can call through FFI) or they are objective-C methods/functions (that are more complicated to map)? Thanks Alistair! |
Guillermo Polito wrote
> On Mon, Sep 17, 2018 at 6:52 PM Alistair Grant < > akgrant0710@ > > > wrote: > >> Hi Esteban, Guille and Everyone, >> >> I haven't looked at using FFI much, however it is easy to imagine that >> different file encoding rules on different platforms will make writing >> FFI calls more difficult, > > > Well not really (from my point of view :)) > From the point of view of the FFI call an encoded string is just a bunch > of > bytes. FFI does not do any interpretation of them. It *would* be pretty handy for adding some auto-conversion into the marshaller based on parameter encoding options though... (other than filename, could be done in smalltalk using exisiting encoders) self ffiCall: #(bool saveContentsToFile(String fileName, String contents)) options: #(+stringEncodings( fileName return , platformAPI contents) (And yes, I've probably badly mangled the options syntax) Is much less verbose than having to manually convert Strings to the proper platform Unicode encodings before calling. Depends a bit on whether the primitive argument is Byte/Widestrings(latin1/utf32), or if it accepts only utf8 bytes and one has to convert first anyways. It's not like this isn't a pain point, there are plenty of currently used API's that are broken if you try to use non-ascii. Cheers, Henry -- Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html |
In reply to this post by Guillermo Polito
Guillermo Polito wrote
> On Mon, Sep 17, 2018 at 6:52 PM Alistair Grant < > akgrant0710@ > > > wrote: > > >> Would it be useful for future FFI work to have primitives available >> which convert file names to and from the various platform specific >> formats? (Linux is basically a no-op, and Windows could be written >> in-image, but OSX requires the platform routines to be called). >> > > Maybe... Are the OSX routines exposed as C functions (that we can call > through FFI) or they are objective-C methods/functions (that are more > complicated to map)? > > Thanks Alistair! +1. From the image point of view, the non-standard normal form used on OSX is the biggest issue. If it's available through FFI, the platform-specific String encoding options I mentioned previously could be implemented entirely in the image. If there are extra hoops to jump though, like having to provide utf8 to said FFI function, it might still be worth it for the reduced performance overhead. Cheers, Henry -- Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html |
In reply to this post by Henrik Sperre Johansen
On Tue, Sep 18, 2018 at 10:43 AM Henrik Sperre Johansen <[hidden email]> wrote: Guillermo Polito wrote Well, I like this idea. (And yes, I've probably badly mangled the options syntax) Yes, but I think this may be because in general people tend to not know how encodings work... (even myself I don't feel I know enough :)) But this makes me think that we should make encoding explicit? Maybe we should force people to specify an encoding if they specify a callout using a string. And then, either they specify it at the level of the callout, or at the level of the library (like setting a default encoding for all strings). Because this raises also the question of what is the default encoding? And I'd say that in there is no satisfactory default encoding... |
In reply to this post by Henrik Sperre Johansen
> On 18 Sep 2018, at 10:42, Henrik Sperre Johansen <[hidden email]> wrote: > > Guillermo Polito wrote >> On Mon, Sep 17, 2018 at 6:52 PM Alistair Grant < > >> akgrant0710@ > >> > >> wrote: >> >>> Hi Esteban, Guille and Everyone, >>> >>> I haven't looked at using FFI much, however it is easy to imagine that >>> different file encoding rules on different platforms will make writing >>> FFI calls more difficult, >> >> >> Well not really (from my point of view :)) >> From the point of view of the FFI call an encoded string is just a bunch >> of >> bytes. FFI does not do any interpretation of them. > > It *would* be pretty handy for adding some auto-conversion into the > marshaller based on parameter encoding options though... (other than > filename, could be done in smalltalk using exisiting encoders) > > self > ffiCall: #(bool saveContentsToFile(String fileName, String contents)) > options: #(+stringEncodings( fileName return , platformAPI contents) This is cool. What I do not like is to rely on primitives to do that encoding. This should be in image… using FFI if needed (this is all because we want to rely less and less on plugins :P) Esteban > > (And yes, I've probably badly mangled the options syntax) > > Is much less verbose than having to manually convert Strings to the proper > platform Unicode encodings before calling. > Depends a bit on whether the primitive argument is > Byte/Widestrings(latin1/utf32), or if it accepts only utf8 bytes and one has > to convert first anyways. > > It's not like this isn't a pain point, there are plenty of currently used > API's that are broken if you try to use non-ascii. > > Cheers, > Henry > > > > > -- > Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html > |
In reply to this post by Guillermo Polito
Yup, explicit please. Nothing hide behind the carpet :)
You can have some global FFI settings (I was thinking on adding some global options settings for FFI in general, btw) and even “library based settings”, to simplify. Esteban
|
Hi Guille, Esteban and Henry,
Thanks for your replies. On Tue, Sep 18, 2018 at 10:09:02AM +0200, Guillermo Polito wrote: > > > On Mon, Sep 17, 2018 at 6:52 PM Alistair Grant <[hidden email]> wrote: > > Hi Esteban, Guille and Everyone, > > I haven't looked at using FFI much, however it is easy to imagine that > different file encoding rules on different platforms will make writing > FFI calls more difficult, > > > Well not really (from my point of view :)) > From the point of view of the FFI call an encoded string is just a bunch of > bytes. FFI does not do any interpretation of them. Right, but getting the appropriately encoded bunch of bytes is the issue. :-) > i.e. some of the different formats are: > > - OSX uses Mac specific decomposed UTF8 encoding > - Windows uses Wide Strings (16 bit Unicode characters) > - Linux allows pretty much anything, but precomposed UTF8 is common > > > > At the image side, we could have an strategy that, depending on the OS, could > encode in one encoding or another, or even not encode at all. > > > Believe it or not, I'm still working on getting the > FileAttributesPlugin working (file name encoding on Windows being the > latest issue - the tests in Pharo need to be extended). > > > I believe you, don't worry ^^. > > > Would it be useful for future FFI work to have primitives available > which convert file names to and from the various platform specific > formats? (Linux is basically a no-op, and Windows could be written > in-image, but OSX requires the platform routines to be called). > > > Maybe... Are the OSX routines exposed as C functions (that we can call through > FFI) or they are objective-C methods/functions (that are more complicated to > map)? The OSX routines are exposed as C functions (and available as Objective-C methods), see convertChars() in platforms/unix/vm/sqUnixCharConv.c. On Tue, Sep 18, 2018 at 11:21:41AM +0200, Esteban Lorenzano wrote: > > self > > ffiCall: #(bool saveContentsToFile(String fileName, String contents)) > > options: #(+stringEncodings( fileName return , platformAPI contents) > > This is cool. > What I do not like is to rely on primitives to do that encoding. > This should be in image??? using FFI if needed (this is all because we > want to rely less and less on plugins :P) I realise of course that this could all be done in FFI, and I agree with all Estaban's arguments in favour of FFI, my main motivation was that the code is already in the VM, and to avoid code duplication with the obvious benefit that if a bug is fixed it will apply everywhere. On Tue, Sep 18, 2018 at 11:23:56AM +0200, Esteban Lorenzano wrote: > > > On 18 Sep 2018, at 11:04, Guillermo Polito <[hidden email]> > wrote: > > > > On Tue, Sep 18, 2018 at 10:43 AM Henrik Sperre Johansen < > [hidden email]> wrote: > > It *would* be pretty handy for adding some auto-conversion into the > marshaller based on parameter encoding options though... (other than > filename, could be done in smalltalk using exisiting encoders) > > self > ffiCall: #(bool saveContentsToFile(String fileName, String > contents)) > options: #(+stringEncodings( fileName return , platformAPI > contents) > > > Well, I like this idea. > > > (And yes, I've probably badly mangled the options syntax) > > Is much less verbose than having to manually convert Strings to the > proper > platform Unicode encodings before calling. > Depends a bit on whether the primitive argument is > Byte/Widestrings(latin1/utf32), or if it accepts only utf8 bytes and > one has > to convert first anyways. > > It's not like this isn't a pain point, there are plenty of currently > used > API's that are broken if you try to use non-ascii. > > > Yes, but I think this may be because in general people tend to not know how > encodings work... (even myself I don't feel I know enough :)) > But this makes me think that we should make encoding explicit? > > > Yup, explicit please. Nothing hide behind the carpet :) > > > > Maybe we should force people to specify an encoding if they specify a > callout using a string. > > And then, either they specify it at the level of the callout, or at the > level of the library (like setting a default encoding for all strings). > > > > You can have some global FFI settings (I was thinking on adding some global > options settings for FFI in general, btw) and even ?library based settings?, to > simplify. > > Esteban > > > > Because this raises also the question of what is the default encoding? > And I'd say that in there is no satisfactory default encoding... I'll defer to Sven every time when it comes to character encoding, but my understanding is that the only platform that has consistent encoding rules is OSX, which uses the platform specific decomposed UTF8. Both Windows and Linux use precomposed UTF8, but other character encodings are possible (particularly for older files). So we certainly shouldn't make the encoding hard-coded. UTF8 as the default encoding I think does make sense (this is what FilePlugin currently uses). Cheers, Alistair |
On Tue, Sep 18, 2018 at 4:40 PM Alistair Grant <[hidden email]> wrote: > I haven't looked at using FFI much, however it is easy to imagine that Yes, the thing is that this would require some new extensions in uFFI to support encodings. The good point of that is that that would have a positive impact in **ALL* FFI bindings using strings (by making explicit to people that they should care about encodings :)). > Maybe... Are the OSX routines exposed as C functions (that we can call through Nice! On Tue, Sep 18, 2018 at 11:21:41AM +0200, Esteban Lorenzano wrote: Yeh. At the end it's a matter of debugging cycles. Imagine making the "compile-restart" steps that you're facing while changing the plugin almost negligible in the "change-compile-restart-test" loop :). |
On Wed, 19 Sep 2018 at 10:26, Guillermo Polito
<[hidden email]> wrote: > > On Tue, Sep 18, 2018 at 4:40 PM Alistair Grant <[hidden email]> wrote: >> >> I realise of course that this could all be done in FFI, and I agree with >> all Estaban's arguments in favour of FFI, my main motivation was that >> the code is already in the VM, and to avoid code duplication with the >> obvious benefit that if a bug is fixed it will apply everywhere. > > > Yeh. At the end it's a matter of debugging cycles. > Imagine making the "compile-restart" steps that you're facing while changing the plugin almost negligible in the "change-compile-restart-test" loop :). This is true if the code only resides in the image, but in this case the code won't be going away from the VM any time soon. Anyway, for whoever does implement the code for FFI the option is always there. Thanks to everyone for their replies! Cheers, Alistair |
In reply to this post by Henrik Sperre Johansen
Hi Henry, On Tue, Sep 18, 2018 at 1:43 AM Henrik Sperre Johansen <[hidden email]> wrote: Guillermo Polito wrote Why not go for some generic escape sequence that can inject Smalltalk code into the marshaling? Right now e.g. primExport: aName value: aValue ^ self ffiCall: #(void moz_preferences_set_bool (short* aName, bool aValue)) is compiled as primExport: arg1 value: arg2 | tmp1 tmp2 | '<an unprintable nonliteral value>' invokeWithArguments: {(tmp2 := arg1 packToArity: 1). arg2} where '<an unprintable nonliteral value>' is the ExternalFunction object (it could usefully print itself ass a literal and then decompilation would be meaningful; there is already code in the Squeak FFI repository). Let's say one added {}'s as characters that can't ever appear in C parameter lists (of course and alas []'s can because of arrays)≥ Then you could perhaps write primExport: aName value: aValue ^ self ffiCall: #(void moz_preferences_set_bool ( { short* aName } asUTF8String, bool aValue)) and have that generate a send of asUTF8String to arg1 or tmp2. One could surround the whole thing to apply a coercion to the return value, but there's no need because one can write e.g. primExport: aName value: aValue ^(self ffiCall: #(void moz_preferences_set_bool ( { short* aName } asUTF8String, bool aValue))) fromUTF8String So then there would be a generic mechanism for in jetting Smalltalk code into the marshaling and one could develop the string encoding support independently from the FFI. The options syntax however requires parsing support, more documentation, and constant extension to support new facilities, etc. Is much less verbose than having to manually convert Strings to the proper _,,,^..^,,,_ best, Eliot |
Free forum by Nabble | Edit this page |