A problem with FFI is that if a callout segfaults, all of memory including that of the Image is suspect, and execution of the Image terminates. Occasionally I hunt around hoping to find technology to mitigate that problem. Maybe this time in I found something... Memory Protection Keys [1] Perhaps these could ensure Image memory safe when an FFI callout segfaults. IIUC the main problem with protecting Image memory on every FFI callout is the time it would take update the flags on every page of Image memory. Would being able to change the protection of a massive number of pages with one syscall make it feasible to wrap them around FFI callouts? This may be useful at least where the FFI use is more about reuse of existing functionality than about performance. Or at least useful while someone is learning/experimenting with FFI for the first time or while becoming familiar with some external library. Further info at [2] & [3]. cheers -ben [1] https://lwn.net/Articles/643797/ [2] http://man7.org/linux/man-pages/man7/pkeys.7.html [3] https://lwn.net/Articles/689395/ |
Hi Ben, > On Aug 4, 2018, at 8:40 AM, Ben Coman <[hidden email]> wrote: > > > A problem with FFI is that if a callout segfaults, all of memory > including that of the Image is suspect, and execution of the Image terminates. > > Occasionally I hunt around hoping to find technology to mitigate that problem. > Maybe this time in I found something... Memory Protection Keys [1] > Perhaps these could ensure Image memory safe when an FFI callout segfaults. > > IIUC the main problem with protecting Image memory on every FFI callout > is the time it would take update the flags on every page of Image memory. > Would being able to change the protection of a massive number of pages > with one syscall make it feasible to wrap them around FFI callouts? > > This may be useful at least where the FFI use is more about reuse of > existing functionality than about performance. > Or at least useful while someone is learning/experimenting with FFI for > the first time or while becoming familiar with some external library. > Further info at [2] & [3]. I think there’s a much simpler improvement that doesn’t go this far. I implemented it in VisualWorks and it’s been in production for more than a decade. It should be easy to add to Cog. The idea is simply to add a flag that tracks if the VM is in an FFI call or not and to test this flag in the VM’s exception handlers for SIGBUS, SIGILL, SIGSEGV and their equivalents on Windows. The exception handlers then respond when in an FFI call by failing the FFI call primitive, answering a primitive fail code that includes the exception information. Recently we extended Cog’s failure codes to allow a structured object (I font have the details handy; I’ll check soon). In this case we need a pc and/or address and an exception code. Would this approach satisfy you? > > cheers -ben > > > [1] https://lwn.net/Articles/643797/ > [2] http://man7.org/linux/man-pages/man7/pkeys.7.html > [3] https://lwn.net/Articles/689395/ |
On 5 August 2018 at 23:10, Eliot Miranda <[hidden email]> wrote: > > Hi Ben, > > >> On Aug 4, 2018, at 8:40 AM, Ben Coman <[hidden email]> wrote: >> >> >> A problem with FFI is that if a callout segfaults, all of memory >> including that of the Image is suspect, and execution of the Image terminates. >> >> Occasionally I hunt around hoping to find technology to mitigate that problem. >> Maybe this time in I found something... Memory Protection Keys [1] >> Perhaps these could ensure Image memory safe when an FFI callout segfaults. >> >> IIUC the main problem with protecting Image memory on every FFI callout >> is the time it would take update the flags on every page of Image memory. >> Would being able to change the protection of a massive number of pages >> with one syscall make it feasible to wrap them around FFI callouts? >> >> This may be useful at least where the FFI use is more about reuse of >> existing functionality than about performance. >> Or at least useful while someone is learning/experimenting with FFI for >> the first time or while becoming familiar with some external library. >> Further info at [2] & [3]. > > I think there’s a much simpler improvement that doesn’t go this far. I implemented it in VisualWorks and it’s been in production for more than a decade. It should be easy to add to Cog. > > The idea is simply to add a flag that tracks if the VM is in an FFI call or not and to test this flag in the VM’s exception handlers for SIGBUS, SIGILL, SIGSEGV and their equivalents on Windows. The exception handlers then respond when in an FFI call by failing the FFI call primitive, answering a primitive fail code that includes the exception information. Recently we extended Cog’s failure codes to allow a structured object (I font have the details handy; I’ll check soon). In this case we need a pc and/or address and an exception code. > > Would this approach satisfy you? That sounds good. Although the argument I've seen is that a memory access error means you "cant recover because you don't know what may have been corrupted" I think its worthwhile to be optimistic that the Image may last a bit longer to get more information about what call from the Image invoked the FFI failure. And if you've been notified (e.g. via Growl message) you can still take steps to move to a new Image if the current one is suspect. I guess you'd want to be able to turn it off for native level debugging, and for critical production applications where its judged better to crash than continue. Also, the approach you suggest would be a pre-requisite for what I suggested anyway, and make it easier to later experiment with MPKs. Let me know what I can do to help (probably more capable on the testing side). cheers -ben >> [1] https://lwn.net/Articles/643797/ >> [2] http://man7.org/linux/man-pages/man7/pkeys.7.html >> [3] https://lwn.net/Articles/689395/ |
Surely the really safe way to do FFI type stuff is to have a separate memory space? The trick is how one achieves that. I can think of a few ways to do that, some of which might even work... - remapping (temporarily) the MMU entries to make only the directly involved FFI data area(s) visible - similar but perhaps just making everywhere else read-only - actually have a separate memory space that is shared to the vm - a completely separate process - hell, a completely separate computer! tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful Latin Phrases:- Fac me cocleario vomere! = Gag me with a spoon! |
Hi Tim, > On Aug 6, 2018, at 10:32 AM, tim Rowledge <[hidden email]> wrote: > > > Surely the really safe way to do FFI type stuff is to have a separate memory space? The trick is how one achieves that. > > I can think of a few ways to do that, some of which might even work... > - remapping (temporarily) the MMU entries to make only the directly involved FFI data area(s) visible > - similar but perhaps just making everywhere else read-only > - actually have a separate memory space that is shared to the vm > - a completely separate process > - hell, a completely separate computer! While nice in theory this approach is useless in practice. Many APIs we want to use are fundamentally about resources shared on some context (process, thread, API instance, etc) and involve sharing between the Smalltalk system and the external library. Given that the relevant context is typically not shareable across different memory spaces then the approach is not generally useful. > > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Useful Latin Phrases:- Fac me cocleario vomere! = Gag me with a spoon! > > |
In reply to this post by Ben Coman
Hi Ben, > On Aug 5, 2018, at 6:34 PM, Ben Coman <[hidden email]> wrote: > > >> On 5 August 2018 at 23:10, Eliot Miranda <[hidden email]> wrote: >> >> Hi Ben, >> >> >>> On Aug 4, 2018, at 8:40 AM, Ben Coman <[hidden email]> wrote: >>> >>> >>> A problem with FFI is that if a callout segfaults, all of memory >>> including that of the Image is suspect, and execution of the Image terminates. >>> >>> Occasionally I hunt around hoping to find technology to mitigate that problem. >>> Maybe this time in I found something... Memory Protection Keys [1] >>> Perhaps these could ensure Image memory safe when an FFI callout segfaults. >>> >>> IIUC the main problem with protecting Image memory on every FFI callout >>> is the time it would take update the flags on every page of Image memory. >>> Would being able to change the protection of a massive number of pages >>> with one syscall make it feasible to wrap them around FFI callouts? >>> >>> This may be useful at least where the FFI use is more about reuse of >>> existing functionality than about performance. >>> Or at least useful while someone is learning/experimenting with FFI for >>> the first time or while becoming familiar with some external library. >>> Further info at [2] & [3]. >> >> I think there’s a much simpler improvement that doesn’t go this far. I implemented it in VisualWorks and it’s been in production for more than a decade. It should be easy to add to Cog. >> >> The idea is simply to add a flag that tracks if the VM is in an FFI call or not and to test this flag in the VM’s exception handlers for SIGBUS, SIGILL, SIGSEGV and their equivalents on Windows. The exception handlers then respond when in an FFI call by failing the FFI call primitive, answering a primitive fail code that includes the exception information. Recently we extended Cog’s failure codes to allow a structured object (I font have the details handy; I’ll check soon). In this case we need a pc and/or address and an exception code. >> >> Would this approach satisfy you? > > That sounds good. Although the argument I've seen is that a memory > access error > means you "cant recover because you don't know what may have been corrupted" > I think its worthwhile to be optimistic that the Image may last a bit > longer to get more information about what call from the Image invoked > the FFI failure. > And if you've been notified (e.g. via Growl message) you can still > take steps to move to a new Image if the current one is suspect. While it’s possible that an FFI call could damage the Smalltalk heap and VM state it’s often not the case do one wants to be able to at least reap the error code and hence identify where the error occurred, inspect the arguments to the call, etc. It’s hence about having enough functionality to gather what information is available from outside the call, not about being able to continue for a long time after. > I guess you'd want to be able to turn it off for native level debugging, > and for critical production applications where its judged better to > crash than continue. Perhaps, but debuggers like gdb often give one the control one needs to stop at an exception before it is delivered. > Also, the approach you suggest would be a pre-requisite for what I > suggested anyway, > and make it easier to later experiment with MPKs. Cool. > Let me know what I can do to help (probably more capable on the testing side). Will do. If you’re happy with C programming and the simulator part of it is adding a global flag variable, setting and u setting it in FFI calls and callbacks, and then responding to the flag in the exception handler. The tricky bit is arranging failure. That I can work on when time allows. > > cheers -ben > >>> [1] https://lwn.net/Articles/643797/ >>> [2] http://man7.org/linux/man-pages/man7/pkeys.7.html >>> [3] https://lwn.net/Articles/689395/ |
In reply to this post by Eliot Miranda-2
> On 06-08-2018, at 10:42 AM, Eliot Miranda <[hidden email]> wrote: >> On Aug 6, 2018, at 10:32 AM, tim Rowledge <[hidden email]> wrote: >> >> >> Surely the really safe way to do FFI type stuff is to have a separate memory space? The trick is how one achieves that. >> >> I can think of a few ways to do that, some of which might even work... >> - remapping (temporarily) the MMU entries to make only the directly involved FFI data area(s) visible >> - similar but perhaps just making everywhere else read-only >> - actually have a separate memory space that is shared to the vm >> - a completely separate process >> - hell, a completely separate computer! > > While nice in theory this approach is useless in practice. Many APIs we want to use are fundamentally about resources shared on some context (process, thread, API instance, etc) and involve sharing between the Smalltalk system and the external library. Given that the relevant context is typically not shareable across different memory spaces then the approach is not generally useful. > I get the problems. The interesting thing is that the problem is not unique to us, not by a long way. That means that we may have a chance of it being sorted (for certain definitions) as part of making OSs more secure. The biggest cost I see is the likelihood of it involving copying data to 'secured' locations where the FFI stuff can see it and the OS can manage it. But then we have the pinning stuff that can move an object to a particular space (IIRC) so maybe that wouldn't be too terrible a cost? And how about Alien? Is there no way of having the alien data be 'secured' (for certain definitions etc etc)? Sure, anything that requires copying data around is annoying and time consuming. But it's likely less annoying than having your image eaten by monsters from another dimension (aka buffer overflow bugs et al.) and certainly less time consuming than crashing, restarting and rebuilding. For stuff that is really internal to the context of the running VM, sure; pain, not likely to work, stupid idea. And for anything really short-lived, way too slow. Ah, if only MMUs could be told 'lock everything except this range, and this range, and that one' and do it adequately fast. ARM actually tried that for the 610 for the Newton, many years ago. I don't think it would fly nowadays. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Bad command or file name. Go stand in the corner. |
In reply to this post by Ben Coman
Hi Ben, > On Aug 5, 2018, at 6:34 PM, Ben Coman <[hidden email]> wrote: > > >> On 5 August 2018 at 23:10, Eliot Miranda <[hidden email]> wrote: >> >> Hi Ben, >> >> >>> On Aug 4, 2018, at 8:40 AM, Ben Coman <[hidden email]> wrote: >>> >>> >>> A problem with FFI is that if a callout segfaults, all of memory >>> including that of the Image is suspect, and execution of the Image terminates. >>> >>> Occasionally I hunt around hoping to find technology to mitigate that problem. >>> Maybe this time in I found something... Memory Protection Keys [1] >>> Perhaps these could ensure Image memory safe when an FFI callout segfaults. >>> >>> IIUC the main problem with protecting Image memory on every FFI callout >>> is the time it would take update the flags on every page of Image memory. >>> Would being able to change the protection of a massive number of pages >>> with one syscall make it feasible to wrap them around FFI callouts? >>> >>> This may be useful at least where the FFI use is more about reuse of >>> existing functionality than about performance. >>> Or at least useful while someone is learning/experimenting with FFI for >>> the first time or while becoming familiar with some external library. >>> Further info at [2] & [3]. >> >> I think there’s a much simpler improvement that doesn’t go this far. I implemented it in VisualWorks and it’s been in production for more than a decade. It should be easy to add to Cog. >> >> The idea is simply to add a flag that tracks if the VM is in an FFI call or not and to test this flag in the VM’s exception handlers for SIGBUS, SIGILL, SIGSEGV and their equivalents on Windows. The exception handlers then respond when in an FFI call by failing the FFI call primitive, answering a primitive fail code that includes the exception information. Recently we extended Cog’s failure codes to allow a structured object (I font have the details handy; I’ll check soon). In this case we need a pc and/or address and an exception code. >> >> Would this approach satisfy you? > > That sounds good. Although the argument I've seen is that a memory > access error > means you "cant recover because you don't know what may have been corrupted" > I think its worthwhile to be optimistic that the Image may last a bit > longer to get more information about what call from the Image invoked > the FFI failure. > And if you've been notified (e.g. via Growl message) you can still > take steps to move to a new Image if the current one is suspect. While it’s possible that an FFI call could damage the Smalltalk heap and VM state it’s often not the case do one wants to be able to at least reap the error code and hence identify where the error occurred, inspect the arguments to the call, etc. It’s hence about having enough functionality to gather what information is available from outside the call, not about being able to continue for a long time after. > I guess you'd want to be able to turn it off for native level debugging, > and for critical production applications where its judged better to > crash than continue. Perhaps, but debuggers like gdb often give one the control one needs to stop at an exception before it is delivered. > Also, the approach you suggest would be a pre-requisite for what I > suggested anyway, > and make it easier to later experiment with MPKs. Cool. > Let me know what I can do to help (probably more capable on the testing side). Will do. If you’re happy with C programming and the simulator part of it is adding a global flag variable, setting and u setting it in FFI calls and callbacks, and then responding to the flag in the exception handler. The tricky bit is arranging failure. That I can work on when time allows. > > cheers -ben > >>> [1] https://lwn.net/Articles/643797/ >>> [2] http://man7.org/linux/man-pages/man7/pkeys.7.html >>> [3] https://lwn.net/Articles/689395/ |
Hi Eliot, On Tue, 7 Aug 2018 at 03:32, Eliot Miranda <[hidden email]> wrote:
I believe you did some work on this catching of segfaults in FFI callouts to return a primitive failure. Where did it get up to? Can you point me at the code that sets/tests this flag and sets up the primitive failure? cheers -ben |
Hi Ben, On Thu, Jun 27, 2019 at 10:31 PM Ben Coman <[hidden email]> wrote:
Indeed I did. Here's the status. It works on Unix and MacOS but fails on Windows due to a failure in structur4edd exception handling (stack walking) with the MinGW toolchain. I may revisit the code in the context of MSVC Community Edition 2017, which I'm using for the Terf VM. Thanks for the reminder. here's the structure of the code: In SmalltalkImage>>#recreateSpecialObjectsArray the error table is extended to include a prototype instance of ExceptionInFFICallError. When wanting to deliver such an error the VM creates a shallow copy of this object, fills it in. and supplies it as the errorCode in an FFI primitive method. This change was introduced in System-eem.1041. newArray at: 52 put: #(nil "nil => generic error" #'bad receiver' #'bad argument' #'bad index' #'bad number of arguments' #'inappropriate operation' #'unsupported operation' #'no modification' #'insufficient object memory' #'insufficient C memory' #'not found' #'bad method' #'internal error in named primitive machinery' #'object may move' #'resource limit exceeded' #'object is pinned' #'primitive write beyond end of object' #'object moved' #'object not pinned' #'callback error'), {PrimitiveError new errorName: #'operating system error'; yourself. ExceptionInFFICallError new errorName: #'exception in FFI call'; yourself}. ExceptionInFFICallError allInstVarNames #('errorName' 'errorCode' 'pc') So errorName is #'exception in FFI call', errorCode will be either the second argument to the signal handler on Unix, or the Win32 exception code on Win32. The pc is the pc at which the exception took place. The error code will only be delivered if the method contains a primitive error code. There is a flag in the VM to provide overriding of this behavior, but as yet there is no primitive to access this flag. See the two implementors of primitiveFailForFFIException:at: in the VMMaker source code. Within the VM the fatal exception handlers (sigsegv in the Unix & MacOS VMs; squeakExceptionHandler within the win32 VM) always call primitiveFailForFFIExceptionat. primitiveFailForFFIExceptionat checks to see if the VM is in an FFI call and if not, simply returns. If so, it does the relevant stack switching actions to discard the C stack and fail the primitive with the supplied error code & pc. For reasons unknown, the exception handler squeakExceptionHandler seems not to be reached if the VM is compiled with clang and/or gcc on win32. [I have to confirm this; it's been ten months]. N.B. I had to add an error code variable to ExternalFunction>>#invokeWithArguments:. Pharo should ensure it also has one. HTH _,,,^..^,,,_ best, Eliot |
Free forum by Nabble | Edit this page |