[ANN] FFICHeaderExtractor first milestone (for early code reviewers) [WAS] Can OSProcess functionality be implemented using FFI instead of plugin?
OK, I have a first working version and so I wanted to share it with you.
I have not yet the time to start writing the doc since I just finished the first pass on the code. Tomorrow I will start with the doc. But I thought some of you may be interested in taking a look even without formal "doc" (and some feedback/iteration may avoid re-writing docs..).
If you have no clue what I am talking about, then this summary is for you:
When we use FFI to call a certain library it's quite common that we need to pass as argument certain constants (for example, SIGKILL to kill()). These constants are defined in C header files and can even change it's value in different paltforms.
These constants also are sometimes defined by the C preprocessor and so there is not way to get those values from FFI. If you don't have the value of those constants, you cannot make the FFI call.
I have tested the tool in OSX and CentOS using latest Pharo 5.0. It won't work in Windows right now. As usual, all classes and methods have comments and there are enough tests.
At the end, I decided the C program will output a very naive Smalltalk literal array kind of thingy. The tool then parses that output and directly creates a init method (which is compiled into the SharedPool class) for that platform which is then called automatically at startup (only if initialization is needed).
As for real examples, I started to write constants for libc: signal.h (to use kill()) , wait.h (to use wait() famility), fcntl.h (to use ... xxx()) , and errno.h. You can take a look to the package 'FFICHeaderExtractor-LibC'.
Note that for running the tests you need 'cc' findable by path in OSX and 'gcc' in Unix.
Let's measure this. Let's say we have 8 platforms (that's an underestimate, because different Linux distributions may have different values for certain constants), but 8, which is 4 basic platforms times 32- & 64-bits. We have Mac x86 32-bit, Mac x64 64-bit, Windows x86 32-bit, Windows x64 64-bit, Linux x86 32-bit, Linux ARM 32-bit, Linux x64 64-bit, and soon enough there will be more. Further, there may be different versions over time.
So each of those initialization methods has
- 1 slot for the global variable to be assigned
- 1 slot for the literal value to assign to it
- 3 bytes of bytecode per initialization for small methods, 4 for large methods. Let's say 4.
So the overhead in 32-bits is 12 bytes per constant, and in 64-bits is 20 bytes. So the overhead per constant for all platforms is 96 bytes per constant in 32-bits and 160 bytes per constant for 64-bits. A full system with sockets, files, a database connexion etc could easily exceed 100 constants. I think it would be nearer 1000. So the overheads are in the 10- to 100-k byte range (100k ~= 0.5% of the image) on 32-bits. That's low but it's also pure overhead. Every GC has to visit them. Every senders and implementors has to visit them, but they offer nothing of value. Whereas the small parser for whatever notation is used to store the constants externally (if they are needed in a given deployment) has a small constant overhead; its simple code.
Further, you still need the machinery to export the constants to be able to generate these initialization methods. If you've got the machinery and you don't need the methods why bother to generate the methods?
As the Scots say, many a mickle makes a muckle.
Thank's Eliot for such detailed explanation. It makes sense.
But personally I prefer Smalltalk solution although Smalltalk itself is pure overhead comparing to C.
I can see the draw of the pure Smalltalk. Simplicity and brows ability. But imagine a tiny headless image deployed on containers, say 2mb. Now 100kb of initialization code doesn't look so good :-). Anyway I'm beating a dead horse. Mariano is generating initialization methods.
My question was raised by Mariano idea to save ston files in methods. I think it can reduce problems which you described.
But then literal array syntax can be more suitable than ston.
I just want to be clear, I'm neutral about the notation used to export info from the C file. Liberal array syntax, chunk source format, ston, xml. It doesn't matter as long as it's convenient at expressing an attribute dictionary from names to attributes such as value, size, offset. Don't get hung up on the specific notation. If one were to go with the external file the only real requirements are that it be reasonably compact and quick to parse. That might kill xml but leave plenty of other candidates.
Nice. I was writing little C programs to tell me various
constants... it sounds like this automates that and keeps the
interaction in Smalltalk.
Yeah, exactly. It is automate that and the result can be stored as "init" methods (one per platform) in the shared pools directly.
Also, it will take care of initializing them (searching the correct init method for the current platform) at startup.
Note that I called the project FFICHeaderExtractor and not FFICConstantsExtractor. If time allows, I would also like to get info from structs: sizeof and how they are defined internally. That would be yet another feature to let more things to be done via FFI. See https://github.com/marianopeck/FFICHeaderExtractor/issues/1 But that would require some more effort!
Black Page Digital
1001ET Amsterdam, Netherlands [hidden email]
<a href="tel:%2B31%20%20%206%202757%207177" value="+31627577177">+31 6 2757 7177 (SMS ok)
<a href="tel:%2B%201%20415%20%20287%203547" value="+14152873547">+ 1 415 287 3547 (no SMS)