On Tue, Jan 8, 2019 at 1:51 PM Nicolas Cellier <
[hidden email]> wrote:
Hi all,
particularly Clement and Eliot,
One of the most annoying limit of bytecode is the number of arguments (<16 in V3), not so much annoying for pure Smalltalk, but certainly so for FFI (FORTRAN 77 lacks structures so existing code base often have functions with many arguments).
For scientific Smalltalk, some of those old FORTRAN libraries are still around nowadays (LAPACK is an example).
Agreed. There are VW users out there with autogenerated code that requires more than 15 arguments. Clément and I already have a design in mind, which is much more elegant than using the extra bit below. However, it does require that we change the maximum Context stack size, which is one reason (the other being lack of time) why we haven't implemented this so far.
]In 2008 my closure design introduced indirection vectors for closed over arguments and among the five bytecodes added to implement it was the Create Array bytes ode that can do one of two things:
V3PlusClosures:
138 10001010 jkkkkkkk Push (Array new: kkkkkkk) (j = 0)
or Pop kkkkkkk elements into: (Array new: kkkkkkk) (j = 1)
SistaV1:
231 11100111 jkkkkkkk Push (Array new: kkkkkkk) (j = 0)
& Pop kkkkkkk elements into: (Array new: kkkkkkk) (j = 1)
This bytecode is used to create indirection vectors, and to create tuples of size <= 8. e.g. { thisContext method symbolic. 2. 3. 4. 5. 6. 7. 8 }
#('89 <52> pushThisContext:
90 <81> send: method
91 <80> send: symbolic
92 <E8 02> pushConstant: 2
94 <E8 03> pushConstant: 3
96 <E8 04> pushConstant: 4
98 <E8 05> pushConstant: 5
100 <E8 06> pushConstant: 6
102 <E8 07> pushConstant: 7
104 <E8 08> pushConstant: 8
106 <E7 88> pop 8 into (Array new: 8)
108 <5C> returnTop
We call this version of the bytecode the cons array bytecode. The other form, used to create in direction vectors is the greater array bytecode.
(c.f. { thisContext method symbolic. 2. 3. 4. 5. 6. 7. 8. 9 } which produces much more code but requires only 2 elements of stack depth).
There are also three bytecodes used to access indirection vectors:
V3PlusClosures:
140 10001100 kkkkkkkk jjjjjjjj Push Temp At kkkkkkkk In Temp Vector At: jjjjjjjj
141 10001101 kkkkkkkk jjjjjjjj Store Temp At kkkkkkkk In Temp Vector At: jjjjjjjj
142 10001110 kkkkkkkk jjjjjjjj Pop and Store Temp At kkkkkkkk In Temp Vector At: jjjjjjjj
SistaV1:
251 11111011 kkkkkkkk sjjjjjjj Push Temp At kkkkkkkk In Temp Vector At: jjjjjjj, s = 1 implies remote inst var access instead of remote temp vector access
* 252 (3) 11111100 kkkkkkkk sjjjjjjj Store Temp At kkkkkkkk In Temp Vector At: jjjjjjj s = 1 implies remote inst var access instead of remote temp vector access
* 253 (3) 11111101 kkkkkkkk sjjjjjjj Pop and Store Temp At kkkkkkkk In Temp Vector At: jjjjjjj s = 1 implies remote inst var access instead of remote temp vector access
So the insight is that if we pass arguments beyond 14 in an indirection vector we can have up to 15 + 127 = 142 arguments without needing any extra bits in a CompiledMethod header or range in a bytecode. We simply pop arguments beyond the 14th into an indirection vector, using the cons array bytecode. Yes, this is slow compared to "native" support, but such methods are extremely rare, and supporting them this way means we have less waste elsewhere. It will require some sophistication in the Decompiler, but otherwise seems quite simple.
With this design, as far as the VM is concerned the maximum argument count is still 15. Only the image need bother with how to record the argument count for a method that has 15 or more arguments, and indeed a method with 15 arguments can still use all 15 arguments without having to create an indirection vector. This isolates the effects to the compiler (arguments beyond the 14th in methods with more than 15 arguments must be accessed using the indirection vector bytecodes above), but otherwise are quite localized: indirection vector creation occurs immediately after normal argument marshaling and immediately before the send bytecode.
Does this design appeal to you? If it does, then we should discuss when and how it should be implemented. One thing would be to make the maximum size of a Context, defined at the image level by CompiledCode's LargeFrame class variable, but hard coded into the VM, some kind of VM parameter, e.g. stored in the image header and read at start-up. It would be quite easy to add this. If we did so we should also ensure the stack page size calculation allows for a stack page big enough for one or two huge frames. Note that the design also means that a large stack is needed only to *marshal* arguments, not to activate a method with many arguments, since the excess arguments are stored in an indirection vector.
P.S. Indeed we could use the scheme used for arbitrary sized tuples to marshall extra arguments, but this would affect code generation much more. Different code would have to be used to marshall each argument beyond 15; whereas using the cons array bytecode