On Sun, Jul 10, 2011 at 5:06 AM, Levente Uzonyi
<[hidden email]> wrote:
I read the JIT code (#genSmallIntegerComparison:) and found that it doesn't implement the 4-byte LargePositiveInteger checks. So the code behaves differently when it's interpreted and when it's jitted:
(1 to: 2) collect: [ :each | 0 = (LargePositiveInteger new: 4) ].
gives
#(true false).
But this is not a real problem IMHO, because the argument is not normalized. How hard would it be to implement the support for LargePositiveIntegers in the JIT-ed code?
Quite easy. The failure path would be modified to invoke the interpreter primitives. I had not done so on minimalism grounds in that I had wanted to get things working fast for the common case asap. I have a slight antipathy to the normal Squeak code which only copes with integers up to 64-bits. But Cog includes the LargeIntegersPlugin by Stephan Rudlof, and this is probably the best code to fall back on.
Regarding implementation, arguably fast failure is important (since inequality can be important to establish too) so, since LargeNegativeInteger, LargePositiveInteger and Float all have contiguous compact class indices the code could be something of the flavour of
SmallInteger = (assume the receiver is a SmallInteger)
arg isSmallInteger ifTrue:
[....inline code for SmallInteger comparison].
compactClassIndex := arg compactClassIndex.
(compactClassIndex >= ClassLargeNegativeIntegerCompactIndex
and: [compactClassIndex <= ClassFloatCompactIndex]) ifTrue:
[compactClassIndex <= ClassLargePositiveIntegerCompactIndex ifTrue:
[call corresponding primitive in the LargeIntegersPlugin].
inline code for SmallInteger x Float comparison]
fail
and of course this pattern could apply to comparison operations and arithmetic. For little extra cost the code could also include a test for the receiver being a SmallInteger, but IMO its better to have primitives that are specific to the receiver (for performance) and have the VM implement primitives for SmallInteger Large{Posi,Nega}tiveInteger & Float than have one size fits all.
2¢
Levente
P.S.: Sorry for still nagging about this.
On Sun, 10 Jul 2011, Levente Uzonyi wrote:
I fired up QVMProfiler (hacked it a bit to make it work again under windows). Integer >> #= is invoked during execution, even though it can't be seen from the debugger. Here's the VM report for the old version:
gc prior. clear prior.
2.393 seconds; sampling frequency 998 hz
2389 samples in the VM (2389 samples in the entire program) 100.0% of total
799 samples in generated vm code 33.44% of entire vm (33.44% of total)
1590 samples in vanilla vm code 66.56% of entire vm (66.56% of total)
% of generated vm code (% of total) (samples) (cumulative)
29.91% (10.00%) SmallInteger>>= (239) (29.91%)
18.40% ( 6.15%) Integer>>= (147) (48.31%)
10.64% ( 3.56%) UndefinedObject>>DoIt (85) (58.95%)
8.76% ( 2.93%) Integer>>digitCompare: (70) (67.71%)
8.01% ( 2.68%) Number>>negative (64) (75.72%)
7.63% ( 2.55%) cePrimReturnEnterCogCode (61) (83.35%)
3.38% ( 1.13%) PIC isInteger (27) (86.73%)
3.00% ( 1.00%) SmallInteger>>< (24) (89.74%)
2.75% ( 0.92%) PIC digitCompare: (22) (92.49%)
2.63% ( 0.88%) PIC negative (21) (95.12%)
2.50% ( 0.84%) PIC isNumber (20) (97.62%)
1.75% ( 0.59%) PIC negative (14) (99.37%)
0.38% ( 0.13%) Integer>>isInteger (3) (99.75%)
0.13% ( 0.04%) LargePositiveInteger>>negative (1) (99.87%)
0.13% ( 0.04%) Number>>isNumber (1) (100.0%)
% of vanilla vm code (% of total) (samples) (cumulative)
31.26% (20.80%) _classNameOfIs (497) (31.26%)
13.84% ( 9.21%) _stSizeOf (220) (45.09%)
11.07% ( 7.37%) _lengthOf (176) (56.16%)
9.69% ( 6.45%) _isWordsOrBytesNonInt (154) (65.85%)
8.18% ( 5.44%) _isKindOf (130) (74.03%)
7.04% ( 4.69%) _arrayValueOf (112) (81.07%)
6.35% ( 4.23%) _primDigitCompare (101) (87.42%)
6.23% ( 4.14%) _stackValue (99) (93.65%)
3.14% ( 2.09%) _popthenPush (50) (96.79%)
1.51% ( 1.00%) _failed (24) (98.30%)
1.32% ( 0.88%) _success (21) (99.62%)
0.38% ( 0.25%) _integerObjectOf (6) (100.0%)
And here's the same report with the new version of Integer >> #=:
gc prior. clear prior.
0.169 seconds; sampling frequency 994 hz
168 samples in the VM (168 samples in the entire program) 100.0% of total
168 samples in generated vm code 100.0% of entire vm (100.0% of total)
0 samples in vanilla vm code 0.00% of entire vm ( 0.00% of total)
% of generated vm code (% of total) (samples) (cumulative)
45.24% (45.24%) Integer>>= (76) (45.24%)
25.00% (25.00%) SmallInteger>>= (42) (70.24%)
18.45% (18.45%) PIC isInteger (31) (88.69%)
8.93% ( 8.93%) UndefinedObject>>DoIt (15) (97.62%)
2.38% ( 2.38%) Integer>>isInteger (4) (100.0%)
The new version is faster, because it avoids Integer >> #digitCompare: as I expected. But why is Integer >> #= invoked at all? Why can't it be seen from the debugger?
Levente
On Sat, 9 Jul 2011, Levente Uzonyi wrote:
Hi Eliot,
I found that Cog is really slow when I compare SmallIntegers with 4-byte LargePositiveIntegers. In theory this kind of comparison should only be slightly slower than SmallInteger-SmallInteger comparisons, but that's not the case.
With Cog (r2434 on windows) I get:
evaluator := [ :aBlock |
(((1 to: 5) collect: [ :run |
aBlock timeToRun ]) sort copyFrom: 2 to: 4) average asFloat ].
evaluator value: [ 1 to: 1000000 do: [ :i | 0 = 1 ] ]. "3.0"
evaluator value: [ 1 to: 1000000 do: [ :i | 0 = 16r40000000 ] ]. "244.0"
The same code with the Interpreter VM gives 22.0 and 74.0.
I tried debugging the 0 = 16r40000000 expression with Cog, but the debugger doesn't touch any smalltalk code during the execution of #=.
If I replace the implementation of Integer >> #= to:
= aNumber
aNumber isInteger ifTrue: [
aNumber class == self class ifFalse: [ ^false ].
^(self digitCompare: aNumber) = 0 ].
aNumber isNumber ifFalse: [ ^false ].
^aNumber adaptToInteger: self andCompare: #=
then the second number decreases to 18.0 with Cog. Changing this method has no effect on the interpreter VM's performance.
Do you have an idea what can cause the slowdown and why does the implementation of Integer >> #= matter for Cog?
Cheers,
Levente
--
best,
Eliot