Hi, Any idea how slower is? I mean, any measure/estimation/something around? cheers, Esteban |
On 22-06-2015, at 5:13 AM, Esteban Lorenzano <[hidden email]> wrote: > > Hi, > > Any idea how slower is? I mean, any measure/estimation/something around? After the initial searching/loading/linking it should be pretty much identical. It’s just a jump to a pointed-at program location. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful random insult:- Ready to check in at the HaHa Hilton. |
That sounds right to me too. But it would be a worthwhile experiment to set up a test to confirm it. Maybe take one or more methods that call numbered primitives, and recode them to call the primitives by name. Then measure and see if anything got slower. Dave > > On 22-06-2015, at 5:13 AM, Esteban Lorenzano <[hidden email]> wrote: > >> >> Hi, >> >> Any idea how slower is? I mean, any measure/estimation/something around? > > After the initial searching/loading/linking it should be pretty much > identical. Its just a jump to a pointed-at program location. > > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Useful random insult:- Ready to check in at the HaHa Hilton. > > |
Well it also depends if the primitive is generated by the JIT. If you rewrite SmallInteger>>#+ from primitive 1 to a named primitive the overhead will be more important than just the searching/loading/linking because the JIT won't compile it to n-code anymore. So make a test with a primitive not compiled by the JIT. 2015-06-22 18:35 GMT+02:00 David T. Lewis <[hidden email]>:
|
I’m just trying to understand the cost difference between prim 120 and #primitiveCalloutWithArgs so it should be easy to set up a test :) but… judging #primitiveDoNamedPrimitiveWithArgs… I suppose something like Athens will need a numbered primitive, in order to keep it performant… but well, after I have numbers I will send a proposition (if needed), to remap #primitiveCalloutWithArgs into a number… :P Esteban
|
In reply to this post by EstebanLM
Hi Esteban, you can set up a test using the LargeInteger comparison primitives. They're both named and numberd. e.g. 23 primitiveLessThanLargeIntegers So you can write e.g. LargePositiveInteger>>#< anInteger "Primitive. Compare the receiver with the argument and answer true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger or a LargePositiveInteger less than 2-to-the-30th (1073741824). Optional. See Object documentation whatIsAPrimitive." <primitive: 23> ^super < anInteger as numberedLessThan: anInteger "Primitive. Compare the receiver with the argument and answer true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger or a LargePositiveInteger less than 2-to-the-30th (1073741824). Optional. See Object documentation whatIsAPrimitive." <primitive: 23> ^super < anInteger namedLessThan: anInteger "Primitive. Compare the receiver with the argument and answer true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger or a LargePositiveInteger less than 2-to-the-30th (1073741824). Optional. See Object documentation whatIsAPrimitive." <primitive: 'primitiveLessThanLargeIntegers'> ^super < anInteger and test it with two suitable large integers. Will you report back? I'd like to know the answer. Named primitive invocation should be slightly slower. As Clément says, a return different address is written to the stack, overwriting the primitive code, but that return path is essentially the same as for numbered primtiives. So I expect that there will be almost no measurable difference. This is all to do with callbacks. Numbered primitives are assumed never to callback (so far that's a valid assumption). But named primitives (such as an FFI call) may indeed callback and hence, by the time the primitive finally returns the code zone may have been compacted and the original method containing the callout may have moved. So the VM can't simply return to a primitive that may have called back, and then have that primitive's code return form the primitive, because that codee may have moved. The solution is to provide a piece of code at a fixed address that returns from a named primitive call, and have the return sequence run that code. On Mon, Jun 22, 2015 at 5:13 AM, Esteban Lorenzano <[hidden email]> wrote:
best,
Eliot |
In reply to this post by EstebanLM
On 22-06-2015, at 10:32 AM, Esteban Lorenzano <[hidden email]> wrote: > I’m just trying to understand the cost difference between prim 120 and #primitiveCalloutWithArgs so it should be easy to set up a test :) The calling of a prim (at the lowest level) is done via a pointer to the prim code. That pointer is generally cached somewhere useful (depends on the exact vm version) and works the same whether it is a pointer found from the primitive table (‘numbered prims’) or via the initial lookup of a named prim. The basics of this was work I did nearly 12 years ago and I’d really surprised if anyone has radically broken it. > but… judging #primitiveDoNamedPrimitiveWithArgs… I suppose something like Athens will need a numbered primitive, in order to keep it performant… I really doubt it. Even if the named-prim call cost substantially more I really doubt that it would make any real difference to the total time for a graphics library call. A couple of instructions against thousands, even millions, to do some draw call? > > but well, after I have numbers I will send a proposition (if needed), to remap #primitiveCalloutWithArgs into a number… :P Completely not needed. The cost of primitiveCallout… is in the marshalling of the arguments, not the basic call to that primitive. If you want to speed up calls to a specific library, write a plugin for that specific library and optimise your code & api. IE, don’t have calls to set or get a single variable, make a call to pass a load of info in one go. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim "Bother!" said Pooh, searching for the $10m winning lottery ticket. |
In reply to this post by Eliot Miranda-2
On Mon, Jun 22, 2015 at 10:40 AM, Eliot Miranda <[hidden email]> wrote:
I should have said that numbered primitives other than 117 (primitiveExternalCall) & 120 (primitiveCalloutToFFI) are assumed never to call-back. In fact, the VM code in primitivePropertyFlagsForSpur: & primitivePropertyFlagsForV3: won't set the required flags to tell the Cogit to substitute the return address if you use primitiveCalloutWithArgs as a named primitive instead of 120 as a numbered primitive. So please use 120. Anyway, the test should demonstrate that there's no difference. If you /do/ want to use primitiveCalloutWithArgs instead of 120 then primitivePropertyFlagsForSpur: & primitivePropertyFlagsForV3: are going to get more complicated. The system is currently setup for the FFP plugin to be unloaded and hence the system made secure by not shipping the FFI plugin. But including a reference to primitiveCalloutWithArgs in the main VM needs to be done carefully to avoid having to link the FFI plugin into the VM. So this should be done with e.g. a self cppIf: PharoVM ifTrue: ... idiom.
best,
Eliot |
In reply to this post by EstebanLM
Hi Esteban,
On Mon, Jun 22, 2015 at 10:32 AM, Esteban Lorenzano <[hidden email]> wrote:
What Tim said. Both the Interpreter and the Cogit go to some lengths to cache named primtiiev calls so that the complex linking machinery is used only on the first call. In particular primitiveDoNamedPrimitiveWithArgs is used only in the debugger to call named primitives. It *isn't* used at all in a normal named primitive call. Only primitiveExternalCall is used on the *first* invocation of a named primitive. Subsequent invocations likely invoke the primitive directly. If the named primitive gets dropped from the method cache (or for generated code) but the plugin module has not been unloaded then the function pointer will still be in the named primitive's first literal and will be fetched from there, and the method lookup cache updated to point to the function pointer, avoiding any need to lookup the name in the plugin module.
Like I said earlier in the thread, the issue is likely not to do with numbers. It is to do with providing the right flags as the result of primitivePropertyFlagsForSpur: & primitivePropertyFlagsForV3: so that the Cogit can generate code that is robust in the face of callbacks. See SimpleStackBasedCogit>>#compileInterpreterPrimitive: HTH P.S. Great that you're working on the FFI!!
best,
Eliot |
In reply to this post by Eliot Miranda-2
I will not change anything until I actually find a reason to do it… and if I find a reason, I will discuss it here, so do not worry about. Right now I’m just wondering, because I’m writing the NB to FFI backend and I’m thinking on better ways to do it… and since this is all while we wait for the new FFI implementation, most probably everything can wait… but, it would be nice to actually know the numbers, instead just guessing… :) Esteban
|
In reply to this post by Eliot Miranda-2
On Mon, Jun 22, 2015 at 9:35 AM, David T. Lewis <[hidden email]> wrote:
On Mon, Jun 22, 2015 at 10:40 AM, Eliot Miranda <[hidden email]> wrote:
Wow, it is indeed a significant difference. Substituting the return address must invoke all sorts of cost in an x86 cpu. Here are 2 x 5 runs | i | i := SmallInteger maxVal + 1. (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] #(#(191 283) #(211 375) #(281 405) #(300 411) #(281 421) #(296 409)) #(#(186 267) #(201 273) #(210 364) #(294 410) #(313 400) #(292 405)) So the overhead is of the order of (100ms / 10,000,000) per call. e.g. around 10ns per named primitive call. Interesting :-)
best,
Eliot |
On 22-06-2015, at 11:05 AM, Eliot Miranda <[hidden email]> wrote: > | i | i := SmallInteger maxVal + 1. > (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] > > #(#(191 283) #(211 375) #(281 405) #(300 411) #(281 421) #(296 409)) #(#(186 267) #(201 273) #(210 364) #(294 410) #(313 400) #(292 405)) And on a Pi2/Cog we get typical results of 2250 2650 - so 400mS per 10e7 calls, implying about 40nS per. Which is about 35 instructions on a Pi2’s cpy. So really, not going to actually make any interesting contribution to the total time for an FFI call and the library routine it calls. It will be interesting to work out exactly why there is any difference at all. Just don’t hold your breath. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: RSC: Rewind System Clock |
In reply to this post by Eliot Miranda-2
On Mon, Jun 22, 2015 at 11:05:30AM -0700, Eliot Miranda wrote: > On Mon, Jun 22, 2015 at 9:35 AM, David T. Lewis <[hidden email]> wrote: > > > > > That sounds right to me too. But it would be a worthwhile experiment to > > set up a test to confirm it. Maybe take one or more methods that call > > numbered primitives, and recode them to call the primitives by name. Then > > measure and see if anything got slower. > > > > Dave > > > > On Mon, Jun 22, 2015 at 10:40 AM, Eliot Miranda <[hidden email]> > wrote: > > > Hi Esteban, > > > > you can set up a test using the LargeInteger comparison primitives. > > They're both named and numberd. e.g. > > > > 23 primitiveLessThanLargeIntegers > > > > So you can write e.g. > > > > LargePositiveInteger>>#< anInteger > > "Primitive. Compare the receiver with the argument and answer true if > > the receiver is less than the argument. Otherwise answer false. Fail if the > > argument is not a SmallInteger or a LargePositiveInteger less than > > 2-to-the-30th (1073741824). > > Optional. See Object documentation whatIsAPrimitive." > > > > <primitive: 23> > > ^super < anInteger > > > > as > > > > numberedLessThan: anInteger > > "Primitive. Compare the receiver with the argument and answer true if > > the receiver is less than the argument. Otherwise answer false. Fail if the > > argument is not a SmallInteger or a LargePositiveInteger less than > > 2-to-the-30th (1073741824). > > Optional. See Object documentation whatIsAPrimitive." > > > > <primitive: 23> > > ^super < anInteger > > > > > > namedLessThan: anInteger > > "Primitive. Compare the receiver with the argument and answer true if > > the receiver is less than the argument. Otherwise answer false. Fail if the > > argument is not a SmallInteger or a LargePositiveInteger less than > > 2-to-the-30th (1073741824). > > Optional. See Object documentation whatIsAPrimitive." > > > > <primitive: 'primitiveLessThanLargeIntegers'> > > ^super < anInteger > > > > and test it with two suitable large integers. Will you report back? I'd > > like to know the answer. Named primitive invocation should be slightly > > slower. As Cl??ment says, a return different address is written to the > > stack, overwriting the primitive code, but that return path is essentially > > the same as for numbered primtiives. So I expect that there will be almost > > no measurable difference. > > > > Wow, it is indeed a significant difference. Substituting the return > address must invoke all sorts of cost in an x86 cpu. Here are 2 x 5 runs > > > | i | i := SmallInteger maxVal + 1. > (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] > timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] > > #(#(191 283) #(211 375) #(281 405) #(300 411) #(281 421) #(296 409)) > #(#(186 267) #(201 273) #(210 364) #(294 410) #(313 400) #(292 405)) > > So the overhead is of the order of (100ms / 10,000,000) per call. e.g. > around 10ns per named primitive call. Interesting :-) On an interpreter VM, the results are as Tim and I initially expected: | i | i := SmallInteger maxVal + 1. (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] ==> #(#(791 789) #(793 794) #(793 790) #(791 791) #(790 794) #(795 789)) With a Cog VM, the numbered primitives are significantly faster: | i | i := SmallInteger maxVal + 1. (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] ==> #(#(542 670) #(542 668) #(544 678) #(546 680) #(540 666) #(540 680)) On Mon, Jun 22, 2015 at 07:16:43PM +0200, Cl??ment Bera wrote: > > Well it also depends if the primitive is generated by the JIT. If you > rewrite SmallInteger>>#+ from primitive 1 to a named primitive the overhead > will be more important than just the searching/loading/linking because the > JIT won't compile it to n-code anymore. So maybe this is the reason for the difference. Note: this is with an old Cog VM, because I lost my primary PC and can't restore it right now. But I think the results are relevant WRT this discussion. /usr/local/lib/squeak/4.0-2776/squeak Croquet Closure Cog VM [CoInterpreter VMMaker.oscog-eem.331] Dave |
On 23.06.2015, at 02:58, David T. Lewis <[hidden email]> wrote: > On an interpreter VM, the results are as Tim and I initially expected: > > | i | i := SmallInteger maxVal + 1. > (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] > timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] > > ==> #(#(791 789) #(793 794) #(793 790) #(791 791) #(790 794) #(795 789)) > > With a Cog VM, the numbered primitives are significantly faster: > > | i | i := SmallInteger maxVal + 1. > (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] > timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] > > ==> #(#(542 670) #(542 668) #(544 678) #(546 680) #(540 666) #(540 680)) ((54 535 ) (42 541 ) (42 536 ) (49 542 ) (44 527 ) (44 530 ) ) - Bert - smime.p7s (5K) Download Attachment |
Hi Bert, On Jun 23, 2015, at 3:38 AM, Bert Freudenberg <[hidden email]> wrote: > On 23.06.2015, at 02:58, David T. Lewis <[hidden email]> wrote: >> On an interpreter VM, the results are as Tim and I initially expected: >> >> | i | i := SmallInteger maxVal + 1. >> (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] >> timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] >> >> ==> #(#(791 789) #(793 794) #(793 790) #(791 791) #(790 794) #(795 789)) >> >> With a Cog VM, the numbered primitives are significantly faster: >> >> | i | i := SmallInteger maxVal + 1. >> (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] >> timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] >> >> ==> #(#(542 670) #(542 668) #(544 678) #(546 680) #(540 666) #(540 680)) > > Looks like SqueakJS may need some caching for named prim lookup: > > ((54 535 ) (42 541 ) (42 536 ) (49 542 ) (44 527 ) (44 530 ) ) I think this is an inlined apples vs oranges comparison :). To be that fast surely the JS VM has optimized away the numbered primitive send entirely, leaving only the named primitive send, so this isn't the difference between the two at all. > - Bert - > |
On 23.06.2015, at 18:23, Eliot Miranda <[hidden email]> wrote: > > Hi Bert, > > On Jun 23, 2015, at 3:38 AM, Bert Freudenberg <[hidden email]> wrote: > >> On 23.06.2015, at 02:58, David T. Lewis <[hidden email]> wrote: >>> On an interpreter VM, the results are as Tim and I initially expected: >>> >>> | i | i := SmallInteger maxVal + 1. >>> (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] >>> timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] >>> >>> ==> #(#(791 789) #(793 794) #(793 790) #(791 791) #(790 794) #(795 789)) >>> >>> With a Cog VM, the numbered primitives are significantly faster: >>> >>> | i | i := SmallInteger maxVal + 1. >>> (1 to: 6) collect: [:j| {[1 to: 10000000 do: [:k| i numberedLessThan: i]] >>> timeToRun. [1 to: 10000000 do: [:k| i namedLessThan: i]] timeToRun}] >>> >>> ==> #(#(542 670) #(542 668) #(544 678) #(546 680) #(540 666) #(540 680)) >> >> Looks like SqueakJS may need some caching for named prim lookup: >> >> ((54 535 ) (42 541 ) (42 536 ) (49 542 ) (44 527 ) (44 530 ) ) > > > I think this is an inlined apples vs oranges comparison :). To be that fast surely the JS VM has optimized away the numbered primitive send entirely, leaving only the named primitive send, so this isn't the difference between the two at all. - Bert - smime.p7s (5K) Download Attachment |
On Tue, Jun 23, 2015 at 12:33 PM, Bert Freudenberg <[hidden email]> wrote: --
ROTFL :) best,
Eliot |
In reply to this post by Bert Freudenberg
On Tue, Jun 23, 2015 at 09:33:44PM +0200, Bert Freudenberg wrote: > > On 23.06.2015, at 18:23, Eliot Miranda <[hidden email]> wrote: > > > > I think this is an inlined apples vs oranges comparison :). To be that fast surely the JS VM has optimized away the numbered primitive send entirely, leaving only the named primitive send, so this isn't the difference between the two at all. > > Haha, no. I removed two zeroes from the iteration count ;) Aha, so this performance optimization stuff is not so hard after all. All we have to do is get rid of the extraneous zeros and everything just goes faster. I never really understood why we needed all those zeros in the first place. If we just keep getting rid of them, there is no limit to the potential performance improvements! Dave |
On 23-06-2015, at 4:43 PM, David T. Lewis <[hidden email]> wrote: > I never really understood why we needed all those zeros in the first place. > If we just keep getting rid of them, there is no limit to the potential > performance improvements! Oh but there is - as soon as we hit infinite performance everything crashes instantly. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Klingon Code Warrior:- 9) "A TRUE Klingon warrior does not comment his code!" |
On Tue, Jun 23, 2015 at 04:45:42PM -0700, tim Rowledge wrote: > > On 23-06-2015, at 4:43 PM, David T. Lewis <[hidden email]> wrote: > > I never really understood why we needed all those zeros in the first place. > > If we just keep getting rid of them, there is no limit to the potential > > performance improvements! > > Oh but there is - as soon as we hit infinite performance everything crashes instantly. > I'm sure you realize that I was baiting you with that remark ;-) I don't think that a crash would be inevitable, but I will resist the temptation of speculating in the realm of physics and astronomy regarding the likely outcome of a binary star with the zeros removed. Dave |
Free forum by Nabble | Edit this page |