Chris Muller uploaded a new version of Kernel to project The Inbox:
http://source.squeak.org/inbox/Kernel-cmm.1198.mcz ==================== Summary ==================== Name: Kernel-cmm.1198 Author: cmm Time: 23 November 2018, 11:12:47.414703 pm UUID: fe228ca8-2ec7-4432-b3d9-76da98be4475 Ancestors: Kernel-eem.1197 - Suggestion that #basicClass should be inlined while #class should be a message send, so that Proxy's can be supported. - If so, then #xxxClass can be banished. - With #xxxClass banished, the Squeak code that called it can be written normally, simply as "class". =============== Diff against Kernel-eem.1197 =============== Item was added: + ----- Method: Object>>basicClass (in category 'class membership') ----- + basicClass + "Primitive. Answer the object which is the receiver's class. Essential. See + Object documentation whatIsAPrimitive." + + <primitive: 111> + self primitiveFailed! Item was changed: ----- Method: Object>>class (in category 'class membership') ----- class + "Answer the object which is the receiver's class. Essential." - "Primitive. Answer the object which is the receiver's class. Essential. See - Object documentation whatIsAPrimitive." + ^ self basicClass! - <primitive: 111> - self primitiveFailed! Item was changed: ----- Method: Object>>storeDataOn: (in category 'objects from disk') ----- storeDataOn: aDataStream "Store myself on a DataStream. Answer self. This is a low-level DataStream/ReferenceStream method. See also objectToStoreOnDataStream. NOTE: This method must send 'aDataStream beginInstance:size:' and then (nextPut:/nextPutWeak:) its subobjects. readDataFrom:size: reads back what we write here." | cntInstVars cntIndexedVars | cntInstVars := self class instSize. cntIndexedVars := self basicSize. aDataStream + beginInstance: self class - beginInstance: self xxxClass size: cntInstVars + cntIndexedVars. 1 to: cntInstVars do: [:i | aDataStream nextPut: (self instVarAt: i)]. "Write fields of a variable length object. When writing to a dummy stream, don't bother to write the bytes" ((aDataStream byteStream class == DummyStream) and: [self class isBits]) ifFalse: [ 1 to: cntIndexedVars do: [:i | aDataStream nextPut: (self basicAt: i)]]. ! Item was removed: - ----- Method: Object>>xxxClass (in category 'class membership') ----- - xxxClass - "For subclasses of nil, such as ObjectOut" - ^ self class! |
On Sat, 24 Nov 2018, [hidden email] wrote:
> Chris Muller uploaded a new version of Kernel to project The Inbox: > http://source.squeak.org/inbox/Kernel-cmm.1198.mcz > > ==================== Summary ==================== > > Name: Kernel-cmm.1198 > Author: cmm > Time: 23 November 2018, 11:12:47.414703 pm > UUID: fe228ca8-2ec7-4432-b3d9-76da98be4475 > Ancestors: Kernel-eem.1197 > > - Suggestion that #basicClass should be inlined while #class should be a message send, so that Proxy's can be supported. It won't work while the special bytecode for #class is compiled. And even after that, you have to recompile all senders of #class to make it use the primitive and the new method instead of optimizing it away. > - If so, then #xxxClass can be banished. > - With #xxxClass banished, the Squeak code that called it can be written normally, simply as "class". That won't work either for the same reason. And we do not want to remove the bytecode, do we? Levente > > =============== Diff against Kernel-eem.1197 =============== > > Item was added: > + ----- Method: Object>>basicClass (in category 'class membership') ----- > + basicClass > + "Primitive. Answer the object which is the receiver's class. Essential. See > + Object documentation whatIsAPrimitive." > + > + <primitive: 111> > + self primitiveFailed! > > Item was changed: > ----- Method: Object>>class (in category 'class membership') ----- > class > + "Answer the object which is the receiver's class. Essential." > - "Primitive. Answer the object which is the receiver's class. Essential. See > - Object documentation whatIsAPrimitive." > > + ^ self basicClass! > - <primitive: 111> > - self primitiveFailed! > > Item was changed: > ----- Method: Object>>storeDataOn: (in category 'objects from disk') ----- > storeDataOn: aDataStream > "Store myself on a DataStream. Answer self. This is a low-level DataStream/ReferenceStream method. See also objectToStoreOnDataStream. NOTE: This method must send 'aDataStream beginInstance:size:' and then (nextPut:/nextPutWeak:) its subobjects. readDataFrom:size: reads back what we write here." > | cntInstVars cntIndexedVars | > > cntInstVars := self class instSize. > cntIndexedVars := self basicSize. > aDataStream > + beginInstance: self class > - beginInstance: self xxxClass > size: cntInstVars + cntIndexedVars. > 1 to: cntInstVars do: > [:i | aDataStream nextPut: (self instVarAt: i)]. > > "Write fields of a variable length object. When writing to a dummy > stream, don't bother to write the bytes" > ((aDataStream byteStream class == DummyStream) and: [self class isBits]) ifFalse: [ > 1 to: cntIndexedVars do: > [:i | aDataStream nextPut: (self basicAt: i)]]. > ! > > Item was removed: > - ----- Method: Object>>xxxClass (in category 'class membership') ----- > - xxxClass > - "For subclasses of nil, such as ObjectOut" > - ^ self class! |
> > Chris Muller uploaded a new version of Kernel to project The Inbox:
> > http://source.squeak.org/inbox/Kernel-cmm.1198.mcz > > > > ==================== Summary ==================== > > > > Name: Kernel-cmm.1198 > > Author: cmm > > Time: 23 November 2018, 11:12:47.414703 pm > > UUID: fe228ca8-2ec7-4432-b3d9-76da98be4475 > > Ancestors: Kernel-eem.1197 > > > > - Suggestion that #basicClass should be inlined while #class should be a message send, so that Proxy's can be supported. > > It won't work while the special bytecode for #class is compiled. And even > after that, you have to recompile all senders of #class to make it use > the primitive and the new method instead of optimizing it away. Right. Assuming we can achieve consensus with Eliot, and the next Squeak will have a new VM, then that would be called from an MC post script. But what do you mean make all senders of #class use the primitive? Just as you suggested the use of #ensureNonProxiedReceiver from the other thread, the intention here is that #basicClass would better document those performance-critical places, but leaving the majority (of non-critical ones) sending #class, so it can be overridable. Do you think the system would be noticably slower if all the sends to #class became a message send? I'm skeptical that it would, but I have no idea. I am surprised to see we have so many senders of #class in trunk, but I have a feeling most rarely ever called. Removing those byteCodes from my CompiledMethods is above my knowledge level, but if you could help me come up with a script, I'd be interested in testing and playing around to learn more. > > - If so, then #xxxClass can be banished. > > - With #xxxClass banished, the Squeak code that called it can be written normally, simply as "class". > > That won't work either for the same reason. And we do not want to remove > the bytecode, do we? Not remove it, redirect it to #basicClass. This is a reasonable and familiar pattern, right? It provides users full control and WYSIWIG between source and bytecodes due to a crystal clear selector name. No magic. - Chris - Chris |
On Sat, 24 Nov 2018, Chris Muller wrote:
>>> Chris Muller uploaded a new version of Kernel to project The Inbox: >> > http://source.squeak.org/inbox/Kernel-cmm.1198.mcz >> > >> > ==================== Summary ==================== >> > >> > Name: Kernel-cmm.1198 >> > Author: cmm >> > Time: 23 November 2018, 11:12:47.414703 pm >> > UUID: fe228ca8-2ec7-4432-b3d9-76da98be4475 >> > Ancestors: Kernel-eem.1197 >> > >> > - Suggestion that #basicClass should be inlined while #class should be a message send, so that Proxy's can be supported. >> >> It won't work while the special bytecode for #class is compiled. And even >> after that, you have to recompile all senders of #class to make it use >> the primitive and the new method instead of optimizing it away. > > Right. Assuming we can achieve consensus with Eliot, and the next > Squeak will have a new VM, then that would be called from an MC post > script. I don't see what kind of VM changes are necessary here. Care to elaborate? > > But what do you mean make all senders of #class use the primitive? Currently, when you compile a method containing a send of #class, the compiler will generate a special bytecode for it (199). When the interpreter/jit sees this bytecode, it will not perform a send nor a primitive; it'll just look up the class of the receiver and place it on top of the stack. You can see this in action by removing the sole implementor of #class from your image without any effects. That method is only there for consistency, it is never executed. So, while the bytecode is in use, it doesn't matter what you do with the #class method, because it will never be sent. > Just as you suggested the use of #ensureNonProxiedReceiver from the > other thread, the intention here is that #basicClass would better > document those performance-critical places, but leaving the majority > (of non-critical ones) sending #class, so it can be overridable. See above. > > Do you think the system would be noticably slower if all the sends to > #class became a message send? I'm skeptical that it would, but I have Yes, the bytecode is way quicker than the primitive or a primitive + a send which is exactly what you suggested. Also, removing the bytecode will make #class lose its atomicity. Any code that relies on that behavior will silently break. This pretty much applies to all special selectors (See SmalltalkImage >> #specialSelectors). > no idea. I am surprised to see we have so many senders of #class in > trunk, but I have a feeling most rarely ever called. I doubt that. People don't sprinkle #class sends for no reason, do they? > > Removing those byteCodes from my CompiledMethods is above my knowledge > level, but if you could help me come up with a script, I'd be > interested in testing and playing around to learn more. VariableNode has a class variable named StdSelectors. It contains the selectors for which custom bytecodes are generated. Removing #class from there should be enough. > >> > - If so, then #xxxClass can be banished. >> > - With #xxxClass banished, the Squeak code that called it can be written normally, simply as "class". >> >> That won't work either for the same reason. And we do not want to remove >> the bytecode, do we? > > Not remove it, redirect it to #basicClass. Right, but while the bytecode is in effect, you just can't redirect it. Levente > > This is a reasonable and familiar pattern, right? It provides users > full control and WYSIWIG between source and bytecodes due to a crystal > clear selector name. No magic. > > - Chris > > - Chris |
Hi Levente,
> > But what do you mean make all senders of #class use the primitive? > > Currently, when you compile a method containing a send of #class, the > compiler will generate a special bytecode for it (199). > When the interpreter/jit sees this bytecode, it will not perform a send > nor a primitive; it'll just look up the class of the receiver and place it > on top of the stack. Great! Does that mean this can be accomplished solely in the image by making the compiler generate 199 when #basicClass is sent, and just the normal "send" bytecode for sends to #class? > > Do you think the system would be noticably slower if all the sends to > > #class became a message send? I'm skeptical that it would, but I have > > Yes, the bytecode is way quicker than the primitive or a primitive + a > send which is exactly what you suggested. It saves one send. One. That's only infinitesimally quicker: _________ { [1 xxxClass] bench. [ 1 class ] bench. } ----> #('99,000,000 per second. 10.1 nanoseconds per run.' '126,000,000 per second. 7.93 nanoseconds per run.') ________ 2 nanoseconds per send faster. Inconsequential in any real-world sense. Furthermore, as soon as the message sent to the class does *any work* whatsoever, that good-sounding 27% improvement is quickly wiped out. Look how much of the gain is lost doing as little as creating one single Rectangle from another one: ___________ "Compare creating a single Rectangle with inlined #class vs. a (proposed) message-send of #class." | someRectangle | someRectangle := 100@50 corner: 320@200. { [someRectangle xxxClass origin: someRectangle topLeft corner: someRectangle bottomRight ] bench. [someRectangle class origin: someRectangle topLeft corner: someRectangle bottomRight ] bench. } ---> #('37,200,000 per second. 26.9 nanoseconds per run.' '38,000,000 per second. 26.3 nanoseconds per run.') ____________ Real-world gain by the inlined send was reduced to... whew! I just had to go learn about "Picosecond" because nanoseconds aren't even small enough to measure the improvement. So, amplify. Crank it up to 100K: __________ "Compare creating a 100,000 Rectangles with inlined #class vs. a message-send of #class." | someRectangle | someRectangle := 100@50 corner: 320@200. { [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle topLeft corner: someRectangle bottomRight] ] bench. [ 100000 timesRepeat: [someRectangle class origin: someRectangle topLeft corner: someRectangle bottomRight] ] bench. } ---> #('364 per second. 2.75 milliseconds per run.' '369 per second. 2.71 milliseconds per run.') _________ Nothing times 100K is still nothing. > Also, removing the bytecode will make #class lose its atomicity. Any code > that relies on that behavior will silently break. If THAT exists it needs a more intention-revealing selector than #class that would let his peers know atomicity mattered there. #basicClass is his friend. > > ... I am surprised to see we have so many senders of #class in > > trunk, but I have a feeling most rarely ever called. > > I doubt that. People don't sprinkle #class sends for no reason, do they? Sorry, I should not have said "ever". I was trying to say the system probably spends most of its time sending to instance-side methods than class-side methods. > > Not remove it, redirect it to #basicClass. > > Right, but while the bytecode is in effect, you just can't redirect > it. I'm racking my brain trying to understand this -- sorry... By "redirect" I just meant change the Compiler to generate bytecode 199 for sends to #basicClass, and just the regular "send" bytecode for sends to #class. Then, recompile all methods. Would that work? > > This is a reasonable and familiar pattern, right? It provides users > > full control and WYSIWIG between source and bytecodes due to a crystal > > clear selector name. No magic. So, if performance is not really hurt, and we can keep sending #class if so insisted, and we still have #basicClass, just in case, together delineating an elegant seam between system-level vs. user-level access in a classic Smalltalky way that even *I* can understand and use, and give Squeak better Proxy support that helps Magma then would you let me have this? You have a skill of making performance-considerations to such degrees that I never even would have fathomed, and this has resulted in immense performance benefits for Squeak. I do wish you liked Magma, because I'm sure you could _obliterate_ many inefficiencies in the code and design. But if not, I hope you can at least appreciate the value proposition of this proposal is worth it. - Chris |
Hi Chris,
On Sun, 25 Nov 2018, Chris Muller wrote: > Hi Levente, > >>> But what do you mean make all senders of #class use the primitive? >> >> Currently, when you compile a method containing a send of #class, the >> compiler will generate a special bytecode for it (199). >> When the interpreter/jit sees this bytecode, it will not perform a send >> nor a primitive; it'll just look up the class of the receiver and place it >> on top of the stack. > > Great! Does that mean this can be accomplished solely in the image by > making the compiler generate 199 when #basicClass is sent, and just > the normal "send" bytecode for sends to #class? > >>> Do you think the system would be noticably slower if all the sends to >>> #class became a message send? I'm skeptical that it would, but I have >> >> Yes, the bytecode is way quicker than the primitive or a primitive + a >> send which is exactly what you suggested. > > It saves one send. One. That's only infinitesimally quicker: > _________ > { [1 xxxClass] bench. > [ 1 class ] bench. } > > ----> #('99,000,000 per second. 10.1 nanoseconds per run.' > '126,000,000 per second. 7.93 nanoseconds per run.') > ________ > > 2 nanoseconds per send faster. Inconsequential in any real-world > sense. Furthermore, as soon as the message sent to the class does > *any work* whatsoever, that good-sounding 27% improvement is quickly > wiped out. Look how much of the gain is lost doing as little as > creating one single Rectangle from another one: > > ___________ > "Compare creating a single Rectangle with inlined #class vs. a > (proposed) message-send of #class." > | someRectangle | someRectangle := 100@50 corner: 320@200. > { [someRectangle xxxClass origin: someRectangle topLeft corner: > someRectangle bottomRight ] bench. > [someRectangle class origin: someRectangle topLeft corner: > someRectangle bottomRight ] bench. } > > ---> #('37,200,000 per second. 26.9 nanoseconds per run.' > '38,000,000 per second. 26.3 nanoseconds per run.') > ____________ > > Real-world gain by the inlined send was reduced to... whew! I just > had to go learn about "Picosecond" because nanoseconds aren't even > small enough to measure the improvement. > > So, amplify. Crank it up to 100K: > __________ > "Compare creating a 100,000 Rectangles with inlined #class vs. a > message-send of #class." > | someRectangle | someRectangle := 100@50 corner: 320@200. > { [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle > topLeft corner: someRectangle bottomRight] ] bench. > [ 100000 timesRepeat: [someRectangle class origin: someRectangle > topLeft corner: someRectangle bottomRight] ] bench. } > > ---> #('364 per second. 2.75 milliseconds per run.' '369 per > second. 2.71 milliseconds per run.') > _________ > > Nothing times 100K is still nothing. That's not the right way to measure things that are so quick, because the overhead of block activation is comparable to the runtime of the code inside the block. Also, #timesRepeat: is not a good choice for measurements for the very same reason: block creation + lots of block activation. Also, the nearby bytecodes affect what the JIT does. When more things can be executed without performing a send, the overall performance gains will be higher. > >> Also, removing the bytecode will make #class lose its atomicity. Any code >> that relies on that behavior will silently break. > > If THAT exists it needs a more intention-revealing selector than > #class that would let his peers know atomicity mattered there. > #basicClass is his friend. All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you think all of those need #basicXXX methods? > >>> ... I am surprised to see we have so many senders of #class in >>> trunk, but I have a feeling most rarely ever called. >> >> I doubt that. People don't sprinkle #class sends for no reason, do they? > > Sorry, I should not have said "ever". I was trying to say the system > probably spends most of its time sending to instance-side methods than > class-side methods. It's a common pattern to have instance-independent code on the class side. Quick access to that is always a good thing. > >>> Not remove it, redirect it to #basicClass. >> >> Right, but while the bytecode is in effect, you just can't redirect >> it. > > I'm racking my brain trying to understand this -- sorry... By > "redirect" I just meant change the Compiler to generate bytecode 199 > for sends to #basicClass, and just the regular "send" bytecode for > sends to #class. Then, recompile all methods. Would that work? It might work, but you would need to identify and rewrite senders of #class which rely on the presence of the bytecode. In my image there are 2174 senders, which is simply too much review in my opinion. I did some measurements and found that the JIT makes the numbered primitive almost as quick as the bytecode. The slowdown is only about 10%. Your suggestion, which is send + bytecode is about 85% slower and loses the atomicity of the message. So, you'd better leave the implementation of #class as it is right now, because that would be quicker and would preserve the atomicity as long as nothing overrides it. > >>> This is a reasonable and familiar pattern, right? It provides users >>> full control and WYSIWIG between source and bytecodes due to a crystal >>> clear selector name. No magic. > > So, if > performance is not really hurt, and > we can keep sending #class if so insisted, and > we still have #basicClass, just in case, together > delineating an elegant seam between system-level vs. user-level access > in a classic Smalltalky way that even *I* can understand and use, > and give Squeak better Proxy support that helps Magma > then > would you let me have this? As I wrote it a few emails earlier, I'd rather have a "switch" for this than forcing it on everyone who don't use proxies at all (I presume that's the current majority of Squeak users). Levente > > You have a skill of making performance-considerations to such degrees > that I never even would have fathomed, and this has resulted in > immense performance benefits for Squeak. I do wish you liked Magma, > because I'm sure you could _obliterate_ many inefficiencies in the > code and design. But if not, I hope you can at least appreciate the > value proposition of this proposal is worth it. > > - Chris > |
Hi Levente,
Just a reminder, the original question I asked was: > >>> Do you think the system would be noticably slower if all the sends to > >>> #class became a message send? ... and your response: > >> Yes, the bytecode is way quicker than the primitive or a primitive + a > >> send which is exactly what you suggested. So even though you answered a different question, I was still curious by your claim, and remembered that you're one has liked to communicate with benchmarks. That's why I ran and presented them to you, but I'm not sure if we're interpreting the results relative to my question or some other question... > > It saves one send. One. That's only infinitesimally quicker: > > _________ > > { [1 xxxClass] bench. > > [ 1 class ] bench. } > > > > ----> #('99,000,000 per second. 10.1 nanoseconds per run.' > > '126,000,000 per second. 7.93 nanoseconds per run.') > > ________ > > > > 2 nanoseconds per send faster. Inconsequential in any real-world > > sense. Furthermore, as soon as the message sent to the class does > > *any work* whatsoever, that good-sounding 27% improvement is quickly > > wiped out. Look how much of the gain is lost doing as little as > > creating one single Rectangle from another one: > > > > ___________ > > "Compare creating a single Rectangle with inlined #class vs. a > > (proposed) message-send of #class." > > | someRectangle | someRectangle := 100@50 corner: 320@200. > > { [someRectangle xxxClass origin: someRectangle topLeft corner: > > someRectangle bottomRight ] bench. > > [someRectangle class origin: someRectangle topLeft corner: > > someRectangle bottomRight ] bench. } > > > > ---> #('37,200,000 per second. 26.9 nanoseconds per run.' > > '38,000,000 per second. 26.3 nanoseconds per run.') > > ____________ > > > > Real-world gain by the inlined send was reduced to... whew! I just > > had to go learn about "Picosecond" because nanoseconds aren't even > > small enough to measure the improvement. > > > > So, amplify. Crank it up to 100K: > > __________ > > "Compare creating a 100,000 Rectangles with inlined #class vs. a > > message-send of #class." > > | someRectangle | someRectangle := 100@50 corner: 320@200. > > { [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle > > topLeft corner: someRectangle bottomRight] ] bench. > > [ 100000 timesRepeat: [someRectangle class origin: someRectangle > > topLeft corner: someRectangle bottomRight] ] bench. } > > > > ---> #('364 per second. 2.75 milliseconds per run.' '369 per > > second. 2.71 milliseconds per run.') > > _________ > > > > Nothing times 100K is still nothing. > > > That's not the right way to measure things that are so quick, because the > overhead of block activation is comparable to the runtime of the code > inside the block. Also, #timesRepeat: is not a good choice for > measurements for the very same reason: block creation + lots of block > activation. > Also, the nearby bytecodes affect what the JIT does. When more things can > be executed without performing a send, the overall performance gains > will be higher. There are three benchmarks, did you notice the first two? - The first one measures the single-unit cost of #xxxClass over #class. This captures your theoretical maximum benefit of 27%, which is terrible, because it can't come close to that in real code. - The second demonstrates how 90% of that 27% benefit is wiped out with no more than a single simple allocation -- what the vast majority of class methods are responsible for. - The third one measures "real world impact", and shows that this particular in-line doesn't help the system in any way that helps any human anywhere. > >> Also, removing the bytecode will make #class lose its atomicity. Any code > >> that relies on that behavior will silently break. > > > > If THAT exists it needs a more intention-revealing selector than > > #class that would let his peers know atomicity mattered there. > > #basicClass is his friend. > > All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you > think all of those need #basicXXX methods? No just #class. An identity-check should be an identity-check, even against a Proxy. And does that example help illustrate how using #== when you DON'T need an identity-check is a breakage of encapsulation? It makes false assumptions and enforces type-conformance in a system that wants to be empowered by messaging. > >>> ... I am surprised to see we have so many senders of #class in > >>> trunk, but I have a feeling most rarely ever called. > >> > >> I doubt that. People don't sprinkle #class sends for no reason, do they? > > > > Sorry, I should not have said "ever". I was trying to say the system > > probably spends most of its time sending to instance-side methods than > > class-side methods. > > It's a common pattern to have instance-independent code on the class side. > Quick access to that is always a good thing. It's still quick! Levente, I challenge you to back up your claim by identifying any one single method in the image which reports even only a meaningfully better *bench* performance (much less real-world) by calling it via #class instead of #xxxClass. Anything whose performance matters at a level of one send is going to use #basicClass anyway, just like we may have a few that we send #basicNew instead of #new to. > >>> Not remove it, redirect it to #basicClass. > >> > >> Right, but while the bytecode is in effect, you just can't redirect > >> it. > > > > I'm racking my brain trying to understand this -- sorry... By > > "redirect" I just meant change the Compiler to generate bytecode 199 > > for sends to #basicClass, and just the regular "send" bytecode for > > sends to #class. Then, recompile all methods. Would that work? > > It might work, but you would need to identify and rewrite senders of > #class which rely on the presence of the bytecode. In my image there are > 2174 senders, which is simply too much review in my opinion. I repeat my challenge above! > I did some measurements and found that the JIT makes the numbered > primitive almost as quick as the bytecode. The slowdown is only about 10%. > Your suggestion, which is send + bytecode is about 85% slower and loses > the atomicity of the message. So, you'd better leave the implementation of > #class as it is right now, because that would be quicker and would > preserve the atomicity as long as nothing overrides it. Huh? No, you're only 27% faster in the *benchmark*, but near zero in anything real-world. My challenge above, stands. I would love to be wrong, so I could shed my suspicion of whether this is about something else not mentioned... :( > >>> This is a reasonable and familiar pattern, right? It provides users > >>> full control and WYSIWIG between source and bytecodes due to a crystal > >>> clear selector name. No magic. > > > > So, if > > performance is not really hurt, and > > we can keep sending #class if so insisted, and > > we still have #basicClass, just in case, together > > delineating an elegant seam between system-level vs. user-level access > > in a classic Smalltalky way that even *I* can understand and use, > > and give Squeak better Proxy support that helps Magma > > then > > would you let me have this? > > As I wrote it a few emails earlier, I'd rather have a "switch" for this > than forcing it on everyone who don't use proxies at all (I presume that's > the current majority of Squeak users). Whoa, hold on there. You only ever made one argument -- "performance" -- which was obliterated by the benchmarks. Squeezing 27% more out of a microbench of something called 0.0001% of the time results no benefit to anyone anywhere. I see MY position is the pro user position, and yours as the... pro fastest-lab-result position, but hurts this Squeak user. I'm sad that that alone isn't enough to support this. :( _______ Do you remember when Behavior>>#new didn't always make a call to #initialize? But at a time when Squeak was 10X slower than it is now, the people then had the wisdom to understand that the computer and software exists to eventually serve _users_, and that spiting users to save one single send, even when it was a much greater percentage of impact back then, was still way worth it. |
In reply to this post by Levente Uzonyi
> That's not the right way to measure things that are so quick, because the
> overhead of block activation is comparable to the runtime of the code > inside the block. I get you, but that its so hard to even write such a test indicates that real-world code also needs to do a lot of block-activations, and so this quickly dilutes the density of calls to #class. The only way I could think was to just cut-and-paste the block innards X 100 times and measure the degradation from the baseline (single): { [1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. 1 xxxClass. ] bench. [ 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. 1 class. ] bench. } #('2,780,000 per second. 360 nanoseconds per run.' '5,590,000 per second. 179 nanoseconds per run.') So X100 more density of calls to #xxxClass degraded the performance from 27% slower to 50% slower. So the real question is how dense are the calls to #class, and are they mostly from only a few senders which could retain the optimization by #basicClass? It would be an interesting experiment. Pointless, though, if there's no chance of swaying you. |
In reply to this post by Chris Muller-4
Hi Chris,
This conversation is getting off the track, so let's take a step back and try something different. I had suggested you a solution: the "switch", but you never mentioned how it worked for you. Perhaps my explanation wasn't clear. Let me just give you a snippet which does exactly what I suggested. Please try it in your image (one without Kernel-cmm.1198 loaded) and let me know if it solved your problem or not: (ParseNode classPool at: #StdSelectors) removeKey: #class. Compiler recompileAll. Levente P.S.: Here's the benchmark I used to get my numbers: runs := (1 to: 5) collect: [ :e | { [ 1 to: 50000000 do: [ :i | i class class class class class class class class class class ] ] timeToRun. [ 1 to: 50000000 do: [ :i | i classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive classPrimitive ] ] timeToRun. [ 1 to: 50000000 do: [ :i | i classSend classSend classSend classSend classSend classSend classSend classSend classSend classSend ] ] timeToRun. [ 1 to: 50000000 do: [ :i | i ] ] timeToRun } ]. cleanRuns := runs collect: [ :e | (e - e last) allButLast ]. primitiveVsByteCode := (cleanRuns collect: [ :e | e second / e first ]) average printShowingMaxDecimalPlaces: 2. sendVsByteCode := (cleanRuns collect: [ :e | e third / e first ]) average printShowingMaxDecimalPlaces: 2. Where Object >> #classPrimitive is classPrimitive "Primitive. Answer the object which is the receiver's class. Essential. See Object documentation whatIsAPrimitive." <primitive: 111> self primitiveFailed And Object >> #classSend is classSend "Primitive. Answer the object which is the receiver's class. Essential. See Object documentation whatIsAPrimitive." ^self class On Sun, 25 Nov 2018, Chris Muller wrote: > Hi Levente, > > Just a reminder, the original question I asked was: > >>>>> Do you think the system would be noticably slower if all the sends to >>>>> #class became a message send? ... > > and your response: > >>>> Yes, the bytecode is way quicker than the primitive or a primitive + a >>>> send which is exactly what you suggested. > > So even though you answered a different question, I was still curious > by your claim, and remembered that you're one has liked to communicate > with benchmarks. That's why I ran and presented them to you, but I'm > not sure if we're interpreting the results relative to my question or > some other question... > >>> It saves one send. One. That's only infinitesimally quicker: >>> _________ >>> { [1 xxxClass] bench. >>> [ 1 class ] bench. } >>> >>> ----> #('99,000,000 per second. 10.1 nanoseconds per run.' >>> '126,000,000 per second. 7.93 nanoseconds per run.') >>> ________ >>> >>> 2 nanoseconds per send faster. Inconsequential in any real-world >>> sense. Furthermore, as soon as the message sent to the class does >>> *any work* whatsoever, that good-sounding 27% improvement is quickly >>> wiped out. Look how much of the gain is lost doing as little as >>> creating one single Rectangle from another one: >>> >>> ___________ >>> "Compare creating a single Rectangle with inlined #class vs. a >>> (proposed) message-send of #class." >>> | someRectangle | someRectangle := 100@50 corner: 320@200. >>> { [someRectangle xxxClass origin: someRectangle topLeft corner: >>> someRectangle bottomRight ] bench. >>> [someRectangle class origin: someRectangle topLeft corner: >>> someRectangle bottomRight ] bench. } >>> >>> ---> #('37,200,000 per second. 26.9 nanoseconds per run.' >>> '38,000,000 per second. 26.3 nanoseconds per run.') >>> ____________ >>> >>> Real-world gain by the inlined send was reduced to... whew! I just >>> had to go learn about "Picosecond" because nanoseconds aren't even >>> small enough to measure the improvement. >>> >>> So, amplify. Crank it up to 100K: >>> __________ >>> "Compare creating a 100,000 Rectangles with inlined #class vs. a >>> message-send of #class." >>> | someRectangle | someRectangle := 100@50 corner: 320@200. >>> { [ 100000 timesRepeat: [someRectangle xxxClass origin: someRectangle >>> topLeft corner: someRectangle bottomRight] ] bench. >>> [ 100000 timesRepeat: [someRectangle class origin: someRectangle >>> topLeft corner: someRectangle bottomRight] ] bench. } >>> >>> ---> #('364 per second. 2.75 milliseconds per run.' '369 per >>> second. 2.71 milliseconds per run.') >>> _________ >>> >>> Nothing times 100K is still nothing. >> >> >> That's not the right way to measure things that are so quick, because the >> overhead of block activation is comparable to the runtime of the code >> inside the block. Also, #timesRepeat: is not a good choice for >> measurements for the very same reason: block creation + lots of block >> activation. >> Also, the nearby bytecodes affect what the JIT does. When more things can >> be executed without performing a send, the overall performance gains >> will be higher. > > There are three benchmarks, did you notice the first two? > > - The first one measures the single-unit cost of #xxxClass over > #class. This captures your theoretical maximum benefit of 27%, which > is terrible, because it can't come close to that in real code. > > - The second demonstrates how 90% of that 27% benefit is wiped out > with no more than a single simple allocation -- what the vast majority > of class methods are responsible for. > > - The third one measures "real world impact", and shows that this > particular in-line doesn't help the system in any way that helps any > human anywhere. > >>>> Also, removing the bytecode will make #class lose its atomicity. Any code >>>> that relies on that behavior will silently break. >>> >>> If THAT exists it needs a more intention-revealing selector than >>> #class that would let his peers know atomicity mattered there. >>> #basicClass is his friend. >> >> All special selectors do the same e.g. #==, #ifNil:, #ifTrue:. Do you >> think all of those need #basicXXX methods? > > No just #class. An identity-check should be an identity-check, even > against a Proxy. And does that example help illustrate how using #== > when you DON'T need an identity-check is a breakage of encapsulation? > It makes false assumptions and enforces type-conformance in a system > that wants to be empowered by messaging. > >>>>> ... I am surprised to see we have so many senders of #class in >>>>> trunk, but I have a feeling most rarely ever called. >>>> >>>> I doubt that. People don't sprinkle #class sends for no reason, do they? >>> >>> Sorry, I should not have said "ever". I was trying to say the system >>> probably spends most of its time sending to instance-side methods than >>> class-side methods. >> >> It's a common pattern to have instance-independent code on the class side. >> Quick access to that is always a good thing. > > It's still quick! Levente, I challenge you to back up your claim by > identifying any one single method in the image which reports even only > a meaningfully better *bench* performance (much less real-world) by > calling it via #class instead of #xxxClass. > > Anything whose performance matters at a level of one send is going to > use #basicClass anyway, just like we may have a few that we send > #basicNew instead of #new to. > >>>>> Not remove it, redirect it to #basicClass. >>>> >>>> Right, but while the bytecode is in effect, you just can't redirect >>>> it. >>> >>> I'm racking my brain trying to understand this -- sorry... By >>> "redirect" I just meant change the Compiler to generate bytecode 199 >>> for sends to #basicClass, and just the regular "send" bytecode for >>> sends to #class. Then, recompile all methods. Would that work? >> >> It might work, but you would need to identify and rewrite senders of >> #class which rely on the presence of the bytecode. In my image there are >> 2174 senders, which is simply too much review in my opinion. > > I repeat my challenge above! > >> I did some measurements and found that the JIT makes the numbered >> primitive almost as quick as the bytecode. The slowdown is only about 10%. >> Your suggestion, which is send + bytecode is about 85% slower and loses >> the atomicity of the message. So, you'd better leave the implementation of >> #class as it is right now, because that would be quicker and would >> preserve the atomicity as long as nothing overrides it. > > Huh? No, you're only 27% faster in the *benchmark*, but near zero in > anything real-world. > > My challenge above, stands. I would love to be wrong, so I could shed > my suspicion of whether this is about something else not mentioned... > :( > >>>>> This is a reasonable and familiar pattern, right? It provides users >>>>> full control and WYSIWIG between source and bytecodes due to a crystal >>>>> clear selector name. No magic. >>> >>> So, if >>> performance is not really hurt, and >>> we can keep sending #class if so insisted, and >>> we still have #basicClass, just in case, together >>> delineating an elegant seam between system-level vs. user-level access >>> in a classic Smalltalky way that even *I* can understand and use, >>> and give Squeak better Proxy support that helps Magma >>> then >>> would you let me have this? >> >> As I wrote it a few emails earlier, I'd rather have a "switch" for this >> than forcing it on everyone who don't use proxies at all (I presume that's >> the current majority of Squeak users). > > Whoa, hold on there. You only ever made one argument -- "performance" > -- which was obliterated by the benchmarks. Squeezing 27% more out of > a microbench of something called 0.0001% of the time results no > benefit to anyone anywhere. > > I see MY position is the pro user position, and yours as the... pro > fastest-lab-result position, but hurts this Squeak user. I'm sad that > that alone isn't enough to support this. :( > _______ > Do you remember when Behavior>>#new didn't always make a call to > #initialize? But at a time when Squeak was 10X slower than it is now, > the people then had the wisdom to understand that the computer and > software exists to eventually serve _users_, and that spiting users to > save one single send, even when it was a much greater percentage of > impact back then, was still way worth it. > |
Free forum by Nabble | Edit this page |