Hi,
Nile is a complete reimplementation of the squeak stream hierarchy. It's based on traits. You can find it on Universe. New features include: - Constructors on collections like #nsReadStream or #streamContents:. - Better performances than Squeak for #next #next: #nextPut: and #nextPutAll: with Strings (you can run the benchmarks yourself using NSBenchmarks>>#launchBenchmarks). Thanks to Klaus D. Witzel, Mathieu Suen, Roel Wuyts and Andrew P. Black who helped me a lot. - Adds a lot of libraries based on Nile core. there are generic buffers, byte reading/writing methods, a thread-safe transcript (Stéphane Ducasse wrote it), file-based streams... - No more dependencies: Nile is self contained (previous dependencies was only for tests and not needed). - NSMetrics computes some metrics on Nile and the Squeak stream hierarchy to compare. - 322 tests -- Damien Cassou |
Thank you Damien :)
I just unloaded the older Nile version with MC; then did a load on Nile-All from SqueakSource. During load it references STranscript which is left undeclared? /Klaus P.S. use the squeak-dev 120 image for the above. On Sat, 09 Jun 2007 12:28:37 +0200, Damien Cassou wrote: > Hi, > > Nile is a complete reimplementation of the squeak stream hierarchy. > It's based on traits. > > You can find it on Universe. > > New features include: > > - Constructors on collections like #nsReadStream or #streamContents:. > - Better performances than Squeak for #next #next: #nextPut: and > #nextPutAll: with Strings (you can run the benchmarks yourself using > NSBenchmarks>>#launchBenchmarks). Thanks to Klaus D. Witzel, Mathieu > Suen, Roel Wuyts and Andrew P. Black who helped me a lot. > - Adds a lot of libraries based on Nile core. there are generic > buffers, byte reading/writing methods, a thread-safe transcript > (Stéphane Ducasse wrote it), file-based streams... > - No more dependencies: Nile is self contained (previous dependencies > was only for tests and not needed). > - NSMetrics computes some metrics on Nile and the Squeak stream > hierarchy to compare. > - 322 tests > |
In reply to this post by Damien Cassou-3
On 9 juin 07, at 12:28, Damien Cassou wrote: > Hi, > > Nile is a complete reimplementation of the squeak stream hierarchy. > It's based on traits. > > You can find it on Universe. > > New features include: > > - Constructors on collections like #nsReadStream or #streamContents:. > - Better performances than Squeak for #next #next: #nextPut: and > #nextPutAll: with Strings (you can run the benchmarks yourself using > NSBenchmarks>>#launchBenchmarks). Thanks to Klaus D. Witzel, Mathieu > Suen, Roel Wuyts and Andrew P. Black who helped me a lot. Can you tell us more? How much? > - Adds a lot of libraries based on Nile core. there are generic > buffers, byte reading/writing methods, a thread-safe transcript > (Stéphane Ducasse wrote it), file-based streams... did you change the history class to wrap or not :) > - No more dependencies: Nile is self contained (previous dependencies > was only for tests and not needed). > - NSMetrics computes some metrics on Nile and the Squeak stream > hierarchy to compare. > - 322 tests Good. |
In reply to this post by Klaus D. Witzel
2007/6/9, Klaus D. Witzel <[hidden email]>:
> During load it references STranscript which is left undeclared? This is a global variable (the only one) which mimics the Transcript variable. It's normal if it's not declared when you load Nile. It will be declared when necessary. Thank you for the report -- Damien Cassou |
In reply to this post by stephane ducasse
2007/6/9, stephane ducasse <[hidden email]>:
> > - Better performances than Squeak for #next #next: #nextPut: and > > #nextPutAll: with Strings (you can run the benchmarks yourself using > > NSBenchmarks>>#launchBenchmarks). Thanks to Klaus D. Witzel, Mathieu > > Suen, Roel Wuyts and Andrew P. Black who helped me a lot. > > Can you tell us more? How much? You could have run them yourself. See NSBenchmarks>>launchBenchmarks. Here are my results: next Squeak: 41.5 operations/sec Nile: 45 8% faster next: Squeak: 98.2 Nile: 158.2 38% faster nextPut: Squeak: 24.9 Nile: 42.4 41% faster nextPutAll: Squeak: 115.9 Nile: 120.2 4% faster > > - Adds a lot of libraries based on Nile core. there are generic > > buffers, byte reading/writing methods, a thread-safe transcript > > (Stéphane Ducasse wrote it), file-based streams... > > did you change the history class to wrap or not :) ooops :-) -- Damien Cassou |
Hi Damien -
Interesting benchmarks. I think there is something to be learned from running micro benchmarks so here are a few comments: Damien Cassou wrote: > next > Squeak: 41.5 operations/sec > Nile: 45 > 8% faster On my machine: Squeak result: 39.1 Nile result: 32.2 Comparison: -18% The real "lesson" of this benchmark is to only trust your micro benchmark as far as you can throw them ;-) Given that both implementation use primitiveNext and given that the code is designed to guarantee a hit in the VM's at-cache there shouldn't be any difference whatsoever. The difference we're seeing here (which made me investigate the matter more closely) should also be an indicator that there may be something wrong with the benchmarking process. Running it repeatedly gives: Squeak: 38.9 35.6 30.5 39.2 31.8 Nile: 36.2 33.9 37.9 40.8 31.3 Delta: -7% -5% 19% 4% -1% meaning there is a 20% difference within five runs which seems extremely high for a microbenchmark that's just a primitive call. I think you'll have to fix the benchmarks to give more consistent results if you want to make any claims about relative speed improvements. > next: > Squeak: 98.2 > Nile: 158.2 > 38% faster On my machine: Squeak result: 90.3 Nile result: 130.5 Comparison: 31% The lesson here is (at least for me) that after trying to wrap my brain around the code in NSStringReader>>next: I'm willing to give up the speed. A nice example for what can be done if you understand byte code execution but by no means production code (pity the bugger who at some point will need to understand that the "0-position" is required since position cannot occur on the right-hand side of that expression ;-) Oh, and of course "no comments == not helpful" in particular when it comes to that level of optimization. And while having tests is great, having 73 out of 87 classes without a single line of explanation (class comment) is pretty pathetic. > nextPut: > Squeak: 24.9 > Nile: 42.4 > 41% faster On my machine: Squeak result: 44.8 Nile result: 71.4 Comparison: 37% Oddly, this benchmark scores the same with or without the primitive in WriteStream>>nextPut: ... which is pretty strange if you ask me. I have a suspicion that the primitiveNextPut hasn't been used in a long time and may need to be rewhacked to perform properly. > nextPutAll: > Squeak: 115.9 > Nile: 120.2 > 4% faster On my machine: Squeak result: 117.8 Nile result: 114.8 Comparison: -3% Not really much of a lesson here other than if both implementations take "reasonable care" they'll likely end up with similar speed. So all in all, interesting benchmarks but you need to fix the variation issue. With the variations we're seeing, all but the nextPut: benchmark (which I suspect suffers from a broken nextPut: primitive) could fall either way so there isn't really much of claim to be made here. Cheers, - Andreas |
Hi Andreas,
thank you very much for your comments. Please help me a bit more answere these questions: 2007/6/9, Andreas Raab <[hidden email]>: > The real "lesson" of this benchmark is to only trust your micro > benchmark as far as you can throw them ;-) Given that both > implementation use primitiveNext and given that the code is designed to > guarantee a hit in the VM's at-cache there shouldn't be any difference > whatsoever. > > The difference we're seeing here (which made me investigate the matter > more closely) should also be an indicator that there may be something > wrong with the benchmarking process. Running it repeatedly gives: > > Squeak: 38.9 35.6 30.5 39.2 31.8 > Nile: 36.2 33.9 37.9 40.8 31.3 > Delta: -7% -5% 19% 4% -1% > > meaning there is a 20% difference within five runs which seems extremely > high for a microbenchmark that's just a primitive call. I think you'll > have to fix the benchmarks to give more consistent results if you want > to make any claims about relative speed improvements. I noticed these changes too but I didn't understand why they happened. Can you help me investigate please? > > next: > > Squeak: 98.2 > > Nile: 158.2 > > 38% faster > > On my machine: > > Squeak result: 90.3 > Nile result: 130.5 > Comparison: 31% > > The lesson here is (at least for me) that after trying to wrap my brain > around the code in NSStringReader>>next: I'm willing to give up the > speed. A nice example for what can be done if you understand byte code > execution but by no means production code (pity the bugger who at some > point will need to understand that the "0-position" is required since > position cannot occur on the right-hand side of that expression ;-) I do not understand this paragraph. > Oh, and of course "no comments == not helpful" in particular when it > comes to that level of optimization. And while having tests is great, > having 73 out of 87 classes without a single line of explanation (class > comment) is pretty pathetic. I'm really in favor of a lot of comments. If you look at the main traits, they should be heavily commented. Unfortunately, I had a strong deadline and all the comments I wrote are in an article and not in the code. This will be corrected soon. > > nextPut: > > Squeak: 24.9 > > Nile: 42.4 > > 41% faster > > On my machine: > > Squeak result: 44.8 > Nile result: 71.4 > Comparison: 37% > > Oddly, this benchmark scores the same with or without the primitive in > WriteStream>>nextPut: ... which is pretty strange if you ask me. I have > a suspicion that the primitiveNextPut hasn't been used in a long time > and may need to be rewhacked to perform properly. I noticed that too and Andrew P. Black wrote me a mail about that: " [...]A lot of time was going into WriteStream>>nextPut: , which is not unexpected. But most of that time was being spent in isOctetCharacter, which WAS unexpected.[...] It seems to me that this means that the primitive is failing. I tired inserting PutCount := PutCount + 1. immediately after the primitive pragma, and then tried printIt on: s := String streamContents: [ :str | PutCount := 0. 10000 timesRepeat: [ str nextPut: $q ]] . PutCount ==> 10000 So, the primitive is always failing? Why? String new:100 now creates a byteString. Is it the case that the primitive is still checking for an instance of * String * in the Stream's collection? " So I think there is a real problem here. Correcting it will probably greatly enhance Squeak speed. > > nextPutAll: > > Squeak: 115.9 > > Nile: 120.2 > > 4% faster > > On my machine: > > Squeak result: 117.8 > Nile result: 114.8 > Comparison: -3% > > Not really much of a lesson here other than if both implementations take > "reasonable care" they'll likely end up with similar speed. > > So all in all, interesting benchmarks but you need to fix the variation > issue. With the variations we're seeing, all but the nextPut: benchmark > (which I suspect suffers from a broken nextPut: primitive) could fall > either way so there isn't really much of claim to be made here. In fact, these benchmarks were done to verify that Nile was at least as fast as Squeak. I didn't want people to complain because Nile was slower and this was due to traits... Now, I can say that Nile is as fast as Squeak even with the better design. I would really appreciate help to improve Nile. Don't forget you can always commit directly if you want to; http://www.squeaksource.com/Nile/ is writeable by anybody. Bye -- Damien Cassou |
> I noticed these changes too but I didn't understand why they happened.
> Can you help me investigate please? I guess the main problem is that taking wall-clock time for the measurement. This gets incorrect on a multi-tasking system if there is other activity in the system. The tests are designed to run for 1s (IIUC); if some fraction of that second was spent in other activity, Squeak would achieve less than it can in a different second. To some degree, the same holds for activities within Squeak as well; those you could close out by running the test at a high priority. The other problem might be garbage collection. Even though you run GC before starting the test, you won't really know whether GC also happened during the test. The best way to measure in this kind of scenario is to measure instructions. If the VM would support counting byte code instructions executed, ideally on a per-process basis, you would get a more precise measurement. Failing that, if the processor supports counting cycles (as the x86 processors do with the TSC), and if the system supports accounting of such cycles on a per-thread basis (as Microsoft Vista does), you could use such accounting as a better basis than time passed. Regards, Martin |
> The best way to measure in this kind of scenario is to measure
> instructions. If the VM would support counting byte code > instructions executed, ideally on a per-process basis, you would > get a more precise measurement. ContextPart class>>tallyInstructions: aBlock "This method uses the simulator to count the number of occurrences of each of the Smalltalk instructions executed during evaluation of aBlock. Results appear in order of the byteCode set." -- Lukas Renggli http://www.lukas-renggli.ch |
>> The best way to measure in this kind of scenario is to measure
>> instructions. If the VM would support counting byte code >> instructions executed, ideally on a per-process basis, you would >> get a more precise measurement. > > ContextPart class>>tallyInstructions: aBlock > "This method uses the simulator to count the number of occurrences of > each of the Smalltalk instructions executed during evaluation of > aBlock. > Results appear in order of the byteCode set." Very interesting. It shouldn't be necessary each block more than once under this method; ideally, two subsequent runs should give the same numbers. Interpreting the numbers might be a challenge if one competitor has more instructions of one kind, but fewer instructions of another kind, than the other competitor. Regards, Martin |
In reply to this post by Damien Cassou-3
Hi -
Damien Cassou wrote: >> meaning there is a 20% difference within five runs which seems extremely >> high for a microbenchmark that's just a primitive call. I think you'll >> have to fix the benchmarks to give more consistent results if you want >> to make any claims about relative speed improvements. > > I noticed these changes too but I didn't understand why they happened. > Can you help me investigate please? I'm not entirely sure what's causing this but a couple of things to try are: Make it run for a longer period of time. Just runinng for ten seconds instead of one should reduce the variation quite a bit. Another thing to try is increase the priority of the benchmark process to run at "Processor timingPriority - 1". If neither of those help have a look at the GC statistics (do Utilities vmStatisticsReportString before and after and look at the delta); if they are wildly different it's a sign that something is wrong. >> The lesson here is (at least for me) that after trying to wrap my brain >> around the code in NSStringReader>>next: I'm willing to give up the >> speed. A nice example for what can be done if you understand byte code >> execution but by no means production code (pity the bugger who at some >> point will need to understand that the "0-position" is required since >> position cannot occur on the right-hand side of that expression ;-) > > I do not understand this paragraph. I just mean that I prefer code like this: prior := position. position := capacity min: position + amount. size := position - prior. over code like this: size := 0 - position + (position := capacity min: position + amount) (and yes, it's equivalent ;-) The latter is a real brain-teaser if you don't have the former to compare it to, in particular that "0 - position + X" is equivalent to "X - position(before X executed)". And although the latter will be faster, I strongly prefer the former. >> > nextPut: > So, the primitive is always failing? Why? String new:100 now > creates a byteString. Is it the case that the primitive is still > checking for an instance of * String * in the Stream's collection? I don't know (but it's a great little exercise for someone who wants to learn about the Squeak VM and a bit about optimization). > In fact, these benchmarks were done to verify that Nile was at least > as fast as Squeak. I didn't want people to complain because Nile was > slower and this was due to traits... Now, I can say that Nile is as > fast as Squeak even with the better design. Yes, I think given the benchmarks it's completely fair to say that Nile is definitely on par with what we currently have. To be honest, everything else would have come as a big surprise to me which is why I looked at these benchmarks more carefully - the numbers you posted seemed to indicate that there is a *major* speed advantage in the microbenchmarks (>40%) and I was simply surprised to see such large numbers. > I would really appreciate help to improve Nile. Don't forget you can > always commit directly if you want to; > http://www.squeaksource.com/Nile/ is writeable by anybody. Thanks, but I was really only interested in the benchmarks. I suspected that there is something to learn with these numbers and indeed there is (like primNextPut: failing). Cheers, - Andreas |
Hi Andreas,
on Sun, 10 Jun 2007 22:18:53 +0200, you wrote: > Hi - > > Damien Cassou wrote: ... >>> The lesson here is (at least for me) that after trying to wrap my brain >>> around the code in NSStringReader>>next: I'm willing to give up the >>> speed. A nice example for what can be done if you understand byte code >>> execution but by no means production code (pity the bugger who at some >>> point will need to understand that the "0-position" is required since >>> position cannot occur on the right-hand side of that expression ;-) >> >> I do not understand this paragraph. > > I just mean that I prefer code like this: > > prior := position. > position := capacity min: position + amount. > size := position - prior. > > over code like this: > > size := 0 - position + (position := capacity min: position + amount) > > (and yes, it's equivalent ;-) The latter is a real brain-teaser if you > don't have the former to compare it to, in particular that "0 - position > + X" is equivalent to "X - position(before X executed)". > > And although the latter will be faster, I strongly prefer the former. FWIW this was all discussed (and summarized on the NewCompiler list) during Nile development and Stef suggested that is a job for a future optimizing compiler so code can be entered and maintained in the "needs-absulutely-no-further-comment-to-be-understandable" form that you gave above [translation of the quoted text by me]. OTOH I would have a problem when an employee (professional developer) would came to me and would claim that the [partly] optimized form above is not understandable or not maintainable. There are just very, very basic math equations involved and doing subexpression elimination back and forth with any piece of code is the minimum I do expect from a professional developer (no offense, Squeak => doIt *and* have fun :) expecially with a language so simple as Smalltalk :) /Klaus |
Klaus D. Witzel wrote:
> OTOH I would have a problem when an employee (professional developer) > would came to me and would claim that the [partly] optimized form above > is not understandable or not maintainable. There are just very, very > basic math equations involved and doing subexpression elimination back > and forth with any piece of code is the minimum I do expect from a > professional developer (no offense, Squeak => doIt *and* have fun :) > expecially with a language so simple as Smalltalk :) It isn't as much the code that I would reject, but rather the fact that there isn't a comment explaining the what and why. In other words, there is nothing wrong with the code per se, what's wrong is that there isn't a comment along the lines of: "For speed, hand-optimize the following expression: ... (and yes, the compiler should take care of it but it doesn't and since the XYZ-benchmark has shown this to be a cricital piece of our application we really need to do something about it)" So, as your (theoretical) employer I would indeed come to you and claim that the optimized form is not understandable and not maintainable; not because of the code but because of the lack of any comment whatsoever. Cheers, - Andreas |
On Mon, 11 Jun 2007 00:53:44 +0200, Andreas Raab wrote:
> Klaus D. Witzel wrote: >> OTOH I would have a problem when an employee (professional developer) >> would came to me and would claim that the [partly] optimized form above >> is not understandable or not maintainable. There are just very, very >> basic math equations involved and doing subexpression elimination back >> and forth with any piece of code is the minimum I do expect from a >> professional developer (no offense, Squeak => doIt *and* have fun :) >> expecially with a language so simple as Smalltalk :) > > It isn't as much the code that I would reject, but rather the fact that > there isn't a comment explaining the what and why. In other words, there > is nothing wrong with the code per se, what's wrong is that there isn't > a comment along the lines of: > > "For speed, hand-optimize the following expression: > ... > (and yes, the compiler should take care of it but it doesn't > and since the XYZ-benchmark has shown this to be a cricital > piece of our application we really need to do something about it)" > > So, as your (theoretical) employer I would indeed come to you and claim > that the optimized form is not understandable and not maintainable; not > because of the code but because of the lack of any comment whatsoever. If you'd ever do that here then I'd immediately promote you to the Chief Quality Officer For Squeak Projects :) /Klaus > Cheers, > - Andreas > > |
Klaus D. Witzel wrote:
>> So, as your (theoretical) employer I would indeed come to you and >> claim that the optimized form is not understandable and not >> maintainable; not because of the code but because of the lack of any >> comment whatsoever. > > If you'd ever do that here then I'd immediately promote you to the Chief > Quality Officer For Squeak Projects :) Thank you, I accept. Cheers, - Andreas |
In reply to this post by Damien Cassou-3
Hi Damien,
I noticed a little mistake on the SqueakMap page of Nile at: http://map.squeak.org/package/c84b40f9-1e03-4a4c-8005-ec8e33e326e1 In the description paragraph, you mention a bad link to the completehierarchy.pdf. It should be (underscore missing): http://damien.cassou.free.fr/documents/internship_2007/nile/completehierarchy.pdf On squeaksource, the links are corrects. I inform you about this because I noticed the request 'squeak' + 'a project name' on google.com give you first the link to map.squeak.org then some links to mailing-lists reports (nabble.com, squeakfoundation.org, ASPN...) and rarely squeaksource.com. Cheers, -- Martial Damien Cassou a écrit : | Hi, | | Nile is a complete reimplementation of the squeak stream hierarchy. | It's based on traits. | | You can find it on Universe. | | New features include: | | - Constructors on collections like #nsReadStream or #streamContents:. | - Better performances than Squeak for #next #next: #nextPut: and | #nextPutAll: with Strings (you can run the benchmarks yourself using | NSBenchmarks>>#launchBenchmarks). Thanks to Klaus D. Witzel, Mathieu | Suen, Roel Wuyts and Andrew P. Black who helped me a lot. | - Adds a lot of libraries based on Nile core. there are generic | buffers, byte reading/writing methods, a thread-safe transcript | (Stéphane Ducasse wrote it), file-based streams... | - No more dependencies: Nile is self contained (previous dependencies | was only for tests and not needed). | - NSMetrics computes some metrics on Nile and the Squeak stream | hierarchy to compare. | - 322 tests | | -- | Damien Cassou | |
Hi Martial,
2007/6/21, Martial Boniou <[hidden email]>: > In the description paragraph, you mention a bad link to the > completehierarchy.pdf. It should be (underscore missing): > http://damien.cassou.free.fr/documents/internship_2007/nile/completehierarchy.pdf IIUC, the underscore is present in the description of the package, but it makes the text bold. Thus, underscores are not displayed. I ask Goran about this. > On squeaksource, the links are corrects. I inform you about this because > I noticed the request 'squeak' + 'a project name' on google.com give you > first the link to map.squeak.org then some links to mailing-lists > reports (nabble.com, squeakfoundation.org, ASPN...) and rarely > squeaksource.com. Thank you -- Damien Cassou |
A quick solution: a symlink intership2007 -> internship_2007
-- Martial Damien Cassou a écrit : | Hi Martial, | | 2007/6/21, Martial Boniou <[hidden email]>: | >In the description paragraph, you mention a bad link to the | >completehierarchy.pdf. It should be (underscore missing): | >http://damien.cassou.free.fr/documents/internship_2007/nile/completehierarchy.pdf | | | IIUC, the underscore is present in the description of the package, but | it makes the text bold. Thus, underscores are not displayed. I ask | Goran about this. | | >On squeaksource, the links are corrects. I inform you about this because | >I noticed the request 'squeak' + 'a project name' on google.com give you | >first the link to map.squeak.org then some links to mailing-lists | >reports (nabble.com, squeakfoundation.org, ASPN...) and rarely | >squeaksource.com. | | Thank you | | -- | Damien Cassou | |
In reply to this post by Damien Cassou-3
> Date: Sun, 10 Jun 2007 13:18:53 -0700
> From: [hidden email] > To: [hidden email] > Subject: Re: [ANN] Nile 0.9.0 > > Yes, I think given the benchmarks it's completely fair to say that Nile > is definitely on par with what we currently have. To be honest, > everything else would have come as a big surprise to me which is why I > looked at these benchmarks more carefully - the numbers you posted > seemed to indicate that there is a *major* speed advantage in the > microbenchmarks (>40%) and I was simply surprised to see such large numbers. I wouldn't expect much of a speed up either (unless the implementation before was really inefficient), and I didn't think that was the point. I thought the point was simply to clean up collections by using traits. I would just expect it to make the number of lines of code to maintain be less (and hopefully get rid of any #shouldNotImplement messages). Hotmail to go? Get your Hotmail, news, sports and much more! Check out the New MSN Mobile |
In reply to this post by Damien Cassou-3
> Date: Sun, 10 Jun 2007 15:53:44 -0700 > From: [hidden email] > To: [hidden email] > Subject: Re: [ANN] Nile 0.9.0 > > "For speed, hand-optimize the following expression: > ... > (and yes, the compiler should take care of it but it doesn't > and since the XYZ-benchmark has shown this to be a cricital > piece of our application we really need to do something about it)" Perfect comment. Explains why, so that we can know if the why is still valid years later. Play free games, earn tickets, get cool prizes! Join Live Search Club. Join Live Search Club! |
Free forum by Nabble | Edit this page |