Just read that announcement
(https://pharoweekly.wordpress.com/2020/01/29/ann-large-image-generator/) on Pharo Weekly. Anyone knows more about the precise goal of that project? What is the exact purpose of those images and what are we trying to test? Garbage collection? The VM behavior under stress? Limitations of Pharo on a specific OS? A reference against which future versions will be benchmarked for performance? Iceberg and code management performance? How base classes react when they have to deal with millions of objects (e.g. a Bag with 15 million objects, forking 15000 processes, creating 20000 semaphores, etc) ? -- ----------------- Benoît St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth GitHub: bstjean Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein) |
Hi Benoit,
the main idea as always is to have better tests. We have seen that Pharo, and many users reported, that Pharo does not behaves correctly in some cases when handling large images. When I take about large images, I am talking about images with a lot of code and/or with a lot of data (yes the code for us is data... but I am just differencing because if we are testing senders / implementors we don't care about having a 10GB ByteArray). Also we have seen that for correctly testing this scenarios, we need good images, because depending the characteristics of the image it stress different parts of the system. As you correctly said we need to improve the tools (Calypso, System Navigation, Spotter, Iceberg, etc) and the infrastructure (GC, Compiler, VM in general). So we need to generate a lot of different images with different characteristics. Also, as you correctly mentioned we need to generate images with static behavior and with dynamic behavior (e.g., lots of concurrent processes). The main goal of the project is to start a recollection of existing solutions and to add new ones to generate this synthetic images. We want also to generate images that reproduce the nature of images. For example, it is not enough to generate random method selectors if we are testing the indexing of them. A good index varies its performance depending of the nature of the text. We need to generate random methods following some rules a developer whould use, for example using more a given word or using real words. So basically, we started collecting the easy algorithms to generate and we will add more, and of course, it is open to contribution from anyone and to different usage scenarios. The two I have implemented are the ones we are using this week to solve three issues: (1) improving the startup time, (2) improving the detection of deprecated methods, (3) improving the analysis of big literal methods. I have finished 1, the other two we are in working. Also, I will like to add an implementation of the work of Clement and Sophie to generate images to stress the GC. If you have ideas or things to add let's share them!! Cheers, Pablo On Wed, Jan 29, 2020 at 6:48 PM Benoit St-Jean via Pharo-users <[hidden email]> wrote: > > Just read that announcement > (https://pharoweekly.wordpress.com/2020/01/29/ann-large-image-generator/) > on Pharo Weekly. > > Anyone knows more about the precise goal of that project? What is the > exact purpose of those images and what are we trying to test? Garbage > collection? The VM behavior under stress? Limitations of Pharo on a > specific OS? A reference against which future versions will be > benchmarked for performance? Iceberg and code management performance? > How base classes react when they have to deal with millions of objects > (e.g. a Bag with 15 million objects, forking 15000 processes, creating > 20000 semaphores, etc) ? > > -- > ----------------- > Benoît St-Jean > Yahoo! Messenger: bstjean > Twitter: @BenLeChialeux > Pinterest: benoitstjean > Instagram: Chef_Benito > IRC: lamneth > GitHub: bstjean > Blogue: endormitoire.wordpress.com > "A standpoint is an intellectual horizon of radius zero". (A. Einstein) > > -- Pablo Tesone. [hidden email] |
Thanks for your quick & detailed answer Pablo!
I have a big list of ideas, all I need to do now is to type my notes! I wasn't sure if you wanted to test performance of a specific part/aspect of Pharo/VM hence my question to clarify all that! Is there any preferred image (8.x or 9.x) you're looking for? Do you need/want tests that might only apply to P8 or your starting point is P9? Are you considering tests that use Morph/UI objects as well? P.S. I just installed the latest Pharo Launcher and it rocks (long story short, don't ask, I had tons of problems previously because of the accentuated "i" in my first name, stuff that hadn't been removed when I uninstalled, etc. and I'm on Sh*tdows 10 !) so I'll first start by moving all my stuff into ONE place (Thanks Pharo Launcher!!!!) and after that experiment a little bit with Iceberg! In the meantime, would you consider contributions in the form of a fileOut/mcz file ? On 2020-01-29 13:30, [hidden email] wrote: > Hi Benoit, > the main idea as always is to have better tests. > We have seen that Pharo, and many users reported, that Pharo does not > behaves correctly in some cases when handling large images. When I > take about large images, I am talking about images with a lot of code > and/or with a lot of data (yes the code for us is data... but I am > just differencing because if we are testing senders / implementors we > don't care about having a 10GB ByteArray). > > Also we have seen that for correctly testing this scenarios, we need > good images, because depending the characteristics of the image it > stress different parts of the system. As you correctly said we need to > improve the tools (Calypso, System Navigation, Spotter, Iceberg, etc) > and the infrastructure (GC, Compiler, VM in general). So we need to > generate a lot of different images with different characteristics. > > Also, as you correctly mentioned we need to generate images with > static behavior and with dynamic behavior (e.g., lots of concurrent > processes). > > The main goal of the project is to start a recollection of existing > solutions and to add new ones to generate this synthetic images. We > want also to generate images that reproduce the nature of images. For > example, it is not enough to generate random method selectors if we > are testing the indexing of them. A good index varies its performance > depending of the nature of the text. We need to generate random > methods following some rules a developer whould use, for example using > more a given word or using real words. > > So basically, we started collecting the easy algorithms to generate > and we will add more, and of course, it is open to contribution from > anyone and to different usage scenarios. The two I have implemented > are the ones we are using this week to solve three issues: (1) > improving the startup time, (2) improving the detection of deprecated > methods, (3) improving the analysis of big literal methods. I have > finished 1, the other two we are in working. > > Also, I will like to add an implementation of the work of Clement and > Sophie to generate images to stress the GC. > > If you have ideas or things to add let's share them!! > > Cheers, > Pablo > > On Wed, Jan 29, 2020 at 6:48 PM Benoit St-Jean via Pharo-users > <[hidden email]> wrote: >> Just read that announcement >> (https://pharoweekly.wordpress.com/2020/01/29/ann-large-image-generator/) >> on Pharo Weekly. >> >> Anyone knows more about the precise goal of that project? What is the >> exact purpose of those images and what are we trying to test? Garbage >> collection? The VM behavior under stress? Limitations of Pharo on a >> specific OS? A reference against which future versions will be >> benchmarked for performance? Iceberg and code management performance? >> How base classes react when they have to deal with millions of objects >> (e.g. a Bag with 15 million objects, forking 15000 processes, creating >> 20000 semaphores, etc) ? >> >> -- >> ----------------- >> Benoît St-Jean >> Yahoo! Messenger: bstjean >> Twitter: @BenLeChialeux >> Pinterest: benoitstjean >> Instagram: Chef_Benito >> IRC: lamneth >> GitHub: bstjean >> Blogue: endormitoire.wordpress.com >> "A standpoint is an intellectual horizon of radius zero". (A. Einstein) >> >> > ----------------- Benoît St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth GitHub: bstjean Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein) |
In reply to this post by tesonep@gmail.com
Pablo,
Here's my quick list of ideas to generate gigantic images... The easiest way to fill the image quickly is to load huge packages that can also generate tons of data. The firsts that come to mind are Moose, Roassal, Seaside, BioSmalltalk, PolyMath, Marea, Magma (not sure if it is still supported though) and Dr. Geo. Another possibility is to use tools that can import tons of data into the image : there are tons of stress-test-gicantic XML schemas out there that we could load/read with one of our XML readers out there. Another good candidate to torture the GC would be the FHCP challenge: they had to solved gigantic graphs and, luckily for us, the winners in 2016 were from Inria so someone there must have the data to load those graphs (and perhaps the algorithms!). DeepTraverser could probably do the job here. Other obvious tests involve pushing the language/VM to its limits : how does Pharo react if we flood the image with 4 million symbols??? Or create a hierarchy of 30000 classes? Or have a package with 2000 tags? Is there any limit to the size of a methods (say, we create a method with 5000 literals)? Or create a gazillion classes, each with the maximum number of instance variables possible? What if we create a class that has a reference to every class in the image and see how the dependency checker copes with that? Also, there's a lot of objects that are treated "specially" in the image.?? How does the VM/GC reacts when there's a gazillion of them? Semaphores, blocks, symbols, points, processes, weak objects, etc. We could also generate a humongous graph/collection/whatever that could be loaded fast with Fuel instead of having to create those objects from scratch every time. P.S. I have found quite a few discussions on the Squeak & Pharo mailing lists regarding problems with large images and I've collected the references to those threads. I'd be more than happy to share those with you in private if you're interested! I've also found lots of references to GC benchmarks/torture-tests used by other languages with GC and I think lots of them could also apply to our memory model... On 2020-01-29 13:30, [hidden email] wrote: > The main goal of the project is to start a recollection of existing > solutions and to add new ones to generate this synthetic images. We > want also to generate images that reproduce the nature of images. For > example, it is not enough to generate random method selectors if we > are testing the indexing of them. A good index varies its performance > depending of the nature of the text. We need to generate random > methods following some rules a developer whould use, for example using > more a given word or using real words. -- ----------------- Beno??t St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth GitHub: bstjean Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein) |
Free forum by Nabble | Edit this page |