Large Image Generator

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Large Image Generator

Pharo Smalltalk Users mailing list
Just read that announcement
(https://pharoweekly.wordpress.com/2020/01/29/ann-large-image-generator/)
on Pharo Weekly.

Anyone knows more about the precise goal of that project?  What is the
exact purpose of those images and what are we trying to test?  Garbage
collection?  The VM behavior under stress? Limitations of Pharo on a
specific OS? A reference against which future versions will be
benchmarked for performance? Iceberg and code management performance?
How base classes react when they have to deal with millions of objects
(e.g. a Bag with 15 million objects, forking 15000 processes, creating
20000 semaphores, etc) ?

--
-----------------
Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein)


Reply | Threaded
Open this post in threaded view
|

Re: Large Image Generator

tesonep@gmail.com
Hi Benoit,
 the main idea as always is to have better tests.
We have seen that Pharo, and many users reported, that Pharo does not
behaves correctly in some cases when handling large images. When I
take about large images, I am talking about images with a lot of code
and/or with a lot of data (yes the code for us is data... but I am
just differencing because if we are testing senders / implementors we
don't care about having a 10GB ByteArray).

Also we have seen that for correctly testing this scenarios, we need
good images, because depending the characteristics of the image it
stress different parts of the system. As you correctly said we need to
improve the tools (Calypso, System Navigation, Spotter, Iceberg, etc)
and the infrastructure (GC, Compiler, VM in general). So we need to
generate a lot of different images with different characteristics.

Also, as you correctly mentioned we need to generate images with
static behavior and with dynamic behavior (e.g., lots of concurrent
processes).

The main goal of the project is to start a recollection of existing
solutions and to add new ones to generate this synthetic images. We
want also to generate images that reproduce the nature of images. For
example, it is not enough to generate random method selectors if we
are testing the indexing of them. A good index varies its performance
depending of the nature of the text. We need to generate random
methods following some rules a developer whould use, for example using
more a given word or using real words.

So basically, we started collecting the easy algorithms to generate
and we will add more, and of course, it is open to contribution from
anyone and to different usage scenarios. The two I have implemented
are the ones we are using this week to solve three issues: (1)
improving the startup time, (2) improving the detection of deprecated
methods, (3) improving the analysis of big literal methods. I have
finished 1, the other two we are in working.

Also, I will like to add an implementation of the work of Clement and
Sophie to generate images to stress the GC.

If you have ideas or things to add let's share them!!

Cheers,
Pablo

On Wed, Jan 29, 2020 at 6:48 PM Benoit St-Jean via Pharo-users
<[hidden email]> wrote:

>
> Just read that announcement
> (https://pharoweekly.wordpress.com/2020/01/29/ann-large-image-generator/)
> on Pharo Weekly.
>
> Anyone knows more about the precise goal of that project?  What is the
> exact purpose of those images and what are we trying to test?  Garbage
> collection?  The VM behavior under stress? Limitations of Pharo on a
> specific OS? A reference against which future versions will be
> benchmarked for performance? Iceberg and code management performance?
> How base classes react when they have to deal with millions of objects
> (e.g. a Bag with 15 million objects, forking 15000 processes, creating
> 20000 semaphores, etc) ?
>
> --
> -----------------
> Benoît St-Jean
> Yahoo! Messenger: bstjean
> Twitter: @BenLeChialeux
> Pinterest: benoitstjean
> Instagram: Chef_Benito
> IRC: lamneth
> GitHub: bstjean
> Blogue: endormitoire.wordpress.com
> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
>
>


--
Pablo Tesone.
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Large Image Generator

Pharo Smalltalk Users mailing list
Thanks for your quick & detailed answer Pablo!

I have a big list of ideas, all I need to do now is to type my notes! I
wasn't sure if you wanted to test performance of a specific part/aspect
of Pharo/VM hence my question to clarify all that!

Is there any preferred image (8.x or 9.x) you're looking for? Do you
need/want tests that might only apply to P8 or your starting point is
P9? Are you considering tests that use Morph/UI objects as well?

P.S. I just installed the latest Pharo Launcher and it rocks (long story
short, don't ask, I had tons of problems previously because of the
accentuated "i" in my first name, stuff that hadn't been removed when I
uninstalled, etc. and I'm on Sh*tdows 10 !) so I'll first start by
moving all my stuff into ONE place (Thanks Pharo Launcher!!!!) and after
that experiment a little bit with Iceberg! In the meantime, would you
consider contributions in the form of a fileOut/mcz file ?

On 2020-01-29 13:30, [hidden email] wrote:

> Hi Benoit,
>   the main idea as always is to have better tests.
> We have seen that Pharo, and many users reported, that Pharo does not
> behaves correctly in some cases when handling large images. When I
> take about large images, I am talking about images with a lot of code
> and/or with a lot of data (yes the code for us is data... but I am
> just differencing because if we are testing senders / implementors we
> don't care about having a 10GB ByteArray).
>
> Also we have seen that for correctly testing this scenarios, we need
> good images, because depending the characteristics of the image it
> stress different parts of the system. As you correctly said we need to
> improve the tools (Calypso, System Navigation, Spotter, Iceberg, etc)
> and the infrastructure (GC, Compiler, VM in general). So we need to
> generate a lot of different images with different characteristics.
>
> Also, as you correctly mentioned we need to generate images with
> static behavior and with dynamic behavior (e.g., lots of concurrent
> processes).
>
> The main goal of the project is to start a recollection of existing
> solutions and to add new ones to generate this synthetic images. We
> want also to generate images that reproduce the nature of images. For
> example, it is not enough to generate random method selectors if we
> are testing the indexing of them. A good index varies its performance
> depending of the nature of the text. We need to generate random
> methods following some rules a developer whould use, for example using
> more a given word or using real words.
>
> So basically, we started collecting the easy algorithms to generate
> and we will add more, and of course, it is open to contribution from
> anyone and to different usage scenarios. The two I have implemented
> are the ones we are using this week to solve three issues: (1)
> improving the startup time, (2) improving the detection of deprecated
> methods, (3) improving the analysis of big literal methods. I have
> finished 1, the other two we are in working.
>
> Also, I will like to add an implementation of the work of Clement and
> Sophie to generate images to stress the GC.
>
> If you have ideas or things to add let's share them!!
>
> Cheers,
> Pablo
>
> On Wed, Jan 29, 2020 at 6:48 PM Benoit St-Jean via Pharo-users
> <[hidden email]> wrote:
>> Just read that announcement
>> (https://pharoweekly.wordpress.com/2020/01/29/ann-large-image-generator/)
>> on Pharo Weekly.
>>
>> Anyone knows more about the precise goal of that project?  What is the
>> exact purpose of those images and what are we trying to test?  Garbage
>> collection?  The VM behavior under stress? Limitations of Pharo on a
>> specific OS? A reference against which future versions will be
>> benchmarked for performance? Iceberg and code management performance?
>> How base classes react when they have to deal with millions of objects
>> (e.g. a Bag with 15 million objects, forking 15000 processes, creating
>> 20000 semaphores, etc) ?
>>
>> --
>> -----------------
>> Benoît St-Jean
>> Yahoo! Messenger: bstjean
>> Twitter: @BenLeChialeux
>> Pinterest: benoitstjean
>> Instagram: Chef_Benito
>> IRC: lamneth
>> GitHub: bstjean
>> Blogue: endormitoire.wordpress.com
>> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
>>
>>
>
--
-----------------
Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein)


Reply | Threaded
Open this post in threaded view
|

Re: Large Image Generator

Pharo Smalltalk Users mailing list
In reply to this post by tesonep@gmail.com
Pablo,

Here's my quick list of ideas to generate gigantic images...

The easiest way to fill the image quickly is to load huge packages that
can also generate tons of data. The firsts that come to mind are Moose,
Roassal, Seaside, BioSmalltalk, PolyMath, Marea, Magma (not sure if it
is still supported though) and Dr. Geo. Another possibility is to use
tools that can import tons of data into the image : there are tons of
stress-test-gicantic XML schemas out there that we could load/read with
one of our XML readers out there.

Another good candidate to torture the GC would be the FHCP challenge:
they had to solved gigantic graphs and, luckily for us, the winners in
2016 were from Inria so someone there must have the data to load those
graphs (and perhaps the algorithms!). DeepTraverser could probably do
the job here.

Other obvious tests involve pushing the language/VM to its limits : how
does Pharo react if we flood the image with 4 million symbols??? Or
create a hierarchy of 30000 classes? Or have a package with 2000 tags?
Is there any limit to the size of a methods (say, we create a method
with 5000 literals)? Or create a gazillion classes, each with the
maximum number of instance variables possible? What if we create a class
that has a reference to every class in the image and see how the
dependency checker copes with that?

Also, there's a lot of objects that are treated "specially" in the
image.?? How does the VM/GC reacts when there's a gazillion of them?
Semaphores, blocks, symbols, points, processes, weak objects, etc.

We could also generate a humongous graph/collection/whatever that could
be loaded fast with Fuel instead of having to create those objects from
scratch every time.

P.S. I have found quite a few discussions on the Squeak & Pharo mailing
lists regarding problems with large images and I've collected the
references to those threads. I'd be more than happy to share those with
you in private if you're interested! I've also found lots of references
to GC benchmarks/torture-tests used by other languages with GC and I
think lots of them could also apply to our memory model...


On 2020-01-29 13:30, [hidden email] wrote:
> The main goal of the project is to start a recollection of existing
> solutions and to add new ones to generate this synthetic images. We
> want also to generate images that reproduce the nature of images. For
> example, it is not enough to generate random method selectors if we
> are testing the indexing of them. A good index varies its performance
> depending of the nature of the text. We need to generate random
> methods following some rules a developer whould use, for example using
> more a given word or using real words.

--
-----------------
Beno??t St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
GitHub: bstjean
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein)