[ANN] PharoLambda a demo of Pharo running on AWS Lambda

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Ben Coman


On Mon, Aug 14, 2017 at 5:55 PM, Guillermo Polito <[hidden email]> wrote:
In a full image (just bootstrapped) we have:

 7.7 MB of arrays (probably in collections, we should check usages)
 6.3 MB of methods
 5.3 MB of ByteArrays
 3.3 MB of ByteStrings

What size do you get when all those ByteStrings are written to a text file and zipped up?
(I'd try myself but I don't have access to Pharo from where I am right now)

Perhaps on image save, all ByteStrings can be converted to ZippedByteStrings and lazily converted back as needed.
Depending on the measured performance hit, "as needed" could be first access, or only when the string needs to be updated, or not at all.

Now I was trying for a rough estimate of saved space like this...

stringsToZip := ByteString allInstances.  "do this line once only for repeatability"
zipStrings := OrderedCollection new.
entryPrefix := 'a'.   "or... bbbbbbbbbbbbb"
i := 0.
zip := ZipArchive new.
stringsToZip do: [ :bs | 
zipStrings add: (zip addDeflateString: bs as: entryPrefix, (i:=i+1) printString) ].
zip writeToFileNamed: 'ByteStrings.zip'.

uncompressedSize := compressedSize := 0.
zipStrings do: [ :zs | 
uncompressedSize := uncompressedSize + zs uncompressedSize.
compressedSize   := compressedSize   + zs compressedSize.  ].
(uncompressedSize//1024) -> (compressedSize//1024). 
 
" 2975->1365 <== entryPrefix='bbbbbbbbbbbbb'  "
" 2975->1365 <== entryPrefix='a'    "

We'd need to observe which/when strings are converted back to determine if its a real in-operation space saving, and what impact it has on performance. Has anyone done similar before?

-----------------

Now before posting the above, I went back to chart the effect of compression...
strSizes := Dictionary new.
zipSizes := Dictionary new.
zipStrings do: [ :zs |
strSizes at: zs uncompressedSize accumulate: zs uncompressedSize.
zipSizes at: zs uncompressedSize accumulate: zs compressedSize ].
strSizes keys sorted do: [ :strSize | 
Transcript cr;
show: ( strSize printStringLength: 10); 
show: ((strSizes at: strSize) printStringLength: 10);
show: ((zipSizes at: strSize) printStringLength: 10)] 

and loaded that output into Excel to produce the attached graph, which shows its detrimental for strings below 100 bytes, and limited benefit above that until the very last data point.  For comparison, here are the last two data points (largest strings).
       74498       74498      13781
   1621978   1621978    234281

and its content on that largest string starts like this...

    0000;<control>;Cc;0;BN;;;;;N;NULL;;;;
    0001;<control>;Cc;0;BN;;;;;N;START OF HEADING;;;;
    0002;<control>;Cc;0;BN;;;;;N;START OF TEXT;;;;
    0003;<control>;Cc;0;BN;;;;;N;END OF TEXT;;;;
    0004;<control>;Cc;0;BN;;;;;N;END OF TRANSMISSION;;;;
    0005;<control>;Cc;0;BN;;;;;N;ENQUIRY;;;;
    0006;<control>;Cc;0;BN;;;;;N;ACKNOWLEDGE;;;;
    0007;<control>;Cc;0;BN;;;;;N;BELL;;;;
    0008;<control>;Cc;0;BN;;;;;N;BACKSPACE;;;;
    0009;<control>;Cc;0;S;;;;;N;CHARACTER TABULATION;;;;
    000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;
    000B;<control>;Cc;0;S;;;;;N;LINE TABULATION;;;;
    000C;<control>;Cc;0;WS;;;;;N;FORM FEED (FF);;;;
    000D;<control>;Cc;0;B;;;;;N;CARRIAGE RETURN (CR);;;;

Now #pointersTo shows that (apart from the dozens of GT objects grabbing at it)
it is held by the "receiver" ivar of OpalCompiler.  What does it do there?

cheers -ben


cheers -ben
 
 2.7 MB of Bitmaps
 1.8 MB of ByteSymbols
 
That sumps up aready ~27 MB
 




PharoStringsCompression.png (110K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
In reply to this post by Guillermo Polito
Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
A weird observation - is it possible that source code is being stored in the image as strings somehow? When I do

./pharo PharoLambda.image eval "ByteString allInstances inject: (OrderedCollection new) into: [:r :i | i size > 500 ifTrue: [r add: i]. r]"

I see to see reams of what looks like method source - but I thought source code was stored in the .sources file and the .changes file (and I haven’t been bundling those in my deployed image).

I’m trying to figure out how you find references to a string object, to chase down what is pointing to these strings as maybe there is a quick 4mb win by simply nil’ing out some obvious things. (This doesn’t of course help with a default minimal image - but maybe a few tricks for packaging and deploying something).

Tim

On 15 Aug 2017, at 22:26, Tim Mackinnon <[hidden email]> wrote:

Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Guillermo Polito
Just a hunch: could you inspect ur MethodChangeRecord instances ?

Le mar. 15 août 2017 à 23:55, Tim Mackinnon <[hidden email]> a écrit :
A weird observation - is it possible that source code is being stored in the image as strings somehow? When I do

./pharo PharoLambda.image eval "ByteString allInstances inject: (OrderedCollection new) into: [:r :i | i size > 500 ifTrue: [r add: i]. r]"

I see to see reams of what looks like method source - but I thought source code was stored in the .sources file and the .changes file (and I haven’t been bundling those in my deployed image).

I’m trying to figure out how you find references to a string object, to chase down what is pointing to these strings as maybe there is a quick 4mb win by simply nil’ing out some obvious things. (This doesn’t of course help with a default minimal image - but maybe a few tricks for packaging and deploying something).

Tim

On 15 Aug 2017, at 22:26, Tim Mackinnon <[hidden email]> wrote:

Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13


--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Guillermo Polito
In reply to this post by Tim Mackinnon


On Tue, Aug 15, 2017 at 11:26 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

I know, I also believe we have to simplify this. In any case, baselines are healthy as they allow to also express dependencies. Otherwise you'll end up loading dependencies by hand. We'll fix this soon I hope.
 

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Guillermo Polito
Actually it happens first that monticello is "nicely" coupled with the changeset system and logs all the source code loaded in change sets :D :/ ¬¬. Also, the first two strings in terms of size are related to unicode tables (we should put them in files instead of in the image and load them on demand), and the two biggest arrays also to unicode. I just tried the following in a clean bootstrapped "minimal" image (metacello):

"Careful, this will make that #isLetter, #isUppercase #isLowercase, #toLowercase and #toUppercase only work on ascii"
Character characterSet: nil.
Unicode classPool at: #GeneralCategory put: nil.
Unicode classPool at: #DecimalProperty put: nil.

UnicodeDefinition removeFromSystem.
ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ].
ChangeSet resetCurrentToNewUnnamedChangeSet.
MCDefinition clearInstances.
Undeclared removeUnreferencedKeys.
Smalltalk garbageCollect.

like this:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image eval --save "Character characterSet: nil. Unicode classPool at: #GeneralCategory put: nil. Unicode classPool at: #DecimalProperty put: nil. UnicodeDefinitions removeFromSystem. ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

and my image went down from 11MB to 6.6MB (7.0 MB if I don't change back to ascii with the first three lines)

Then I tried a tally:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image save spacetally

./vm/pharo spacetally.image eval --save "repo := MCFileTreeRepository new directory: '../src' asFileReference. version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'. version load."

re-clean since i loaded some packages

./vm/pharo spacetally.image eval --save "ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

This image is now 6.6MB (7.1MB with the unicode large arrays), 4.1% of strings (274k) what seems reasonable. Remaining big strings are Pharo's licence, the buffer of the changes file and then some class comments (shouldn't they be fetched from disk as any other method source code?).

Making again a tally shows that ~30% of the space is taken by Arrays and 21.9% by compiled methods. But, BUT! :) I have ~30k arrays and lots of collections also:

"MethodDictionary"              2872 +
"IdentitySet"                         12781 + 
"OrderedCollection"             4398 + 
"Set"                                     2959 +
"Dictionary"                          1997 +
"IdentityDictionary"               454
-----------------------------------------------
25461

So there are ~5k arrays that are used outside collections.

Worth exploring a bit more I think.

On Wed, Aug 16, 2017 at 1:23 AM, Guillermo Polito <[hidden email]> wrote:


On Tue, Aug 15, 2017 at 11:26 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

I know, I also believe we have to simplify this. In any case, baselines are healthy as they allow to also express dependencies. Otherwise you'll end up loading dependencies by hand. We'll fix this soon I hope.
 

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
In reply to this post by Guillermo Polito
Yes you were on to something there (and at the same time, by poking around with #pointersTo I noticed some chains of objects too). So I ran the following script (partially borrowed from ImageCleaner) and this has got me down to a 14mb image (instance sizes listed below, which is looking much healthier - and those MethodChangeRecords are gone too) !!! 

I suspect there are more monti/metacello things that are still lurking around.

I also wonder if I need some the character sorting strings too.

Tim

"CmdLine script to debug the initial minimal image"

| logger repo version |

logger := FileStream stderr.
logger cr; nextPutAll: 'Starting Minimal Cleanup Script...'.

logger cr; nextPutAll: '>Resetting Class Comments'.
Smalltalk allClasses do: [ :c | c classComment: '' stamp: '' ].

logger cr; nextPutAll: '>Removing MC holders'.
MCMethodDefinition allInstances do: [:each | each become: String new ].
MCClassDefinition allInstances do: [:each | each become: String new ].
MCVersionInfo allInstances do: [:each | each become: String new ].

logger cr; nextPutAll: '>ImageCleaner release routines'.
Smalltalk organization removeEmptyCategories.
Smalltalk
allClassesAndTraitsDo: [ :class |
[ :each |
each
removeEmptyCategories;
sortCategories ]
value: class organization;
value: class class organization ].

(RPackageOrganizer default packages select: #isEmpty)
do: #unregister.

Smalltalk organization sortCategories.
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences.
Smalltalk cleanUp: true except: #() confirming: false.

logger cr; nextPutAll: '>GC'.
3 timesRepeat: [
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences].

logger cr; nextPutAll: 'Finished Script.'; cr; cr.

My top instances are now:

Class                                          code space # instances  inst space     percent   inst average size
CompiledMethod                                      19159       30481       2912968       21.60               95.57
Array                                                3742       36495       2852448       21.10               78.16
ByteString                                           2640       24018       2517168       18.60              104.80
ByteSymbol                                           1698       20722        759208        5.60               36.64
Association                                          1148       19786        633152        4.70               32.00
IdentitySet                                           408       15452        494464        3.70               32.00
MethodDictionary                                     3310        3520        350192        2.60               99.49
Protocol                                             1679        8382        268224        2.00               32.00
WeakArray                                            1758         265        232304        1.70              876.62
OrderedCollection                                    6555        5043        201720        1.50               40.00
ClassOrganization                                    5281        3520        168960        1.30               48.00
Metaclass                                            7184        1748        153824        1.10               88.00




On 15 Aug 2017, at 23:00, Guillermo Polito <[hidden email]> wrote:

Just a hunch: could you inspect ur MethodChangeRecord instances ?

Le mar. 15 août 2017 à 23:55, Tim Mackinnon <[hidden email]> a écrit :
A weird observation - is it possible that source code is being stored in the image as strings somehow? When I do

./pharo PharoLambda.image eval "ByteString allInstances inject: (OrderedCollection new) into: [:r :i | i size > 500 ifTrue: [r add: i]. r]"

I see to see reams of what looks like method source - but I thought source code was stored in the .sources file and the .changes file (and I haven’t been bundling those in my deployed image).

I’m trying to figure out how you find references to a string object, to chase down what is pointing to these strings as maybe there is a quick 4mb win by simply nil’ing out some obvious things. (This doesn’t of course help with a default minimal image - but maybe a few tricks for packaging and deploying something).

Tim

On 15 Aug 2017, at 22:26, Tim Mackinnon <[hidden email]> wrote:

Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13


--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Guillermo Polito
This means it would be healthy to do a cleanup (at least the non aggressive one, ChangeSets and MC stuff) on each of the images we produce and not just the latest one.

On Wed, Aug 16, 2017 at 8:35 AM, Tim Mackinnon <[hidden email]> wrote:
Yes you were on to something there (and at the same time, by poking around with #pointersTo I noticed some chains of objects too). So I ran the following script (partially borrowed from ImageCleaner) and this has got me down to a 14mb image (instance sizes listed below, which is looking much healthier - and those MethodChangeRecords are gone too) !!! 

I suspect there are more monti/metacello things that are still lurking around.

I also wonder if I need some the character sorting strings too.

Tim

"CmdLine script to debug the initial minimal image"

| logger repo version |

logger := FileStream stderr.
logger cr; nextPutAll: 'Starting Minimal Cleanup Script...'.

logger cr; nextPutAll: '>Resetting Class Comments'.
Smalltalk allClasses do: [ :c | c classComment: '' stamp: '' ].

logger cr; nextPutAll: '>Removing MC holders'.
MCMethodDefinition allInstances do: [:each | each become: String new ].
MCClassDefinition allInstances do: [:each | each become: String new ].
MCVersionInfo allInstances do: [:each | each become: String new ].

logger cr; nextPutAll: '>ImageCleaner release routines'.
Smalltalk organization removeEmptyCategories.
Smalltalk
allClassesAndTraitsDo: [ :class |
[ :each |
each
removeEmptyCategories;
sortCategories ]
value: class organization;
value: class class organization ].

(RPackageOrganizer default packages select: #isEmpty)
do: #unregister.

Smalltalk organization sortCategories.
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences.
Smalltalk cleanUp: true except: #() confirming: false.

logger cr; nextPutAll: '>GC'.
3 timesRepeat: [
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences].

logger cr; nextPutAll: 'Finished Script.'; cr; cr.

My top instances are now:

Class                                          code space # instances  inst space     percent   inst average size
CompiledMethod                                      19159       30481       2912968       21.60               95.57
Array                                                3742       36495       2852448       21.10               78.16
ByteString                                           2640       24018       2517168       18.60              104.80
ByteSymbol                                           1698       20722        759208        5.60               36.64
Association                                          1148       19786        633152        4.70               32.00
IdentitySet                                           408       15452        494464        3.70               32.00
MethodDictionary                                     3310        3520        350192        2.60               99.49
Protocol                                             1679        8382        268224        2.00               32.00
WeakArray                                            1758         265        232304        1.70              876.62
OrderedCollection                                    6555        5043        201720        1.50               40.00
ClassOrganization                                    5281        3520        168960        1.30               48.00
Metaclass                                            7184        1748        153824        1.10               88.00




On 15 Aug 2017, at 23:00, Guillermo Polito <[hidden email]> wrote:

Just a hunch: could you inspect ur MethodChangeRecord instances ?

Le mar. 15 août 2017 à 23:55, Tim Mackinnon <[hidden email]> a écrit :
A weird observation - is it possible that source code is being stored in the image as strings somehow? When I do

./pharo PharoLambda.image eval "ByteString allInstances inject: (OrderedCollection new) into: [:r :i | i size > 500 ifTrue: [r add: i]. r]"

I see to see reams of what looks like method source - but I thought source code was stored in the .sources file and the .changes file (and I haven’t been bundling those in my deployed image).

I’m trying to figure out how you find references to a string object, to chase down what is pointing to these strings as maybe there is a quick 4mb win by simply nil’ing out some obvious things. (This doesn’t of course help with a default minimal image - but maybe a few tricks for packaging and deploying something).

Tim

On 15 Aug 2017, at 22:26, Tim Mackinnon <[hidden email]> wrote:

Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
This is very encouraging, and also very instructive (your post on snapshot is also one in this area too. I have a queue of things to try for you on Friday).

I'll try adding your ideas to my script and see if it squeezes some more.

And then I guess we need to to decide which parts go into your minimum build steps and which are an external script (or possibly a HeadlessImageCleaner class we keep loaded so it's easier to maintain?)

Tim

Sent from my iPhone

On 16 Aug 2017, at 08:52, Guillermo Polito <[hidden email]> wrote:

This means it would be healthy to do a cleanup (at least the non aggressive one, ChangeSets and MC stuff) on each of the images we produce and not just the latest one.

On Wed, Aug 16, 2017 at 8:35 AM, Tim Mackinnon <[hidden email]> wrote:
Yes you were on to something there (and at the same time, by poking around with #pointersTo I noticed some chains of objects too). So I ran the following script (partially borrowed from ImageCleaner) and this has got me down to a 14mb image (instance sizes listed below, which is looking much healthier - and those MethodChangeRecords are gone too) !!! 

I suspect there are more monti/metacello things that are still lurking around.

I also wonder if I need some the character sorting strings too.

Tim

"CmdLine script to debug the initial minimal image"

| logger repo version |

logger := FileStream stderr.
logger cr; nextPutAll: 'Starting Minimal Cleanup Script...'.

logger cr; nextPutAll: '>Resetting Class Comments'.
Smalltalk allClasses do: [ :c | c classComment: '' stamp: '' ].

logger cr; nextPutAll: '>Removing MC holders'.
MCMethodDefinition allInstances do: [:each | each become: String new ].
MCClassDefinition allInstances do: [:each | each become: String new ].
MCVersionInfo allInstances do: [:each | each become: String new ].

logger cr; nextPutAll: '>ImageCleaner release routines'.
Smalltalk organization removeEmptyCategories.
Smalltalk
allClassesAndTraitsDo: [ :class |
[ :each |
each
removeEmptyCategories;
sortCategories ]
value: class organization;
value: class class organization ].

(RPackageOrganizer default packages select: #isEmpty)
do: #unregister.

Smalltalk organization sortCategories.
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences.
Smalltalk cleanUp: true except: #() confirming: false.

logger cr; nextPutAll: '>GC'.
3 timesRepeat: [
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences].

logger cr; nextPutAll: 'Finished Script.'; cr; cr.

My top instances are now:

Class                                          code space # instances  inst space     percent   inst average size
CompiledMethod                                      19159       30481       2912968       21.60               95.57
Array                                                3742       36495       2852448       21.10               78.16
ByteString                                           2640       24018       2517168       18.60              104.80
ByteSymbol                                           1698       20722        759208        5.60               36.64
Association                                          1148       19786        633152        4.70               32.00
IdentitySet                                           408       15452        494464        3.70               32.00
MethodDictionary                                     3310        3520        350192        2.60               99.49
Protocol                                             1679        8382        268224        2.00               32.00
WeakArray                                            1758         265        232304        1.70              876.62
OrderedCollection                                    6555        5043        201720        1.50               40.00
ClassOrganization                                    5281        3520        168960        1.30               48.00
Metaclass                                            7184        1748        153824        1.10               88.00




On 15 Aug 2017, at 23:00, Guillermo Polito <[hidden email]> wrote:

Just a hunch: could you inspect ur MethodChangeRecord instances ?

Le mar. 15 août 2017 à 23:55, Tim Mackinnon <[hidden email]> a écrit :
A weird observation - is it possible that source code is being stored in the image as strings somehow? When I do

./pharo PharoLambda.image eval "ByteString allInstances inject: (OrderedCollection new) into: [:r :i | i size > 500 ifTrue: [r add: i]. r]"

I see to see reams of what looks like method source - but I thought source code was stored in the .sources file and the .changes file (and I haven’t been bundling those in my deployed image).

I’m trying to figure out how you find references to a string object, to chase down what is pointing to these strings as maybe there is a quick 4mb win by simply nil’ing out some obvious things. (This doesn’t of course help with a default minimal image - but maybe a few tricks for packaging and deploying something).

Tim

On 15 Aug 2017, at 22:26, Tim Mackinnon <[hidden email]> wrote:

Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Guillermo Polito
Be careful because some of those aggressive cleanups may turn some parts of your image unstable. For example:

This is dangerous:
MCMethodDefinition allInstances do: [:each | each become: String new ].
MCClassDefinition allInstances do: [:each | each become: String new ].
MCVersionInfo allInstances do: [:each | each become: String new ].

And this may break some code if you're using non-ascii characters:
Unicode classPool at: #GeneralCategory put: nil.
Unicode classPool at: #DecimalProperty put: nil.



On Wed, Aug 16, 2017 at 10:07 AM, Tim Mackinnon <[hidden email]> wrote:
This is very encouraging, and also very instructive (your post on snapshot is also one in this area too. I have a queue of things to try for you on Friday).

I'll try adding your ideas to my script and see if it squeezes some more.

And then I guess we need to to decide which parts go into your minimum build steps and which are an external script (or possibly a HeadlessImageCleaner class we keep loaded so it's easier to maintain?)

Tim

Sent from my iPhone

On 16 Aug 2017, at 08:52, Guillermo Polito <[hidden email]> wrote:

This means it would be healthy to do a cleanup (at least the non aggressive one, ChangeSets and MC stuff) on each of the images we produce and not just the latest one.

On Wed, Aug 16, 2017 at 8:35 AM, Tim Mackinnon <[hidden email]> wrote:
Yes you were on to something there (and at the same time, by poking around with #pointersTo I noticed some chains of objects too). So I ran the following script (partially borrowed from ImageCleaner) and this has got me down to a 14mb image (instance sizes listed below, which is looking much healthier - and those MethodChangeRecords are gone too) !!! 

I suspect there are more monti/metacello things that are still lurking around.

I also wonder if I need some the character sorting strings too.

Tim

"CmdLine script to debug the initial minimal image"

| logger repo version |

logger := FileStream stderr.
logger cr; nextPutAll: 'Starting Minimal Cleanup Script...'.

logger cr; nextPutAll: '>Resetting Class Comments'.
Smalltalk allClasses do: [ :c | c classComment: '' stamp: '' ].

logger cr; nextPutAll: '>Removing MC holders'.
MCMethodDefinition allInstances do: [:each | each become: String new ].
MCClassDefinition allInstances do: [:each | each become: String new ].
MCVersionInfo allInstances do: [:each | each become: String new ].

logger cr; nextPutAll: '>ImageCleaner release routines'.
Smalltalk organization removeEmptyCategories.
Smalltalk
allClassesAndTraitsDo: [ :class |
[ :each |
each
removeEmptyCategories;
sortCategories ]
value: class organization;
value: class class organization ].

(RPackageOrganizer default packages select: #isEmpty)
do: #unregister.

Smalltalk organization sortCategories.
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences.
Smalltalk cleanUp: true except: #() confirming: false.

logger cr; nextPutAll: '>GC'.
3 timesRepeat: [
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences].

logger cr; nextPutAll: 'Finished Script.'; cr; cr.

My top instances are now:

Class                                          code space # instances  inst space     percent   inst average size
CompiledMethod                                      19159       30481       2912968       21.60               95.57
Array                                                3742       36495       2852448       21.10               78.16
ByteString                                           2640       24018       2517168       18.60              104.80
ByteSymbol                                           1698       20722        759208        5.60               36.64
Association                                          1148       19786        633152        4.70               32.00
IdentitySet                                           408       15452        494464        3.70               32.00
MethodDictionary                                     3310        3520        350192        2.60               99.49
Protocol                                             1679        8382        268224        2.00               32.00
WeakArray                                            1758         265        232304        1.70              876.62
OrderedCollection                                    6555        5043        201720        1.50               40.00
ClassOrganization                                    5281        3520        168960        1.30               48.00
Metaclass                                            7184        1748        153824        1.10               88.00




On 15 Aug 2017, at 23:00, Guillermo Polito <[hidden email]> wrote:

Just a hunch: could you inspect ur MethodChangeRecord instances ?

Le mar. 15 août 2017 à 23:55, Tim Mackinnon <[hidden email]> a écrit :
A weird observation - is it possible that source code is being stored in the image as strings somehow? When I do

./pharo PharoLambda.image eval "ByteString allInstances inject: (OrderedCollection new) into: [:r :i | i size > 500 ifTrue: [r add: i]. r]"

I see to see reams of what looks like method source - but I thought source code was stored in the .sources file and the .changes file (and I haven’t been bundling those in my deployed image).

I’m trying to figure out how you find references to a string object, to chase down what is pointing to these strings as maybe there is a quick 4mb win by simply nil’ing out some obvious things. (This doesn’t of course help with a default minimal image - but maybe a few tricks for packaging and deploying something).

Tim

On 15 Aug 2017, at 22:26, Tim Mackinnon <[hidden email]> wrote:

Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13


--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
Yes - that is a fair warning - I guess I’m looking for ways to reset Monticello/metocello artefacts and caching (maybe I should ask Dale for some tips - as I find the Monticello/metacello packages quite big and difficult to follow, and there are no class comments to give you any pointers). I’m going on the assumption that regular code/classes shouldn’t need any of this MC metadata/objects?

Once my application is loaded and instantiated and saved in the image, I don’t need update to update it in the image (I would run another CI build and generate a new image - which is a different pattern from more mainstream projects but as execution time and memory size are the key here, as Lambda is about short lived executions, so you would take the hit at build time).

Tim

On 16 Aug 2017, at 09:11, Guillermo Polito <[hidden email]> wrote:

Be careful because some of those aggressive cleanups may turn some parts of your image unstable. For example:

This is dangerous:
MCMethodDefinition allInstances do: [:each | each become: String new ].
MCClassDefinition allInstances do: [:each | each become: String new ].
MCVersionInfo allInstances do: [:each | each become: String new ].

And this may break some code if you're using non-ascii characters:
Unicode classPool at: #GeneralCategory put: nil.
Unicode classPool at: #DecimalProperty put: nil.



On Wed, Aug 16, 2017 at 10:07 AM, Tim Mackinnon <[hidden email]> wrote:
This is very encouraging, and also very instructive (your post on snapshot is also one in this area too. I have a queue of things to try for you on Friday).

I'll try adding your ideas to my script and see if it squeezes some more.

And then I guess we need to to decide which parts go into your minimum build steps and which are an external script (or possibly a HeadlessImageCleaner class we keep loaded so it's easier to maintain?)

Tim

Sent from my iPhone

On 16 Aug 2017, at 08:52, Guillermo Polito <[hidden email]> wrote:

This means it would be healthy to do a cleanup (at least the non aggressive one, ChangeSets and MC stuff) on each of the images we produce and not just the latest one.

On Wed, Aug 16, 2017 at 8:35 AM, Tim Mackinnon <[hidden email]> wrote:
Yes you were on to something there (and at the same time, by poking around with #pointersTo I noticed some chains of objects too). So I ran the following script (partially borrowed from ImageCleaner) and this has got me down to a 14mb image (instance sizes listed below, which is looking much healthier - and those MethodChangeRecords are gone too) !!! 

I suspect there are more monti/metacello things that are still lurking around.

I also wonder if I need some the character sorting strings too.

Tim

"CmdLine script to debug the initial minimal image"

| logger repo version |

logger := FileStream stderr.
logger cr; nextPutAll: 'Starting Minimal Cleanup Script...'.

logger cr; nextPutAll: '>Resetting Class Comments'.
Smalltalk allClasses do: [ :c | c classComment: '' stamp: '' ].

logger cr; nextPutAll: '>Removing MC holders'.
MCMethodDefinition allInstances do: [:each | each become: String new ].
MCClassDefinition allInstances do: [:each | each become: String new ].
MCVersionInfo allInstances do: [:each | each become: String new ].

logger cr; nextPutAll: '>ImageCleaner release routines'.
Smalltalk organization removeEmptyCategories.
Smalltalk
allClassesAndTraitsDo: [ :class |
[ :each |
each
removeEmptyCategories;
sortCategories ]
value: class organization;
value: class class organization ].

(RPackageOrganizer default packages select: #isEmpty)
do: #unregister.

Smalltalk organization sortCategories.
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences.
Smalltalk cleanUp: true except: #() confirming: false.

logger cr; nextPutAll: '>GC'.
3 timesRepeat: [
Smalltalk garbageCollect.
Smalltalk cleanOutUndeclared.
Smalltalk fixObsoleteReferences].

logger cr; nextPutAll: 'Finished Script.'; cr; cr.

My top instances are now:

Class                                          code space # instances  inst space     percent   inst average size
CompiledMethod                                      19159       30481       2912968       21.60               95.57
Array                                                3742       36495       2852448       21.10               78.16
ByteString                                           2640       24018       2517168       18.60              104.80
ByteSymbol                                           1698       20722        759208        5.60               36.64
Association                                          1148       19786        633152        4.70               32.00
IdentitySet                                           408       15452        494464        3.70               32.00
MethodDictionary                                     3310        3520        350192        2.60               99.49
Protocol                                             1679        8382        268224        2.00               32.00
WeakArray                                            1758         265        232304        1.70              876.62
OrderedCollection                                    6555        5043        201720        1.50               40.00
ClassOrganization                                    5281        3520        168960        1.30               48.00
Metaclass                                            7184        1748        153824        1.10               88.00




On 15 Aug 2017, at 23:00, Guillermo Polito <[hidden email]> wrote:

Just a hunch: could you inspect ur MethodChangeRecord instances ?

Le mar. 15 août 2017 à 23:55, Tim Mackinnon <[hidden email]> a écrit :
A weird observation - is it possible that source code is being stored in the image as strings somehow? When I do

./pharo PharoLambda.image eval "ByteString allInstances inject: (OrderedCollection new) into: [:r :i | i size > 500 ifTrue: [r add: i]. r]"

I see to see reams of what looks like method source - but I thought source code was stored in the .sources file and the .changes file (and I haven’t been bundling those in my deployed image).

I’m trying to figure out how you find references to a string object, to chase down what is pointing to these strings as maybe there is a quick 4mb win by simply nil’ing out some obvious things. (This doesn’t of course help with a default minimal image - but maybe a few tricks for packaging and deploying something).

Tim

On 15 Aug 2017, at 22:26, Tim Mackinnon <[hidden email]> wrote:

Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13


--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13



--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
In reply to this post by Guillermo Polito
Hi, tracing through your changes - it looks like:

Smalltalk cleanUp: true except: #() confirming: false.
Takes care of all the non-unicode changes you proposed (and it seems like its a known cleanup protocol). I wonder if the Unicode change is worth it/risky as many web based services I might connect to with Zinc do support Unicode so maybe I should keep that one in. (I will for now - might verify how much of a difference it really makes)

I think my next port of call is cleanUp for Monticello/Metacello as I see a fair amount of that stuff floating around in my image (after I’ve used it to bootstrap my code).

Tim

On 16 Aug 2017, at 02:32, Guillermo Polito <[hidden email]> wrote:

Actually it happens first that monticello is "nicely" coupled with the changeset system and logs all the source code loaded in change sets :D :/ ¬¬. Also, the first two strings in terms of size are related to unicode tables (we should put them in files instead of in the image and load them on demand), and the two biggest arrays also to unicode. I just tried the following in a clean bootstrapped "minimal" image (metacello):

"Careful, this will make that #isLetter, #isUppercase #isLowercase, #toLowercase and #toUppercase only work on ascii"
Character characterSet: nil.
Unicode classPool at: #GeneralCategory put: nil.
Unicode classPool at: #DecimalProperty put: nil.

UnicodeDefinition removeFromSystem.
ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ].
ChangeSet resetCurrentToNewUnnamedChangeSet.
MCDefinition clearInstances.
Undeclared removeUnreferencedKeys.
Smalltalk garbageCollect.

like this:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image eval --save "Character characterSet: nil. Unicode classPool at: #GeneralCategory put: nil. Unicode classPool at: #DecimalProperty put: nil. UnicodeDefinitions removeFromSystem. ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

and my image went down from 11MB to 6.6MB (7.0 MB if I don't change back to ascii with the first three lines)

Then I tried a tally:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image save spacetally

./vm/pharo spacetally.image eval --save "repo := MCFileTreeRepository new directory: '../src' asFileReference. version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'. version load."

re-clean since i loaded some packages

./vm/pharo spacetally.image eval --save "ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

This image is now 6.6MB (7.1MB with the unicode large arrays), 4.1% of strings (274k) what seems reasonable. Remaining big strings are Pharo's licence, the buffer of the changes file and then some class comments (shouldn't they be fetched from disk as any other method source code?).

Making again a tally shows that ~30% of the space is taken by Arrays and 21.9% by compiled methods. But, BUT! :) I have ~30k arrays and lots of collections also:

"MethodDictionary"              2872 +
"IdentitySet"                         12781 + 
"OrderedCollection"             4398 + 
"Set"                                     2959 +
"Dictionary"                          1997 +
"IdentityDictionary"               454
-----------------------------------------------
25461

So there are ~5k arrays that are used outside collections.

Worth exploring a bit more I think.

On Wed, Aug 16, 2017 at 1:23 AM, Guillermo Polito <[hidden email]> wrote:


On Tue, Aug 15, 2017 at 11:26 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

I know, I also believe we have to simplify this. In any case, baselines are healthy as they allow to also express dependencies. Otherwise you'll end up loading dependencies by hand. We'll fix this soon I hope.
 

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13



--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Guillermo Polito


On Wed, Aug 16, 2017 at 11:46 AM, Tim Mackinnon <[hidden email]> wrote:
Hi, tracing through your changes - it looks like:

Smalltalk cleanUp: true except: #() confirming: false.
Takes care of all the non-unicode changes you proposed (and it seems like its a known cleanup protocol).

I based my script on #cleanupForRelease ^^. But I did not just blindly execute it as is because I wanted to understand the implications of each line.
 
I wonder if the Unicode change is worth it/risky as many web based services I might connect to with Zinc do support Unicode so maybe I should keep that one in. (I will for now - might verify how much of a difference it really makes)

No, it should not break any encoding/decoding. The changes I proposed will just nil out two things:

 - the uppercase/lowercase mapping unicode tables that says for each codepoint if the codepoint is uppercase/lowercase and allows transformations from/to uppercase/lowercase. This means that these may not work as expected:

         aChar asLowercase
         aChar asUppercase
         aChar toLowercase
         aChar toUppercase

 - the unicode classification table that says if a character is letter or digit, and so on. This means that these may not work as expected:
     
        aChar isLetter
        aChar isDigit
        aChar isAlphaNumeric
 
I think my next port of call is cleanUp for Monticello/Metacello as I see a fair amount of that stuff floating around in my image (after I’ve used it to bootstrap my code).

Tim

On 16 Aug 2017, at 02:32, Guillermo Polito <[hidden email]> wrote:

Actually it happens first that monticello is "nicely" coupled with the changeset system and logs all the source code loaded in change sets :D :/ ¬¬. Also, the first two strings in terms of size are related to unicode tables (we should put them in files instead of in the image and load them on demand), and the two biggest arrays also to unicode. I just tried the following in a clean bootstrapped "minimal" image (metacello):

"Careful, this will make that #isLetter, #isUppercase #isLowercase, #toLowercase and #toUppercase only work on ascii"
Character characterSet: nil.
Unicode classPool at: #GeneralCategory put: nil.
Unicode classPool at: #DecimalProperty put: nil.

UnicodeDefinition removeFromSystem.
ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ].
ChangeSet resetCurrentToNewUnnamedChangeSet.
MCDefinition clearInstances.
Undeclared removeUnreferencedKeys.
Smalltalk garbageCollect.

like this:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image eval --save "Character characterSet: nil. Unicode classPool at: #GeneralCategory put: nil. Unicode classPool at: #DecimalProperty put: nil. UnicodeDefinitions removeFromSystem. ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

and my image went down from 11MB to 6.6MB (7.0 MB if I don't change back to ascii with the first three lines)

Then I tried a tally:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image save spacetally

./vm/pharo spacetally.image eval --save "repo := MCFileTreeRepository new directory: '../src' asFileReference. version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'. version load."

re-clean since i loaded some packages

./vm/pharo spacetally.image eval --save "ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

This image is now 6.6MB (7.1MB with the unicode large arrays), 4.1% of strings (274k) what seems reasonable. Remaining big strings are Pharo's licence, the buffer of the changes file and then some class comments (shouldn't they be fetched from disk as any other method source code?).

Making again a tally shows that ~30% of the space is taken by Arrays and 21.9% by compiled methods. But, BUT! :) I have ~30k arrays and lots of collections also:

"MethodDictionary"              2872 +
"IdentitySet"                         12781 + 
"OrderedCollection"             4398 + 
"Set"                                     2959 +
"Dictionary"                          1997 +
"IdentityDictionary"               454
-----------------------------------------------
25461

So there are ~5k arrays that are used outside collections.

Worth exploring a bit more I think.

On Wed, Aug 16, 2017 at 1:23 AM, Guillermo Polito <[hidden email]> wrote:


On Tue, Aug 15, 2017 at 11:26 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

I know, I also believe we have to simplify this. In any case, baselines are healthy as they allow to also express dependencies. Otherwise you'll end up loading dependencies by hand. We'll fix this soon I hope.
 

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13



--
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank">+33 06 52 70 66 13




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
Just thought I would report back a bit more on this - 

The Unicode change doesn’t work in my case (possibly not for command line Pharo as well) as I get an error where OS filename’s need unicode support (actually I think this is where its trying to write to stdout, but I didn’t dig more into this):

Error: Instances of UndefinedObject are not indexable
UndefinedObject(Object)>>error:
UndefinedObject(Object)>>errorNotIndexable
UndefinedObject(Object)>>size
Unicode class>>isLetter:
Character>>isLetter
Path class>>isAbsoluteWindowsPath:
Path class>>from:delimiter:
MacStore(FileSystemStore)>>pathFromString:
FileSystem>>pathFromString:
ByteString(String)>>asPathWith:
FileSystem>>pathFromObject:
FileSystem>>referenceTo:
ByteString(String)>>asFileReference
FileStream class>>fullName:
FileStream class>>fileNamed:
SmalltalkImage>>openLog

I was able to improve on Guille’s warning about how to safely clear up monticello/metacello (and not use become: String new) with the following (I actually think Metacello should provide a #cleanUp method, so I raised a pr for consideration)
logger cr; nextPutAll: '>Removing Clearing MC Registry'.
MetacelloProjectRegistration resetRegistry.

I’m then able to be by image down from 22mb to 13.8 (which is pretty good).

As a further experiment I also noticed that there is a fair amount of space trapped in Protocols and ClassOrganisation - so I tried clearing those out (as they are lazily cached) with:
Smalltalk allClassesAndTraits do: [:c | c basicOrganization: nil ].
This seems to give me a further 1mb back (but I have’t tried performance tests on this, but my naive assumption is that in a running system that isn’t adding/manipulating code - that I don’t think Protocols are used?). So I’m now at 21mb.

Tim

On 16 Aug 2017, at 10:53, Guillermo Polito <[hidden email]> wrote:



On Wed, Aug 16, 2017 at 11:46 AM, Tim Mackinnon <[hidden email]> wrote:
Hi, tracing through your changes - it looks like:

Smalltalk cleanUp: true except: #() confirming: false.
Takes care of all the non-unicode changes you proposed (and it seems like its a known cleanup protocol).

I based my script on #cleanupForRelease ^^. But I did not just blindly execute it as is because I wanted to understand the implications of each line.
 
I wonder if the Unicode change is worth it/risky as many web based services I might connect to with Zinc do support Unicode so maybe I should keep that one in. (I will for now - might verify how much of a difference it really makes)

No, it should not break any encoding/decoding. The changes I proposed will just nil out two things:

 - the uppercase/lowercase mapping unicode tables that says for each codepoint if the codepoint is uppercase/lowercase and allows transformations from/to uppercase/lowercase. This means that these may not work as expected:

         aChar asLowercase
         aChar asUppercase
         aChar toLowercase
         aChar toUppercase

 - the unicode classification table that says if a character is letter or digit, and so on. This means that these may not work as expected:
     
        aChar isLetter
        aChar isDigit
        aChar isAlphaNumeric
 
I think my next port of call is cleanUp for Monticello/Metacello as I see a fair amount of that stuff floating around in my image (after I’ve used it to bootstrap my code).

Tim

On 16 Aug 2017, at 02:32, Guillermo Polito <[hidden email]> wrote:

Actually it happens first that monticello is "nicely" coupled with the changeset system and logs all the source code loaded in change sets :D :/ ¬¬. Also, the first two strings in terms of size are related to unicode tables (we should put them in files instead of in the image and load them on demand), and the two biggest arrays also to unicode. I just tried the following in a clean bootstrapped "minimal" image (metacello):

"Careful, this will make that #isLetter, #isUppercase #isLowercase, #toLowercase and #toUppercase only work on ascii"
Character characterSet: nil.
Unicode classPool at: #GeneralCategory put: nil.
Unicode classPool at: #DecimalProperty put: nil.

UnicodeDefinition removeFromSystem.
ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ].
ChangeSet resetCurrentToNewUnnamedChangeSet.
MCDefinition clearInstances.
Undeclared removeUnreferencedKeys.
Smalltalk garbageCollect.

like this:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image eval --save "Character characterSet: nil. Unicode classPool at: #GeneralCategory put: nil. Unicode classPool at: #DecimalProperty put: nil. UnicodeDefinitions removeFromSystem. ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

and my image went down from 11MB to 6.6MB (7.0 MB if I don't change back to ascii with the first three lines)

Then I tried a tally:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image save spacetally

./vm/pharo spacetally.image eval --save "repo := MCFileTreeRepository new directory: '../src' asFileReference. version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'. version load."

re-clean since i loaded some packages

./vm/pharo spacetally.image eval --save "ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

This image is now 6.6MB (7.1MB with the unicode large arrays), 4.1% of strings (274k) what seems reasonable. Remaining big strings are Pharo's licence, the buffer of the changes file and then some class comments (shouldn't they be fetched from disk as any other method source code?).

Making again a tally shows that ~30% of the space is taken by Arrays and 21.9% by compiled methods. But, BUT! :) I have ~30k arrays and lots of collections also:

"MethodDictionary"              2872 +
"IdentitySet"                         12781 + 
"OrderedCollection"             4398 + 
"Set"                                     2959 +
"Dictionary"                          1997 +
"IdentityDictionary"               454
-----------------------------------------------
25461

So there are ~5k arrays that are used outside collections.

Worth exploring a bit more I think.

On Wed, Aug 16, 2017 at 1:23 AM, Guillermo Polito <[hidden email]> wrote:


On Tue, Aug 15, 2017 at 11:26 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

I know, I also believe we have to simplify this. In any case, baselines are healthy as they allow to also express dependencies. Otherwise you'll end up loading dependencies by hand. We'll fix this soon I hope.
 

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13



-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Tim Mackinnon
Actually it looks like that extra 1.5mb is not worth having as It seems that something does end up using the class organisation and so it must recompute when the image is launched.

Tim

On 17 Aug 2017, at 13:07, Tim Mackinnon <[hidden email]> wrote:

Just thought I would report back a bit more on this - 

The Unicode change doesn’t work in my case (possibly not for command line Pharo as well) as I get an error where OS filename’s need unicode support (actually I think this is where its trying to write to stdout, but I didn’t dig more into this):

Error: Instances of UndefinedObject are not indexable
UndefinedObject(Object)>>error:
UndefinedObject(Object)>>errorNotIndexable
UndefinedObject(Object)>>size
Unicode class>>isLetter:
Character>>isLetter
Path class>>isAbsoluteWindowsPath:
Path class>>from:delimiter:
MacStore(FileSystemStore)>>pathFromString:
FileSystem>>pathFromString:
ByteString(String)>>asPathWith:
FileSystem>>pathFromObject:
FileSystem>>referenceTo:
ByteString(String)>>asFileReference
FileStream class>>fullName:
FileStream class>>fileNamed:
SmalltalkImage>>openLog

I was able to improve on Guille’s warning about how to safely clear up monticello/metacello (and not use become: String new) with the following (I actually think Metacello should provide a #cleanUp method, so I raised a pr for consideration)
logger cr; nextPutAll: '>Removing Clearing MC Registry'.
MetacelloProjectRegistration resetRegistry.

I’m then able to be by image down from 22mb to 13.8 (which is pretty good).

As a further experiment I also noticed that there is a fair amount of space trapped in Protocols and ClassOrganisation - so I tried clearing those out (as they are lazily cached) with:
Smalltalk allClassesAndTraits do: [:c | c basicOrganization: nil ].
This seems to give me a further 1mb back (but I have’t tried performance tests on this, but my naive assumption is that in a running system that isn’t adding/manipulating code - that I don’t think Protocols are used?). So I’m now at 21mb.

Tim

On 16 Aug 2017, at 10:53, Guillermo Polito <[hidden email]> wrote:



On Wed, Aug 16, 2017 at 11:46 AM, Tim Mackinnon <[hidden email]> wrote:
Hi, tracing through your changes - it looks like:

Smalltalk cleanUp: true except: #() confirming: false.
Takes care of all the non-unicode changes you proposed (and it seems like its a known cleanup protocol).

I based my script on #cleanupForRelease ^^. But I did not just blindly execute it as is because I wanted to understand the implications of each line.
 
I wonder if the Unicode change is worth it/risky as many web based services I might connect to with Zinc do support Unicode so maybe I should keep that one in. (I will for now - might verify how much of a difference it really makes)

No, it should not break any encoding/decoding. The changes I proposed will just nil out two things:

 - the uppercase/lowercase mapping unicode tables that says for each codepoint if the codepoint is uppercase/lowercase and allows transformations from/to uppercase/lowercase. This means that these may not work as expected:

         aChar asLowercase
         aChar asUppercase
         aChar toLowercase
         aChar toUppercase

 - the unicode classification table that says if a character is letter or digit, and so on. This means that these may not work as expected:
     
        aChar isLetter
        aChar isDigit
        aChar isAlphaNumeric
 
I think my next port of call is cleanUp for Monticello/Metacello as I see a fair amount of that stuff floating around in my image (after I’ve used it to bootstrap my code).

Tim

On 16 Aug 2017, at 02:32, Guillermo Polito <[hidden email]> wrote:

Actually it happens first that monticello is "nicely" coupled with the changeset system and logs all the source code loaded in change sets :D :/ ¬¬. Also, the first two strings in terms of size are related to unicode tables (we should put them in files instead of in the image and load them on demand), and the two biggest arrays also to unicode. I just tried the following in a clean bootstrapped "minimal" image (metacello):

"Careful, this will make that #isLetter, #isUppercase #isLowercase, #toLowercase and #toUppercase only work on ascii"
Character characterSet: nil.
Unicode classPool at: #GeneralCategory put: nil.
Unicode classPool at: #DecimalProperty put: nil.

UnicodeDefinition removeFromSystem.
ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ].
ChangeSet resetCurrentToNewUnnamedChangeSet.
MCDefinition clearInstances.
Undeclared removeUnreferencedKeys.
Smalltalk garbageCollect.

like this:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image eval --save "Character characterSet: nil. Unicode classPool at: #GeneralCategory put: nil. Unicode classPool at: #DecimalProperty put: nil. UnicodeDefinitions removeFromSystem. ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

and my image went down from 11MB to 6.6MB (7.0 MB if I don't change back to ascii with the first three lines)

Then I tried a tally:

./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image save spacetally

./vm/pharo spacetally.image eval --save "repo := MCFileTreeRepository new directory: '../src' asFileReference. version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'. version load."

re-clean since i loaded some packages

./vm/pharo spacetally.image eval --save "ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."

This image is now 6.6MB (7.1MB with the unicode large arrays), 4.1% of strings (274k) what seems reasonable. Remaining big strings are Pharo's licence, the buffer of the changes file and then some class comments (shouldn't they be fetched from disk as any other method source code?).

Making again a tally shows that ~30% of the space is taken by Arrays and 21.9% by compiled methods. But, BUT! :) I have ~30k arrays and lots of collections also:

"MethodDictionary"              2872 +
"IdentitySet"                         12781 + 
"OrderedCollection"             4398 + 
"Set"                                     2959 +
"Dictionary"                          1997 +
"IdentityDictionary"               454
-----------------------------------------------
25461

So there are ~5k arrays that are used outside collections.

Worth exploring a bit more I think.

On Wed, Aug 16, 2017 at 1:23 AM, Guillermo Polito <[hidden email]> wrote:


On Tue, Aug 15, 2017 at 11:26 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems very convoluted to load a single package into the image, I was trying to avoid having to create a baselineOf for something so simple - I ended up with:

I know, I also believe we have to simplify this. In any case, baselines are healthy as they allow to also express dependencies. Otherwise you'll end up loading dependencies by hand. We'll fix this soon I hope.
 

repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
version load.

Anyway - in my minimal image, like in the fat image there seems to be a surprising amount of bytestrings (4mb worth?). I think that might need some digging into? It seems like a lot somehow. Although Ben’s neat experiment of zipping strings shows that’s not a real route.

In a deployed minimal image - maybe I can get rid of some other things like MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but noticeable)

Class                                          code space # instances  inst space     percent   inst average size
ByteString                                           2640       37365       4823848       21.50              129.10
Array                                                3742       53002       3961944       17.60               74.75
CompiledMethod                                      19159       30481       2912968       13.00               95.57
Association                                          1148       58348       1867136        8.30               32.00
MethodChangeRecord                                    431       34312       1097984        4.90               32.00
ByteArray                                            4605         290        908728        4.00             3133.54
ByteSymbol                                           1698       22689        840168        3.70               37.03
IdentitySet                                           408       19076        610432        2.70               32.00
MethodDictionary                                     3310        3520        608688        2.70              172.92
WeakArray                                            1758        3024        597824        2.70              197.69
MCMethodDefinition                                   4318        6659        426176        1.90               64.00
Protocol                                             1679        8382        268224        1.20               32.00
OrderedCollection                                    6555        5509        220360        1.00               40.00 

As an aside - my Gitlab project is public, the scripts that load things up are in ./scripts (build.sh, and minimal.st and loadlocal.st)

Tim

On 15 Aug 2017, at 08:02, Guillermo Polito <[hidden email]> wrote:



On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <[hidden email]> wrote:
Hi Guille - just running SpaceTally on my dev image to get a feel for it. It turns out that in the minimal images you’ve been creating, its not loaded (makes sense).

Yup, it's loaded afterwards.

All packages are loaded through metacello baselines. We should start refactoring and making standalone projects, each one with a baseline for himself, and his own dependencies described.

I was checking on your gitlab and I have probably no access: how are you finally loading packages in the bootstrap image? Can you share that with us in text? I'd like to improve that situation.
 
I’m wondering if there is an easy way to import it in (I guess that package should be in the Pharo git tree I cloned to get Fuel loaded right? Or is there a separate standalone source?).

Yes it is, you can get the package programatically doing 

SpaceTally package name

And furthermore, get the baseline that currently is loading by doing

package := SpaceTally package name.
BaselineOf subclasses select: [ :e | 
e project version packages anySatisfy: [ :p | p name = package ]].
 

Thanks for all the support, and your email about why the contexts stack up is very well received (I will comment over there).

By the way - it looks like Martin Fowler picked up on this announcement - so maybe we might get some interest from his mass of followers.

Tim

On 14 Aug 2017, at 10:49, Guillermo Polito <[hidden email]> wrote:

Hi Tim,

On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <[hidden email]> wrote:
Hey guys, thanks for your enthusiasm around this - and I cannot stress enough how this was only possible because of the work that has gone into making Pharo (in particular the 64bit image, as well as having a minimal image, and some great blog posts on serialising contexts) as well as the patience from everyone in answering questions and helping me get it all working.

I’m still quite keen to get my execution time back down under 800ms and I’d like to actually get back to writing a few skills to automate a few things around my house.

To Answer Denis’ question - 

My final footprint is 30.4mb - thats composed of a 22mb image (with a simple example that pulls in Fuel, ZTimestamp and the S3 Library which depends on XMLParser) and then the VM (from which I removed obvious dll’s).

In my original experiments with a 6.0 minimal image - I did manage to get to a 13.4mb image (which started out as 12mb original size, and then loaded in STON and had only a simple clock example). I think the sweet spot is around 20mb total footprint as that seems to get me into the 450ms-900ms range.

The 7.0 min image now starts out at 15mb and then I’m not sure why loading Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug into that).

You can do further space analysis using the following expression

SpaceTally  new printSpaceAnalysis

You can do that in an eval and check what's taking space. With measures we can iterate and improve :).
 
I’ve also found (and this on the back of unserialising the context in my example) that the way we build images has 15+ saved stack sessions that have saved on top of each other from the way we build up the images. I don’t yet know the implications of size/speed of these - but we need a better way of folding executions when we snapshot headless images. I’m also not clear if there are any other startup tasks that take precious time (this also has implications for our fat development images as they take much longer to appear than they really should).

I'm working on this as I'm writing this mail ;)


I'll write down the implications further in a different thread.


I’ll be exploring some of these size/speed tradeoff’s in follow on messages.

But once again, a big thanks - I’ve not enjoyed programming like this for ages.

Tim

On 12 Aug 2017, at 16:26, Ben Coman <[hidden email]> wrote:

hi Tim,  

That is.....      AWESOME!

Very nice delivery - it flowed well with great narration. 

I loved @2:17 "this is the interesting piece, because PharoLambda has serialized the execution context of its application and saved it into [my S3 bucket] ... [then on the local machine] rematerializes a debugger [on that context]."

There is a clarity in your video presentation that really may intrigue outsiders. As a community we should push this on the usual hacker forums - ycombinator could be a good starting point (but I'm locked out of my account there).  
An enticing title could be...
"Debugging Lambdas by re-materializing saved execution contexts on your local machine."

cheers -ben

On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <[hidden email]> wrote:
This is cool Tim.

So what image size you deployed at the end?

2017-08-10 15:47 GMT+02:00 Tim Mackinnon <[hidden email]>:
I just wanted to thank everyone for their help in getting my pet project further along, so that now I can announce that PharoLambda is now working with the V7 minimal image and also supports post mortem debugging by saving a zipped fuel context onto S3.

This latter item is particularly satisfying as at a recent serverless conference (JeffConf) there was a panel where poor development tools on serverless platforms was highlighted as a real problem.

In our community we’ve had these kinds of tools at our fingertips for ages - but I don’t think the wider development community has really noticed. Debugging something short lived like a Lambda execution is quite startling, as the current answer is “add more logging”, and we all know that sucks. To this end, I’ve created a little screencast showing this in action - and it was pretty cool because it was a real example I encountered when I got everything working and was trying my test application out.

I’ve also put a bit of work into tuning the excellent GitLab CI tools, so that I can cache many of the artefacts used between different build runs (this might also be of interest to others using CI systems).

The Gitlab project is on: https://gitlab.com/macta/PharoLambda

Tim


On 15 Jul 2017, at 00:39, Tim Mackinnon <[hidden email]> wrote:

Hi - I’ve been playing around with getting Pharo to run well on AWS Lambda. It’s early days, but I though it might be interesting to share what I’ve learned so far.

Usage examples and code at https://gitlab.com/macta/PharoLambda

With help from many of the folks here, I’ve been able to get a simple example to run in 500ms-1200ms with a minimal Pharo 6 image. You can easily try it out yourself. This seems slightly better than what the GoLang folks have been able to do.

Tim







-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13



-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: <a href="tel:+33%206%2052%2070%2066%2013" value="+33652706613" target="_blank" class="">+33 06 52 70 66 13




-- 
   
Guille Polito

Research Engineer
French National Center for Scientific Research - http://www.cnrs.fr


Phone: +33 06 52 70 66 13


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

alistairgrant
In reply to this post by Tim Mackinnon
Hi Tim,

On Thu, Aug 17, 2017 at 01:07:06PM +0100, Tim Mackinnon wrote:

> Just thought I would report back a bit more on this -
>
> The Unicode change doesn?t work in my case (possibly not for command line Pharo
> as well) as I get an error where OS filename?s need unicode support (actually I
> think this is where its trying to write to stdout, but I didn?t dig more into
> this):
>
> Error: Instances of UndefinedObject are not indexable
> UndefinedObject(Object)>>error:
> UndefinedObject(Object)>>errorNotIndexable
> UndefinedObject(Object)>>size
> Unicode class>>isLetter:
> Character>>isLetter
> Path class>>isAbsoluteWindowsPath:
> Path class>>from:delimiter:
> MacStore(FileSystemStore)>>pathFromString:
> FileSystem>>pathFromString:
> ByteString(String)>>asPathWith:
> FileSystem>>pathFromObject:
> FileSystem>>referenceTo:
> ByteString(String)>>asFileReference
> FileStream class>>fullName:
> FileStream class>>fileNamed:
> SmalltalkImage>>openLog

#size is only sent to UniCode's class variable GeneralCategory in
#isLetter:.

Are you sure that while preparing your image for deployment you didn't
clear GeneralCategory accidentally?  (in my image GeneralCategory is an
instance of a SparseLargeTable)

HTH,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] PharoLambda 1.5 - Pharo running on AWS Lambda now with saved Debug sessions via S3

Guillermo Polito


On Thu, Aug 17, 2017 at 8:51 PM, Alistair Grant <[hidden email]> wrote:
Hi Tim,

On Thu, Aug 17, 2017 at 01:07:06PM +0100, Tim Mackinnon wrote:
> Just thought I would report back a bit more on this -
>
> The Unicode change doesn?t work in my case (possibly not for command line Pharo
> as well) as I get an error where OS filename?s need unicode support (actually I
> think this is where its trying to write to stdout, but I didn?t dig more into
> this):
>
> Error: Instances of UndefinedObject are not indexable
> UndefinedObject(Object)>>error:
> UndefinedObject(Object)>>errorNotIndexable
> UndefinedObject(Object)>>size
> Unicode class>>isLetter:
> Character>>isLetter
> Path class>>isAbsoluteWindowsPath:
> Path class>>from:delimiter:
> MacStore(FileSystemStore)>>pathFromString:
> FileSystem>>pathFromString:
> ByteString(String)>>asPathWith:
> FileSystem>>pathFromObject:
> FileSystem>>referenceTo:
> ByteString(String)>>asFileReference
> FileStream class>>fullName:
> FileStream class>>fileNamed:
> SmalltalkImage>>openLog

#size is only sent to UniCode's class variable GeneralCategory in
#isLetter:.

Are you sure that while preparing your image for deployment you didn't
clear GeneralCategory accidentally?  (in my image GeneralCategory is an
instance of a SparseLargeTable)

Indeed, he did it on purpose trying to save 1.5MB of memory :P 

HTH,
Alistair




--

   

Guille Polito


Research Engineer

French National Center for Scientific Research - http://www.cnrs.fr



Web: http://guillep.github.io

Phone: +33 06 52 70 66 13

12