PR: Compressed sources inside the object memory

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

PR: Compressed sources inside the object memory

Pavel Krivanek-3
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 
Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Denis Kudriashov
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)
 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 

Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Pavel Krivanek-3


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 


Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

HilaireFernandes
In reply to this post by Pavel Krivanek-3
Did a successful build of drgeo with this image. However Athens Canvas
broken again (see screenshot). Is it because missing image? There was
not complain during the image build though.  Still don't know how to get
the trace of the error log.

Build image is here[1] in case it is useful.

Hilaire

[1] https://www.dropbox.com/s/wc18e21p371z28f/DrGeo.app-18.01a.zip?dl=0


Le 28/12/2017 à 15:50, Pavel Krivanek a écrit :
>
> Please review this PR. As a side effect it makes the sources file
> mechanism (but not changes file) independent on the FileStream classes
> that are deprecated.
>

--
Dr. Geo
http://drgeo.eu


Dr. Geo -- 2017-12-28.jpeg (180K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Pavel Krivanek-3
I'm trying this DrGeo image on Linux VM and I have no problems with Athens but with the FreeType. The image does not start and reports following bugs:

FT2Error: Freetype2 primitive failed [error 1][cannot open resource]
FreeTypeFace(FT2Handle)>>primitiveFailed:
FreeTypeFace(FT2Handle)>>primitiveFailed
FreeTypeFace(FT2Face)>>primNewFaceFromFile:index:
FreeTypeFace(FT2Face)>>newFaceFromFile:index:
FreeTypeFace>>create
FreeTypeFace>>validate
FreeTypeFont>>face

To be able to run it I had to remove the FreeType plugin from the VM but then I have no Athens error. Try to build this image without opening of the main DrGeo window.

-- Pavel 


2017-12-28 16:29 GMT+01:00 Hilaire <[hidden email]>:
Did a successful build of drgeo with this image. However Athens Canvas broken again (see screenshot). Is it because missing image? There was not complain during the image build though.  Still don't know how to get the trace of the error log.

Build image is here[1] in case it is useful.

Hilaire

[1] https://www.dropbox.com/s/wc18e21p371z28f/DrGeo.app-18.01a.zip?dl=0



Le 28/12/2017 à 15:50, Pavel Krivanek a écrit :

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.


--
Dr. Geo
http://drgeo.eu


Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Denis Kudriashov
In reply to this post by Pavel Krivanek-3
2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

It is unbelievable that we can have single image file. I think it is very attractive goal for Pharo 7. And it is so close. 
 

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 



Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Denis Kudriashov
In reply to this post by Pavel Krivanek-3
Interesting to compare speed of search operation like "find string in sources"

2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 



Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Denis Kudriashov
2017-12-28 17:07 GMT+01:00 Denis Kudriashov <[hidden email]>:
Interesting to compare speed of search operation like "find string in sources"

With Calypso you can do it by:

[(ClyMethodSources withString: 'example' from: ClyNavigationEnvironment currentImageScope ) execute] timeToRun
 

2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 




Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

HilaireFernandes
In reply to this post by Pavel Krivanek-3
It is funny, because I have reverse problem: FreeType renders nicely the
font but not Athens.

The opening of the DrGeo main window instance is done at start up, will
this be an issue? I guessed not as opening the window is done from the
user start up list, but closing the window opened at start up then
opening a new one, the Athens canvas is right. Gush... Now there is
issue with the startup, absent from previous P7 image, the issue was
then with the missing source file.


Le 28/12/2017 à 16:55, Pavel Krivanek a écrit :

> I'm trying this DrGeo image on Linux VM and I have no problems with
> Athens but with the FreeType. The image does not start and reports
> following bugs:
>
> FT2Error: Freetype2 primitive failed [error 1][cannot open resource]
> FreeTypeFace(FT2Handle)>>primitiveFailed:
> FreeTypeFace(FT2Handle)>>primitiveFailed
> FreeTypeFace(FT2Face)>>primNewFaceFromFile:index:
> FreeTypeFace(FT2Face)>>newFaceFromFile:index:
> FreeTypeFace>>create
> FreeTypeFace>>validate
> FreeTypeFont>>face
>
> To be able to run it I had to remove the FreeType plugin from the VM
> but then I have no Athens error. Try to build this image without
> opening of the main DrGeo window.
>

--
Dr. Geo
http://drgeo.eu



Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Pavel Krivanek-3
In reply to this post by Denis Kudriashov
For string searching in source codes it is 1.5s (external sources) vs 3.7s (in image compressed), for recompilation of the image it was for me 51s vs 56s.

-- Pavel

2017-12-28 17:10 GMT+01:00 Denis Kudriashov <[hidden email]>:
2017-12-28 17:07 GMT+01:00 Denis Kudriashov <[hidden email]>:
Interesting to compare speed of search operation like "find string in sources"

With Calypso you can do it by:

[(ClyMethodSources withString: 'example' from: ClyNavigationEnvironment currentImageScope ) execute] timeToRun
 

2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 





Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Denis Kudriashov


2017-12-28 20:02 GMT+01:00 Pavel Krivanek <[hidden email]>:
For string searching in source codes it is 1.5s (external sources) vs 3.7s (in image compressed), for recompilation of the image it was for me 51s vs 56s.

I expected much better performance than external file. Can it be optimized?
I guess that every "aMethod sourceCode" decompresses full section. And it can be easily cached.
To measure maximum speed can you check the version with uncompressed inmemory sources? 
 

-- Pavel

2017-12-28 17:10 GMT+01:00 Denis Kudriashov <[hidden email]>:
2017-12-28 17:07 GMT+01:00 Denis Kudriashov <[hidden email]>:
Interesting to compare speed of search operation like "find string in sources"

With Calypso you can do it by:

[(ClyMethodSources withString: 'example' from: ClyNavigationEnvironment currentImageScope ) execute] timeToRun
 

2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 






Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Pavel Krivanek-3


2017-12-28 22:29 GMT+01:00 Denis Kudriashov <[hidden email]>:


2017-12-28 20:02 GMT+01:00 Pavel Krivanek <[hidden email]>:
For string searching in source codes it is 1.5s (external sources) vs 3.7s (in image compressed), for recompilation of the image it was for me 51s vs 56s.

I expected much better performance than external file. Can it be optimized?
I guess that every "aMethod sourceCode" decompresses full section. And it can be easily cached.
To measure maximum speed can you check the version with uncompressed inmemory sources? 

Raw memory file has time for searching for a string in the image 2.1s BUT if the file is stored on disk, the time is 16.5s so about 10-times slower than current version. I suppose that the original fast mechanism used MultiByteFileStream but the new mechanism uses ZnCharacterReadStream. So, in the same conditions the compressed memory stream is almost 5-times faster than disk access. We should try to find some optimization but I think that it is not necessary to do it immediately.

-- Pavel
 
 

-- Pavel

2017-12-28 17:10 GMT+01:00 Denis Kudriashov <[hidden email]>:
2017-12-28 17:07 GMT+01:00 Denis Kudriashov <[hidden email]>:
Interesting to compare speed of search operation like "find string in sources"

With Calypso you can do it by:

[(ClyMethodSources withString: 'example' from: ClyNavigationEnvironment currentImageScope ) execute] timeToRun
 

2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 







Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Denis Kudriashov


2017-12-29 10:07 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 22:29 GMT+01:00 Denis Kudriashov <[hidden email]>:


2017-12-28 20:02 GMT+01:00 Pavel Krivanek <[hidden email]>:
For string searching in source codes it is 1.5s (external sources) vs 3.7s (in image compressed), for recompilation of the image it was for me 51s vs 56s.

I expected much better performance than external file. Can it be optimized?
I guess that every "aMethod sourceCode" decompresses full section. And it can be easily cached.
To measure maximum speed can you check the version with uncompressed inmemory sources? 

Raw memory file has time for searching for a string in the image 2.1s BUT if the file is stored on disk, the time is 16.5s so about 10-times slower than current version. I suppose that the original fast mechanism used MultiByteFileStream but the new mechanism uses ZnCharacterReadStream.

For file sources we should use ZnBufferedReadStream over ZnCharacterReadStream. It should lead to similar performance as old MultiByteFileStream.
 
So, in the same conditions the compressed memory stream is almost 5-times faster than disk access. We should try to find some optimization but I think that it is not necessary to do it immediately.

-- Pavel
 
 

-- Pavel

2017-12-28 17:10 GMT+01:00 Denis Kudriashov <[hidden email]>:
2017-12-28 17:07 GMT+01:00 Denis Kudriashov <[hidden email]>:
Interesting to compare speed of search operation like "find string in sources"

With Calypso you can do it by:

[(ClyMethodSources withString: 'example' from: ClyNavigationEnvironment currentImageScope ) execute] timeToRun
 

2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:


2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
Nice.

2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
Hi,

when the Pharo 7 image is being built on top of the bootstrapped image, all loaded code is placed in the changes file so this process ends with an empty sources file, huge changes file and the image. We condense the sources file for every build so we distribute the Pharo 7 as three files - image, huge sources (over 30 MB) and almost empty changes file.
As consequence every build has own sources file that can be shared between images based on the same build - so when we will release the Pharo 7, the sources file can be shared as it was for previous releases, but as soon as some fixing version of Pharo 7 will be released, it will require a new specific sources file.

As one possible solution I tried to create a pull request that resurrects from oblivion an old code made for Squeak 3.7 that introduced option of compressed sources files. Such files use segments that are stored in compressed form so it is not so small as one single zip file but still good enough. From original uncompressed file of size 30.8 MB it generates 7.1 MB file while single zip has 6.1 MB.

Of course the image file is bigger but:
- the amount of new objects is low so it should not make much more stress on GC

And how many objects it adds? (how many sections are compressed?)

It uses currently 1619 segments of size 20000 bytes but that does not mean that such amount of objects is needed because in the memory filesystem all the (compressed) data are in single ByteArray and the stream class uses a very simple segments table - one big array of 1619 integer items that contain indexes to the ByteArray.

 
- we already used to have big images because the condenser was not working and it was not a big issue
- the amount of distributed files will be the same as in case of Pharo 6 (image+changes) so it means less troubles
- the sources file can be easily restored, deleted from the object memory and used the same way as before 
- we are not using the nice concept of the image well enough. All data that are not really shared between images or serve as backup information should be stored inside image

Can we avoid empty changes file? We can create it on demand (when user modifies code for example). 

I think we can because it contains only do-its. We should be able to simply delete it and only modify the image start-up do do not require it.

-- Pavel
 
 

Please review this PR. As a side effect it makes the sources file mechanism (but not changes file) independent on the FileStream classes that are deprecated.

PR: https://github.com/pharo-project/pharo/pull/631

The pre-built image:
 https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip

Cheers,
-- Pavel

 








Reply | Threaded
Open this post in threaded view
|

Re: PR: Compressed sources inside the object memory

Stephane Ducasse-3
In reply to this post by Denis Kudriashov
Not Pharo 70. Too many issues on other fronts. Ring2, calypso, iceberg....


On Thu, Dec 28, 2017 at 5:05 PM, Denis Kudriashov <[hidden email]> wrote:

> 2017-12-28 16:21 GMT+01:00 Pavel Krivanek <[hidden email]>:
>>
>>
>>
>> 2017-12-28 16:00 GMT+01:00 Denis Kudriashov <[hidden email]>:
>>>
>>> Nice.
>>>
>>> 2017-12-28 15:50 GMT+01:00 Pavel Krivanek <[hidden email]>:
>>>>
>>>> Hi,
>>>>
>>>> when the Pharo 7 image is being built on top of the bootstrapped image,
>>>> all loaded code is placed in the changes file so this process ends with an
>>>> empty sources file, huge changes file and the image. We condense the sources
>>>> file for every build so we distribute the Pharo 7 as three files - image,
>>>> huge sources (over 30 MB) and almost empty changes file.
>>>> As consequence every build has own sources file that can be shared
>>>> between images based on the same build - so when we will release the Pharo
>>>> 7, the sources file can be shared as it was for previous releases, but as
>>>> soon as some fixing version of Pharo 7 will be released, it will require a
>>>> new specific sources file.
>>>>
>>>> As one possible solution I tried to create a pull request that
>>>> resurrects from oblivion an old code made for Squeak 3.7 that introduced
>>>> option of compressed sources files. Such files use segments that are stored
>>>> in compressed form so it is not so small as one single zip file but still
>>>> good enough. From original uncompressed file of size 30.8 MB it generates
>>>> 7.1 MB file while single zip has 6.1 MB.
>>>>
>>>> Of course the image file is bigger but:
>>>> - the amount of new objects is low so it should not make much more
>>>> stress on GC
>>>
>>>
>>> And how many objects it adds? (how many sections are compressed?)
>>
>>
>> It uses currently 1619 segments of size 20000 bytes but that does not mean
>> that such amount of objects is needed because in the memory filesystem all
>> the (compressed) data are in single ByteArray and the stream class uses a
>> very simple segments table - one big array of 1619 integer items that
>> contain indexes to the ByteArray.
>>
>>>
>>>>
>>>> - we already used to have big images because the condenser was not
>>>> working and it was not a big issue
>>>> - the amount of distributed files will be the same as in case of Pharo 6
>>>> (image+changes) so it means less troubles
>>>> - the sources file can be easily restored, deleted from the object
>>>> memory and used the same way as before
>>>> - we are not using the nice concept of the image well enough. All data
>>>> that are not really shared between images or serve as backup information
>>>> should be stored inside image
>>>
>>>
>>> Can we avoid empty changes file? We can create it on demand (when user
>>> modifies code for example).
>>
>>
>> I think we can because it contains only do-its. We should be able to
>> simply delete it and only modify the image start-up do do not require it.
>
>
> It is unbelievable that we can have single image file. I think it is very
> attractive goal for Pharo 7. And it is so close.
>
>>
>>
>> -- Pavel
>>
>>>
>>>
>>>>
>>>>
>>>> Please review this PR. As a side effect it makes the sources file
>>>> mechanism (but not changes file) independent on the FileStream classes that
>>>> are deprecated.
>>>>
>>>> PR: https://github.com/pharo-project/pharo/pull/631
>>>>
>>>> The pre-built image:
>>>>
>>>> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-631/lastSuccessfulBuild/artifact/bootstrap-cache/Pharo7.0-32bit-57ddc75.zip
>>>>
>>>> Cheers,
>>>> -- Pavel
>>>>
>>>>
>>>
>>>
>>
>