Smalltalk › Pharo › Pharo Smalltalk Developers

Breaking the 4GB barrier with Pharo 6 64-bit

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

57 messages Options

123

Sven Van Caekenberghe-2

Re: Breaking the 4GB barrier with Pharo 6 64-bit

> On 10 Nov 2016, at 10:25, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
>> On 10 Nov 2016, at 10:10, Denis Kudriashov <[hidden email]> wrote:
>>
>>
>> 2016-11-09 23:30 GMT+01:00 Nicolas Cellier <[hidden email]>:
>> uptime 0h0m0s
>> memory 70,918,144 bytes
>> old 61,966,112 bytes (87.4%)
>> young 2,781,608 bytes (3.9000000000000004%)
>> I see yet another bad usage of round:/roundTo: --------------^
>>
>> It just printed float :). I think anybody round values in this statistics.
>
> Nothing should be rounded. Just compute the percentage and then use #printShowingDecimalPlaces: or #printOn:showingDecimalPlaces:

For example,

'Status OK - Clock {1} - Allocated {2} bytes - {3} % free.' format: {
DateAndTime now.
self memoryTotal asStringWithCommas.
(self memoryFree / self memoryTotal * 100.0) printShowingDecimalPlaces: 2 }

Prints

Status OK - Clock 2016-11-10T09:47:18.367242+00:00 - Allocated 217,852,528 bytes - 2.36 % free.

(This is part of NeoConsole, a REPL package).

philippeback

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Denis Kudriashov

On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <[hidden email]> wrote:

2016-11-10 9:49 GMT+01:00 [hidden email] <[hidden email]>:
Ah, but then it may be more interesting to have a data image (maybe a lot of these) and a front end image.

Isn't Seamless something that could help us here? No need to bring the data back, just manipulate it through proxies.

Problem that server image will anyway perform GC. And it will be slow if server image is big which will stop all world.

What if we asked it to not do any GC at all? Like if we have tons of RAM, why bother? Especially if what it is used to is to keep datasets: load them, save image to disk. When needed trash the loaded stuff and reload from zero.

Basically that is what happens with Spark.

http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo

https://0x0fff.com/spark-misconceptions/

and Tachyon/Alluxio is kind of solving this kind of issue (may be nice to have that interacting with Pharo image). http://www.alluxio.org/ This thing basically keeps stuff in memory in case one needs to reuse the data between workload runs.

Or have an object memory for work and one for datasets (first one gets GC'd, the other one isn't).

Phil

kilon.alios

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Tudor Girba-2

On Thu, Nov 10, 2016 at 11:43 AM Tudor Girba <[hidden email]> wrote:

Hi Igor,

I am happy to see you getting active again. The next step is to commit code at the rate you reply emails. I’d be even happier :).

aouch that was not very nice....

I agree with Igor and Phil, there is no genuine interest by the community for optimising Pharo for big data. Which makes sense because coders that care a lot about performance stick with C/C++. Not that I blame them. You cant have your cake and eat it too.

No idea why you would want to add 128+ GB of RAM on a computer, its not as if CPUs are powerful enough to deal with such a massive amount of data even if you do your coding at C level.

I know because I am working daily with 3d graphics.

Foremost CPUs have lost the war, GPUs have dominated for almost a decade now especially in the area of large parallelism , its quite easy for a cheap GPU nowdays to outperform a CPU by 10 times , and some expensive ones can even 100 times faster than the fastest CPU. But that is for doing the same calculation over a very large data set.

If you go down that path you need OpenCL or CUDA support in Pharo. Assuming you wanna do it all in Pharo. Because modern GPUs are so generic in functionality that are used in many areas that having nothing to do with graphics and are very popular especially for physical simulations which are cases that data can reach easily in TBs or even PBs.

Also a solution that I am implementing with CPPBridge would make sense here, a shared memory area that lives outside the VM memory so it cannot be garbage collected but still inside the Pharo process for Pharo to have direct access to it with no compromise on performance. Also being shared means that multiple instances of Pharo can have direct access to it giving you true parallelism.

If you want to get the comforts of pharo including GC then you move a portion of the data to VM by copying data from the shared memory to Pharo objects and of course erasing or overwriting the data at the shared memory side so you dont waste RAM.

You can also delegate which pharo instance deals with what portion of the shared memory so you can optimise the use of multiple cores, data processing that will benefit from GPUs pararrelism should be moved to GPUs with the appropriate Pharo library.

The memory mapped file storing the share memory will be stripping any meta data and storing the data in its most compact format, while data that needs to be more flexible and more high level can be stored inside a Pharo image.

If 10 Pharos execute at the same time one of those instance can be performing the role of manager of streaming data from hard drive to shared memory in the background without affecting the performance of other Pharos. This will give you the ability to deal with TBs of data and take advantage old computers with little memory.

Out of all that I will be materializing the shared memory part , the protocol and the memory mapped file that will save the shared memory. Because I dont need the rest.

Of course here comes the debate why do it in Pharo and instead use a C/C++ library or C support for CUDA/OpenCL and let pharo just be in the driving seat perform the role of manager.

This is how Python is used by modern scientists. C++ libraries driven by Python scripting. Pharo can do the same.

I dont believe optimising GC will be an ideal solution. It is not even necessary.

NorbertHartl

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Tudor Girba-2

> Am 10.11.2016 um 10:42 schrieb Tudor Girba <[hidden email]>:
>
> Hi Igor,
>
> I am happy to see you getting active again. The next step is to commit code at the rate you reply emails. I’d be even happier :).
>
+1

> To address your point, of course it certainly would be great to have more people work on automated support for swapping data in and out of the image. That was the original idea behind the Fuel work. I have seen a couple of cases on the mailing lists where people are actually using Fuel for caching purposes. I have done this a couple of times, too. But, at this point these are dedicated solutions and would be interesting to see it expand further.
>
And still it would be to general. The only thing you can say is that swapping in/out will make it slower. So you usually don't want that it is swapping. It is comparable with swap space in OSes. In many use case scenarios having swap at all is an architectural design failure. So before having a problem that resources get sparse there are good points not to care too much. And if you want to do it there is no general solution to it. How do you swap out a partial graph with fuel? How can you load back a small part graph of the graph you swapped out? Do we need to reify object references into objects in order to make that smart?
It is understandable from a developers perspective. You have a real problem you should solve but then you make up all sorts of technical problems that you think you need to solve instead of the original problem. That is one prominent way how projects fail.

> However, your assumption is that the best design is one that deals with small chunks of data at a time. This made a lot of sense when memory was expensive and small. But, these days the cost is going down very rapidly, and sizes of 128+ GB of RAM is nowadays quite cheap, and there are strong signs of super large non-volatile memories become increasingly accessible. The software design should take advantage of what hardware offers, so it is not unreasonable to want to have a GC that can deal with large size.
>
Be it small chunks of data or not. A statement that general is most likely to be wrong. So the best way might be to ignore it. Indeed you are right that hardware got cheap. Even more important is the fact that hardware is almost always cheaper than personal costs. Solving all those technical problems instead of real ones and not trying to act in an economical way ruins a lot of companies out there. You can ignore economical facts (are any other) but that doesn't make you really smart!

my 2 cents,

Norbert

> We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.
>
> Cheers,
> Doru
>
>
>> On Nov 10, 2016, at 9:12 AM, Igor Stasenko <[hidden email]> wrote:
>>
>>
>> On 10 November 2016 at 07:27, Tudor Girba <[hidden email]> wrote:
>> Hi Igor,
>>
>> Please refrain from speaking down on people.
>>
>>
>> Hi, Doru!
>> I just wanted to hear you :)
>>
>> If you have a concrete solution for how to do things, please feel free to share it with us. We would be happy to learn from it.
>>
>>
>> Well, there's so many solutions, that i even don't know what to offer, and given the potential of smalltalk, i wonder why
>> you are not employing any. But in overall it is a quesition of storing most of your data on disk, and only small portion of it
>> in image (in most optimal cases - only the portion that user sees/operates with).
>> As i said to you before, you will hit this wall inevitably, no matter how much memory is available.
>> So, what stops you from digging in that direction?
>> Because even if you can fit all data in memory, consider how much time it takes for GC to scan 4+ Gb of memory, comparing to
>> 100 MB or less.
>> I don't think you'll find it convenient to work in environment where you'll have 2-3 seconds pauses between mouse clicks.
>> So, of course, my tone is not acceptable, but its pain to see how people remain helpless without even thinking about
>> doing what they need. We have Fuel for how many years now?
>> So it can't be as easy as it is, just serialize the data and purge it from image, till it will be required again.
>> Sure it will require some effort, but it is nothing comparing to day to day pain that you have to tolerate because of lack of solution.
>>
>> Cheers,
>> Tudor
>>
>>
>>> On Nov 10, 2016, at 4:11 AM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> Nice progress, indeed.
>>> Now i hope at the end of the day, the guys who doing data mining/statistical analysis will finally shut up and happily be able
>>> to work with more bloat without need of learning a ways to properly manage memory & resources, and implement them finally.
>>> But i guess, that won't be long silence, before they again start screaming in despair: please help, my bloat doesn't fits into memory... :)
>>>
>>> On 9 November 2016 at 12:06, Sven Van Caekenberghe <[hidden email]> wrote:
>>> OK, I am quite excited about the future possibilities of 64-bit Pharo. So I played a bit more with the current test version [1], trying to push the limits. In the past, it was only possible to safely allocate about 1.5GB of memory even though a 32-bit process' limit is theoretically 4GB (the OS and the VM need space too).
>>>
>>> Allocating a couple of 1GB ByteArrays is one way to push memory use, but it feels a bit silly. So I loaded a bunch of projects (including Seaside) to push the class/method counts (7K classes, 100K methods) and wrote a script [2] that basically copies part of the class/method metadata including 2 copies of each's methods source code as well as its AST (bypassing the cache of course). This feels more like a real object graph.
>>>
>>> I had to create no less than 7 (SEVEN) copies (each kept open in an inspector) to break through the mythical 4GB limit (real allocated & used memory).
>>>
>>> <Screen Shot 2016-11-09 at 11.25.28.png>
>>>
>>> I also have the impression that the image shrinking problem is gone (closing everything frees memory, saving the image has it return to its original size, 100MB in this case).
>>>
>>> Great work, thank you. Bright future again.
>>>
>>> Sven
>>>
>>> PS: Yes, GC is slower; No, I did not yet try to save such a large image.
>>>
>>> [1]
>>>
>>> VM here: http://bintray.com/estebanlm/pharo-vm/build#files/
>>> Image here: http://files.pharo.org/get-files/60/pharo-64.zip
>>>
>>> [2]
>>>
>>> | meta |
>>> ASTCache reset.
>>> meta := Dictionary new.
>>> Smalltalk allClassesAndTraits do: [ :each | | classMeta methods |
>>> (classMeta := Dictionary new)
>>> at: #name put: each name asSymbol;
>>> at: #comment put: each comment;
>>> at: #definition put: each definition;
>>> at: #object put: each.
>>> methods := Dictionary new.
>>> classMeta at: #methods put: methods.
>>> each methodsDo: [ :method | | methodMeta |
>>> (methodMeta := Dictionary new)
>>> at: #name put: method selector;
>>> at: #source put: method sourceCode;
>>> at: #ast put: method ast;
>>> at: #args put: method argumentNames asArray;
>>> at: #formatted put: method ast formattedCode;
>>> at: #comment put: (method comment ifNotNil: [ :str | str withoutQuoting ]);
>>> at: #object put: method.
>>> methods at: method selector put: methodMeta ].
>>> meta at: each name asSymbol put: classMeta ].
>>> meta.
>>>
>>>
>>>
>>> --
>>> Sven Van Caekenberghe
>>> Proudly supporting Pharo
>>> http://pharo.org
>>> http://association.pharo.org
>>> http://consortium.pharo.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko.
>>
>> --
>> www.tudorgirba.com
>> www.feenk.com
>>
>> "We can create beautiful models in a vacuum.
>> But, to get them effective we have to deal with the inconvenience of reality."
>>
>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "Not knowing how to do something is not an argument for how it cannot be done."
>
>

Thierry Goubier

Re: Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-10 12:18 GMT+01:00 Norbert Hartl <[hidden email]>:

[ ...]

Be it small chunks of data or not. A statement that general is most likely to be wrong. So the best way might be to ignore it. Indeed you are right that hardware got cheap. Even more important is the fact that hardware is almost always cheaper than personal costs. Solving all those technical problems instead of real ones and not trying to act in an economical way ruins a lot of companies out there. You can ignore economical facts (are any other) but that doesn't make you really smart!

I disagree with that. In some areas (HPC, Exascale, HPDA), whatever the physical limit is, we will reach and go larger than that.

Now, about that memory aspect, there is an entire field dedicated to algorithmic solutions that never require the entire data set in memory. You just have to look and implement the right underlying abstractions to allow those algorithms to be implemented and run efficiently.

(my best example for that: satelite imagery viewers... have allways been able to handle images larger than the computer RAM size. Just need a buffered streaming interface to the file).

Thierry

my 2 cents,

Norbert

> We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.
>
> Cheers,
> Doru
>
>
>> On Nov 10, 2016, at 9:12 AM, Igor Stasenko <[hidden email]> wrote:
>>
>>
>> On 10 November 2016 at 07:27, Tudor Girba <[hidden email]> wrote:
>> Hi Igor,
>>
>> Please refrain from speaking down on people.
>>
>>
>> Hi, Doru!
>> I just wanted to hear you :)
>>
>> If you have a concrete solution for how to do things, please feel free to share it with us. We would be happy to learn from it.
>>
>>
>> Well, there's so many solutions, that i even don't know what to offer, and given the potential of smalltalk, i wonder why
>> you are not employing any. But in overall it is a quesition of storing most of your data on disk, and only small portion of it
>> in image (in most optimal cases - only the portion that user sees/operates with).
>> As i said to you before, you will hit this wall inevitably, no matter how much memory is available.
>> So, what stops you from digging in that direction?
>> Because even if you can fit all data in memory, consider how much time it takes for GC to scan 4+ Gb of memory, comparing to
>> 100 MB or less.
>> I don't think you'll find it convenient to work in environment where you'll have 2-3 seconds pauses between mouse clicks.
>> So, of course, my tone is not acceptable, but its pain to see how people remain helpless without even thinking about
>> doing what they need. We have Fuel for how many years now?
>> So it can't be as easy as it is, just serialize the data and purge it from image, till it will be required again.
>> Sure it will require some effort, but it is nothing comparing to day to day pain that you have to tolerate because of lack of solution.
>>
>> Cheers,
>> Tudor
>>
>>
>>> On Nov 10, 2016, at 4:11 AM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> Nice progress, indeed.
>>> Now i hope at the end of the day, the guys who doing data mining/statistical analysis will finally shut up and happily be able
>>> to work with more bloat without need of learning a ways to properly manage memory & resources, and implement them finally.
>>> But i guess, that won't be long silence, before they again start screaming in despair: please help, my bloat doesn't fits into memory... :)
>>>
>>> On 9 November 2016 at 12:06, Sven Van Caekenberghe <[hidden email]> wrote:
>>> OK, I am quite excited about the future possibilities of 64-bit Pharo. So I played a bit more with the current test version [1], trying to push the limits. In the past, it was only possible to safely allocate about 1.5GB of memory even though a 32-bit process' limit is theoretically 4GB (the OS and the VM need space too).
>>>
>>> Allocating a couple of 1GB ByteArrays is one way to push memory use, but it feels a bit silly. So I loaded a bunch of projects (including Seaside) to push the class/method counts (7K classes, 100K methods) and wrote a script [2] that basically copies part of the class/method metadata including 2 copies of each's methods source code as well as its AST (bypassing the cache of course). This feels more like a real object graph.
>>>
>>> I had to create no less than 7 (SEVEN) copies (each kept open in an inspector) to break through the mythical 4GB limit (real allocated & used memory).
>>>
>>> <Screen Shot 2016-11-09 at 11.25.28.png>
>>>
>>> I also have the impression that the image shrinking problem is gone (closing everything frees memory, saving the image has it return to its original size, 100MB in this case).
>>>
>>> Great work, thank you. Bright future again.
>>>
>>> Sven
>>>
>>> PS: Yes, GC is slower; No, I did not yet try to save such a large image.
>>>
>>> [1]
>>>
>>> VM here: http://bintray.com/estebanlm/pharo-vm/build#files/
>>> Image here: http://files.pharo.org/get-files/60/pharo-64.zip
>>>
>>> [2]
>>>
>>> | meta |
>>> ASTCache reset.
>>> meta := Dictionary new.
>>> Smalltalk allClassesAndTraits do: [ :each | | classMeta methods |
>>> (classMeta := Dictionary new)
>>> at: #name put: each name asSymbol;
>>> at: #comment put: each comment;
>>> at: #definition put: each definition;
>>> at: #object put: each.
>>> methods := Dictionary new.
>>> classMeta at: #methods put: methods.
>>> each methodsDo: [ :method | | methodMeta |
>>> (methodMeta := Dictionary new)
>>> at: #name put: method selector;
>>> at: #source put: method sourceCode;
>>> at: #ast put: method ast;
>>> at: #args put: method argumentNames asArray;
>>> at: #formatted put: method ast formattedCode;
>>> at: #comment put: (method comment ifNotNil: [ :str | str withoutQuoting ]);
>>> at: #object put: method.
>>> methods at: method selector put: methodMeta ].
>>> meta at: each name asSymbol put: classMeta ].
>>> meta.
>>>
>>>
>>>
>>> --
>>> Sven Van Caekenberghe
>>> Proudly supporting Pharo
>>> http://pharo.org
>>> http://association.pharo.org
>>> http://consortium.pharo.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko.
>>
>> --
>> www.tudorgirba.com
>> www.feenk.com
>>
>> "We can create beautiful models in a vacuum.
>> But, to get them effective we have to deal with the inconvenience of reality."
>>
>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "Not knowing how to do something is not an argument for how it cannot be done."
>
>

NorbertHartl

Re: Breaking the 4GB barrier with Pharo 6 64-bit

Am 10.11.2016 um 12:27 schrieb Thierry Goubier <[hidden email]>:

2016-11-10 12:18 GMT+01:00 Norbert Hartl <[hidden email]>:
[ ...]

Be it small chunks of data or not. A statement that general is most likely to be wrong. So the best way might be to ignore it. Indeed you are right that hardware got cheap. Even more important is the fact that hardware is almost always cheaper than personal costs. Solving all those technical problems instead of real ones and not trying to act in an economical way ruins a lot of companies out there. You can ignore economical facts (are any other) but that doesn't make you really smart!

I disagree with that. In some areas (HPC, Exascale, HPDA), whatever the physical limit is, we will reach and go larger than that.

To what you disagree? I didn't say you never need it. In your case you have concrete examples where it is necessary. In a lot of other cases it is counter productive. Isn't that an agreement that you cannot say it in a way too general way?

Now, about that memory aspect, there is an entire field dedicated to algorithmic solutions that never require the entire data set in memory. You just have to look and implement the right underlying abstractions to allow those algorithms to be implemented and run efficiently.

And that is good. I think you got me wrong. I find it important to be able to handle partial graphs in memory. But should everyone doing some statistical research be one implementing that? Something like this I took from the complaint making that the most important part and it is not.

Norbert

(my best example for that: satelite imagery viewers... have allways been able to handle images larger than the computer RAM size. Just need a buffered streaming interface to the file).

Thierry

my 2 cents,

Norbert

> We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.
>
> Cheers,
> Doru
>
>
>> On Nov 10, 2016, at 9:12 AM, Igor Stasenko <[hidden email]> wrote:
>>
>>
>> On 10 November 2016 at 07:27, Tudor Girba <[hidden email]> wrote:
>> Hi Igor,
>>
>> Please refrain from speaking down on people.
>>
>>
>> Hi, Doru!
>> I just wanted to hear you :)
>>
>> If you have a concrete solution for how to do things, please feel free to share it with us. We would be happy to learn from it.
>>
>>
>> Well, there's so many solutions, that i even don't know what to offer, and given the potential of smalltalk, i wonder why
>> you are not employing any. But in overall it is a quesition of storing most of your data on disk, and only small portion of it
>> in image (in most optimal cases - only the portion that user sees/operates with).
>> As i said to you before, you will hit this wall inevitably, no matter how much memory is available.
>> So, what stops you from digging in that direction?
>> Because even if you can fit all data in memory, consider how much time it takes for GC to scan 4+ Gb of memory, comparing to
>> 100 MB or less.
>> I don't think you'll find it convenient to work in environment where you'll have 2-3 seconds pauses between mouse clicks.
>> So, of course, my tone is not acceptable, but its pain to see how people remain helpless without even thinking about
>> doing what they need. We have Fuel for how many years now?
>> So it can't be as easy as it is, just serialize the data and purge it from image, till it will be required again.
>> Sure it will require some effort, but it is nothing comparing to day to day pain that you have to tolerate because of lack of solution.
>>
>> Cheers,
>> Tudor
>>
>>
>>> On Nov 10, 2016, at 4:11 AM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> Nice progress, indeed.
>>> Now i hope at the end of the day, the guys who doing data mining/statistical analysis will finally shut up and happily be able
>>> to work with more bloat without need of learning a ways to properly manage memory & resources, and implement them finally.
>>> But i guess, that won't be long silence, before they again start screaming in despair: please help, my bloat doesn't fits into memory... :)
>>>
>>> On 9 November 2016 at 12:06, Sven Van Caekenberghe <[hidden email]> wrote:
>>> OK, I am quite excited about the future possibilities of 64-bit Pharo. So I played a bit more with the current test version [1], trying to push the limits. In the past, it was only possible to safely allocate about 1.5GB of memory even though a 32-bit process' limit is theoretically 4GB (the OS and the VM need space too).
>>>
>>> Allocating a couple of 1GB ByteArrays is one way to push memory use, but it feels a bit silly. So I loaded a bunch of projects (including Seaside) to push the class/method counts (7K classes, 100K methods) and wrote a script [2] that basically copies part of the class/method metadata including 2 copies of each's methods source code as well as its AST (bypassing the cache of course). This feels more like a real object graph.
>>>
>>> I had to create no less than 7 (SEVEN) copies (each kept open in an inspector) to break through the mythical 4GB limit (real allocated & used memory).
>>>
>>> <Screen Shot 2016-11-09 at 11.25.28.png>
>>>
>>> I also have the impression that the image shrinking problem is gone (closing everything frees memory, saving the image has it return to its original size, 100MB in this case).
>>>
>>> Great work, thank you. Bright future again.
>>>
>>> Sven
>>>
>>> PS: Yes, GC is slower; No, I did not yet try to save such a large image.
>>>
>>> [1]
>>>
>>> VM here: http://bintray.com/estebanlm/pharo-vm/build#files/
>>> Image here: http://files.pharo.org/get-files/60/pharo-64.zip
>>>
>>> [2]
>>>
>>> | meta |
>>> ASTCache reset.
>>> meta := Dictionary new.
>>> Smalltalk allClassesAndTraits do: [ :each | | classMeta methods |
>>> (classMeta := Dictionary new)
>>> at: #name put: each name asSymbol;
>>> at: #comment put: each comment;
>>> at: #definition put: each definition;
>>> at: #object put: each.
>>> methods := Dictionary new.
>>> classMeta at: #methods put: methods.
>>> each methodsDo: [ :method | | methodMeta |
>>> (methodMeta := Dictionary new)
>>> at: #name put: method selector;
>>> at: #source put: method sourceCode;
>>> at: #ast put: method ast;
>>> at: #args put: method argumentNames asArray;
>>> at: #formatted put: method ast formattedCode;
>>> at: #comment put: (method comment ifNotNil: [ :str | str withoutQuoting ]);
>>> at: #object put: method.
>>> methods at: method selector put: methodMeta ].
>>> meta at: each name asSymbol put: classMeta ].
>>> meta.
>>>
>>>
>>>
>>> --
>>> Sven Van Caekenberghe
>>> Proudly supporting Pharo
>>> http://pharo.org
>>> http://association.pharo.org
>>> http://consortium.pharo.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko.
>>
>> --
>> www.tudorgirba.com
>> www.feenk.com
>>
>> "We can create beautiful models in a vacuum.
>> But, to get them effective we have to deal with the inconvenience of reality."
>>
>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "Not knowing how to do something is not an argument for how it cannot be done."
>
>

Thierry Goubier

Re: Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-10 12:38 GMT+01:00 Norbert Hartl <[hidden email]>:

Am 10.11.2016 um 12:27 schrieb Thierry Goubier <[hidden email]>:

2016-11-10 12:18 GMT+01:00 Norbert Hartl <[hidden email]>:
[ ...]

Be it small chunks of data or not. A statement that general is most likely to be wrong. So the best way might be to ignore it. Indeed you are right that hardware got cheap. Even more important is the fact that hardware is almost always cheaper than personal costs. Solving all those technical problems instead of real ones and not trying to act in an economical way ruins a lot of companies out there. You can ignore economical facts (are any other) but that doesn't make you really smart!

I disagree with that. In some areas (HPC, Exascale, HPDA), whatever the physical limit is, we will reach and go larger than that.

To what you disagree? I didn't say you never need it. In your case you have concrete examples where it is necessary. In a lot of other cases it is counter productive. Isn't that an agreement that you cannot say it in a way too general way?

It is hard to disagree with something so general :) But, what we strive for is still to be general, otherwise we wouldn't have general purpose programming languages... mostly because a domain specific, dedicated solution is a costly proposition.

Now, about that memory aspect, there is an entire field dedicated to algorithmic solutions that never require the entire data set in memory. You just have to look and implement the right underlying abstractions to allow those algorithms to be implemented and run efficiently.

And that is good. I think you got me wrong. I find it important to be able to handle partial graphs in memory. But should everyone doing some statistical research be one implementing that? Something like this I took from the complaint making that the most important part and it is not.

Well, take again my "larger than memory" image example. Optimizing for that case makes the image viewer more efficient in the general case, so, yes, it can be argued that everybody should write statistical research in a "out-of-memory" system: will cost almost nothing in efficiency on datasets small enough, will allow the system to scale. Otherwise you end up with the R situation, where it runs your stat nice and fine until you reach an unknown (to you) size limit, where it crashes or seems to run forever (if not worse).

Thierry

Norbert

(my best example for that: satelite imagery viewers... have allways been able to handle images larger than the computer RAM size. Just need a buffered streaming interface to the file).

Thierry

my 2 cents,

Norbert

> We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.
>
> Cheers,
> Doru
>
>
>> On Nov 10, 2016, at 9:12 AM, Igor Stasenko <[hidden email]> wrote:
>>
>>
>> On 10 November 2016 at 07:27, Tudor Girba <[hidden email]> wrote:
>> Hi Igor,
>>
>> Please refrain from speaking down on people.
>>
>>
>> Hi, Doru!
>> I just wanted to hear you :)
>>
>> If you have a concrete solution for how to do things, please feel free to share it with us. We would be happy to learn from it.
>>
>>
>> Well, there's so many solutions, that i even don't know what to offer, and given the potential of smalltalk, i wonder why
>> you are not employing any. But in overall it is a quesition of storing most of your data on disk, and only small portion of it
>> in image (in most optimal cases - only the portion that user sees/operates with).
>> As i said to you before, you will hit this wall inevitably, no matter how much memory is available.
>> So, what stops you from digging in that direction?
>> Because even if you can fit all data in memory, consider how much time it takes for GC to scan 4+ Gb of memory, comparing to
>> 100 MB or less.
>> I don't think you'll find it convenient to work in environment where you'll have 2-3 seconds pauses between mouse clicks.
>> So, of course, my tone is not acceptable, but its pain to see how people remain helpless without even thinking about
>> doing what they need. We have Fuel for how many years now?
>> So it can't be as easy as it is, just serialize the data and purge it from image, till it will be required again.
>> Sure it will require some effort, but it is nothing comparing to day to day pain that you have to tolerate because of lack of solution.
>>
>> Cheers,
>> Tudor
>>
>>
>>> On Nov 10, 2016, at 4:11 AM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> Nice progress, indeed.
>>> Now i hope at the end of the day, the guys who doing data mining/statistical analysis will finally shut up and happily be able
>>> to work with more bloat without need of learning a ways to properly manage memory & resources, and implement them finally.
>>> But i guess, that won't be long silence, before they again start screaming in despair: please help, my bloat doesn't fits into memory... :)
>>>
>>> On 9 November 2016 at 12:06, Sven Van Caekenberghe <[hidden email]> wrote:
>>> OK, I am quite excited about the future possibilities of 64-bit Pharo. So I played a bit more with the current test version [1], trying to push the limits. In the past, it was only possible to safely allocate about 1.5GB of memory even though a 32-bit process' limit is theoretically 4GB (the OS and the VM need space too).
>>>
>>> Allocating a couple of 1GB ByteArrays is one way to push memory use, but it feels a bit silly. So I loaded a bunch of projects (including Seaside) to push the class/method counts (7K classes, 100K methods) and wrote a script [2] that basically copies part of the class/method metadata including 2 copies of each's methods source code as well as its AST (bypassing the cache of course). This feels more like a real object graph.
>>>
>>> I had to create no less than 7 (SEVEN) copies (each kept open in an inspector) to break through the mythical 4GB limit (real allocated & used memory).
>>>
>>> <Screen Shot 2016-11-09 at 11.25.28.png>
>>>
>>> I also have the impression that the image shrinking problem is gone (closing everything frees memory, saving the image has it return to its original size, 100MB in this case).
>>>
>>> Great work, thank you. Bright future again.
>>>
>>> Sven
>>>
>>> PS: Yes, GC is slower; No, I did not yet try to save such a large image.
>>>
>>> [1]
>>>
>>> VM here: http://bintray.com/estebanlm/pharo-vm/build#files/
>>> Image here: http://files.pharo.org/get-files/60/pharo-64.zip
>>>
>>> [2]
>>>
>>> | meta |
>>> ASTCache reset.
>>> meta := Dictionary new.
>>> Smalltalk allClassesAndTraits do: [ :each | | classMeta methods |
>>> (classMeta := Dictionary new)
>>> at: #name put: each name asSymbol;
>>> at: #comment put: each comment;
>>> at: #definition put: each definition;
>>> at: #object put: each.
>>> methods := Dictionary new.
>>> classMeta at: #methods put: methods.
>>> each methodsDo: [ :method | | methodMeta |
>>> (methodMeta := Dictionary new)
>>> at: #name put: method selector;
>>> at: #source put: method sourceCode;
>>> at: #ast put: method ast;
>>> at: #args put: method argumentNames asArray;
>>> at: #formatted put: method ast formattedCode;
>>> at: #comment put: (method comment ifNotNil: [ :str | str withoutQuoting ]);
>>> at: #object put: method.
>>> methods at: method selector put: methodMeta ].
>>> meta at: each name asSymbol put: classMeta ].
>>> meta.
>>>
>>>
>>>
>>> --
>>> Sven Van Caekenberghe
>>> Proudly supporting Pharo
>>> http://pharo.org
>>> http://association.pharo.org
>>> http://consortium.pharo.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko.
>>
>> --
>> www.tudorgirba.com
>> www.feenk.com
>>
>> "We can create beautiful models in a vacuum.
>> But, to get them effective we have to deal with the inconvenience of reality."
>>
>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "Not knowing how to do something is not an argument for how it cannot be done."
>
>

Igor Stasenko

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Tudor Girba-2

On 10 November 2016 at 11:42, Tudor Girba <[hidden email]> wrote:

Hi Igor,

I am happy to see you getting active again. The next step is to commit code at the rate you reply emails. I’d be even happier :).

To address your point, of course it certainly would be great to have more people work on automated support for swapping data in and out of the image. That was the original idea behind the Fuel work. I have seen a couple of cases on the mailing lists where people are actually using Fuel for caching purposes. I have done this a couple of times, too. But, at this point these are dedicated solutions and would be interesting to see it expand further.

However, your assumption is that the best design is one that deals with small chunks of data at a time. This made a lot of sense when memory was expensive and small. But, these days the cost is going down very rapidly, and sizes of 128+ GB of RAM is nowadays quite cheap, and there are strong signs of super large non-volatile memories become increasingly accessible. The software design should take advantage of what hardware offers, so it is not unreasonable to want to have a GC that can deal with large size.

The speed of GC will always be in linear dependency from the size of governed memory. Yes, yes.. super fast and super clever, made by some wizard.. but still same dependency.
So, it will be always in your interest to keep memory footprint as small as possible. PERIOD.

We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.

What you saying is just: okay, we have a problem here, we hit a wall.. But we don't look for solutions! Instead let us sit and wait till someone else will be so generous to help with it.

WOW, what a brilliant strategy!!

So, you putting fate of your project(s) into hands of 3-rd party, which
a) maybe , only maybe will work to solve your problem in next 10 years
b) may decide it not worth effort right now(never) and focus on something else, because they have own priorities after all

Are you serious?
"Our furniture don't fits in modern truck(s), so let us wait will industry invent bigger trucks, build larger roads and then we will move" Hilarious!

In that case, the problem that you arising is not that mission-critical to you, and thus making constant noise about your problem(s) is just what it is: a noise.

Which returns us to my original mail with offensive tone.

Cheers,
Doru

--
www.tudorgirba.com
www.feenk.com

"Not knowing how to do something is not an argument for how it cannot be done."

Best regards,
Igor Stasenko.

Aliaksei Syrel

Re: Breaking the 4GB barrier with Pharo 6 64-bit

> The speed of GC will always be in linear dependency from the size of governed memory.

Asymptotic complexity of GC is O(N), where N is heap size - amount of objects, not memory size.

I agree, however, that it's not good to create a lot of short living objects. That is why there are many practices how to overcome this problem. For example Object Pool can be nice example.

Nevertheless I can imagine many usecasses when breaking 4GB limit is useful. For example double buffering during rendering process. 1 pixel takes 32bit of memory => 8k image (near future displays) would take 126Mb of memory. Double buffering would be useful for Roassal (huge zoomed out visualization).

Storing 126Mb array object takes a lot of memory but does not influence on GC performance since it is just one object on the heap.

Cheers
Alex

On Nov 10, 2016 5:02 PM, "Igor Stasenko" <[hidden email]> wrote:

On 10 November 2016 at 11:42, Tudor Girba <[hidden email]> wrote:
Hi Igor,

I am happy to see you getting active again. The next step is to commit code at the rate you reply emails. I’d be even happier :).

To address your point, of course it certainly would be great to have more people work on automated support for swapping data in and out of the image. That was the original idea behind the Fuel work. I have seen a couple of cases on the mailing lists where people are actually using Fuel for caching purposes. I have done this a couple of times, too. But, at this point these are dedicated solutions and would be interesting to see it expand further.

However, your assumption is that the best design is one that deals with small chunks of data at a time. This made a lot of sense when memory was expensive and small. But, these days the cost is going down very rapidly, and sizes of 128+ GB of RAM is nowadays quite cheap, and there are strong signs of super large non-volatile memories become increasingly accessible. The software design should take advantage of what hardware offers, so it is not unreasonable to want to have a GC that can deal with large size.

The speed of GC will always be in linear dependency from the size of governed memory. Yes, yes.. super fast and super clever, made by some wizard.. but still same dependency.
So, it will be always in your interest to keep memory footprint as small as possible. PERIOD.

We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.

What you saying is just: okay, we have a problem here, we hit a wall.. But we don't look for solutions! Instead let us sit and wait till someone else will be so generous to help with it.
WOW, what a brilliant strategy!!
So, you putting fate of your project(s) into hands of 3-rd party, which
a) maybe , only maybe will work to solve your problem in next 10 years
b) may decide it not worth effort right now(never) and focus on something else, because they have own priorities after all

Are you serious?
"Our furniture don't fits in modern truck(s), so let us wait will industry invent bigger trucks, build larger roads and then we will move" Hilarious!

In that case, the problem that you arising is not that mission-critical to you, and thus making constant noise about your problem(s) is just what it is: a noise.
Which returns us to my original mail with offensive tone.

Cheers,
Doru

--
www.tudorgirba.com
www.feenk.com

"Not knowing how to do something is not an argument for how it cannot be done."

--
Best regards,
Igor Stasenko.

Sven Van Caekenberghe-2

Re: Breaking the 4GB barrier with Pharo 6 64-bit

> On 10 Nov 2016, at 17:35, Aliaksei Syrel <[hidden email]> wrote:
>
> > The speed of GC will always be in linear dependency from the size of governed memory.
>
> Asymptotic complexity of GC is O(N), where N is heap size - amount of objects, not memory size.

Even that is not necessarily true, Generational Garbage collection and other tricks can avoid a full heap GC for a long time, even (or especially) under memory allocation stress.

Apart from that, of course we have to write the most resource efficient code that we can !

> I agree, however, that it's not good to create a lot of short living objects. That is why there are many practices how to overcome this problem. For example Object Pool can be nice example.
>
> Nevertheless I can imagine many usecasses when breaking 4GB limit is useful. For example double buffering during rendering process. 1 pixel takes 32bit of memory => 8k image (near future displays) would take 126Mb of memory. Double buffering would be useful for Roassal (huge zoomed out visualization).
>
> Storing 126Mb array object takes a lot of memory but does not influence on GC performance since it is just one object on the heap.
>
> Cheers
> Alex
>
>
> On Nov 10, 2016 5:02 PM, "Igor Stasenko" <[hidden email]> wrote:
>
>
> On 10 November 2016 at 11:42, Tudor Girba <[hidden email]> wrote:
> Hi Igor,
>
> I am happy to see you getting active again. The next step is to commit code at the rate you reply emails. I’d be even happier :).
>
> To address your point, of course it certainly would be great to have more people work on automated support for swapping data in and out of the image. That was the original idea behind the Fuel work. I have seen a couple of cases on the mailing lists where people are actually using Fuel for caching purposes. I have done this a couple of times, too. But, at this point these are dedicated solutions and would be interesting to see it expand further.
>
> However, your assumption is that the best design is one that deals with small chunks of data at a time. This made a lot of sense when memory was expensive and small. But, these days the cost is going down very rapidly, and sizes of 128+ GB of RAM is nowadays quite cheap, and there are strong signs of super large non-volatile memories become increasingly accessible. The software design should take advantage of what hardware offers, so it is not unreasonable to want to have a GC that can deal with large size.
>
> The speed of GC will always be in linear dependency from the size of governed memory. Yes, yes.. super fast and super clever, made by some wizard.. but still same dependency.
> So, it will be always in your interest to keep memory footprint as small as possible. PERIOD.
>
> We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.
>
>
> What you saying is just: okay, we have a problem here, we hit a wall.. But we don't look for solutions! Instead let us sit and wait till someone else will be so generous to help with it.
> WOW, what a brilliant strategy!!
> So, you putting fate of your project(s) into hands of 3-rd party, which
> a) maybe , only maybe will work to solve your problem in next 10 years
> b) may decide it not worth effort right now(never) and focus on something else, because they have own priorities after all
>
> Are you serious?
> "Our furniture don't fits in modern truck(s), so let us wait will industry invent bigger trucks, build larger roads and then we will move" Hilarious!
>
> In that case, the problem that you arising is not that mission-critical to you, and thus making constant noise about your problem(s) is just what it is: a noise.
> Which returns us to my original mail with offensive tone.
>
>
> Cheers,
> Doru
>
>
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "Not knowing how to do something is not an argument for how it cannot be done."
>
>
>
>
>
> --
> Best regards,
> Igor Stasenko.

Tudor Girba-2

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Igor Stasenko

Hi Igor,

I see you are still having fun :). I am not sure what you are arguing about, but it does not seem to be much related to what I said.

And again, I would be very happy to work with you on something concrete. Just let me know if this is of interest and perhaps we can channel the energy on solutions rather than on discussions like this.

Cheers,
Doru

> On Nov 10, 2016, at 5:01 PM, Igor Stasenko <[hidden email]> wrote:
>
>
>
> On 10 November 2016 at 11:42, Tudor Girba <[hidden email]> wrote:
> Hi Igor,
>
> I am happy to see you getting active again. The next step is to commit code at the rate you reply emails. I’d be even happier :).
>
> To address your point, of course it certainly would be great to have more people work on automated support for swapping data in and out of the image. That was the original idea behind the Fuel work. I have seen a couple of cases on the mailing lists where people are actually using Fuel for caching purposes. I have done this a couple of times, too. But, at this point these are dedicated solutions and would be interesting to see it expand further.
>
> However, your assumption is that the best design is one that deals with small chunks of data at a time. This made a lot of sense when memory was expensive and small. But, these days the cost is going down very rapidly, and sizes of 128+ GB of RAM is nowadays quite cheap, and there are strong signs of super large non-volatile memories become increasingly accessible. The software design should take advantage of what hardware offers, so it is not unreasonable to want to have a GC that can deal with large size.
>
> The speed of GC will always be in linear dependency from the size of governed memory. Yes, yes.. super fast and super clever, made by some wizard.. but still same dependency.
> So, it will be always in your interest to keep memory footprint as small as possible. PERIOD.
>
> We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.
>
>
> What you saying is just: okay, we have a problem here, we hit a wall.. But we don't look for solutions! Instead let us sit and wait till someone else will be so generous to help with it.
> WOW, what a brilliant strategy!!
> So, you putting fate of your project(s) into hands of 3-rd party, which
> a) maybe , only maybe will work to solve your problem in next 10 years
> b) may decide it not worth effort right now(never) and focus on something else, because they have own priorities after all
>
> Are you serious?
> "Our furniture don't fits in modern truck(s), so let us wait will industry invent bigger trucks, build larger roads and then we will move" Hilarious!
>
> In that case, the problem that you arising is not that mission-critical to you, and thus making constant noise about your problem(s) is just what it is: a noise.
> Which returns us to my original mail with offensive tone.
>
>
> Cheers,
> Doru
>
>
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "Not knowing how to do something is not an argument for how it cannot be done."
>
>
>
>
>
> --
> Best regards,
> Igor Stasenko.

--
www.tudorgirba.com
www.feenk.com

"From an abstract enough point of view, any two things are similar."

Aliaksei Syrel

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Sven Van Caekenberghe-2

On 10 November 2016 at 17:41, Sven Van Caekenberghe <[hidden email]> wrote:

Even that is not necessarily true, Generational Garbage collection and other tricks can avoid a full heap GC for a long time, even (or especially) under memory allocation stress.

That is why it is Big O notation (upper bound / worst case) ;)

Igor Stasenko

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Sven Van Caekenberghe-2

On 10 November 2016 at 18:41, Sven Van Caekenberghe <[hidden email]> wrote:

> On 10 Nov 2016, at 17:35, Aliaksei Syrel <[hidden email]> wrote:
>
> > The speed of GC will always be in linear dependency from the size of governed memory.
>
> Asymptotic complexity of GC is O(N), where N is heap size - amount of objects, not memory size.

Even that is not necessarily true, Generational Garbage collection and other tricks can avoid a full heap GC for a long time, even (or especially) under memory allocation stress.

That's why it asymptotic.. Still more objects => more memory.. O(N) => O(K).. so my statement holds true.

And all of the tricks is a puny attempts to workaround that , all those generational, multi-generational, permanent space etc etc. It does helps, of course,
but not solves the problem. Since you can always invent a real-world scenario where you can put it on knees and so, that it from 'asymptotic' becomes quite
'symptotic'.. so, all those elaborations does not dismiss my argument, especially when we're talking about large data.

When it comes about BIG data - manual data/resource management is the way to go. The rest is handwaving and self-delugion :)

Apart from that, of course we have to write the most resource efficient code that we can !

> I agree, however, that it's not good to create a lot of short living objects. That is why there are many practices how to overcome this problem. For example Object Pool can be nice example.
>
> Nevertheless I can imagine many usecasses when breaking 4GB limit is useful. For example double buffering during rendering process. 1 pixel takes 32bit of memory => 8k image (near future displays) would take 126Mb of memory. Double buffering would be useful for Roassal (huge zoomed out visualization).
>
> Storing 126Mb array object takes a lot of memory but does not influence on GC performance since it is just one object on the heap.
>
> Cheers
> Alex
>
>
> On Nov 10, 2016 5:02 PM, "Igor Stasenko" <[hidden email]> wrote:
>
>
> On 10 November 2016 at 11:42, Tudor Girba <[hidden email]> wrote:
> Hi Igor,
>
> I am happy to see you getting active again. The next step is to commit code at the rate you reply emails. I’d be even happier :).
>
> To address your point, of course it certainly would be great to have more people work on automated support for swapping data in and out of the image. That was the original idea behind the Fuel work. I have seen a couple of cases on the mailing lists where people are actually using Fuel for caching purposes. I have done this a couple of times, too. But, at this point these are dedicated solutions and would be interesting to see it expand further.
>
> However, your assumption is that the best design is one that deals with small chunks of data at a time. This made a lot of sense when memory was expensive and small. But, these days the cost is going down very rapidly, and sizes of 128+ GB of RAM is nowadays quite cheap, and there are strong signs of super large non-volatile memories become increasingly accessible. The software design should take advantage of what hardware offers, so it is not unreasonable to want to have a GC that can deal with large size.
>
> The speed of GC will always be in linear dependency from the size of governed memory. Yes, yes.. super fast and super clever, made by some wizard.. but still same dependency.
> So, it will be always in your interest to keep memory footprint as small as possible. PERIOD.
>
> We should always challenge the assumptions behind our designs, because the world keeps changing and we risk becoming irrelevant, a syndrome that is not foreign to Smalltalk aficionados.
>
>
> What you saying is just: okay, we have a problem here, we hit a wall.. But we don't look for solutions! Instead let us sit and wait till someone else will be so generous to help with it.
> WOW, what a brilliant strategy!!
> So, you putting fate of your project(s) into hands of 3-rd party, which
> a) maybe , only maybe will work to solve your problem in next 10 years
> b) may decide it not worth effort right now(never) and focus on something else, because they have own priorities after all
>
> Are you serious?
> "Our furniture don't fits in modern truck(s), so let us wait will industry invent bigger trucks, build larger roads and then we will move" Hilarious!
>
> In that case, the problem that you arising is not that mission-critical to you, and thus making constant noise about your problem(s) is just what it is: a noise.
> Which returns us to my original mail with offensive tone.
>
>
> Cheers,
> Doru
>
>
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "Not knowing how to do something is not an argument for how it cannot be done."
>
>
>
>
>
> --
> Best regards,
> Igor Stasenko.

Best regards,
Igor Stasenko.

Igor Stasenko

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Tudor Girba-2

On 10 November 2016 at 18:57, Tudor Girba <[hidden email]> wrote:

Hi Igor,

I see you are still having fun :). I am not sure what you are arguing about, but it does not seem to be much related to what I said.

It is not fun, seeing that after years since we discussed this problem, and i shared my view on it, nothing changed.
I really wish that problem be lifted from your eyesight. But your rhetoric tells me that you prefer to sit and wait instead of solving it.

But feel free to tell me, if i am wrong.

And again, I would be very happy to work with you on something concrete. Just let me know if this is of interest and perhaps we can channel the energy on solutions rather than on discussions like this.

Why bother? Lets wait till be will have desktops with 1TB of RAM :)
Ohh. sorry.
Yeah, unfortunately i don't have much free time right now to dedicate to Pharo. But who knows, it may change.
As you can see i keep coming, because smalltalk is not something you can forget after you learned it :)

Please, don't take my tone too close. Its my frustration takes offensive forms. My frustration, that i assuming that you can help youself, because your problem is not that hard to solve.

But instead, you prefer to rely on somebody else's effort(s). Arrhgghhh!! :)

Cheers,
Doru

>
> --
> Best regards,
> Igor Stasenko.

--
www.tudorgirba.com
www.feenk.com

"From an abstract enough point of view, any two things are similar."

Best regards,
Igor Stasenko.

Stephan Eggermont-3

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Sven Van Caekenberghe-2

Igor wrote:
>Now i hope at the end of the day,
>the guys who doing data mining/statistical
>analysis will finally shut up and happily
>be able to work with more bloat without
>need of learning a ways to properly
>manage memory & resources, and
>implement them finally.

The actual problem is of course having to work with all that data before you understand the structure. Or highly interconnected structures with unpredictable access patterns. Partial graphs are nice, once you understand how to partition. Needing to understand how to partition first is a dependency I'd rather avoid.

>Because even if you can fit all data in
>memory, consider how much time it takes
>for GC to scan 4+ Gb of memory,

That's often not what is happening. The large data is mostly static, so gets moved out of new space very quickly. Otherwise working with large data quickly becomes annoying indeed. I fully agree with you on that.

Stephan

Igor Stasenko

Re: Breaking the 4GB barrier with Pharo 6 64-bit

On 10 November 2016 at 19:58, Stephan Eggermont <[hidden email]> wrote:

Igor wrote:
>Now i hope at the end of the day,
>the guys who doing data mining/statistical
>analysis will finally shut up and happily
>be able to work with more bloat without
>need of learning a ways to properly
>manage memory & resources, and
>implement them finally.

The actual problem is of course having to work with all that data before you understand the structure. Or highly interconnected structures with unpredictable access patterns. Partial graphs are nice, once you understand how to partition. Needing to understand how to partition first is a dependency I'd rather avoid.

No, no, no! This is simply not true.
It is you, who writes the code that generates a lot of statistical data/analysis data, and its output is fairly predictable.. else you are not collecting any data, but just a random noise, isn't?
Those graphs are far from being unpredictable, because they are product of a software you wrote.

Its not unpredictable, unless you claim that code you write is unpredictable, then i wonder, what are you doing in a field of data analysis, if you
admit that your data is nothing but just a dice roll?
If you cannot tame & reason about the complexity of own code, then maybe better to change occupation and go work in casino? :)

I mean, Doru is light years ahead of me and many others in field of data analysis.. so what i can advise to him on his playground?

You absolutely right, that the most hardest part, you identified, is find the way how you dissect the graph data on smaller chunks. And storing such dissected graph in chunks on a hard drive outside of image and loading in case of need, is just nothing compared to the first part.

And if Doru can't handle this, then who else can? Me? I have nothing comparing to his experience in that field. I had very little/occasional experience in my career where i had to deal with such domain. Cmon..

>Because even if you can fit all data in
>memory, consider how much time it takes
>for GC to scan 4+ Gb of memory,

That's often not what is happening. The large data is mostly static, so gets moved out of new space very quickly. Otherwise working with large data quickly becomes annoying indeed. I fully agree with you on that.

Stephan

Best regards,
Igor Stasenko.

Stephan Eggermont-3

Re: Breaking the 4GB barrier with Pharo 6 64-bit

On 10/11/16 21:35, Igor Stasenko wrote:
> No, no, no! This is simply not true.
> It is you, who writes the code that generates a lot of statistical
> data/analysis data, and its output is fairly predictable.. else you are
> not collecting any data, but just a random noise, isn't?

That would be green field development. In brown field development, I
only get in when people start noticing there is a problem (why do we
need more than 4GBytes for this?). At that point I want to be able to
load everything they can give me in an image so I can start analyzing
and structuring it.

> I mean, Doru is light years ahead of me and many others in field of data
> analysis.. so what i can advise to him on his playground?

Well, the current FAMIX model implementation is clearly not well
structured for analyzing large code bases. And it is difficult to
partition because of unpredictable access patterns and high
interconnection.

Stephan

Tudor Girba-2

Re: Breaking the 4GB barrier with Pharo 6 64-bit

Hi,

The discussion on this thread had nothing to do with FAMIX or Moose. It had to do with people’s ability to load larger pieces of data in the image without the VM imposing a low limit on it. There are clear scenarios where this is desirable, and I do not understand why this is a topic of conversation these days. The move to 64bit is a significant advantage in this area and we should applaud it.

Doru

> On Nov 11, 2016, at 11:29 AM, Stephan Eggermont <[hidden email]> wrote:
>
> On 10/11/16 21:35, Igor Stasenko wrote:
>> No, no, no! This is simply not true.
>> It is you, who writes the code that generates a lot of statistical
>> data/analysis data, and its output is fairly predictable.. else you are
>> not collecting any data, but just a random noise, isn't?
>
> That would be green field development. In brown field development, I only get in when people start noticing there is a problem (why do we need more than 4GBytes for this?). At that point I want to be able to load everything they can give me in an image so I can start analyzing and structuring it.
>
>> I mean, Doru is light years ahead of me and many others in field of data
>> analysis.. so what i can advise to him on his playground?
>
> Well, the current FAMIX model implementation is clearly not well structured for analyzing large code bases. And it is difficult to partition because of unpredictable access patterns and high interconnection.
>
> Stephan
>
>

--
www.tudorgirba.com
www.feenk.com

"Some battles are better lost than fought."

Thierry Goubier

Re: Breaking the 4GB barrier with Pharo 6 64-bit

In reply to this post by Stephan Eggermont-3

Le 11/11/2016 à 11:29, Stephan Eggermont a écrit :

> On 10/11/16 21:35, Igor Stasenko wrote:
>> No, no, no! This is simply not true.
>> It is you, who writes the code that generates a lot of statistical
>> data/analysis data, and its output is fairly predictable.. else you are
>> not collecting any data, but just a random noise, isn't?
>
> That would be green field development. In brown field development, I
> only get in when people start noticing there is a problem (why do we
> need more than 4GBytes for this?). At that point I want to be able to
> load everything they can give me in an image so I can start analyzing
> and structuring it.
>
>> I mean, Doru is light years ahead of me and many others in field of data
>> analysis.. so what i can advise to him on his playground?
>
> Well, the current FAMIX model implementation is clearly not well
> structured for analyzing large code bases. And it is difficult to
> partition because of unpredictable access patterns and high
> interconnection.

This is why you look for a general purpose, efficient off-loading
scheme, trying to optimize a general case and get reasonable performance
out of it (a.k.a fuel, but designed for partial unloading / loading:
allow dangling references in a unit of load, focus on per-page units to
match the underlying storage layer or network).

I wrote one such layer for VW a long time ago, but didn't had time to
experiment / qualify some of the techniques in it. There was an
interesting attempt (IMHO ... wasn't qualified) at combining paging and
automatic refinement of application working set, based on previous
experience implementing a hierarchical 2D object access scheme for large
datasets on slow medium (decreased access time from 30 minutes to about
a few seconds).

The other approach I would look is take some of the support code for
such an automatic layer and use it to unload parts of my model;, and I'm
pretty sure that, if I don't bench intensively, I'll get the
partitioning wrong :(

Overall, an interesting subject, totally not valid from a scientific
point of view (the database guys have already solved everything). Only
valid as a hobby, or if a company is ready to pay for a solution.

Thierry

stepharo

Re: Breaking the 4GB barrier with Pharo 6 64-bit

Hi thierry

did you happen to have a techreport or any description of your work?

Stef

Le 11/11/16 à 11:44, Thierry Goubier a écrit :

> Le 11/11/2016 à 11:29, Stephan Eggermont a écrit :
>> On 10/11/16 21:35, Igor Stasenko wrote:
>>> No, no, no! This is simply not true.
>>> It is you, who writes the code that generates a lot of statistical
>>> data/analysis data, and its output is fairly predictable.. else you are
>>> not collecting any data, but just a random noise, isn't?
>>
>> That would be green field development. In brown field development, I
>> only get in when people start noticing there is a problem (why do we
>> need more than 4GBytes for this?). At that point I want to be able to
>> load everything they can give me in an image so I can start analyzing
>> and structuring it.
>>
>>> I mean, Doru is light years ahead of me and many others in field of
>>> data
>>> analysis.. so what i can advise to him on his playground?
>>
>> Well, the current FAMIX model implementation is clearly not well
>> structured for analyzing large code bases. And it is difficult to
>> partition because of unpredictable access patterns and high
>> interconnection.
>
> This is why you look for a general purpose, efficient off-loading
> scheme, trying to optimize a general case and get reasonable
> performance out of it (a.k.a fuel, but designed for partial unloading
> / loading: allow dangling references in a unit of load, focus on
> per-page units to match the underlying storage layer or network).
>
> I wrote one such layer for VW a long time ago, but didn't had time to
> experiment / qualify some of the techniques in it. There was an
> interesting attempt (IMHO ... wasn't qualified) at combining paging
> and automatic refinement of application working set, based on previous
> experience implementing a hierarchical 2D object access scheme for
> large datasets on slow medium (decreased access time from 30 minutes
> to about a few seconds).
>
> The other approach I would look is take some of the support code for
> such an automatic layer and use it to unload parts of my model;, and
> I'm pretty sure that, if I don't bench intensively, I'll get the
> partitioning wrong :(
>
> Overall, an interesting subject, totally not valid from a scientific
> point of view (the database guys have already solved everything). Only
> valid as a hobby, or if a company is ready to pay for a solution.
>
> Thierry
>
>

123