Smalltalk › Instantiations

How hard would it be for the new VM to run two images at once?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

35 messages Options

Louis LaBrunda

Re: How hard would it be for the new VM to run two images at once?

Hi Richard,

I'm sorry it took so long to get back to this post. I wasn't ignoring you.

I think it is fair to say that most of us on this forum have some blend of engineer and scientist in us. I'm guessing that your ratio of engineer to scientist is larger than mine (not that mine is all that high in the scientist area. As engineers we are problem solvers. We want a problem to solve. We like the idea of necessity being the mother of invention. But there are of course people (scientists) in the world who work on things, learn about things and think about things, that they don't necessarily know where that effort will lead. That is what this discussion is about. I am asking those who wish to participate to take off our engineers caps and put on our scientist hats (do scientists wear hats?). So, with that in mind, I hope you will participate too. And by participating, you should feel free to continue to say multiple images running in one VM aren't worth the effort.

Lou

On Monday, March 26, 2018 at 12:27:37 PM UTC-4, Richard Sargent wrote:

On Saturday, March 24, 2018 at 7:17:20 AM UTC-7, Louis LaBrunda wrote:
Hi Seth, Marten and Richard,

I really don't have an actual problem to solve. This is not a case of necessity being the mother of invention but more like inventing something and looking around for a problem. In general, I like the idea of improving VA Smalltalk.

This is the critical statement:

For some unknown reason the idea of the VM running more than one image popped into my head.

Allow me to share some superb advice I read nearly 30 years. (I think it was in the 1989 Software Development magazine, in a column by P.J.Plauger of Whitesmith C renown.)

Plauger described something he called the "0, 1, Infinity Rule". I have used this design rule assiduously ever since, with great result. In essence, he explained that there are things you exclude from your model simply because they aren't a part of it. These are the zero. They don't occur. Then there are the things that you can only have one of. Social Security Number is a reasonably useful example, with the simplification that "everyone has one". (That's not really true, but we'll work with that for this explanation.) Last, there are things that have an arbitrary number, such as children, parents, cars, drivers, etc. The best designs always model these as having an unlimited number of occurrences. (Unlimited at least from the program's perspective. Database, memory, etc. might impose some absolute limit. As long as it isn't imposed by the program logic, you'll be fine.)

If we look at a VM running a single image, we have a number of elements involved: VM code, DLLs, caches, image memory, perhaps more, but this will do for now. To give you a handle on these, GemStone's VM allocates a number of caches by default from one cache size declaration. The total of all the caches, memory allocations, whatever, works out to about 3.3x that one parameter's declared size. The VM code adds more, but the actual fraction of that really depends on the size of the caches. Except for extremely small images, the memory needs of the VM itself will be pretty small.

This brings me to the point: if we were to make one running VM run multiple images, we have a huge step function in terms of making the VM go from modelling a single image to modelling multiple. But modern operating systems will share the code of the VM and any DLLs (shared libraries, etc.) when running multiple processes using the same executable. It might be possible, with a lot of effort (in my opinion), to make some of the caches common for all the running images. I am doubtful of that; certainly the risk of errors screwing up everything is much higher.

I don't see anything that having one running VM support multiple concurrent images will actually offer any benefit for the substantial cost required to create that ability.

Separately, there was the question of communicating between the images. There was discussion of 0MQ, sockets, shared memory segments, etc. On Linux at least, sockets on the same machine are memory based and don't go through the network. In some respects, sockets are one of the best ways to share data, since you can scale nearly infinitely (there's that 0, 1, ∞ rule once again). I don't know anything about 0MQ, other than its name and existence, but I would guess it is socket-based and scales the same way. These techniques do impose an overhead in serializing the data being transferred. Shared memory offers the potential for the VM to read and write it using the image memory format, thereby making it look like a part of the image. Of course, you need some external gatekeeper to control access to the shared resource; something like OS semaphores, for example.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.

Richard Sargent

Re: How hard would it be for the new VM to run two images at once?

Administrator

In reply to this post by Louis LaBrunda

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:

Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

Richard Sargent

Re: How hard would it be for the new VM to run two images at once?

Administrator

In reply to this post by Louis LaBrunda

On Friday, March 30, 2018 at 8:54:28 AM UTC-7, Louis LaBrunda wrote:

Hi Richard,

I'm sorry it took so long to get back to this post. I wasn't ignoring you.

No problem, Lou. I'm quite content to state an opinion and wait to see if it's valid when challenged.

I think it is fair to say that most of us on this forum have some blend of engineer and scientist in us. I'm guessing that your ratio of engineer to scientist is larger than mine (not that mine is all that high in the scientist area.

I don't know what my ratio really is, either. I studied computer science at the University of Waterloo, receiving a B. Math degree in the end. There is a fair degree of scientist in that kind of education, although I admit that I am more of a "in practice" kind of person. And that degree was quite a number of years ago!

As engineers we are problem solvers. We want a problem to solve. We like the idea of necessity being the mother of invention. But there are of course people (scientists) in the world who work on things, learn about things and think about things, that they don't necessarily know where that effort will lead. That is what this discussion is about. I am asking those who wish to participate to take off our engineers caps and put on our scientist hats (do scientists wear hats?). So, with that in mind, I hope you will participate too.

Absolutely! I'm finding this discussion fascinating.

And by participating, you should feel free to continue to say multiple images running in one VM aren't worth the effort.

I am waiting with bated breath to learn that my opinion is wrong. So far, I don't think it is, but I am willing to be proven wrong. (See my response to your other message on this topic.)

Let me assure you that I do find thought experiments intriguing and often fun. So, let's experiment with this topic and figure out what the solution space really should look like.

Lou

On Monday, March 26, 2018 at 12:27:37 PM UTC-4, Richard Sargent wrote:
On Saturday, March 24, 2018 at 7:17:20 AM UTC-7, Louis LaBrunda wrote:
Hi Seth, Marten and Richard,

I really don't have an actual problem to solve. This is not a case of necessity being the mother of invention but more like inventing something and looking around for a problem. In general, I like the idea of improving VA Smalltalk.

This is the critical statement:

For some unknown reason the idea of the VM running more than one image popped into my head.

Allow me to share some superb advice I read nearly 30 years. (I think it was in the 1989 Software Development magazine, in a column by P.J.Plauger of Whitesmith C renown.)

Plauger described something he called the "0, 1, Infinity Rule". I have used this design rule assiduously ever since, with great result. In essence, he explained that there are things you exclude from your model simply because they aren't a part of it. These are the zero. They don't occur. Then there are the things that you can only have one of. Social Security Number is a reasonably useful example, with the simplification that "everyone has one". (That's not really true, but we'll work with that for this explanation.) Last, there are things that have an arbitrary number, such as children, parents, cars, drivers, etc. The best designs always model these as having an unlimited number of occurrences. (Unlimited at least from the program's perspective. Database, memory, etc. might impose some absolute limit. As long as it isn't imposed by the program logic, you'll be fine.)

If we look at a VM running a single image, we have a number of elements involved: VM code, DLLs, caches, image memory, perhaps more, but this will do for now. To give you a handle on these, GemStone's VM allocates a number of caches by default from one cache size declaration. The total of all the caches, memory allocations, whatever, works out to about 3.3x that one parameter's declared size. The VM code adds more, but the actual fraction of that really depends on the size of the caches. Except for extremely small images, the memory needs of the VM itself will be pretty small.

This brings me to the point: if we were to make one running VM run multiple images, we have a huge step function in terms of making the VM go from modelling a single image to modelling multiple. But modern operating systems will share the code of the VM and any DLLs (shared libraries, etc.) when running multiple processes using the same executable. It might be possible, with a lot of effort (in my opinion), to make some of the caches common for all the running images. I am doubtful of that; certainly the risk of errors screwing up everything is much higher.

I don't see anything that having one running VM support multiple concurrent images will actually offer any benefit for the substantial cost required to create that ability.

Separately, there was the question of communicating between the images. There was discussion of 0MQ, sockets, shared memory segments, etc. On Linux at least, sockets on the same machine are memory based and don't go through the network. In some respects, sockets are one of the best ways to share data, since you can scale nearly infinitely (there's that 0, 1, ∞ rule once again). I don't know anything about 0MQ, other than its name and existence, but I would guess it is socket-based and scales the same way. These techniques do impose an overhead in serializing the data being transferred. Shared memory offers the potential for the VM to read and write it using the image memory format, thereby making it look like a part of the image. Of course, you need some external gatekeeper to control access to the shared resource; something like OS semaphores, for example.

Louis LaBrunda

Re: How hard would it be for the new VM to run two images at once?

In reply to this post by Richard Sargent

Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

Seth Berman

Re: How hard would it be for the new VM to run two images at once?

Hi All,

'Does shared memory have to be a separate memory area?"

- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.

- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.

It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).

- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept "shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"

- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area

- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)

- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?

It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to

solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.

- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"

- Compaction. Yes, it works very similarly to "Disk Defrag".

- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)

- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.

- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.

- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that

can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.

- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.

For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.

- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)

- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."

- I went over this already I guess...but the tradeoff is:

Compaction means to spend some time to keep object allocation super fast and nice caching behavior

Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have

a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.

There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:

Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

Louis LaBrunda

Re: How hard would it be for the new VM to run two images at once?

Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:

Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

Seth Berman

Re: How hard would it be for the new VM to run two images at once?

Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.

A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.

Here is a sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:

Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

Louis LaBrunda

Re: How hard would it be for the new VM to run two images at once?

Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:

Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a <a href="https://www.infoworld.com/article/3212988/application-development/how-to-not-use-the-large-object-heap-in-net.html" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;">sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

Mariano Martinez Peck-2

Re: How hard would it be for the new VM to run two images at once?

Hi guys,

Sorry for joining too late to the party but this is an area I really like, so better later than never :) While its not related to the "multi-core" point or view or the "running multiple images with the same running VM", my PhD did touch a few of the topics discussed in this thread. In addition, you can also see all the papers I refer in the thesis and you will likely find several that are very interesting! Of course, my project was just a prototype and not something mature enough or production ready, but still one more thought in this area...

So...for my PhD [1] [2], one of the things I did (lets say the closest to the PhD topic) was called Marea [3] (I can also send the PhD defense video on youtube if you want). In Marea what I did was to modify the Pharo VM to implement a very basic (with lots of limitations) object usage tracking...I simply flagged when an object was used and cleared the flag every in a while. Then, at image side, I would find graphs of unused objects, and replaced the boundary objects with proxies using Ghost proxies (note that an anused object is not the same as unreferenced object,...GC does nothing here). The graphs where then serialized with Fuel. Finally, if those graphs happened to be needed, then the proxy would intercept the message, materialize from Fuel, and plug back the original graph.

Anyway... all of what was to said that I some point I did an experiment. I was able to already proxify and serialize classes, methods, etc. So I took the whole image and I swapped out all classes and their instances (each class with its instances in a different graph) but only a really small core. This image was a Seaside image running DBX Pier website. So I swapped out everything, and then I lazily started to navigate DBX website, causing the needed graphs to be swapped in. I was able to naviagate all webapp and use it perfectly. I even saved the image. And all this email is to say that such an image was 3MB.

Below I paste the abstract of Marea paper which explains better what it is:

----
During the execution of object-oriented applications, several millions of objects are created, used and then collected if they are not referenced. Problems appear when objects are unused but cannot be garbage-collected because they are still referenced from other objects. This is an issue because those objects waste primary memory and applications use more primary memory than they actually need. We claim that relying on the operating system’s (OS) virtual memory is not always enough since it cannot take into account the domain and structure of applications. At the same time, applications have no easy way to parametrize nor cooperate with memory management. In this paper, we present Marea, an efficient application-level object graph swapper for object-oriented programming languages. Its main goal is to offer the programmer a novel solution to handle application-level memory. Developers can instruct our system to release primary memory by swapping out unused yet referenced objects to secondary memory. Our approach has been qualitatively and quantitatively validated. Our experiments and benchmarks on real-world applications show that Marea can reduce the memory footprint between 23% and 36%.
----

[1] https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645
[2] http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf
[3] http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf

Cheers,

On Tue, Apr 3, 2018 at 11:06 AM, Louis LaBrunda <[hidden email]> wrote:

Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:
Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept "shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.

Mariano Martinez Peck
Software Engineer, Instantiations Inc.
[hidden email]

Seth Berman

Re: How hard would it be for the new VM to run two images at once?

Hi All,

@Mariano

This is very cool research!

@Lou

I would agree that the strategies in that article are on the extreme end.

I suppose if you had a deployed production system that was suffering from extreme fragmentation, and these tips fixed the issue, I could see making use of it...at least temporarily.

I don't think there is anything wrong with using GC tuning parameters to optimize for your application's allocation/reclamation profile, though.

"Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged."

- Hmm. Not sure I follow completely.

- The logical "New" space is comprised of many segments. (EsMemorySegment allSegments select: [:s | s isNewSpace])

- 2 of them are special...those are the semi-space halfs that the Scavenger operates on. One is inactive while the other one is active...and a scavenge copies the live objects from active to inactive and then switch roles.

- The other ones that are "New" have various purposes. For example, Swapper might create temporary new spaces if it's a small enough segment size. Once done, an attempt is made to merge these back into one of the semi-spaces so you don't tenure temporary stuff.

- I don't understand "defraged separately". Are you making the case for a concurrent GC that allocates while it compacts? If so, then a lot of thread coordination is required to make this happen....this is like a different garbage collector.

It sounds like you are intuitively able to ask appropriate questions regarding this topic. I would highly recommend reading "The Garbage Collection Handbook - The Art of Automatic Memory Management" (get the 2012 one...not the 1996 one).

It walks through most of the topics we have touched on relating to GC in thorough detail. And describes it in a way far better than what I could come up with.

After that, if you want to explore some more specific areas, I have a ton of research papers I can recommend.

- Seth

On Tuesday, April 3, 2018 at 10:40:07 AM UTC-4, marianopeck wrote:

Hi guys,

Sorry for joining too late to the party but this is an area I really like, so better later than never :) While its not related to the "multi-core" point or view or the "running multiple images with the same running VM", my PhD did touch a few of the topics discussed in this thread. In addition, you can also see all the papers I refer in the thesis and you will likely find several that are very interesting! Of course, my project was just a prototype and not something mature enough or production ready, but still one more thought in this area...

So...for my PhD [1] [2], one of the things I did (lets say the closest to the PhD topic) was called Marea [3] (I can also send the PhD defense video on youtube if you want). In Marea what I did was to modify the Pharo VM to implement a very basic (with lots of limitations) object usage tracking...I simply flagged when an object was used and cleared the flag every in a while. Then, at image side, I would find graphs of unused objects, and replaced the boundary objects with proxies using Ghost proxies (note that an anused object is not the same as unreferenced object,...GC does nothing here). The graphs where then serialized with Fuel. Finally, if those graphs happened to be needed, then the proxy would intercept the message, materialize from Fuel, and plug back the original graph.

Anyway... all of what was to said that I some point I did an experiment. I was able to already proxify and serialize classes, methods, etc. So I took the whole image and I swapped out all classes and their instances (each class with its instances in a different graph) but only a really small core. This image was a Seaside image running DBX Pier website. So I swapped out everything, and then I lazily started to navigate DBX website, causing the needed graphs to be swapped in. I was able to naviagate all webapp and use it perfectly. I even saved the image. And all this email is to say that such an image was 3MB.

Below I paste the abstract of Marea paper which explains better what it is:

----
During the execution of object-oriented applications, several millions of objects are created, used and then collected if they are not referenced. Problems appear when objects are unused but cannot be garbage-collected because they are still referenced from other objects. This is an issue because those objects waste primary memory and applications use more primary memory than they actually need. We claim that relying on the operating system’s (OS) virtual memory is not always enough since it cannot take into account the domain and structure of applications. At the same time, applications have no easy way to parametrize nor cooperate with memory management. In this paper, we present Marea, an efficient application-level object graph swapper for object-oriented programming languages. Its main goal is to offer the programmer a novel solution to handle application-level memory. Developers can instruct our system to release primary memory by swapping out unused yet referenced objects to secondary memory. Our approach has been qualitatively and quantitatively validated. Our experiments and benchmarks on real-world applications show that Marea can reduce the memory footprint between 23% and 36%.
----

[1] <a href="https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;">https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645
[2] <a href="http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf" style="color:rgb(17,85,204)" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;">http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf
[3] <a href="http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;">http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf

Cheers,

On Tue, Apr 3, 2018 at 11:06 AM, Louis LaBrunda <[hidden email]> wrote:
Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:
Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a <a href="https://www.infoworld.com/article/3212988/application-development/how-to-not-use-the-large-object-heap-in-net.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;">sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at <a href="https://groups.google.com/group/va-smalltalk" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/group/va-smalltalk';return true;" onclick="this.href='https://groups.google.com/group/va-smalltalk';return true;">https://groups.google.com/group/va-smalltalk.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
Mariano Martinez Peck
Software Engineer, Instantiations Inc.
[hidden email]

Louis LaBrunda

Re: How hard would it be for the new VM to run two images at once?

Hi Mariano and Seth,

On Tuesday, April 3, 2018 at 12:58:16 PM UTC-4, Seth Berman wrote:

Hi All,

@Mariano
This is very cool research!

+1. I have seen you posts in one (or more) of the Squeak forums but hadn't realized you had joined Instantiations. Glad to have you there.

@Lou
I would agree that the strategies in that article are on the extreme end.
I suppose if you had a deployed production system that was suffering from extreme fragmentation, and these tips fixed the issue, I could see making use of it...at least temporarily.

I agree, I was going to say as much but must have gotten distracted.

I don't think there is anything wrong with using GC tuning parameters to optimize for your application's allocation/reclamation profile, though.

I agree.

"Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged."
- Hmm. Not sure I follow completely.
- The logical "New" space is comprised of many segments. (EsMemorySegment allSegments select: [:s | s isNewSpace])
- 2 of them are special...those are the semi-space halfs that the Scavenger operates on. One is inactive while the other one is active...and a scavenge copies the live objects from active to inactive and then switch roles.
- The other ones that are "New" have various purposes. For example, Swapper might create temporary new spaces if it's a small enough segment size. Once done, an attempt is made to merge these back into one of the semi-spaces so you don't tenure temporary stuff.
- I don't understand "defraged separately". Are you making the case for a concurrent GC that allocates while it compacts? If so, then a lot of thread coordination is required to make this happen....this is like a different garbage collector.

I am thinking of a GC that can allocate while it compacts but NOT in two threads just two memory areas. I'm wondering if everything stops while the compaction takes place, meaning the compaction has to finish before any new objects can be created or can a little compaction get done and then some new objects get created (in the other area) and then back to compacting.

It sounds like you are intuitively able to ask appropriate questions regarding this topic. I would highly recommend reading <a href="http://gchandbook.org/" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;">"The Garbage Collection Handbook - The Art of Automatic Memory Management" (get the 2012 one...not the 1996 one).
It walks through most of the topics we have touched on relating to GC in thorough detail. And describes it in a way far better than what I could come up with.
After that, if you want to explore some more specific areas, I have a ton of research papers I can recommend.

I should. But as I look to my right there is a 4 inch high stack of Scientific American magazines and others and a one inch thick book that I haven't gotten to yet. I have always been a slow reader. I had gotten to a point where I was reading a lot (still slow but not bad) but now my father is 102 and wakes me up at night. Last night it was at least twice. Once I checked on him and he was laying in bed, I think asleep, calling my name. Not in any kind of panic or anything, just calling Louie, Louie. It leaves me very tired much of the day and if I try to read much, I just fall asleep.

So, I apologize to everyone for taking my own post off topic and will try to not let the discussion get too far afield in the future. I hope I am not taking up too much of people valuable time.

Lou

- Seth

On Tuesday, April 3, 2018 at 10:40:07 AM UTC-4, marianopeck wrote:
Hi guys,

Sorry for joining too late to the party but this is an area I really like, so better later than never :) While its not related to the "multi-core" point or view or the "running multiple images with the same running VM", my PhD did touch a few of the topics discussed in this thread. In addition, you can also see all the papers I refer in the thesis and you will likely find several that are very interesting! Of course, my project was just a prototype and not something mature enough or production ready, but still one more thought in this area...

So...for my PhD [1] [2], one of the things I did (lets say the closest to the PhD topic) was called Marea [3] (I can also send the PhD defense video on youtube if you want). In Marea what I did was to modify the Pharo VM to implement a very basic (with lots of limitations) object usage tracking...I simply flagged when an object was used and cleared the flag every in a while. Then, at image side, I would find graphs of unused objects, and replaced the boundary objects with proxies using Ghost proxies (note that an anused object is not the same as unreferenced object,...GC does nothing here). The graphs where then serialized with Fuel. Finally, if those graphs happened to be needed, then the proxy would intercept the message, materialize from Fuel, and plug back the original graph.

Anyway... all of what was to said that I some point I did an experiment. I was able to already proxify and serialize classes, methods, etc. So I took the whole image and I swapped out all classes and their instances (each class with its instances in a different graph) but only a really small core. This image was a Seaside image running DBX Pier website. So I swapped out everything, and then I lazily started to navigate DBX website, causing the needed graphs to be swapped in. I was able to naviagate all webapp and use it perfectly. I even saved the image. And all this email is to say that such an image was 3MB.

Below I paste the abstract of Marea paper which explains better what it is:

----
During the execution of object-oriented applications, several millions of objects are created, used and then collected if they are not referenced. Problems appear when objects are unused but cannot be garbage-collected because they are still referenced from other objects. This is an issue because those objects waste primary memory and applications use more primary memory than they actually need. We claim that relying on the operating system’s (OS) virtual memory is not always enough since it cannot take into account the domain and structure of applications. At the same time, applications have no easy way to parametrize nor cooperate with memory management. In this paper, we present Marea, an efficient application-level object graph swapper for object-oriented programming languages. Its main goal is to offer the programmer a novel solution to handle application-level memory. Developers can instruct our system to release primary memory by swapping out unused yet referenced objects to secondary memory. Our approach has been qualitatively and quantitatively validated. Our experiments and benchmarks on real-world applications show that Marea can reduce the memory footprint between 23% and 36%.
----

[1] <a href="https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;">https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645
[2] <a href="http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf" style="color:rgb(17,85,204)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;">http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf
[3] <a href="http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;">http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf

Cheers,

On Tue, Apr 3, 2018 at 11:06 AM, Louis LaBrunda <<a href="javascript:" rel="nofollow" target="_blank" gdf-obfuscated-mailto="jd95BAYmBgAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">L...@...> wrote:
Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:
Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a <a href="https://www.infoworld.com/article/3212988/application-development/how-to-not-use-the-large-object-heap-in-net.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;">sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real idea why I should think this other than one program would have access to everything including what the images would want to share. This may be completely without merit. But that is why I ask these kinds of questions.

Lou

On Tuesday, March 20, 2018 at 1:00:27 PM UTC-4, Joachim Tuchel wrote:
Lou,

I wonder why you think the VM running 2 or more images allows for efficient use of multiple CPUs or Cores? In my naive mind, the VM is the bottleneck here...?

Joachim

Am Dienstag, 20. März 2018 14:46:21 UTC+1 schrieb Louis LaBrunda:
Hi Everyone,

Congratulations to everyone at Instantiations on the new 32 and 64 bit VMs. That effort seems to be doing well.

I don't have a need for a VM to run two images at once, I just find the idea interesting. I pose this in the form of a question but I am really more interested in the discussion. How hard would it be for the new VM to run two images at once?

Could the VM be made to run two images, most likely the same image, at the same time? This seems like a good way to make use of more than one CPU. I expect the images would be able to communicate with each other, maybe through a primitive or something.

Is there anyone who could use this? What do you think about the idea?

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" rel="nofollow" target="_blank" gdf-obfuscated-mailto="jd95BAYmBgAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">va-smalltalk...@googlegroups.com.
To post to this group, send email to <a href="javascript:" rel="nofollow" target="_blank" gdf-obfuscated-mailto="jd95BAYmBgAJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">va-sma...@...</

Seth Berman

Re: How hard would it be for the new VM to run two images at once?

Hi Lou,

No worries, I didn't think anything was off topic.

Concerning the GC, this is a well studied field with a huge body of knowledge that dates back to lisp systems in the early 60s.

So the book I recommended is good for foundational knowledge as well as classifying the different types of algorithms involved and how they are related.

For example, what you described in your last post I would think is a form of incremental compaction...which has been classified, studied and I've seen a few implementations described in research papers.

So good for you for arriving at it through intuition. That's why I thought you might like the book:)

- Seth

On Tuesday, April 3, 2018 at 5:13:38 PM UTC-4, Louis LaBrunda wrote:

Hi Mariano and Seth,

On Tuesday, April 3, 2018 at 12:58:16 PM UTC-4, Seth Berman wrote:
Hi All,

@Mariano
This is very cool research!

+1. I have seen you posts in one (or more) of the Squeak forums but hadn't realized you had joined Instantiations. Glad to have you there.

@Lou
I would agree that the strategies in that article are on the extreme end.
I suppose if you had a deployed production system that was suffering from extreme fragmentation, and these tips fixed the issue, I could see making use of it...at least temporarily.

I agree, I was going to say as much but must have gotten distracted.

I don't think there is anything wrong with using GC tuning parameters to optimize for your application's allocation/reclamation profile, though.

I agree.

"Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged."
- Hmm. Not sure I follow completely.
- The logical "New" space is comprised of many segments. (EsMemorySegment allSegments select: [:s | s isNewSpace])
- 2 of them are special...those are the semi-space halfs that the Scavenger operates on. One is inactive while the other one is active...and a scavenge copies the live objects from active to inactive and then switch roles.
- The other ones that are "New" have various purposes. For example, Swapper might create temporary new spaces if it's a small enough segment size. Once done, an attempt is made to merge these back into one of the semi-spaces so you don't tenure temporary stuff.
- I don't understand "defraged separately". Are you making the case for a concurrent GC that allocates while it compacts? If so, then a lot of thread coordination is required to make this happen....this is like a different garbage collector.

I am thinking of a GC that can allocate while it compacts but NOT in two threads just two memory areas. I'm wondering if everything stops while the compaction takes place, meaning the compaction has to finish before any new objects can be created or can a little compaction get done and then some new objects get created (in the other area) and then back to compacting.

It sounds like you are intuitively able to ask appropriate questions regarding this topic. I would highly recommend reading <a href="http://gchandbook.org/" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;">"The Garbage Collection Handbook - The Art of Automatic Memory Management" (get the 2012 one...not the 1996 one).
It walks through most of the topics we have touched on relating to GC in thorough detail. And describes it in a way far better than what I could come up with.
After that, if you want to explore some more specific areas, I have a ton of research papers I can recommend.

I should. But as I look to my right there is a 4 inch high stack of Scientific American magazines and others and a one inch thick book that I haven't gotten to yet. I have always been a slow reader. I had gotten to a point where I was reading a lot (still slow but not bad) but now my father is 102 and wakes me up at night. Last night it was at least twice. Once I checked on him and he was laying in bed, I think asleep, calling my name. Not in any kind of panic or anything, just calling Louie, Louie. It leaves me very tired much of the day and if I try to read much, I just fall asleep.

So, I apologize to everyone for taking my own post off topic and will try to not let the discussion get too far afield in the future. I hope I am not taking up too much of people valuable time.

Lou

- Seth

On Tuesday, April 3, 2018 at 10:40:07 AM UTC-4, marianopeck wrote:
Hi guys,

Sorry for joining too late to the party but this is an area I really like, so better later than never :) While its not related to the "multi-core" point or view or the "running multiple images with the same running VM", my PhD did touch a few of the topics discussed in this thread. In addition, you can also see all the papers I refer in the thesis and you will likely find several that are very interesting! Of course, my project was just a prototype and not something mature enough or production ready, but still one more thought in this area...

So...for my PhD [1] [2], one of the things I did (lets say the closest to the PhD topic) was called Marea [3] (I can also send the PhD defense video on youtube if you want). In Marea what I did was to modify the Pharo VM to implement a very basic (with lots of limitations) object usage tracking...I simply flagged when an object was used and cleared the flag every in a while. Then, at image side, I would find graphs of unused objects, and replaced the boundary objects with proxies using Ghost proxies (note that an anused object is not the same as unreferenced object,...GC does nothing here). The graphs where then serialized with Fuel. Finally, if those graphs happened to be needed, then the proxy would intercept the message, materialize from Fuel, and plug back the original graph.

Anyway... all of what was to said that I some point I did an experiment. I was able to already proxify and serialize classes, methods, etc. So I took the whole image and I swapped out all classes and their instances (each class with its instances in a different graph) but only a really small core. This image was a Seaside image running DBX Pier website. So I swapped out everything, and then I lazily started to navigate DBX website, causing the needed graphs to be swapped in. I was able to naviagate all webapp and use it perfectly. I even saved the image. And all this email is to say that such an image was 3MB.

Below I paste the abstract of Marea paper which explains better what it is:

----
During the execution of object-oriented applications, several millions of objects are created, used and then collected if they are not referenced. Problems appear when objects are unused but cannot be garbage-collected because they are still referenced from other objects. This is an issue because those objects waste primary memory and applications use more primary memory than they actually need. We claim that relying on the operating system’s (OS) virtual memory is not always enough since it cannot take into account the domain and structure of applications. At the same time, applications have no easy way to parametrize nor cooperate with memory management. In this paper, we present Marea, an efficient application-level object graph swapper for object-oriented programming languages. Its main goal is to offer the programmer a novel solution to handle application-level memory. Developers can instruct our system to release primary memory by swapping out unused yet referenced objects to secondary memory. Our approach has been qualitatively and quantitatively validated. Our experiments and benchmarks on real-world applications show that Marea can reduce the memory footprint between 23% and 36%.
----

[1] <a href="https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;">https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645
[2] <a href="http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf" style="color:rgb(17,85,204)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;">http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf
[3] <a href="http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;">http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf

Cheers,

On Tue, Apr 3, 2018 at 11:06 AM, Louis LaBrunda <[hidden email]> wrote:
Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:
Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a <a href="https://www.infoworld.com/article/3212988/application-development/how-to-not-use-the-large-object-heap-in-net.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;">sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real i

Mariano Martinez Peck-2

Re: How hard would it be for the new VM to run two images at once?

Hi Louis,

I just wanted to add a small comment on your sentence "Now, I don't have any evidence that moving the objects around is a problem."

This is OK and in the typical case, the GC move objects around and takes care of correctly updating pointers.

However, as far as I remember, this could be a problem with certain usage of FFI (when you pass objects to the external code). This is not the only reason, but instead something common enough so that it appeared the concept of "pinned objects". Basically, it allows you to tell the GC "please, do not move this object".

Cheers,

On Wed, Apr 4, 2018 at 10:41 AM, Seth Berman <[hidden email]> wrote:

Hi Lou,

No worries, I didn't think anything was off topic.
Concerning the GC, this is a well studied field with a huge body of knowledge that dates back to lisp systems in the early 60s.
So the book I recommended is good for foundational knowledge as well as classifying the different types of algorithms involved and how they are related.
For example, what you described in your last post I would think is a form of incremental compaction...which has been classified, studied and I've seen a few implementations described in research papers.
So good for you for arriving at it through intuition. That's why I thought you might like the book:)

- Seth

On Tuesday, April 3, 2018 at 5:13:38 PM UTC-4, Louis LaBrunda wrote:
Hi Mariano and Seth,

On Tuesday, April 3, 2018 at 12:58:16 PM UTC-4, Seth Berman wrote:
Hi All,

@Mariano
This is very cool research!

+1. I have seen you posts in one (or more) of the Squeak forums but hadn't realized you had joined Instantiations. Glad to have you there.

@Lou
I would agree that the strategies in that article are on the extreme end.
I suppose if you had a deployed production system that was suffering from extreme fragmentation, and these tips fixed the issue, I could see making use of it...at least temporarily.

I agree, I was going to say as much but must have gotten distracted.

I don't think there is anything wrong with using GC tuning parameters to optimize for your application's allocation/reclamation profile, though.

I agree.

"Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged."
- Hmm. Not sure I follow completely.
- The logical "New" space is comprised of many segments. (EsMemorySegment allSegments select: [:s | s isNewSpace])
- 2 of them are special...those are the semi-space halfs that the Scavenger operates on. One is inactive while the other one is active...and a scavenge copies the live objects from active to inactive and then switch roles.
- The other ones that are "New" have various purposes. For example, Swapper might create temporary new spaces if it's a small enough segment size. Once done, an attempt is made to merge these back into one of the semi-spaces so you don't tenure temporary stuff.
- I don't understand "defraged separately". Are you making the case for a concurrent GC that allocates while it compacts? If so, then a lot of thread coordination is required to make this happen....this is like a different garbage collector.

I am thinking of a GC that can allocate while it compacts but NOT in two threads just two memory areas. I'm wondering if everything stops while the compaction takes place, meaning the compaction has to finish before any new objects can be created or can a little compaction get done and then some new objects get created (in the other area) and then back to compacting.

It sounds like you are intuitively able to ask appropriate questions regarding this topic. I would highly recommend reading "The Garbage Collection Handbook - The Art of Automatic Memory Management" (get the 2012 one...not the 1996 one).
It walks through most of the topics we have touched on relating to GC in thorough detail. And describes it in a way far better than what I could come up with.
After that, if you want to explore some more specific areas, I have a ton of research papers I can recommend.

I should. But as I look to my right there is a 4 inch high stack of Scientific American magazines and others and a one inch thick book that I haven't gotten to yet. I have always been a slow reader. I had gotten to a point where I was reading a lot (still slow but not bad) but now my father is 102 and wakes me up at night. Last night it was at least twice. Once I checked on him and he was laying in bed, I think asleep, calling my name. Not in any kind of panic or anything, just calling Louie, Louie. It leaves me very tired much of the day and if I try to read much, I just fall asleep.

So, I apologize to everyone for taking my own post off topic and will try to not let the discussion get too far afield in the future. I hope I am not taking up too much of people valuable time.

Lou

- Seth

On Tuesday, April 3, 2018 at 10:40:07 AM UTC-4, marianopeck wrote:
Hi guys,

Sorry for joining too late to the party but this is an area I really like, so better later than never :) While its not related to the "multi-core" point or view or the "running multiple images with the same running VM", my PhD did touch a few of the topics discussed in this thread. In addition, you can also see all the papers I refer in the thesis and you will likely find several that are very interesting! Of course, my project was just a prototype and not something mature enough or production ready, but still one more thought in this area...

So...for my PhD [1] [2], one of the things I did (lets say the closest to the PhD topic) was called Marea [3] (I can also send the PhD defense video on youtube if you want). In Marea what I did was to modify the Pharo VM to implement a very basic (with lots of limitations) object usage tracking...I simply flagged when an object was used and cleared the flag every in a while. Then, at image side, I would find graphs of unused objects, and replaced the boundary objects with proxies using Ghost proxies (note that an anused object is not the same as unreferenced object,...GC does nothing here). The graphs where then serialized with Fuel. Finally, if those graphs happened to be needed, then the proxy would intercept the message, materialize from Fuel, and plug back the original graph.

Anyway... all of what was to said that I some point I did an experiment. I was able to already proxify and serialize classes, methods, etc. So I took the whole image and I swapped out all classes and their instances (each class with its instances in a different graph) but only a really small core. This image was a Seaside image running DBX Pier website. So I swapped out everything, and then I lazily started to navigate DBX website, causing the needed graphs to be swapped in. I was able to naviagate all webapp and use it perfectly. I even saved the image. And all this email is to say that such an image was 3MB.

Below I paste the abstract of Marea paper which explains better what it is:

----
During the execution of object-oriented applications, several millions of objects are created, used and then collected if they are not referenced. Problems appear when objects are unused but cannot be garbage-collected because they are still referenced from other objects. This is an issue because those objects waste primary memory and applications use more primary memory than they actually need. We claim that relying on the operating system’s (OS) virtual memory is not always enough since it cannot take into account the domain and structure of applications. At the same time, applications have no easy way to parametrize nor cooperate with memory management. In this paper, we present Marea, an efficient application-level object graph swapper for object-oriented programming languages. Its main goal is to offer the programmer a novel solution to handle application-level memory. Developers can instruct our system to release primary memory by swapping out unused yet referenced objects to secondary memory. Our approach has been qualitatively and quantitatively validated. Our experiments and benchmarks on real-world applications show that Marea can reduce the memory footprint between 23% and 36%.
----

[1] https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645
[2] http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf
[3] http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf

Cheers,

On Tue, Apr 3, 2018 at 11:06 AM, Louis LaBrunda <[hidden email]> wrote:
Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:
Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept  "shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real i

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/va-smalltalk.
For more options, visit https://groups.google.com/d/optout.

Mariano Martinez Peck
Software Engineer, Instantiations Inc.
[hidden email]

Louis LaBrunda

Re: How hard would it be for the new VM to run two images at once?

In reply to this post by Seth Berman

Hi Seth,

I think I probably would like the book, maybe when I catch up on my reading.

Lou

On Wednesday, April 4, 2018 at 9:41:32 AM UTC-4, Seth Berman wrote:

Hi Lou,

No worries, I didn't think anything was off topic.
Concerning the GC, this is a well studied field with a huge body of knowledge that dates back to lisp systems in the early 60s.
So the book I recommended is good for foundational knowledge as well as classifying the different types of algorithms involved and how they are related.
For example, what you described in your last post I would think is a form of incremental compaction...which has been classified, studied and I've seen a few implementations described in research papers.
So good for you for arriving at it through intuition. That's why I thought you might like the book:)

- Seth

On Tuesday, April 3, 2018 at 5:13:38 PM UTC-4, Louis LaBrunda wrote:
Hi Mariano and Seth,

On Tuesday, April 3, 2018 at 12:58:16 PM UTC-4, Seth Berman wrote:
Hi All,

@Mariano
This is very cool research!

+1. I have seen you posts in one (or more) of the Squeak forums but hadn't realized you had joined Instantiations. Glad to have you there.

@Lou
I would agree that the strategies in that article are on the extreme end.
I suppose if you had a deployed production system that was suffering from extreme fragmentation, and these tips fixed the issue, I could see making use of it...at least temporarily.

I agree, I was going to say as much but must have gotten distracted.

I don't think there is anything wrong with using GC tuning parameters to optimize for your application's allocation/reclamation profile, though.

I agree.

"Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged."
- Hmm. Not sure I follow completely.
- The logical "New" space is comprised of many segments. (EsMemorySegment allSegments select: [:s | s isNewSpace])
- 2 of them are special...those are the semi-space halfs that the Scavenger operates on. One is inactive while the other one is active...and a scavenge copies the live objects from active to inactive and then switch roles.
- The other ones that are "New" have various purposes. For example, Swapper might create temporary new spaces if it's a small enough segment size. Once done, an attempt is made to merge these back into one of the semi-spaces so you don't tenure temporary stuff.
- I don't understand "defraged separately". Are you making the case for a concurrent GC that allocates while it compacts? If so, then a lot of thread coordination is required to make this happen....this is like a different garbage collector.

I am thinking of a GC that can allocate while it compacts but NOT in two threads just two memory areas. I'm wondering if everything stops while the compaction takes place, meaning the compaction has to finish before any new objects can be created or can a little compaction get done and then some new objects get created (in the other area) and then back to compacting.

It sounds like you are intuitively able to ask appropriate questions regarding this topic. I would highly recommend reading <a href="http://gchandbook.org/" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;">"The Garbage Collection Handbook - The Art of Automatic Memory Management" (get the 2012 one...not the 1996 one).
It walks through most of the topics we have touched on relating to GC in thorough detail. And describes it in a way far better than what I could come up with.
After that, if you want to explore some more specific areas, I have a ton of research papers I can recommend.

I should. But as I look to my right there is a 4 inch high stack of Scientific American magazines and others and a one inch thick book that I haven't gotten to yet. I have always been a slow reader. I had gotten to a point where I was reading a lot (still slow but not bad) but now my father is 102 and wakes me up at night. Last night it was at least twice. Once I checked on him and he was laying in bed, I think asleep, calling my name. Not in any kind of panic or anything, just calling Louie, Louie. It leaves me very tired much of the day and if I try to read much, I just fall asleep.

So, I apologize to everyone for taking my own post off topic and will try to not let the discussion get too far afield in the future. I hope I am not taking up too much of people valuable time.

Lou

- Seth

On Tuesday, April 3, 2018 at 10:40:07 AM UTC-4, marianopeck wrote:
Hi guys,

Sorry for joining too late to the party but this is an area I really like, so better later than never :) While its not related to the "multi-core" point or view or the "running multiple images with the same running VM", my PhD did touch a few of the topics discussed in this thread. In addition, you can also see all the papers I refer in the thesis and you will likely find several that are very interesting! Of course, my project was just a prototype and not something mature enough or production ready, but still one more thought in this area...

So...for my PhD [1] [2], one of the things I did (lets say the closest to the PhD topic) was called Marea [3] (I can also send the PhD defense video on youtube if you want). In Marea what I did was to modify the Pharo VM to implement a very basic (with lots of limitations) object usage tracking...I simply flagged when an object was used and cleared the flag every in a while. Then, at image side, I would find graphs of unused objects, and replaced the boundary objects with proxies using Ghost proxies (note that an anused object is not the same as unreferenced object,...GC does nothing here). The graphs where then serialized with Fuel. Finally, if those graphs happened to be needed, then the proxy would intercept the message, materialize from Fuel, and plug back the original graph.

Anyway... all of what was to said that I some point I did an experiment. I was able to already proxify and serialize classes, methods, etc. So I took the whole image and I swapped out all classes and their instances (each class with its instances in a different graph) but only a really small core. This image was a Seaside image running DBX Pier website. So I swapped out everything, and then I lazily started to navigate DBX website, causing the needed graphs to be swapped in. I was able to naviagate all webapp and use it perfectly. I even saved the image. And all this email is to say that such an image was 3MB.

Below I paste the abstract of Marea paper which explains better what it is:

----
During the execution of object-oriented applications, several millions of objects are created, used and then collected if they are not referenced. Problems appear when objects are unused but cannot be garbage-collected because they are still referenced from other objects. This is an issue because those objects waste primary memory and applications use more primary memory than they actually need. We claim that relying on the operating system’s (OS) virtual memory is not always enough since it cannot take into account the domain and structure of applications. At the same time, applications have no easy way to parametrize nor cooperate with memory management. In this paper, we present Marea, an efficient application-level object graph swapper for object-oriented programming languages. Its main goal is to offer the programmer a novel solution to handle application-level memory. Developers can instruct our system to release primary memory by swapping out unused yet referenced objects to secondary memory. Our approach has been qualitatively and quantitatively validated. Our experiments and benchmarks on real-world applications show that Marea can reduce the memory footprint between 23% and 36%.
----

[1] <a href="https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;">https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645
[2] <a href="http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf" style="color:rgb(17,85,204)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;">http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf
[3] <a href="http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;">http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf

Cheers,

On Tue, Apr 3, 2018 at 11:06 AM, Louis LaBrunda <[hidden email]> wrote:
Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:
Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a <a href="https://www.infoworld.com/article/3212988/application-development/how-to-not-use-the-large-object-heap-in-net.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;">sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept  <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
<div

Louis LaBrunda

Re: How hard would it be for the new VM to run two images at once?

In reply to this post by Mariano Martinez Peck-2

Hi Mariano,

I know about FFI being a problem and pinning being the solution. I was just trying to leave the door open for potential problems in regular Smalltalk. I don't think there should be and I don't expect that any regular object would get hold of a pointer that could change. I do expect that collection classes have pointers to objects that get moved and that the GC must fix them. I expect that is a part of why using multiple threads is hard.

Lou

On Wednesday, April 4, 2018 at 9:50:44 AM UTC-4, marianopeck wrote:

Hi Louis,

I just wanted to add a small comment on your sentence "Now, I don't have any evidence that moving the objects around is a problem."
This is OK and in the typical case, the GC move objects around and takes care of correctly updating pointers.
However, as far as I remember, this could be a problem with certain usage of FFI (when you pass objects to the external code). This is not the only reason, but instead something common enough so that it appeared the concept of "pinned objects". Basically, it allows you to tell the GC "please, do not move this object".

Cheers,

On Wed, Apr 4, 2018 at 10:41 AM, Seth Berman <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="gOe0pgrcAgAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">sbe...@...> wrote:
Hi Lou,

No worries, I didn't think anything was off topic.
Concerning the GC, this is a well studied field with a huge body of knowledge that dates back to lisp systems in the early 60s.
So the book I recommended is good for foundational knowledge as well as classifying the different types of algorithms involved and how they are related.
For example, what you described in your last post I would think is a form of incremental compaction...which has been classified, studied and I've seen a few implementations described in research papers.
So good for you for arriving at it through intuition. That's why I thought you might like the book:)

- Seth

On Tuesday, April 3, 2018 at 5:13:38 PM UTC-4, Louis LaBrunda wrote:
Hi Mariano and Seth,

On Tuesday, April 3, 2018 at 12:58:16 PM UTC-4, Seth Berman wrote:
Hi All,

@Mariano
This is very cool research!

+1. I have seen you posts in one (or more) of the Squeak forums but hadn't realized you had joined Instantiations. Glad to have you there.

@Lou
I would agree that the strategies in that article are on the extreme end.
I suppose if you had a deployed production system that was suffering from extreme fragmentation, and these tips fixed the issue, I could see making use of it...at least temporarily.

I agree, I was going to say as much but must have gotten distracted.

I don't think there is anything wrong with using GC tuning parameters to optimize for your application's allocation/reclamation profile, though.

I agree.

"Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged."
- Hmm. Not sure I follow completely.
- The logical "New" space is comprised of many segments. (EsMemorySegment allSegments select: [:s | s isNewSpace])
- 2 of them are special...those are the semi-space halfs that the Scavenger operates on. One is inactive while the other one is active...and a scavenge copies the live objects from active to inactive and then switch roles.
- The other ones that are "New" have various purposes. For example, Swapper might create temporary new spaces if it's a small enough segment size. Once done, an attempt is made to merge these back into one of the semi-spaces so you don't tenure temporary stuff.
- I don't understand "defraged separately". Are you making the case for a concurrent GC that allocates while it compacts? If so, then a lot of thread coordination is required to make this happen....this is like a different garbage collector.

I am thinking of a GC that can allocate while it compacts but NOT in two threads just two memory areas. I'm wondering if everything stops while the compaction takes place, meaning the compaction has to finish before any new objects can be created or can a little compaction get done and then some new objects get created (in the other area) and then back to compacting.

It sounds like you are intuitively able to ask appropriate questions regarding this topic. I would highly recommend reading <a href="http://gchandbook.org/" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fgchandbook.org%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE32VaUq1xMQmv6Kl4c2-VNEZJIhA';return true;">"The Garbage Collection Handbook - The Art of Automatic Memory Management" (get the 2012 one...not the 1996 one).
It walks through most of the topics we have touched on relating to GC in thorough detail. And describes it in a way far better than what I could come up with.
After that, if you want to explore some more specific areas, I have a ton of research papers I can recommend.

I should. But as I look to my right there is a 4 inch high stack of Scientific American magazines and others and a one inch thick book that I haven't gotten to yet. I have always been a slow reader. I had gotten to a point where I was reading a lot (still slow but not bad) but now my father is 102 and wakes me up at night. Last night it was at least twice. Once I checked on him and he was laying in bed, I think asleep, calling my name. Not in any kind of panic or anything, just calling Louie, Louie. It leaves me very tired much of the day and if I try to read much, I just fall asleep.

So, I apologize to everyone for taking my own post off topic and will try to not let the discussion get too far afield in the future. I hope I am not taking up too much of people valuable time.

Lou

- Seth

On Tuesday, April 3, 2018 at 10:40:07 AM UTC-4, marianopeck wrote:
Hi guys,

Sorry for joining too late to the party but this is an area I really like, so better later than never :) While its not related to the "multi-core" point or view or the "running multiple images with the same running VM", my PhD did touch a few of the topics discussed in this thread. In addition, you can also see all the papers I refer in the thesis and you will likely find several that are very interesting! Of course, my project was just a prototype and not something mature enough or production ready, but still one more thought in this area...

So...for my PhD [1] [2], one of the things I did (lets say the closest to the PhD topic) was called Marea [3] (I can also send the PhD defense video on youtube if you want). In Marea what I did was to modify the Pharo VM to implement a very basic (with lots of limitations) object usage tracking...I simply flagged when an object was used and cleared the flag every in a while. Then, at image side, I would find graphs of unused objects, and replaced the boundary objects with proxies using Ghost proxies (note that an anused object is not the same as unreferenced object,...GC does nothing here). The graphs where then serialized with Fuel. Finally, if those graphs happened to be needed, then the proxy would intercept the message, materialize from Fuel, and plug back the original graph.

Anyway... all of what was to said that I some point I did an experiment. I was able to already proxify and serialize classes, methods, etc. So I took the whole image and I swapped out all classes and their instances (each class with its instances in a different graph) but only a really small core. This image was a Seaside image running DBX Pier website. So I swapped out everything, and then I lazily started to navigate DBX website, causing the needed graphs to be swapped in. I was able to naviagate all webapp and use it perfectly. I even saved the image. And all this email is to say that such an image was 3MB.

Below I paste the abstract of Marea paper which explains better what it is:

----
During the execution of object-oriented applications, several millions of objects are created, used and then collected if they are not referenced. Problems appear when objects are unused but cannot be garbage-collected because they are still referenced from other objects. This is an issue because those objects waste primary memory and applications use more primary memory than they actually need. We claim that relying on the operating system’s (OS) virtual memory is not always enough since it cannot take into account the domain and structure of applications. At the same time, applications have no easy way to parametrize nor cooperate with memory management. In this paper, we present Marea, an efficient application-level object graph swapper for object-oriented programming languages. Its main goal is to offer the programmer a novel solution to handle application-level memory. Developers can instruct our system to release primary memory by swapping out unused yet referenced objects to secondary memory. Our approach has been qualitatively and quantitatively validated. Our experiments and benchmarks on real-world applications show that Marea can reduce the memory footprint between 23% and 36%.
----

[1] <a href="https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.slideshare.net%2FMarianoMartinezPeck%2Fthesis-presentation-14987645\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxAcY0wRtMjo-trwus52O7NZrGZw';return true;">https://www.slideshare.net/MarianoMartinezPeck/thesis-presentation-14987645
[2] <a href="http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf" style="color:rgb(17,85,204)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fphd%2FPhD-2012-Martinez-Peck.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsbPJeRSYxvrL2CCHGrPqZ9S1Axg';return true;">http://rmod.lille.inria.fr/archives/phd/PhD-2012-Martinez-Peck.pdf
[3] <a href="http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255)" rel="nofollow" target="_blank" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Frmod.lille.inria.fr%2Farchives%2Fpapers%2FMart12c-JOT-Marea.pdf\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHuPwiiGTy-uBVn_ye7xASSrsCl4Q';return true;">http://rmod.lille.inria.fr/archives/papers/Mart12c-JOT-Marea.pdf

Cheers,

On Tue, Apr 3, 2018 at 11:06 AM, Louis LaBrunda <[hidden email]> wrote:
Hi Seth,

Thanks for the link, I read it with interest. This statement struck me as being if not crazy/stupid at least odd:

"Now that we know about the pitfalls of large object heap fragmentation, let’s take a quick tour of the best practices that we can adopt to avoid it. A recommended strategy is to identify the large objects in your application and then split them into smaller objects – perhaps using some wrapper class. You can also redesign your application to ensure that you avoid using large objects. Another approach is to recycle the application pool periodically."

Why would anyone think it is a good idea to write a program in a way that tries to take advantage of the hidden inner workings of the underling system? A system that could change at any time, possibly resulting in the opposite of the intended effect.

So, not defraging the big stuff makes sense. When looking at my running VA Smalltalk systems, I see that they can grow and shrink in their memory usage. Does this mean that the memory areas that VA keeps, like new is not continuous? If so, that takes be back to my question about there being different areas of memory that get defraged separately, allowing new objects to be created in one while another is being defraged.

Lou

On Monday, April 2, 2018 at 2:24:59 PM UTC-4, Seth Berman wrote:
Hi Lou,

You could have certain "spaces" that you don't compact. This is typically for large objects (i.e. 8MB ByteArray) that you don't want to move around all the time because that would be expensive. So these are separated out and managed differently.
There have been many GC papers written on this and it exists in many virtual machines...often referred to as Large Object Area, or Large Object Space or Large Object Heap.
A quick google search shows implementations in .Net, IBM J9, V8 javascript engine...and many others I'm sure.
Here is a <a href="https://www.infoworld.com/article/3212988/application-development/how-to-not-use-the-large-object-heap-in-net.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.infoworld.com%2Farticle%2F3212988%2Fapplication-development%2Fhow-to-not-use-the-large-object-heap-in-net.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhKlV6Au9VUaASMra9kHpII5UceQ';return true;">sample article on .Net's Large Object Heap that describes all the stuff we have been talking about concerning compaction and fragmentation.

- Seth

On Monday, April 2, 2018 at 1:46:31 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Thank you very much for this wonderfully thorough answer. Every thing you say makes sense. Some of which I had guessed but I didn't want to say too much as to not influence anyones answers. Would it make any sense, at all, to divide the area of memory that gets compacted into two sections so that one could be getting compacted while the other was used to allocate new objects in the quick manner you described?

Lou

On Monday, April 2, 2018 at 12:06:07 PM UTC-4, Seth Berman wrote:
Hi All,

'Does shared memory have to be a separate memory area?"
- If you mean shared from different processes...then yes. That takes special operating system support since most flavors of OS process memory is isolated from each other.
- For example, ICs can work exactly like this...sharing code between many processes...primarily for lowing the memory footprint.
It's interesting to see what an advantage this can become when running on the cloud based on how consumers are charged (i.e. 0.9cents / unit of memory / month).
- I was watching the IBM J9 folks talking about this very advantage...though they call the IC concept  <a href="https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.aix.80.doc/diag/understanding/shared_classes.html" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSSYKE2_8.0.0%2Fcom.ibm.java.aix.80.doc%2Fdiag%2Funderstanding%2Fshared_classes.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsYyJ02DM4m0PJVcjlerhK8V2VwA';return true;">"shared classes" in the java vm

"Would it be better for shared memory to be another area or would it be better to have a flag in the object header"
- Based on the answer above, it would theoretically seem to make sense to have it be in a separate area
- If you marked an object as shared...but it was in standard non-shared memory (i.e. what "new" and "old" logical smalltalk segments would get mapped on to), then it will be a very lonely type of sharing:)
- Now let's consider this in terms of an "object graph". What would it mean to have a "shared" collection of "non-shared" objects? What would it mean for an object with 5 instance variables to contain 3 shared objects and 2 non-shared objects?
It would seem to lead to nothing good. So using this shared memory approach....this means if you select 1 object to be shared...you are really "logically" sharing the whole object graph for which that object is the root. And now you have to
solve that transitive object sharing issue, for which more complexity is demanded like proxies and other such things.
- I imagine we are more into Richard's area of expertise and some of the issues that Gemstone would set out to solve.

"I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often?"
- Compaction. Yes, it works very similarly to "Disk Defrag".
- Compaction only runs after a Global Garbage collection....not after quick scavenges (which implicitly compacts)
- Yes, it can be an expensive step...and yes there are "compaction avoidance" techniques.
- With compaction, there are no holes in the heap. You can very quickly allocate objects by incrementing a bump pointer. This is both simple and fast.
- Without compaction, you are left with lots of holes in the heap. You will need a new object allocator that
can "very quickly" find an appropriate hole given the size of the object you requested, and more complexity to maintain that booking.
- Without compaction, you will begin to lose cache locality benefits over time and can experience more CPU cache misses.
For example, objects accessing related objects or their instance variables may need to hop across memory as opposed to a quick access that would have been in one of the CPU caches.
- It's hard to answer this question..."is it worth it?". I've always found GCs hard to benchmark because the "cost" is not just # of instructions. There is CPU caching behavior to consider and many other factors like your specific application profile. It's kind of like asking if some memory management change in the kernel of the operating system VAST is running on is worth it. *Shrug*...it depends:)
- I often run it using tools like valgrind/cachegrind where I can simulate CPUs with different cache sizes. I find it interesting sometimes when I experiment by changing 1 line of code to what looks like an obvious performance improvement in the GC. Then I profile it and discover I have caused major perturbations in caching behavior and slowed the whole system down considerably.

" It seems to me that it would take time to move the objects and fix anything that points to them."
- I went over this already I guess...but the tradeoff is:
Compaction means to spend some time to keep object allocation super fast and nice caching behavior
vs
Not compacting, which means you need more booking and good algorithms to efficiently track holes in the heap so that you can have
a fragmented allocator that can allocate objects in a quick manner....and be willing to lose some potentially nicer caching characteristics.
There is complexity either way. Although, I like the concept of non-moving for things like passing objects out to C or, in my case, debugging the VM since moving objects makes this harder to track down bugs.

- Seth

I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

On Monday, April 2, 2018 at 9:55:09 AM UTC-4, Louis LaBrunda wrote:
Hi All,

I'm very glad to have Richard in the conversation, see his last two posts.

I have two other questions. The first is an implementation question that doesn't have much to do with this threads topic and the second is way off topic but I'm curious.

Does shared memory have to be a separate memory area? I think there are at least two memory areas in VA Smalltalk now, "old" and "new" memory. Would it be better for shared memory to be another area or would it be better to have a flag in the object header (not sure that's the right name)?

The second question is memory related. I understand the need for garbage collection. I think the end phase is usually to move objects around in memory to, I guess, defragment it. Is this step worth the cost or could it be done less often? Memory is inexpensive and readily available.

Now, I don't have any evidence that moving the objects around is a problem. It seems to me that it would take time to move the objects and fix anything that points to them. I have read that it can be the cause of temporarily poor responsiveness when it happens. Could the defragmentation be put off or does that make the delay longer when it happens? How much memory is "wasted" when defragmentation is delayed?

Lou

On Saturday, March 31, 2018 at 7:42:25 PM UTC-4, Richard Sargent wrote:

On Friday, March 30, 2018 at 8:03:49 AM UTC-7, Louis LaBrunda wrote:
Hi Seth,

Sorry for the delay in getting back to this post. I have given the shared object graph some more thought, so here goes.

I understand that just unleashing two or more threads on many of VA Smalltalk's classes/objects is a bad idea. People seldom explain why it is a bad idea, so for the sake of the discussion, let me venture a guess. For example the collections classes that can grow, like OrderedCollection are basically arrays of a given size that contain pointers the the objects in the collection. When an object is added to the collection and there isn't room for it, the object gets more memory (another array probably twice as big) and copies the pointers to the new array and then frees the old one. If a second thread were to come along and do this same process before the first thread finishes growing the collection, the internal state of the object could be left in disarray (I think there is a pun in there somewhere). I doubt there is anything that can be done about this other than preventing the second thread from having access to the collection until the first thread is finished.

Now, the simplest thing to think of is to replace collection with shared object graph and not allow the second thread to send any messages to any objects in the graph. The question then arises, is this too restricting? Does it defeat too may of the benefits (what ever they are) of having more than one image running in one VM? If all the objects of both VMs were being shared all the time, I think the answer would be yes, one thread wouldn't be doing anything most of the time. But it the shared object graph is small and most of the time the threads were working with the non-shared portions of the images, then I think it wouldn't be too restrictive. Of course, I don't know this for sure, that's why we are having this discussion.

The next question is how many applications would benefit from running two or more images in the same VM and sharing a small object graph?

Now, this is a great question. But, tease out the multiple constraints.

When there are multiple threads, they all share the same memory spaces. In my opinion, this is greatly complicating.
However, the idea of sharing a specific object graph is interesting and exciting.

I suspect it would be relatively easy for VA Smalltalk to be able to access segments in shared memory. Not trivial, but not necessarily all that difficult. There are a number of technical issues related to the life cycle of such a shared memory space and moving or creating the object graph in it. I'm going to wave my hands in the air and claim that those issues aren't all that difficult. (Seth, feel free to rip me a new one if that is wildly incorrect!) By explicitly segregating the object graph into a shared memory space, you eliminate the myriad challenges that I believe multiple images in a single VM would entail and you correctly model the problem you wish to solve.

Such a solution requires shared memory segments, the ability to gate access to them - typically OS semaphores, the ability for VA Smalltalk to explicitly control creation of such an object graph in that memory space, and the ability for VA Smalltalk to have modifications to the object graph (e.g. new instance creation) be restrained to that address space. In my somewhat uneducated opinion, I don't These are rocket science hard problems.

Maybe none. But I think some. I have already mention Seaside, where the main thread could begin the process of a new session and pass it off to another thread, preforming a form of load balancing. Yes, I know there are other ways of doing this with front ends like Apache. Those other ways may be better, I don't know. I guess they involve having N number of VMs running or the front end being able to automatically start VMs as needed but it seems to be it would be easier for one VM to start more threads if needed. I assume the image would be able to have the VM start another image in another thread.

Lou

On Thursday, March 22, 2018 at 7:19:55 PM UTC-4, Seth Berman wrote:
Hi Lou,

I think its better to think of an object in these discussions as a complex graph rather than a single datum.
The discussion is too over-simplified if we just talk about a data/object being passed around . We're back to the "bytes" example and I don't have much to contribute here.
Presumably this object refers to other objects. Probably other objects refer to this object...and if not then we probably expect this object to go away.

So I would think the discussion would need to begin with a clear understanding of what is meant by "sharing an object"...because we are really talking about "sharing an object graph which other things may refer to".
I think this orientation leads one to ask the appropriate questions when considering various approaches...and ultimately a worthwhile discussion.

For example, you mentioned "but once passed the originating image no-longer has a knowledge of or even interest in the data/object passed".
If I were to think of an object as this standalone non-dependent thing...then what you suggest is very easy for me to imagine.
But now let's replace "data/object" with "object graph that other objects may refer to" and re-read.
When I read it this way...the first question that stands out to me is "So is it implied that we have to transfer the whole object graph?"
Yes...ok well that might be a lot. And then of course, the details of how to correctly and efficiently vacate an object graph in mid execution (that other objects might be referring to some portion of) from one place to another we will just leave because that's currently too painful for me to imagine:)
No...now one would have to address some of my previous questions concerning share semantics. One part of the graph in one place and the other part in another place....and wrapped around it is a memory management program that is trying to figure out who still needs to be there.

Unfortunately, unless we are talking about a theoretical execution engine, this is one of those areas where the detail is not desired...but it's also not escapable.
Otherwise the discussion will quickly degenerate into "well, that would be nice if it worked that way".

I think some of your points have been good and in the right direction.
I would be curious to hear some of your suggestions if you mentally swap out "data/object" with "object graph that others objects might refer to some part of".

- Seth

On Thursday, March 22, 2018 at 3:18:10 PM UTC-4, Louis LaBrunda wrote:
Hi Seth,

I don't know if you can tell but I am more interested in the discussion than I am in trying to get it to go in any particular direction. So, forgive me for not being able to say "I want to achieve X". I did say "shared memory" but maybe a shared connection is a better place to start. Maybe it can do slightly more than pass bytes but once passed the originating image no-longer has an knowledge of or even interest in the data/object passed.

Again, I don't NEED to solve any particular problem but as an example, I do have a bunch of images that are all different and communicate with each other via serialized objects sent over a TCP/IP connection. This can be very slow. If two different images, running in the same VM could pass data/objects back and forth, maybe without serializing, that would be cool.

This could probably also work for the Seaside program with two or more copies of the same image, where one image gets all connections and then passes them off to another image.

Lou

On Thursday, March 22, 2018 at 11:37:33 AM UTC-4, Seth Berman wrote:
Hi Lou,

The "shared memory" area can work...but there are still a lot of considerations in a memory-managed environment that make it non-trivial in the general case of objects.
For example, some immediate questions I would have are...
Who accepts responsibility for freeing this shared object when it's no longer in use?
If it's not the user, how does the program establish the shared object is no longer in use?
If it is the user, then I just assume we are introducing unsafe behavior into a virtual machine which is supposed to provide certain protections for objects.
What if this shared object points to other non-shared objects in one of many potential heap spaces...what does book-keeping for that look like?
How is the program telling non-shared objects not to be garbage collected when all intra-heap references are gone, but shared refs still remain?

This all reminds me of some of the SST distributed garbage collection components which set out to solve some non trivial problems.

Now if all you want to do is share some "bytes"...maybe just things like byte arrays or strings that basically don't point to other objects...some of these issues might go away.
But that seems awfully limiting...and then again...who establishes that your done with some "bytes"?

Not sure how inexpensive this endeavor might be:) I guess it just depends on what exactly your trying to do.

-- Seth

On Thursday, March 22, 2018 at 8:43:56 AM UTC-4, Louis LaBrunda wrote:
Hi Seth,

Now you are getting to what I was thinking. Each image would have its own copy of the memory "areas" (like old memory, new memory, I will let you list them all if you like) that a single image has now. And as you say there can be a "shared memory area" that was what I was alluding to with communicate with each other through a primitive.

I don't think this is the perfect solution to any particular problem but it may be an "inexpensive" solution to something like load balancing with a Seaside program that doesn't require some front end system to direct traffic to separate VA Smalltalk programs.

Lou

On Wednesday, March 21, 2018 at 10:13:56 AM UTC-4, Seth Berman wrote:
Hi Lou,

It's certainly not a bad question. Many languages seek to solve this very important issue in different ways.
I know questions like this have been asked in other posts so it may be worth searching on.
It's really the question of a shared object heap (or global vm state) that makes this difficult. Neither the vm
or the smalltalk image code (i.e. Collections) would be prepared to deal with multiple executions operating on them
at the same time.
So one way or another, you would really need to maintain separate heaps for each execution unit. Perhaps some
part of it could be shared memory or perhaps just efficient communication channels.
But this can be accomplished via separate OS processes.

-- Seth

On Tuesday, March 20, 2018 at 6:21:18 PM UTC-4, Louis LaBrunda wrote:
Hi Seth and Joachim,

You both raise good questions. I'm thinking that the VM running 2 or more images allows for efficient use of multiple CPUs or Cores but I have no real i

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="gOe0pgrcAgAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">va-smalltalk...@googlegroups.com.
To post to this group, send email to <a href="mailto:<a href="javascript:" target="_blank" gdf-obfuscated-mailto="gOe0pgrcAgAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">va-sma...@googlegroups.com"