VM crash with message 'could not grow remembered set'

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

VM crash with message 'could not grow remembered set'

Phil B
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)
Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Clément Bera-4
 
Hi,

This is another limit than the out of memory error. Too many references from old objects to young objects. The limit cannot be changed directly from the image. However, if using Spur, you can try to change the young space size, which also changes the remembered table size and might fix your problem. To do so you can do:
Smalltalk vm parameterAt: 45 put: (Smalltalk vm parameterAt: 44) * 4. And then restart the image.
Check here section TUNING NEW SPACE SIZE for more info about that.

If you are using Spur, could you send us the image (if it is 500Mb could you put it to download on dropbox or something like that ?) ? That way we can reproduce and see what is possible. Normally in Spur if the remembered table grows too big a tenure to shrink the remembered table happens, so that error should not happen. Eliot is currently moving to another place, so he might be busy. If he is available to answer, I guess he will have a look, if he is not, I can have a look today or thursday. However I am not interested in fixing pre-Spur VMs.

Regards,




On Mon, Oct 9, 2017 at 11:57 PM, Phil B <[hidden email]> wrote:
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)


Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Eliot Miranda-2
 
Hi Both,

On Oct 9, 2017, at 10:12 PM, Clément Bera <[hidden email]> wrote:

Hi,

This is another limit than the out of memory error. Too many references from old objects to young objects. The limit cannot be changed directly from the image. However, if using Spur, you can try to change the young space size, which also changes the remembered table size and might fix your problem. To do so you can do:
Smalltalk vm parameterAt: 45 put: (Smalltalk vm parameterAt: 44) * 4. And then restart the image.
Check here section TUNING NEW SPACE SIZE for more info about that.

If you are using Spur, could you send us the image (if it is 500Mb could you put it to download on dropbox or something like that ?) ? That way we can reproduce and see what is possible. Normally in Spur if the remembered table grows too big a tenure to shrink the remembered table happens, so that error should not happen. Eliot is currently moving to another place, so he might be busy. If he is available to answer, I guess he will have a look, if he is not, I can have a look today or thursday. However I am not interested in fixing pre-Spur VMs.

I should be able to look at this next week. It is an important bug that I want to look at ASAP.



Regards,




On Mon, Oct 9, 2017 at 11:57 PM, Phil B <[hidden email]> wrote:
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)


Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Phil B
In reply to this post by Clément Bera-4
 
Clément,

Thanks for the info.  This is a Spur image.  Unfortunately it has some sensitive information so I'll have to see if I can reproduce the issue in one I can share.

On a related note, I seem to be running into more of these kinds of VM issues as I'm attempting to scale up my image sizes (I can only imagine the fun I'll be having with multi GB images) and am thinking it would be helpful if the VM had the ability (even if it requires some sort of debug build) to raise an exception in the image when a fixed resource exceeded X% of it's maximum value.  Has a capability along those lines been considered?

Thanks,
Phil

On Oct 10, 2017 1:12 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

This is another limit than the out of memory error. Too many references from old objects to young objects. The limit cannot be changed directly from the image. However, if using Spur, you can try to change the young space size, which also changes the remembered table size and might fix your problem. To do so you can do:
Smalltalk vm parameterAt: 45 put: (Smalltalk vm parameterAt: 44) * 4. And then restart the image.
Check here section TUNING NEW SPACE SIZE for more info about that.

If you are using Spur, could you send us the image (if it is 500Mb could you put it to download on dropbox or something like that ?) ? That way we can reproduce and see what is possible. Normally in Spur if the remembered table grows too big a tenure to shrink the remembered table happens, so that error should not happen. Eliot is currently moving to another place, so he might be busy. If he is available to answer, I guess he will have a look, if he is not, I can have a look today or thursday. However I am not interested in fixing pre-Spur VMs.

Regards,




On Mon, Oct 9, 2017 at 11:57 PM, Phil B <[hidden email]> wrote:
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)




Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Clément Bera-4
 
Hi,

Without a way to reproduce, it is difficult to deal with the problem.

I had reports also from the Pharo community with some problems when scaling up, but it seems most of that noise is gone since Spur has a new compactor. There are still some issues, such as GC pauses, that we are trying to deal with. I wrote this post here to help people dealing with larger images (couple Gbs). There are things that you can change from the image, with the vm parameters, that are recommended for larger images. For example in Java for a couple Gb heap the VM scales up automatically young space size to 200Mb, in our case the default is set to 4Mb and you need to use vm parameters to set it up. 

The thing with an image-side exception is that it will execute additional code and allocate new objects. To do that, we need first to deal with the problem. For example we could do a scavenge to try to decrease the number of RT entries, but maybe the overflow happened in a specific execution state where a scavenge is not possible. The command line error messages don't have this kind of problems. We could try to do something along those lines but it is not that simple.

Regards,


On Tue, Oct 10, 2017 at 9:58 PM, Phil B <[hidden email]> wrote:
 
Clément,

Thanks for the info.  This is a Spur image.  Unfortunately it has some sensitive information so I'll have to see if I can reproduce the issue in one I can share.

On a related note, I seem to be running into more of these kinds of VM issues as I'm attempting to scale up my image sizes (I can only imagine the fun I'll be having with multi GB images) and am thinking it would be helpful if the VM had the ability (even if it requires some sort of debug build) to raise an exception in the image when a fixed resource exceeded X% of it's maximum value.  Has a capability along those lines been considered?

Thanks,
Phil

On Oct 10, 2017 1:12 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

This is another limit than the out of memory error. Too many references from old objects to young objects. The limit cannot be changed directly from the image. However, if using Spur, you can try to change the young space size, which also changes the remembered table size and might fix your problem. To do so you can do:
Smalltalk vm parameterAt: 45 put: (Smalltalk vm parameterAt: 44) * 4. And then restart the image.
Check here section TUNING NEW SPACE SIZE for more info about that.

If you are using Spur, could you send us the image (if it is 500Mb could you put it to download on dropbox or something like that ?) ? That way we can reproduce and see what is possible. Normally in Spur if the remembered table grows too big a tenure to shrink the remembered table happens, so that error should not happen. Eliot is currently moving to another place, so he might be busy. If he is available to answer, I guess he will have a look, if he is not, I can have a look today or thursday. However I am not interested in fixing pre-Spur VMs.

Regards,




On Mon, Oct 9, 2017 at 11:57 PM, Phil B <[hidden email]> wrote:
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)








--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Phil B
 
Clément,

On Oct 11, 2017 4:09 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

Without a way to reproduce, it is difficult to deal with the problem.

I managed to get a reproducible example... Will post details shortly.


I had reports also from the Pharo community with some problems when scaling up, but it seems most of that noise is gone since Spur has a new compactor. There are still some issues, such as GC pauses, that we are trying to deal with. I wrote this post here to help people dealing with larger images (couple Gbs). There are things that you can change from the image, with the vm parameters, that are recommended for larger images. For example in Java for a couple Gb heap the VM scales up automatically young space size to 200Mb, in our case the default is set to 4Mb and you need to use vm parameters to set it up. 

Funny you mention that post as it might have some bearing on the issue 😀

The thing with an image-side exception is that it will execute additional code and allocate new objects. To do that, we need first to deal with the problem. For example we could do a scavenge to try to decrease the number of RT entries, but maybe the overflow happened in a specific execution state where a scavenge is not possible. The command line error messages don't have this kind of problems. We could try to do something along those lines but it is not that simple.

What I had in mind was something along the lines of an optional check *before* hitting an absolute resource limit that would raise an exception.

For example, let's say we had:
Smalltalk vmParameterAt: X put: 95. "A VM parameter which doesn't currently exist accepting a small int in the range of 0-99 where 0 represents don't warn and any other value represents the warning threshold as a percentage.  Just to keep things simple, use a single setting as the warning threshold for all scare resources that we know the upper bounds for"

So at some point you run some code that crosses threshold for some scarce/fixed resource... maybe it's stack pages, maybe semaphores.. whatever.  Then the VM could raise a VMThresholdExceeded exception (or whatever it made sense to call it) in the process that triggered it with a message string indicating what limit got hit.  This would most likely need to be a resettable one-shot trigger to be useful (I.e. to ensure that it doesn't trigger a cascade of exceptions).  That would be a much nicer troubleshooting starting point than a stack trace at the command line.

Regards,


On Tue, Oct 10, 2017 at 9:58 PM, Phil B <[hidden email]> wrote:
 
Clément,

Thanks for the info.  This is a Spur image.  Unfortunately it has some sensitive information so I'll have to see if I can reproduce the issue in one I can share.

On a related note, I seem to be running into more of these kinds of VM issues as I'm attempting to scale up my image sizes (I can only imagine the fun I'll be having with multi GB images) and am thinking it would be helpful if the VM had the ability (even if it requires some sort of debug build) to raise an exception in the image when a fixed resource exceeded X% of it's maximum value.  Has a capability along those lines been considered?

Thanks,
Phil

On Oct 10, 2017 1:12 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

This is another limit than the out of memory error. Too many references from old objects to young objects. The limit cannot be changed directly from the image. However, if using Spur, you can try to change the young space size, which also changes the remembered table size and might fix your problem. To do so you can do:
Smalltalk vm parameterAt: 45 put: (Smalltalk vm parameterAt: 44) * 4. And then restart the image.
Check here section TUNING NEW SPACE SIZE for more info about that.

If you are using Spur, could you send us the image (if it is 500Mb could you put it to download on dropbox or something like that ?) ? That way we can reproduce and see what is possible. Normally in Spur if the remembered table grows too big a tenure to shrink the remembered table happens, so that error should not happen. Eliot is currently moving to another place, so he might be busy. If he is available to answer, I guess he will have a look, if he is not, I can have a look today or thursday. However I am not interested in fixing pre-Spur VMs.

Regards,




On Mon, Oct 9, 2017 at 11:57 PM, Phil B <[hidden email]> wrote:
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)








--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq


Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Phil B
In reply to this post by Clément Bera-4
 
Clément,

On Oct 11, 2017 4:09 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

Without a way to reproduce, it is difficult to deal with the problem.

Hopefully, this will allow you to do so: https://github.com/pbella/VmIssueCouldNotGrow

This turned out to be tricky to provide a repo case for since I'm not sure exactly what is triggering it so I reproduced the type of work I'm throwing at the VM (it's a bulk parser/loader) where there's lots of continuous allocation going on with the occasional saving of a result to generate lots of garbage.  This should run in 5-10 minutes depending on the speed of your system.

The main caveat is that I'm only able to get this example to reliably reproduce with the included VM with the commented VM parameters applied.  So I'm not sure if this is an issue only with this particular VM/parameter combination or if it's just generally a difficult to reproduce issue.

Thanks,
Phil
Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Stephan Eggermont-3
In reply to this post by Clément Bera-4
 


Op 11 okt. 2017 om 10:08 heeft Clément Bera <[hidden email]> het volgende geschreven:
>
> Hi,
>
> Without a way to reproduce, it is difficult to deal with the problem.
>
> I had reports also from the Pharo community with some problems when scaling up, but it seems most of that noise is gone since Spur has a new compactor.

Well, I just stopped trying to scale up.

Stephan
Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Clément Bera-4
 


On Fri, Oct 13, 2017 at 12:43 AM, Stephan Eggermont <[hidden email]> wrote:



Op 11 okt. 2017 om 10:08 heeft Clément Bera <[hidden email]> het volgende geschreven:
>
> Hi,
>
> Without a way to reproduce, it is difficult to deal with the problem.
>
> I had reports also from the Pharo community with some problems when scaling up, but it seems most of that noise is gone since Spur has a new compactor.

Well, I just stopped trying to scale up.

We're tracking down one by one each problem forbidding to efficiently scale up, so it is more and more possible everyday. Eliot wrote the new compactor. Right now I am working with Sophie on loading large images. In the short term the GC will be incremental and I plan to try to improve mmap segment allocation. So try again next year.
 

Stephan



--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Clément Bera-4
In reply to this post by Phil B
 


On Thu, Oct 12, 2017 at 9:41 PM, Phil B <[hidden email]> wrote:
 
Clément,

On Oct 11, 2017 4:09 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

Without a way to reproduce, it is difficult to deal with the problem.

Hopefully, this will allow you to do so: https://github.com/pbella/VmIssueCouldNotGrow

This turned out to be tricky to provide a repo case for since I'm not sure exactly what is triggering it so I reproduced the type of work I'm throwing at the VM (it's a bulk parser/loader) where there's lots of continuous allocation going on with the occasional saving of a result to generate lots of garbage.  This should run in 5-10 minutes depending on the speed of your system.

The main caveat is that I'm only able to get this example to reliably reproduce with the included VM with the commented VM parameters applied.  So I'm not sure if this is an issue only with this particular VM/parameter combination or if it's just generally a difficult to reproduce issue.

Ok.

Today I am very busy.

I will try to have a look tomorrow, else Eliot said he could have a look next week. 5-10 min means if I want to simulate I must likely will need to start simulation tonight and debug tomorrow morning.
 

Thanks,
Phil




--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Ryan Macnak
In reply to this post by Clément Bera-4
 
Why is filling the remembered set fatal? It's possible to empty the remembered set by promoting everything in new space.

On Oct 9, 2017 10:12 PM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

This is another limit than the out of memory error. Too many references from old objects to young objects. The limit cannot be changed directly from the image. However, if using Spur, you can try to change the young space size, which also changes the remembered table size and might fix your problem. To do so you can do:
Smalltalk vm parameterAt: 45 put: (Smalltalk vm parameterAt: 44) * 4. And then restart the image.
Check here section TUNING NEW SPACE SIZE for more info about that.

If you are using Spur, could you send us the image (if it is 500Mb could you put it to download on dropbox or something like that ?) ? That way we can reproduce and see what is possible. Normally in Spur if the remembered table grows too big a tenure to shrink the remembered table happens, so that error should not happen. Eliot is currently moving to another place, so he might be busy. If he is available to answer, I guess he will have a look, if he is not, I can have a look today or thursday. However I am not interested in fixing pre-Spur VMs.

Regards,




On Mon, Oct 9, 2017 at 11:57 PM, Phil B <[hidden email]> wrote:
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)




Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Clément Bera-4
 


On Sun, Oct 15, 2017 at 8:35 AM, Ryan Macnak <[hidden email]> wrote:
 
Why is filling the remembered set fatal? It's possible to empty the remembered set by promoting everything in new space.

Usually we do a tenure to shrink the remembered table. Normally it should not be fatal. It's a bug.
 
On Oct 9, 2017 10:12 PM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

This is another limit than the out of memory error. Too many references from old objects to young objects. The limit cannot be changed directly from the image. However, if using Spur, you can try to change the young space size, which also changes the remembered table size and might fix your problem. To do so you can do:
Smalltalk vm parameterAt: 45 put: (Smalltalk vm parameterAt: 44) * 4. And then restart the image.
Check here section TUNING NEW SPACE SIZE for more info about that.

If you are using Spur, could you send us the image (if it is 500Mb could you put it to download on dropbox or something like that ?) ? That way we can reproduce and see what is possible. Normally in Spur if the remembered table grows too big a tenure to shrink the remembered table happens, so that error should not happen. Eliot is currently moving to another place, so he might be busy. If he is available to answer, I guess he will have a look, if he is not, I can have a look today or thursday. However I am not interested in fixing pre-Spur VMs.

Regards,




On Mon, Oct 9, 2017 at 11:57 PM, Phil B <[hidden email]> wrote:
 
Is this effectively an out of memory error or am I hitting some other internal VM limit?  (I.e. can the limit be increased or is  it a hard limit?) I'm running into this when using the reference finder tool in a Cuis image.  (It's a moderately large image at ~500 meg)








--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Phil B
In reply to this post by Clément Bera-4
 
Clément,

I was curious as to whether you or Eliot were able to get anything useful from this or not.

Thanks,
Phil


On Oct 13, 2017 2:52 AM, "Clément Bera" <[hidden email]> wrote:
 


On Thu, Oct 12, 2017 at 9:41 PM, Phil B <[hidden email]> wrote:
 
Clément,

On Oct 11, 2017 4:09 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

Without a way to reproduce, it is difficult to deal with the problem.

Hopefully, this will allow you to do so: https://github.com/pbella/VmIssueCouldNotGrow

This turned out to be tricky to provide a repo case for since I'm not sure exactly what is triggering it so I reproduced the type of work I'm throwing at the VM (it's a bulk parser/loader) where there's lots of continuous allocation going on with the occasional saving of a result to generate lots of garbage.  This should run in 5-10 minutes depending on the speed of your system.

The main caveat is that I'm only able to get this example to reliably reproduce with the included VM with the commented VM parameters applied.  So I'm not sure if this is an issue only with this particular VM/parameter combination or if it's just generally a difficult to reproduce issue.

Ok.

Today I am very busy.

I will try to have a look tomorrow, else Eliot said he could have a look next week. 5-10 min means if I want to simulate I must likely will need to start simulation tonight and debug tomorrow morning.
 

Thanks,
Phil




--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq


Reply | Threaded
Open this post in threaded view
|

Re: VM crash with message 'could not grow remembered set'

Clément Bera-4
 
Hi,

Thanks for you example, I could reproduce.

It seems when the VM tries to grow the remember set while there is not enough free space in old space to grow it, it does that error. 

In your case, the remember set grows in the middle of a GC, and during GC there is not enough free space in old space to allocate a larger remember set. The full GC includes a scavenge, the scavenge tenures objects leading to a growth of the remembered table, and as old space is not reclaimed yet (later in the full GC phase), there is not enough free space for it. I don't think at this point we can do a scavenge for Remembered table shrinkage (we're already in the middle of a scavenge, which is part of the full GC). Hence I think the best bet is to allocate a new old space memory segment, even though that operation can fail, it's still better than crashing. There are other solutions I can think of but I don't like any of them.

In SpurGenerationScavenger>>growRememberedSet, we have:

...
newObj := manager allocatePinnedSlots: numSlots * 2.
newObj ifNil:
[newObj := manager allocatePinnedSlots: numSlots + 1024.
newObj ifNil:
[self error: 'could not grow remembered set']].
...

If I replace:

self error: 'could not grow remembered set'

by:

(manager growOldSpaceByAtLeast: numSlots + 1024) ifNil: [self error: 'could not grow remembered set'].
newObj := manager allocatePinnedSlots: numSlots + 1024. "cannot fail"

Then your example works (in 5min45sec on my machine).

I would like to have Eliot's opinion before integrating as I am not sure if growing old space in the middle of a scavenge performed during a full GC is a good idea, there might be some strange uncommon interactions with the rest of the GC logic I don't see right now. 

Eliot what do you think ?


On Thu, Oct 19, 2017 at 9:52 PM, Phil B <[hidden email]> wrote:
 
Clément,

I was curious as to whether you or Eliot were able to get anything useful from this or not.

Thanks,
Phil


On Oct 13, 2017 2:52 AM, "Clément Bera" <[hidden email]> wrote:
 


On Thu, Oct 12, 2017 at 9:41 PM, Phil B <[hidden email]> wrote:
 
Clément,

On Oct 11, 2017 4:09 AM, "Clément Bera" <[hidden email]> wrote:
 
Hi,

Without a way to reproduce, it is difficult to deal with the problem.

Hopefully, this will allow you to do so: https://github.com/pbella/VmIssueCouldNotGrow

This turned out to be tricky to provide a repo case for since I'm not sure exactly what is triggering it so I reproduced the type of work I'm throwing at the VM (it's a bulk parser/loader) where there's lots of continuous allocation going on with the occasional saving of a result to generate lots of garbage.  This should run in 5-10 minutes depending on the speed of your system.

The main caveat is that I'm only able to get this example to reliably reproduce with the included VM with the commented VM parameters applied.  So I'm not sure if this is an issue only with this particular VM/parameter combination or if it's just generally a difficult to reproduce issue.

Ok.

Today I am very busy.

I will try to have a look tomorrow, else Eliot said he could have a look next week. 5-10 min means if I want to simulate I must likely will need to start simulation tonight and debug tomorrow morning.
 

Thanks,
Phil




--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq






--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq