2013/3/12 Eliot Miranda <[hidden email]>:
> On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier > <[hidden email]> wrote: >> OK, see the VM thread, I now think that problems does not come from >> COG, but from ClassBuilder which in some cases fail to clean a cache >> (primitive 116). >> The problem does not show up in interpreter VM thanks to primitive 119 >> (this primitives does not unlink send in cogit). > > it does unlink sends, but only for that selector. But is it really > the case that it is a missing cache flush or is it a bug in Cog with > its cache flushing? I realised the way to test this is to try the > Stack VM and see if it crashes or not. I just tried that but now > neither Cog nor the Stack VM crash although both fail the load with an > MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. > So how do I get the system back to a state where I can reproduce the > Cog crash to compare the Stack and Cog VMs with each other? > > (Apologies for being unresponsive; I've just moved into a new > apartment and only got my internet connection yesterday afternoon; at > least its fast (for the states) :) ). > > Well, primitive 119 does indeed seem clean the cache. I was confused because there are two primitive 119 primitiveFlushCacheSelective (Interpreter) primitiveFlushCacheBySelector (StackInterpreter) It's really a drag to carry all those dead code when you want to analyze quickly :( So my first correction (avoid using MethodDictionary new in ClassBuilder) was probably useless. What happened is that while recompiling all the new Parser methods, the old Parser compiled methods are still in use, and thus re-added to the cache. So my second attempt (clean the cache again just before mutation in ClassBuilder) did the trick. As for going back in update process, taking an updated trunk, browsing the update configuration, and loading them in backward order seems to work. Or the other way around, from an older 4.4 image, apply all updates up to nice.221, I think Bert posted a script to automate that. Nicolas >> I have attempted a ClassBuilder fix and posted new updates from >> nice-222 to cwp-227. >> >> Can I please ask our testers contribution once again? >> >> Nicolas >> >> 2013/3/8 Nicolas Cellier <[hidden email]>: >>> 2013/3/8 Bert Freudenberg <[hidden email]>: >>>> >>>> On 2013-03-08, at 10:55, Frank Shearar <[hidden email]> wrote: >>>> >>>>> On 7 March 2013 23:25, Frank Shearar <[hidden email]> wrote: >>>>>> On 7 March 2013 23:11, Bert Freudenberg <[hidden email]> wrote: >>>>>>> On 2013-03-07, at 23:42, Frank Shearar <[hidden email]> wrote: >>>>>>> >>>>>>>>>> On 6 March 2013 15:59, Ken G. Brown <[hidden email]> wrote: >>>>>>>>>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>>>>>>>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>>>>>>>>> abandoning and trying the update again, it apparently completes to 12511. >>>>>>>>>>> >>>>>>>>>>> Ken G. Brown >>>>>>>>>>> >>>>>>>>>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>>>>>>>>> the same update-nice-223, then trying again from what had already been >>>>>>>>>>>> loaded, got the following during the same update, during compiling >>>>>>>>>>>> SMLoader-fbs-78 as before: >>>>>>>> >>>>>>>> What I find strange about all this is that we take a 4.4-12327 image >>>>>>>> and whatever the latest Cog is and update it all the way without any >>>>>>>> probems quite a few times a day on the CI server. >>>>>>>> >>>>>>>> frank >>>>>>> >>>>>>> Looks like it's an intermittent problem, unfortunately: >>>>>>> >>>>>>> I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697. >>>>>>> >>>>>>> I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again. >>>>>>> >>>>>>> BOOM. Cog crash. Didn't save the log unfortunately. >>>>>>> >>>>>>> Tried again. Update, switch to trunk, update again. No crash. What?! >>>>>>> >>>>>>> Once more. Update, switch to trunk, update. Crash! See below. >>>>>>> >>>>>>> Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place. >>>>>>> >>>>>>> So it does crash, just not always. But it's been more than 50% in my case. >>>>>> >>>>>> Ah, interesting. The CI jobs, naturally, don't update from squeak44; >>>>>> they switch to trunk and update just like that. Which I would have >>>>>> thought would make no difference... >>>>> >>>>> Actually, I lie. Here's an example of the CI jobs hitting the same >>>>> issue: http://build.squeak.org/job/SqueakTrunk/204/console And further >>>>> if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to >>>>> see the failing tests you'll see times (say around build #184) where >>>>> the test failure count is unusually low. And >>>>> http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey >>>>> streaks where builds die. >>>> >>>> Curious that it still runs the tests at all if the update failed ... >>>> >>>> So Cog crashes, but has someone tried to replicate this on an interpreter? >>>> >>>> - Bert - >>>> >>> >>> I think that the problem comes form COG which tries to use an obsolete >>> method sent AFTER the recompilation of Parser which is not the >>> expected behavior. >>> I have triggered such kind of strange behavior that does not happen on >>> an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog >>> VM Crash on Windows' >>> For me, it must be related to a cache that is not cleaned-up, I don't know why. >>> >>> Nicolas >> > > > > -- > best, > Eliot > |
In reply to this post by Eliot Miranda-2
On 11 March 2013 23:12, Eliot Miranda <[hidden email]> wrote:
> On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier > <[hidden email]> wrote: >> OK, see the VM thread, I now think that problems does not come from >> COG, but from ClassBuilder which in some cases fail to clean a cache >> (primitive 116). >> The problem does not show up in interpreter VM thanks to primitive 119 >> (this primitives does not unlink send in cogit). > > it does unlink sends, but only for that selector. But is it really > the case that it is a missing cache flush or is it a bug in Cog with > its cache flushing? I realised the way to test this is to try the > Stack VM and see if it crashes or not. I just tried that but now > neither Cog nor the Stack VM crash although both fail the load with an > MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. > So how do I get the system back to a state where I can reproduce the > Cog crash to compare the Stack and Cog VMs with each other? > > (Apologies for being unresponsive; I've just moved into a new > apartment and only got my internet connection yesterday afternoon; at > least its fast (for the states) :) ). I'm reasonably sure that this guy - http://build.squeak.org/job/SqueakTrunk/208/ - has images that are pre-latest-Environments code. That's #12519 at any rate. frank >> I have attempted a ClassBuilder fix and posted new updates from >> nice-222 to cwp-227. >> >> Can I please ask our testers contribution once again? >> >> Nicolas >> >> 2013/3/8 Nicolas Cellier <[hidden email]>: >>> 2013/3/8 Bert Freudenberg <[hidden email]>: >>>> >>>> On 2013-03-08, at 10:55, Frank Shearar <[hidden email]> wrote: >>>> >>>>> On 7 March 2013 23:25, Frank Shearar <[hidden email]> wrote: >>>>>> On 7 March 2013 23:11, Bert Freudenberg <[hidden email]> wrote: >>>>>>> On 2013-03-07, at 23:42, Frank Shearar <[hidden email]> wrote: >>>>>>> >>>>>>>>>> On 6 March 2013 15:59, Ken G. Brown <[hidden email]> wrote: >>>>>>>>>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>>>>>>>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>>>>>>>>> abandoning and trying the update again, it apparently completes to 12511. >>>>>>>>>>> >>>>>>>>>>> Ken G. Brown >>>>>>>>>>> >>>>>>>>>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>>>>>>>>> the same update-nice-223, then trying again from what had already been >>>>>>>>>>>> loaded, got the following during the same update, during compiling >>>>>>>>>>>> SMLoader-fbs-78 as before: >>>>>>>> >>>>>>>> What I find strange about all this is that we take a 4.4-12327 image >>>>>>>> and whatever the latest Cog is and update it all the way without any >>>>>>>> probems quite a few times a day on the CI server. >>>>>>>> >>>>>>>> frank >>>>>>> >>>>>>> Looks like it's an intermittent problem, unfortunately: >>>>>>> >>>>>>> I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697. >>>>>>> >>>>>>> I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again. >>>>>>> >>>>>>> BOOM. Cog crash. Didn't save the log unfortunately. >>>>>>> >>>>>>> Tried again. Update, switch to trunk, update again. No crash. What?! >>>>>>> >>>>>>> Once more. Update, switch to trunk, update. Crash! See below. >>>>>>> >>>>>>> Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place. >>>>>>> >>>>>>> So it does crash, just not always. But it's been more than 50% in my case. >>>>>> >>>>>> Ah, interesting. The CI jobs, naturally, don't update from squeak44; >>>>>> they switch to trunk and update just like that. Which I would have >>>>>> thought would make no difference... >>>>> >>>>> Actually, I lie. Here's an example of the CI jobs hitting the same >>>>> issue: http://build.squeak.org/job/SqueakTrunk/204/console And further >>>>> if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to >>>>> see the failing tests you'll see times (say around build #184) where >>>>> the test failure count is unusually low. And >>>>> http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey >>>>> streaks where builds die. >>>> >>>> Curious that it still runs the tests at all if the update failed ... >>>> >>>> So Cog crashes, but has someone tried to replicate this on an interpreter? >>>> >>>> - Bert - >>>> >>> >>> I think that the problem comes form COG which tries to use an obsolete >>> method sent AFTER the recompilation of Parser which is not the >>> expected behavior. >>> I have triggered such kind of strange behavior that does not happen on >>> an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog >>> VM Crash on Windows' >>> For me, it must be related to a cache that is not cleaned-up, I don't know why. >>> >>> Nicolas >> > > > > -- > best, > Eliot > |
2013/3/12 Frank Shearar <[hidden email]>:
> On 11 March 2013 23:12, Eliot Miranda <[hidden email]> wrote: >> On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier >> <[hidden email]> wrote: >>> OK, see the VM thread, I now think that problems does not come from >>> COG, but from ClassBuilder which in some cases fail to clean a cache >>> (primitive 116). >>> The problem does not show up in interpreter VM thanks to primitive 119 >>> (this primitives does not unlink send in cogit). >> >> it does unlink sends, but only for that selector. But is it really >> the case that it is a missing cache flush or is it a bug in Cog with >> its cache flushing? I realised the way to test this is to try the >> Stack VM and see if it crashes or not. I just tried that but now >> neither Cog nor the Stack VM crash although both fail the load with an >> MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. >> So how do I get the system back to a state where I can reproduce the >> Cog crash to compare the Stack and Cog VMs with each other? >> >> (Apologies for being unresponsive; I've just moved into a new >> apartment and only got my internet connection yesterday afternoon; at >> least its fast (for the states) :) ). > > I'm reasonably sure that this guy - > http://build.squeak.org/job/SqueakTrunk/208/ - has images that are > pre-latest-Environments code. That's #12519 at any rate. > > frank > You make me feel like I'm coming directly from stone age ;) Nicolas >>> I have attempted a ClassBuilder fix and posted new updates from >>> nice-222 to cwp-227. >>> >>> Can I please ask our testers contribution once again? >>> >>> Nicolas >>> >>> 2013/3/8 Nicolas Cellier <[hidden email]>: >>>> 2013/3/8 Bert Freudenberg <[hidden email]>: >>>>> >>>>> On 2013-03-08, at 10:55, Frank Shearar <[hidden email]> wrote: >>>>> >>>>>> On 7 March 2013 23:25, Frank Shearar <[hidden email]> wrote: >>>>>>> On 7 March 2013 23:11, Bert Freudenberg <[hidden email]> wrote: >>>>>>>> On 2013-03-07, at 23:42, Frank Shearar <[hidden email]> wrote: >>>>>>>> >>>>>>>>>>> On 6 March 2013 15:59, Ken G. Brown <[hidden email]> wrote: >>>>>>>>>>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>>>>>>>>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>>>>>>>>>> abandoning and trying the update again, it apparently completes to 12511. >>>>>>>>>>>> >>>>>>>>>>>> Ken G. Brown >>>>>>>>>>>> >>>>>>>>>>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>>>>>>>>>> the same update-nice-223, then trying again from what had already been >>>>>>>>>>>>> loaded, got the following during the same update, during compiling >>>>>>>>>>>>> SMLoader-fbs-78 as before: >>>>>>>>> >>>>>>>>> What I find strange about all this is that we take a 4.4-12327 image >>>>>>>>> and whatever the latest Cog is and update it all the way without any >>>>>>>>> probems quite a few times a day on the CI server. >>>>>>>>> >>>>>>>>> frank >>>>>>>> >>>>>>>> Looks like it's an intermittent problem, unfortunately: >>>>>>>> >>>>>>>> I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697. >>>>>>>> >>>>>>>> I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again. >>>>>>>> >>>>>>>> BOOM. Cog crash. Didn't save the log unfortunately. >>>>>>>> >>>>>>>> Tried again. Update, switch to trunk, update again. No crash. What?! >>>>>>>> >>>>>>>> Once more. Update, switch to trunk, update. Crash! See below. >>>>>>>> >>>>>>>> Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place. >>>>>>>> >>>>>>>> So it does crash, just not always. But it's been more than 50% in my case. >>>>>>> >>>>>>> Ah, interesting. The CI jobs, naturally, don't update from squeak44; >>>>>>> they switch to trunk and update just like that. Which I would have >>>>>>> thought would make no difference... >>>>>> >>>>>> Actually, I lie. Here's an example of the CI jobs hitting the same >>>>>> issue: http://build.squeak.org/job/SqueakTrunk/204/console And further >>>>>> if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to >>>>>> see the failing tests you'll see times (say around build #184) where >>>>>> the test failure count is unusually low. And >>>>>> http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey >>>>>> streaks where builds die. >>>>> >>>>> Curious that it still runs the tests at all if the update failed ... >>>>> >>>>> So Cog crashes, but has someone tried to replicate this on an interpreter? >>>>> >>>>> - Bert - >>>>> >>>> >>>> I think that the problem comes form COG which tries to use an obsolete >>>> method sent AFTER the recompilation of Parser which is not the >>>> expected behavior. >>>> I have triggered such kind of strange behavior that does not happen on >>>> an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog >>>> VM Crash on Windows' >>>> For me, it must be related to a cache that is not cleaned-up, I don't know why. >>>> >>>> Nicolas >>> >> >> >> >> -- >> best, >> Eliot >> > |
In reply to this post by Frank Shearar-3
On Mon, Mar 11, 2013 at 4:26 PM, Frank Shearar <[hidden email]> wrote:
> On 11 March 2013 23:12, Eliot Miranda <[hidden email]> wrote: >> On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier >> <[hidden email]> wrote: >>> OK, see the VM thread, I now think that problems does not come from >>> COG, but from ClassBuilder which in some cases fail to clean a cache >>> (primitive 116). >>> The problem does not show up in interpreter VM thanks to primitive 119 >>> (this primitives does not unlink send in cogit). >> >> it does unlink sends, but only for that selector. But is it really >> the case that it is a missing cache flush or is it a bug in Cog with >> its cache flushing? I realised the way to test this is to try the >> Stack VM and see if it crashes or not. I just tried that but now >> neither Cog nor the Stack VM crash although both fail the load with an >> MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. >> So how do I get the system back to a state where I can reproduce the >> Cog crash to compare the Stack and Cog VMs with each other? >> >> (Apologies for being unresponsive; I've just moved into a new >> apartment and only got my internet connection yesterday afternoon; at >> least its fast (for the states) :) ). > > I'm reasonably sure that this guy - > http://build.squeak.org/job/SqueakTrunk/208/ - has images that are > pre-latest-Environments code. That's #12519 at any rate. That's not the problem. I have an image that crashed last week. I need a way of not applying all the updates. Bert's version worked to exclude some updates but others are creeping in. Time is limited so I hoped for a quick fix :( > > frank > >>> I have attempted a ClassBuilder fix and posted new updates from >>> nice-222 to cwp-227. >>> >>> Can I please ask our testers contribution once again? >>> >>> Nicolas >>> >>> 2013/3/8 Nicolas Cellier <[hidden email]>: >>>> 2013/3/8 Bert Freudenberg <[hidden email]>: >>>>> >>>>> On 2013-03-08, at 10:55, Frank Shearar <[hidden email]> wrote: >>>>> >>>>>> On 7 March 2013 23:25, Frank Shearar <[hidden email]> wrote: >>>>>>> On 7 March 2013 23:11, Bert Freudenberg <[hidden email]> wrote: >>>>>>>> On 2013-03-07, at 23:42, Frank Shearar <[hidden email]> wrote: >>>>>>>> >>>>>>>>>>> On 6 March 2013 15:59, Ken G. Brown <[hidden email]> wrote: >>>>>>>>>>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>>>>>>>>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>>>>>>>>>> abandoning and trying the update again, it apparently completes to 12511. >>>>>>>>>>>> >>>>>>>>>>>> Ken G. Brown >>>>>>>>>>>> >>>>>>>>>>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>>>>>>>>>> the same update-nice-223, then trying again from what had already been >>>>>>>>>>>>> loaded, got the following during the same update, during compiling >>>>>>>>>>>>> SMLoader-fbs-78 as before: >>>>>>>>> >>>>>>>>> What I find strange about all this is that we take a 4.4-12327 image >>>>>>>>> and whatever the latest Cog is and update it all the way without any >>>>>>>>> probems quite a few times a day on the CI server. >>>>>>>>> >>>>>>>>> frank >>>>>>>> >>>>>>>> Looks like it's an intermittent problem, unfortunately: >>>>>>>> >>>>>>>> I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697. >>>>>>>> >>>>>>>> I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again. >>>>>>>> >>>>>>>> BOOM. Cog crash. Didn't save the log unfortunately. >>>>>>>> >>>>>>>> Tried again. Update, switch to trunk, update again. No crash. What?! >>>>>>>> >>>>>>>> Once more. Update, switch to trunk, update. Crash! See below. >>>>>>>> >>>>>>>> Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place. >>>>>>>> >>>>>>>> So it does crash, just not always. But it's been more than 50% in my case. >>>>>>> >>>>>>> Ah, interesting. The CI jobs, naturally, don't update from squeak44; >>>>>>> they switch to trunk and update just like that. Which I would have >>>>>>> thought would make no difference... >>>>>> >>>>>> Actually, I lie. Here's an example of the CI jobs hitting the same >>>>>> issue: http://build.squeak.org/job/SqueakTrunk/204/console And further >>>>>> if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to >>>>>> see the failing tests you'll see times (say around build #184) where >>>>>> the test failure count is unusually low. And >>>>>> http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey >>>>>> streaks where builds die. >>>>> >>>>> Curious that it still runs the tests at all if the update failed ... >>>>> >>>>> So Cog crashes, but has someone tried to replicate this on an interpreter? >>>>> >>>>> - Bert - >>>>> >>>> >>>> I think that the problem comes form COG which tries to use an obsolete >>>> method sent AFTER the recompilation of Parser which is not the >>>> expected behavior. >>>> I have triggered such kind of strange behavior that does not happen on >>>> an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog >>>> VM Crash on Windows' >>>> For me, it must be related to a cache that is not cleaned-up, I don't know why. >>>> >>>> Nicolas >>> >> >> >> >> -- >> best, >> Eliot >> > -- best, Eliot |
On 11 March 2013 23:30, Eliot Miranda <[hidden email]> wrote:
> On Mon, Mar 11, 2013 at 4:26 PM, Frank Shearar <[hidden email]> wrote: >> On 11 March 2013 23:12, Eliot Miranda <[hidden email]> wrote: >>> On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier >>> <[hidden email]> wrote: >>>> OK, see the VM thread, I now think that problems does not come from >>>> COG, but from ClassBuilder which in some cases fail to clean a cache >>>> (primitive 116). >>>> The problem does not show up in interpreter VM thanks to primitive 119 >>>> (this primitives does not unlink send in cogit). >>> >>> it does unlink sends, but only for that selector. But is it really >>> the case that it is a missing cache flush or is it a bug in Cog with >>> its cache flushing? I realised the way to test this is to try the >>> Stack VM and see if it crashes or not. I just tried that but now >>> neither Cog nor the Stack VM crash although both fail the load with an >>> MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. >>> So how do I get the system back to a state where I can reproduce the >>> Cog crash to compare the Stack and Cog VMs with each other? >>> >>> (Apologies for being unresponsive; I've just moved into a new >>> apartment and only got my internet connection yesterday afternoon; at >>> least its fast (for the states) :) ). >> >> I'm reasonably sure that this guy - >> http://build.squeak.org/job/SqueakTrunk/208/ - has images that are >> pre-latest-Environments code. That's #12519 at any rate. > > That's not the problem. I have an image that crashed last week. I > need a way of not applying all the updates. Bert's version worked to > exclude some updates but others are creeping in. Time is limited so I > hoped for a quick fix :( Sorry, Eliot. Didn't mean to add to the noise. It's just that we found a second breakage in the updates (or a second serious bug that recently appeared in an update, rather), so I assumed your mail was related to this _new_ thing and not Nicolas' Parser thing. frank >> frank >> >>>> I have attempted a ClassBuilder fix and posted new updates from >>>> nice-222 to cwp-227. >>>> >>>> Can I please ask our testers contribution once again? >>>> >>>> Nicolas >>>> >>>> 2013/3/8 Nicolas Cellier <[hidden email]>: >>>>> 2013/3/8 Bert Freudenberg <[hidden email]>: >>>>>> >>>>>> On 2013-03-08, at 10:55, Frank Shearar <[hidden email]> wrote: >>>>>> >>>>>>> On 7 March 2013 23:25, Frank Shearar <[hidden email]> wrote: >>>>>>>> On 7 March 2013 23:11, Bert Freudenberg <[hidden email]> wrote: >>>>>>>>> On 2013-03-07, at 23:42, Frank Shearar <[hidden email]> wrote: >>>>>>>>> >>>>>>>>>>>> On 6 March 2013 15:59, Ken G. Brown <[hidden email]> wrote: >>>>>>>>>>>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>>>>>>>>>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>>>>>>>>>>> abandoning and trying the update again, it apparently completes to 12511. >>>>>>>>>>>>> >>>>>>>>>>>>> Ken G. Brown >>>>>>>>>>>>> >>>>>>>>>>>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>>>>>>>>>>> the same update-nice-223, then trying again from what had already been >>>>>>>>>>>>>> loaded, got the following during the same update, during compiling >>>>>>>>>>>>>> SMLoader-fbs-78 as before: >>>>>>>>>> >>>>>>>>>> What I find strange about all this is that we take a 4.4-12327 image >>>>>>>>>> and whatever the latest Cog is and update it all the way without any >>>>>>>>>> probems quite a few times a day on the CI server. >>>>>>>>>> >>>>>>>>>> frank >>>>>>>>> >>>>>>>>> Looks like it's an intermittent problem, unfortunately: >>>>>>>>> >>>>>>>>> I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697. >>>>>>>>> >>>>>>>>> I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again. >>>>>>>>> >>>>>>>>> BOOM. Cog crash. Didn't save the log unfortunately. >>>>>>>>> >>>>>>>>> Tried again. Update, switch to trunk, update again. No crash. What?! >>>>>>>>> >>>>>>>>> Once more. Update, switch to trunk, update. Crash! See below. >>>>>>>>> >>>>>>>>> Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place. >>>>>>>>> >>>>>>>>> So it does crash, just not always. But it's been more than 50% in my case. >>>>>>>> >>>>>>>> Ah, interesting. The CI jobs, naturally, don't update from squeak44; >>>>>>>> they switch to trunk and update just like that. Which I would have >>>>>>>> thought would make no difference... >>>>>>> >>>>>>> Actually, I lie. Here's an example of the CI jobs hitting the same >>>>>>> issue: http://build.squeak.org/job/SqueakTrunk/204/console And further >>>>>>> if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to >>>>>>> see the failing tests you'll see times (say around build #184) where >>>>>>> the test failure count is unusually low. And >>>>>>> http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey >>>>>>> streaks where builds die. >>>>>> >>>>>> Curious that it still runs the tests at all if the update failed ... >>>>>> >>>>>> So Cog crashes, but has someone tried to replicate this on an interpreter? >>>>>> >>>>>> - Bert - >>>>>> >>>>> >>>>> I think that the problem comes form COG which tries to use an obsolete >>>>> method sent AFTER the recompilation of Parser which is not the >>>>> expected behavior. >>>>> I have triggered such kind of strange behavior that does not happen on >>>>> an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog >>>>> VM Crash on Windows' >>>>> For me, it must be related to a cache that is not cleaned-up, I don't know why. >>>>> >>>>> Nicolas >>>> >>> >>> >>> >>> -- >>> best, >>> Eliot >>> >> > > > > -- > best, > Eliot > |
In reply to this post by Eliot Miranda-2
2013/3/12 Eliot Miranda <[hidden email]>:
> On Mon, Mar 11, 2013 at 4:26 PM, Frank Shearar <[hidden email]> wrote: >> On 11 March 2013 23:12, Eliot Miranda <[hidden email]> wrote: >>> On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier >>> <[hidden email]> wrote: >>>> OK, see the VM thread, I now think that problems does not come from >>>> COG, but from ClassBuilder which in some cases fail to clean a cache >>>> (primitive 116). >>>> The problem does not show up in interpreter VM thanks to primitive 119 >>>> (this primitives does not unlink send in cogit). >>> >>> it does unlink sends, but only for that selector. But is it really >>> the case that it is a missing cache flush or is it a bug in Cog with >>> its cache flushing? I realised the way to test this is to try the >>> Stack VM and see if it crashes or not. I just tried that but now >>> neither Cog nor the Stack VM crash although both fail the load with an >>> MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. >>> So how do I get the system back to a state where I can reproduce the >>> Cog crash to compare the Stack and Cog VMs with each other? >>> >>> (Apologies for being unresponsive; I've just moved into a new >>> apartment and only got my internet connection yesterday afternoon; at >>> least its fast (for the states) :) ). >> >> I'm reasonably sure that this guy - >> http://build.squeak.org/job/SqueakTrunk/208/ - has images that are >> pre-latest-Environments code. That's #12519 at any rate. > > That's not the problem. I have an image that crashed last week. I > need a way of not applying all the updates. Bert's version worked to > exclude some updates but others are creeping in. Time is limited so I > hoped for a quick fix :( We could craft a special mixture of package in an update map and put it in inbox for example. But which packages exactly ? What do you want to test ? Nicolas >> >> frank >> >>>> I have attempted a ClassBuilder fix and posted new updates from >>>> nice-222 to cwp-227. >>>> >>>> Can I please ask our testers contribution once again? >>>> >>>> Nicolas >>>> >>>> 2013/3/8 Nicolas Cellier <[hidden email]>: >>>>> 2013/3/8 Bert Freudenberg <[hidden email]>: >>>>>> >>>>>> On 2013-03-08, at 10:55, Frank Shearar <[hidden email]> wrote: >>>>>> >>>>>>> On 7 March 2013 23:25, Frank Shearar <[hidden email]> wrote: >>>>>>>> On 7 March 2013 23:11, Bert Freudenberg <[hidden email]> wrote: >>>>>>>>> On 2013-03-07, at 23:42, Frank Shearar <[hidden email]> wrote: >>>>>>>>> >>>>>>>>>>>> On 6 March 2013 15:59, Ken G. Brown <[hidden email]> wrote: >>>>>>>>>>>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>>>>>>>>>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>>>>>>>>>>> abandoning and trying the update again, it apparently completes to 12511. >>>>>>>>>>>>> >>>>>>>>>>>>> Ken G. Brown >>>>>>>>>>>>> >>>>>>>>>>>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>>>>>>>>>>> the same update-nice-223, then trying again from what had already been >>>>>>>>>>>>>> loaded, got the following during the same update, during compiling >>>>>>>>>>>>>> SMLoader-fbs-78 as before: >>>>>>>>>> >>>>>>>>>> What I find strange about all this is that we take a 4.4-12327 image >>>>>>>>>> and whatever the latest Cog is and update it all the way without any >>>>>>>>>> probems quite a few times a day on the CI server. >>>>>>>>>> >>>>>>>>>> frank >>>>>>>>> >>>>>>>>> Looks like it's an intermittent problem, unfortunately: >>>>>>>>> >>>>>>>>> I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697. >>>>>>>>> >>>>>>>>> I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again. >>>>>>>>> >>>>>>>>> BOOM. Cog crash. Didn't save the log unfortunately. >>>>>>>>> >>>>>>>>> Tried again. Update, switch to trunk, update again. No crash. What?! >>>>>>>>> >>>>>>>>> Once more. Update, switch to trunk, update. Crash! See below. >>>>>>>>> >>>>>>>>> Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place. >>>>>>>>> >>>>>>>>> So it does crash, just not always. But it's been more than 50% in my case. >>>>>>>> >>>>>>>> Ah, interesting. The CI jobs, naturally, don't update from squeak44; >>>>>>>> they switch to trunk and update just like that. Which I would have >>>>>>>> thought would make no difference... >>>>>>> >>>>>>> Actually, I lie. Here's an example of the CI jobs hitting the same >>>>>>> issue: http://build.squeak.org/job/SqueakTrunk/204/console And further >>>>>>> if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to >>>>>>> see the failing tests you'll see times (say around build #184) where >>>>>>> the test failure count is unusually low. And >>>>>>> http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey >>>>>>> streaks where builds die. >>>>>> >>>>>> Curious that it still runs the tests at all if the update failed ... >>>>>> >>>>>> So Cog crashes, but has someone tried to replicate this on an interpreter? >>>>>> >>>>>> - Bert - >>>>>> >>>>> >>>>> I think that the problem comes form COG which tries to use an obsolete >>>>> method sent AFTER the recompilation of Parser which is not the >>>>> expected behavior. >>>>> I have triggered such kind of strange behavior that does not happen on >>>>> an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog >>>>> VM Crash on Windows' >>>>> For me, it must be related to a cache that is not cleaned-up, I don't know why. >>>>> >>>>> Nicolas >>>> >>> >>> >>> >>> -- >>> best, >>> Eliot >>> >> > > > > -- > best, > Eliot > |
On Mon, Mar 11, 2013 at 4:40 PM, Nicolas Cellier <[hidden email]> wrote: > 2013/3/12 Eliot Miranda <[hidden email]>: >> On Mon, Mar 11, 2013 at 4:26 PM, Frank Shearar <[hidden email]> wrote: >>> On 11 March 2013 23:12, Eliot Miranda <[hidden email]> wrote: >>>> On Sun, Mar 10, 2013 at 12:10 PM, Nicolas Cellier >>>> <[hidden email]> wrote: >>>>> OK, see the VM thread, I now think that problems does not come from >>>>> COG, but from ClassBuilder which in some cases fail to clean a cache >>>>> (primitive 116). >>>>> The problem does not show up in interpreter VM thanks to primitive 119 >>>>> (this primitives does not unlink send in cogit). >>>> >>>> it does unlink sends, but only for that selector. But is it really >>>> the case that it is a missing cache flush or is it a bug in Cog with >>>> its cache flushing? I realised the way to test this is to try the >>>> Stack VM and see if it crashes or not. I just tried that but now >>>> neither Cog nor the Stack VM crash although both fail the load with an >>>> MNU of #do: to UndefinedObject in Environment>>bindingOf:ifAbsent:. >>>> So how do I get the system back to a state where I can reproduce the >>>> Cog crash to compare the Stack and Cog VMs with each other? >>>> >>>> (Apologies for being unresponsive; I've just moved into a new >>>> apartment and only got my internet connection yesterday afternoon; at >>>> least its fast (for the states) :) ). >>> >>> I'm reasonably sure that this guy - >>> http://build.squeak.org/job/SqueakTrunk/208/ - has images that are >>> pre-latest-Environments code. That's #12519 at any rate. >> >> That's not the problem. I have an image that crashed last week. I >> need a way of not applying all the updates. Bert's version worked to >> exclude some updates but others are creeping in. Time is limited so I >> hoped for a quick fix :( > > We could craft a special mixture of package in an update map and put > it in inbox for example. > But which packages exactly ? > What do you want to test ? The state that causes Cog to hard crash by reading off the end of the Parser instance because the old version of Parser>>parse:cue:noPattern:ifFail: on the stack uses the pre-reshape inst var offsets. See Bert's message: Filtering out updates > 222 used to ensure the crash. On Tue, Feb 26, 2013 at 11:44 AM, Bert Freudenberg <[hidden email]> wrote: > > Nicolas > >>> >>> frank >>> >>>>> I have attempted a ClassBuilder fix and posted new updates from >>>>> nice-222 to cwp-227. >>>>> >>>>> Can I please ask our testers contribution once again? >>>>> >>>>> Nicolas >>>>> >>>>> 2013/3/8 Nicolas Cellier <[hidden email]>: >>>>>> 2013/3/8 Bert Freudenberg <[hidden email]>: >>>>>>> >>>>>>> On 2013-03-08, at 10:55, Frank Shearar <[hidden email]> wrote: >>>>>>> >>>>>>>> On 7 March 2013 23:25, Frank Shearar <[hidden email]> wrote: >>>>>>>>> On 7 March 2013 23:11, Bert Freudenberg <[hidden email]> wrote: >>>>>>>>>> On 2013-03-07, at 23:42, Frank Shearar <[hidden email]> wrote: >>>>>>>>>> >>>>>>>>>>>>> On 6 March 2013 15:59, Ken G. Brown <[hidden email]> wrote: >>>>>>>>>>>>>> Running on COG 2397, and after updating fresh Squeak4.4-12327 Release to >>>>>>>>>>>>>> 12332, updating to Trunk fails at first attempt in the same place, then by >>>>>>>>>>>>>> abandoning and trying the update again, it apparently completes to 12511. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ken G. Brown >>>>>>>>>>>>>> >>>>>>>>>>>>>>> With COG 2678, pretty well the same. First attempt it timed out during >>>>>>>>>>>>>>> the same update-nice-223, then trying again from what had already been >>>>>>>>>>>>>>> loaded, got the following during the same update, during compiling >>>>>>>>>>>>>>> SMLoader-fbs-78 as before: >>>>>>>>>>> >>>>>>>>>>> What I find strange about all this is that we take a 4.4-12327 image >>>>>>>>>>> and whatever the latest Cog is and update it all the way without any >>>>>>>>>>> probems quite a few times a day on the CI server. >>>>>>>>>>> >>>>>>>>>>> frank >>>>>>>>>> >>>>>>>>>> Looks like it's an intermittent problem, unfortunately: >>>>>>>>>> >>>>>>>>>> I just updated the new all-in-one-cog to latest trunk, no problem. This is a 4.4-12327 image with Cog VM 2697. >>>>>>>>>> >>>>>>>>>> I then tried what Ken described: update the fresh image first from the squeak44 stream, then switch to trunk, then update again. >>>>>>>>>> >>>>>>>>>> BOOM. Cog crash. Didn't save the log unfortunately. >>>>>>>>>> >>>>>>>>>> Tried again. Update, switch to trunk, update again. No crash. What?! >>>>>>>>>> >>>>>>>>>> Once more. Update, switch to trunk, update. Crash! See below. >>>>>>>>>> >>>>>>>>>> Tried yet again, with switching to trunk immediately in a fresh image. Crashes, too, same place. >>>>>>>>>> >>>>>>>>>> So it does crash, just not always. But it's been more than 50% in my case. >>>>>>>>> >>>>>>>>> Ah, interesting. The CI jobs, naturally, don't update from squeak44; >>>>>>>>> they switch to trunk and update just like that. Which I would have >>>>>>>>> thought would make no difference... >>>>>>>> >>>>>>>> Actually, I lie. Here's an example of the CI jobs hitting the same >>>>>>>> issue: http://build.squeak.org/job/SqueakTrunk/204/console And further >>>>>>>> if you look at http://build.squeak.org/job/SqueakTrunk/ and choose to >>>>>>>> see the failing tests you'll see times (say around build #184) where >>>>>>>> the test failure count is unusually low. And >>>>>>>> http://build.squeak.org/job/SqueakTrunk/buildTimeTrend shows grey >>>>>>>> streaks where builds die. >>>>>>> >>>>>>> Curious that it still runs the tests at all if the update failed ... >>>>>>> >>>>>>> So Cog crashes, but has someone tried to replicate this on an interpreter? >>>>>>> >>>>>>> - Bert - >>>>>>> >>>>>> >>>>>> I think that the problem comes form COG which tries to use an obsolete >>>>>> method sent AFTER the recompilation of Parser which is not the >>>>>> expected behavior. >>>>>> I have triggered such kind of strange behavior that does not happen on >>>>>> an Interpreter VM, see the thread opened by Jeff Gonis '[Vm-dev] Cog >>>>>> VM Crash on Windows' >>>>>> For me, it must be related to a cache that is not cleaned-up, I don't know why. >>>>>> >>>>>> Nicolas >>>>> >>>> >>>> >>>> >>>> -- >>>> best, >>>> Eliot >>>> >>> >> >> >> >> -- >> best, >> Eliot >> -- best, Eliot |
Free forum by Nabble | Edit this page |