Re: Release image format (was: "Future Directions" welcome workspace)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

Frank Shearar-3
On 9 December 2012 00:00, David T. Lewis <[hidden email]> wrote:

> On Sat, Dec 08, 2012 at 10:48:16PM +0000, Frank Shearar wrote:
>> On 8 December 2012 21:56, David T. Lewis <[hidden email]> wrote:
>> > On Sat, Dec 08, 2012 at 09:50:21PM +0000, Frank Shearar wrote:
>> >> On 8 December 2012 21:24, David T. Lewis <[hidden email]> wrote:
>> >> >
>> >> > We may have tripped an intermittent bug, and it might or might not be
>> >> > aggrivated by the image having been saved in the 6504 format. The actual
>> >> > failure was in a LargeIntegersPlugin primitive, so I'm not quite sure
>> >> > what to make of it.
>> >>
>> >> Indeed. #55 passed, and so has #56.
>> >>
>> >
>> > I see three successful runs now. It wouldn't hurt to run it a couple more
>> > times and see if it stays healthy. But I need to leave for an hour or two,
>> > so I won't do anything further right now.
>>
>> #57 - 60 have all passed.
>>
>
> Good. Whatever it was, it's not an issue to get in the way of the release.

It just manifested again, for the first time in ages:
http://build.squeak.org/job/SqueakTrunk/383/console

frank


> Dave
>

Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

Frank Shearar-3
On 16 June 2013 10:13, Frank Shearar <[hidden email]> wrote:

> On 9 December 2012 00:00, David T. Lewis <[hidden email]> wrote:
>> On Sat, Dec 08, 2012 at 10:48:16PM +0000, Frank Shearar wrote:
>>> On 8 December 2012 21:56, David T. Lewis <[hidden email]> wrote:
>>> > On Sat, Dec 08, 2012 at 09:50:21PM +0000, Frank Shearar wrote:
>>> >> On 8 December 2012 21:24, David T. Lewis <[hidden email]> wrote:
>>> >> >
>>> >> > We may have tripped an intermittent bug, and it might or might not be
>>> >> > aggrivated by the image having been saved in the 6504 format. The actual
>>> >> > failure was in a LargeIntegersPlugin primitive, so I'm not quite sure
>>> >> > what to make of it.
>>> >>
>>> >> Indeed. #55 passed, and so has #56.
>>> >>
>>> >
>>> > I see three successful runs now. It wouldn't hurt to run it a couple more
>>> > times and see if it stays healthy. But I need to leave for an hour or two,
>>> > so I won't do anything further right now.
>>>
>>> #57 - 60 have all passed.
>>>
>>
>> Good. Whatever it was, it's not an issue to get in the way of the release.
>
> It just manifested again, for the first time in ages:
> http://build.squeak.org/job/SqueakTrunk/383/console
And here's the crash dump.

> frank
>
>
>> Dave
>>



crash.dmp (34K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

Nicolas Cellier
Interesting, but the primitive is digitDiv:neg: , not \\\
Why wouldn't the primitive be reported on the Smalltalk stack?

By the way, \\\ is not really usefull anymore.
The main difference is that \\\ avoids a normalization of the result. Please note that you should not use it with negative receiver or argument.
Also note that \\ should be faster for Small Integers and for LargeInteger up to 64 bits (we now have a primitive for that)
For LargeIntegers > 64 bits, \\ will be a bit longer because we pass thru a primitiveFail before  calling same primitive than \\\, and then normalize quotient and remainder.

Nicolas


2013/6/16 Frank Shearar <[hidden email]>
On 16 June 2013 10:13, Frank Shearar <[hidden email]> wrote:
> On 9 December 2012 00:00, David T. Lewis <[hidden email]> wrote:
>> On Sat, Dec 08, 2012 at 10:48:16PM +0000, Frank Shearar wrote:
>>> On 8 December 2012 21:56, David T. Lewis <[hidden email]> wrote:
>>> > On Sat, Dec 08, 2012 at 09:50:21PM +0000, Frank Shearar wrote:
>>> >> On 8 December 2012 21:24, David T. Lewis <[hidden email]> wrote:
>>> >> >
>>> >> > We may have tripped an intermittent bug, and it might or might not be
>>> >> > aggrivated by the image having been saved in the 6504 format. The actual
>>> >> > failure was in a LargeIntegersPlugin primitive, so I'm not quite sure
>>> >> > what to make of it.
>>> >>
>>> >> Indeed. #55 passed, and so has #56.
>>> >>
>>> >
>>> > I see three successful runs now. It wouldn't hurt to run it a couple more
>>> > times and see if it stays healthy. But I need to leave for an hour or two,
>>> > so I won't do anything further right now.
>>>
>>> #57 - 60 have all passed.
>>>
>>
>> Good. Whatever it was, it's not an issue to get in the way of the release.
>
> It just manifested again, for the first time in ages:
> http://build.squeak.org/job/SqueakTrunk/383/console

And here's the crash dump.

> frank
>
>
>> Dave
>>






Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

Levente Uzonyi-2
On Sun, 16 Jun 2013, Nicolas Cellier wrote:

> Interesting, but the primitive is digitDiv:neg: , not \\\
> Why wouldn't the primitive be reported on the Smalltalk stack?

I think that's the normal behavior. Why should a primitive call "pollute"
the Smalltalk stack if it doesn't fail? The recently used primitives are
listed separately.

Most recent primitives
digitCompare:
digitDiv:neg:
>=
digitCompare:
digitDiv:neg:
>=
...

>
> By the way, \\\ is not really usefull anymore.
> The main difference is that \\\ avoids a normalization of the result. Please note that you should not use it with negative receiver or argument.
> Also note that \\ should be faster for Small Integers and for LargeInteger up to 64 bits (we now have a primitive for that)
> For LargeIntegers > 64 bits, \\ will be a bit longer because we pass thru a primitiveFail before  calling same primitive than \\\, and then normalize quotient and
> remainder.

All senders of #\\\ in my image have your initials. It's a bit suspicious
that Integer >> #reciprocalModulo: doesn't normalize v after the loop, but
it's possible that #- will do it anyway. In the other user - Integer >>
#slidingLeftRightRaisedTo:modulo: - the result is normalized via
#normalize.


Levente

>
> Nicolas
>
>
> 2013/6/16 Frank Shearar <[hidden email]>
>       On 16 June 2013 10:13, Frank Shearar <[hidden email]> wrote:
>       > On 9 December 2012 00:00, David T. Lewis <[hidden email]> wrote:
>       >> On Sat, Dec 08, 2012 at 10:48:16PM +0000, Frank Shearar wrote:
>       >>> On 8 December 2012 21:56, David T. Lewis <[hidden email]> wrote:
>       >>> > On Sat, Dec 08, 2012 at 09:50:21PM +0000, Frank Shearar wrote:
>       >>> >> On 8 December 2012 21:24, David T. Lewis <[hidden email]> wrote:
>       >>> >> >
>       >>> >> > We may have tripped an intermittent bug, and it might or might not be
>       >>> >> > aggrivated by the image having been saved in the 6504 format. The actual
>       >>> >> > failure was in a LargeIntegersPlugin primitive, so I'm not quite sure
>       >>> >> > what to make of it.
>       >>> >>
>       >>> >> Indeed. #55 passed, and so has #56.
>       >>> >>
>       >>> >
>       >>> > I see three successful runs now. It wouldn't hurt to run it a couple more
>       >>> > times and see if it stays healthy. But I need to leave for an hour or two,
>       >>> > so I won't do anything further right now.
>       >>>
>       >>> #57 - 60 have all passed.
>       >>>
>       >>
>       >> Good. Whatever it was, it's not an issue to get in the way of the release.
>       >
>       > It just manifested again, for the first time in ages:
>       > http://build.squeak.org/job/SqueakTrunk/383/console
>
> And here's the crash dump.
>
> > frank
> >
> >
> >> Dave
> >>
>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

timrowledge

On 16-06-2013, at 8:08 AM, Levente Uzonyi <[hidden email]> wrote:

> On Sun, 16 Jun 2013, Nicolas Cellier wrote:
>
>> Interesting, but the primitive is digitDiv:neg: , not \\\
>> Why wouldn't the primitive be reported on the Smalltalk stack?

It won't be on the stack since you don't (in general - maybe someone has made some exceptions) go through primitives to some other method; you go into a prim and back out and send another message etc


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Last one out, turn off the computer!



Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

Colin Putney-3



On Sun, Jun 16, 2013 at 10:29 AM, tim Rowledge <[hidden email]> wrote:
 
It won't be on the stack since you don't (in general - maybe someone has made some exceptions) go through primitives to some other method; you go into a prim and back out and send another message etc

And the exceptions tend to be… exceptions. Failed primitives, or registration of exception handlers via #on:do:. 

Colin 


Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

David T. Lewis
In reply to this post by Frank Shearar-3
On Sun, Jun 16, 2013 at 10:45:23AM +0100, Frank Shearar wrote:

> On 16 June 2013 10:13, Frank Shearar <[hidden email]> wrote:
> > On 9 December 2012 00:00, David T. Lewis <[hidden email]> wrote:
> >> On Sat, Dec 08, 2012 at 10:48:16PM +0000, Frank Shearar wrote:
> >>> On 8 December 2012 21:56, David T. Lewis <[hidden email]> wrote:
> >>> > On Sat, Dec 08, 2012 at 09:50:21PM +0000, Frank Shearar wrote:
> >>> >> On 8 December 2012 21:24, David T. Lewis <[hidden email]> wrote:
> >>> >> >
> >>> >> > We may have tripped an intermittent bug, and it might or might not be
> >>> >> > aggrivated by the image having been saved in the 6504 format. The actual
> >>> >> > failure was in a LargeIntegersPlugin primitive, so I'm not quite sure
> >>> >> > what to make of it.
> >>> >>
> >>> >> Indeed. #55 passed, and so has #56.
> >>> >>
> >>> >
> >>> > I see three successful runs now. It wouldn't hurt to run it a couple more
> >>> > times and see if it stays healthy. But I need to leave for an hour or two,
> >>> > so I won't do anything further right now.
> >>>
> >>> #57 - 60 have all passed.
> >>>
> >>
> >> Good. Whatever it was, it's not an issue to get in the way of the release.
> >
> > It just manifested again, for the first time in ages:
> > http://build.squeak.org/job/SqueakTrunk/383/console
>
> And here's the crash dump.
>

It looks like this is failing intermittently in the SqueakTrunk jobs while
running LargePositiveIntegerTest>testReciprocalModulo. I don't see any particular
pattern behind it, the crashes just seem to occur occasionally while running
the SqueakTrunk job.

I downloaded a copy of this Squeak4.5 image from build.squeak.org:
  http://build.squeak.org/job/SqueakTrunk/ws/Squeak4.4.image
  http://build.squeak.org/job/SqueakTrunk/ws/Squeak4.4.changes

I am now running the following in a workspace under both Cog and Interpreter
VMs. This runs testReciprocalModulo continuously and allocates memory to move
things around:

    log := [:s | FileStream stdout nextPutAll: s; lf].
    vm := ['Interpreter VM ', Smalltalk vm versionLabel]
        on: Warning
        do: [:e | Smalltalk vm interpreterClass].
    wasteSomeSpace := OrderedCollection new.
    1 to: 100000 do: [:i |
        (wasteSomeSpace size > 3000) ifTrue: [
            wasteSomeSpace := OrderedCollection new].
        wasteSomeSpace add: (String new: 100000).
        log value: vm.
        log value: SmalltalkImage current vmStatisticsReportString.
        log value: 'testReciprocalModulo test run ', i asString.
        (LargePositiveIntegerTest selector: #testReciprocalModulo) run].

I have run over 50,000 iterations of the loop in the last few hours, running on
both a Cog VM and on an interpreter VM. So far no crashes with either VM.

I'm going to try the same thing with the SqueakTrunk image from the same workspace:
  http://build.squeak.org/job/SqueakTrunk/lastSuccessfulBuild/artifact/target/TrunkImage.image
  http://build.squeak.org/job/SqueakTrunk/lastSuccessfulBuild/artifact/target/TrunkImage.changes

I'll run the same tests on Cog and interpreter VMs, and see if anything happens
with that image.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

Nicolas Cellier
In reply to this post by Levente Uzonyi-2
You said "if it doesn't fail", that's precisely the point, we got a crash.dmp so something presumably failed.
I understand that we should distinguish if failure occured inside primitive, or in fallback code.
But even if primitive was called recently, what tells me we were inside it when the VM crashed?
If you are right, then the dump is not informative enough.
And you seem right from what I read from CoInterpreter>>#executeNewMethod but it's a bit involved

Concerning \\\, there used to be a noticeable advantage versus \\ for large integers.
The method comment says
    "a modulo method for use in DSA. Be careful if you try to use this elsewhere"
Since reciprocalModulo; and raisedTo:modulo: are typically used in cryptography with rather large ints, it's possible that I tried to not spoil efficiency.
Now that the advantage has been greatly reduced with introduction of new primitive and more efficient fallback code, I think that it's time to update.


2013/6/16 Levente Uzonyi <[hidden email]>
On Sun, 16 Jun 2013, Nicolas Cellier wrote:

Interesting, but the primitive is digitDiv:neg: , not \\\
Why wouldn't the primitive be reported on the Smalltalk stack?

I think that's the normal behavior. Why should a primitive call "pollute" the Smalltalk stack if it doesn't fail? The recently used primitives are listed separately.

Most recent primitives
digitCompare:
digitDiv:neg:
=
digitCompare:
digitDiv:neg:
=
...



By the way, \\\ is not really usefull anymore.
The main difference is that \\\ avoids a normalization of the result. Please note that you should not use it with negative receiver or argument.
Also note that \\ should be faster for Small Integers and for LargeInteger up to 64 bits (we now have a primitive for that)
For LargeIntegers > 64 bits, \\ will be a bit longer because we pass thru a primitiveFail before  calling same primitive than \\\, and then normalize quotient and
remainder.

All senders of #\\\ in my image have your initials. It's a bit suspicious that Integer >> #reciprocalModulo: doesn't normalize v after the loop, but it's possible that #- will do it anyway. In the other user - Integer >> #slidingLeftRightRaisedTo:modulo: - the result is normalized via #normalize.


Levente



Nicolas


2013/6/16 Frank Shearar <[hidden email]>
      On 16 June 2013 10:13, Frank Shearar <[hidden email]> wrote:
      > On 9 December 2012 00:00, David T. Lewis <[hidden email]> wrote:
      >> On Sat, Dec 08, 2012 at 10:48:16PM +0000, Frank Shearar wrote:
      >>> On 8 December 2012 21:56, David T. Lewis <[hidden email]> wrote:
      >>> > On Sat, Dec 08, 2012 at 09:50:21PM +0000, Frank Shearar wrote:
      >>> >> On 8 December 2012 21:24, David T. Lewis <[hidden email]> wrote:
      >>> >> >
      >>> >> > We may have tripped an intermittent bug, and it might or might not be
      >>> >> > aggrivated by the image having been saved in the 6504 format. The actual
      >>> >> > failure was in a LargeIntegersPlugin primitive, so I'm not quite sure
      >>> >> > what to make of it.
      >>> >>
      >>> >> Indeed. #55 passed, and so has #56.
      >>> >>
      >>> >
      >>> > I see three successful runs now. It wouldn't hurt to run it a couple more
      >>> > times and see if it stays healthy. But I need to leave for an hour or two,
      >>> > so I won't do anything further right now.
      >>>
      >>> #57 - 60 have all passed.
      >>>
      >>
      >> Good. Whatever it was, it's not an issue to get in the way of the release.
      >
      > It just manifested again, for the first time in ages:
      > http://build.squeak.org/job/SqueakTrunk/383/console

And here's the crash dump.

> frank
>
>
>> Dave
>>











Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

David T. Lewis
On Sun, Jun 16, 2013 at 10:30:22PM +0200, Nicolas Cellier wrote:

>
> 2013/6/16 Levente Uzonyi <[hidden email]>
> >
> > I think that's the normal behavior. Why should a primitive call "pollute"
> > the Smalltalk stack if it doesn't fail? The recently used primitives are
> > listed separately.
> >
> You said "if it doesn't fail", that's precisely the point, we got a
> crash.dmp so something presumably failed.
> I understand that we should distinguish if failure occured inside
> primitive, or in fallback code.
> But even if primitive was called recently, what tells me we were inside it
> when the VM crashed?

The first thing you see in the dump is this:

  Segmentation fault Wed May  8 14:03:37 2013

That is a C pointer dereferencing problem that almost certainly happened
within a primitive. The only other possibility would be if it happened during
execution of the interpreter, but we know that this error is associated with
large integer functions so it presumably is happening while executing a primitive.

> If you are right, then the dump is not informative enough.

The dump cannot really detect why a C pointer was invalid. All we can say
is that a primitive was treating some data as a C pointer, and it was assuming
that it was safe to do so. For some reason, that data did not represent a
valid address, and dereferencing it as a pointer caused a segmentation fault.

> >
> > All senders of #\\\ in my image have your initials. It's a bit suspicious
> > that Integer >> #reciprocalModulo: doesn't normalize v after the loop, but
> > it's possible that #- will do it anyway. In the other user - Integer >>
> > #slidingLeftRightRaisedTo:**modulo: - the result is normalized via
> > #normalize.
> >

Given that we have intermittent failures that seem to be associated with
data that a primitive is assuming to be safe to treat as a C pointer, it
might be worth adding #normalize to reciprocalModulo: and see if this makes
the problem go away.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

timrowledge
In reply to this post by Colin Putney-3

On 16-06-2013, at 11:53 AM, Colin Putney <[hidden email]> wrote:

>
>
>
> On Sun, Jun 16, 2013 at 10:29 AM, tim Rowledge <[hidden email]> wrote:
>  
> It won't be on the stack since you don't (in general - maybe someone has made some exceptions) go through primitives to some other method; you go into a prim and back out and send another message etc
>
> And the exceptions tend to be… exceptions. Failed primitives, or registration of exception handlers via #on:do:.

Well caught ;-)


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: CLOUT: Call Long-distance On Unused Telephone



Reply | Threaded
Open this post in threaded view
|

Re: Release image format (was: "Future Directions" welcome workspace)

David T. Lewis
In reply to this post by David T. Lewis
On Sun, Jun 16, 2013 at 03:09:17PM -0400, David T. Lewis wrote:

> On Sun, Jun 16, 2013 at 10:45:23AM +0100, Frank Shearar wrote:
> > On 16 June 2013 10:13, Frank Shearar <[hidden email]> wrote:
> > > On 9 December 2012 00:00, David T. Lewis <[hidden email]> wrote:
> > >> On Sat, Dec 08, 2012 at 10:48:16PM +0000, Frank Shearar wrote:
> > >>> On 8 December 2012 21:56, David T. Lewis <[hidden email]> wrote:
> > >>> > On Sat, Dec 08, 2012 at 09:50:21PM +0000, Frank Shearar wrote:
> > >>> >> On 8 December 2012 21:24, David T. Lewis <[hidden email]> wrote:
> > >>> >> >
> > >>> >> > We may have tripped an intermittent bug, and it might or might not be
> > >>> >> > aggrivated by the image having been saved in the 6504 format. The actual
> > >>> >> > failure was in a LargeIntegersPlugin primitive, so I'm not quite sure
> > >>> >> > what to make of it.
> > >>> >>
> > >>> >> Indeed. #55 passed, and so has #56.
> > >>> >>
> > >>> >
> > >>> > I see three successful runs now. It wouldn't hurt to run it a couple more
> > >>> > times and see if it stays healthy. But I need to leave for an hour or two,
> > >>> > so I won't do anything further right now.
> > >>>
> > >>> #57 - 60 have all passed.
> > >>>
> > >>
> > >> Good. Whatever it was, it's not an issue to get in the way of the release.
> > >
> > > It just manifested again, for the first time in ages:
> > > http://build.squeak.org/job/SqueakTrunk/383/console
> >
> > And here's the crash dump.
> >
>
> It looks like this is failing intermittently in the SqueakTrunk jobs while
> running LargePositiveIntegerTest>testReciprocalModulo. I don't see any particular
> pattern behind it, the crashes just seem to occur occasionally while running
> the SqueakTrunk job.
>
> I downloaded a copy of this Squeak4.5 image from build.squeak.org:
>   http://build.squeak.org/job/SqueakTrunk/ws/Squeak4.4.image
>   http://build.squeak.org/job/SqueakTrunk/ws/Squeak4.4.changes
>
> I am now running the following in a workspace under both Cog and Interpreter
> VMs. This runs testReciprocalModulo continuously and allocates memory to move
> things around:
>
>     log := [:s | FileStream stdout nextPutAll: s; lf].
>     vm := ['Interpreter VM ', Smalltalk vm versionLabel]
>         on: Warning
>         do: [:e | Smalltalk vm interpreterClass].
>     wasteSomeSpace := OrderedCollection new.
>     1 to: 100000 do: [:i |
>         (wasteSomeSpace size > 3000) ifTrue: [
>             wasteSomeSpace := OrderedCollection new].
>         wasteSomeSpace add: (String new: 100000).
>         log value: vm.
>         log value: SmalltalkImage current vmStatisticsReportString.
>         log value: 'testReciprocalModulo test run ', i asString.
>         (LargePositiveIntegerTest selector: #testReciprocalModulo) run].
>
> I have run over 50,000 iterations of the loop in the last few hours, running on
> both a Cog VM and on an interpreter VM. So far no crashes with either VM.
>
> I'm going to try the same thing with the SqueakTrunk image from the same workspace:
>   http://build.squeak.org/job/SqueakTrunk/lastSuccessfulBuild/artifact/target/TrunkImage.image
>   http://build.squeak.org/job/SqueakTrunk/lastSuccessfulBuild/artifact/target/TrunkImage.changes
>
> I'll run the same tests on Cog and interpreter VMs, and see if anything happens
> with that image.

For the record, I have now run over 50,000 iterations of the loop using the
TrunkImage, using both Cog and interpreter VMs. No crashes.

I cannot reproduce the crash on my Linux PC by simply running testReciprocalModulo.
I've tried different VMs and different images, and I did memory allocation between
loops in hopes of moving the problem around in the object memory. No joy, it does
not crash.

The source of the problem must be elsewhere.

Dave