Debugging a bug that wasn't.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Debugging a bug that wasn't.

Bryce Kampjes

I'm testing and debugging before the 0.14 release. The major benefit
is faster compile times and better generated code. There's around a
10% gain or loss for the macro benchmarks depending on what's
compiled.

The bug was triggered by the below code. Originally it was compiling
"^[42] value". The set-up shown below is what generates the fault
without needing to compile the code. The fault was odd because it
wasn't crashing, and wasn't stuck in an infinite loop but the image
had locked up.

I was investigating by interrupting execution with gdb and looking at
both the Smalltalk stack using "p printCallStack()" and the C stack
using "where". Normally, if compiled code was corrupting the object
memory it would crash fairly quickly. That it kept executing, which
could be seen because the Smalltalk stack kept changing was weird.  If
the GC's got a corrupt view of the object memory then normally it'll
end up really corrupting it when it decides to try to interpret the
middle of an object as an object header.

The "Error signal." at the front of the test is because running the
test will lock up the image. This allows the rest of the suite to
be run.

    testInterruptCausesCrashes
        "Crashing is a failure"
        | processes |
        Error signal.
        self compileExpression: 'self createPoint'.
        processes := (1 to: 10) collect:
            [:each| [[(ExuperyProfiler
               spyOn: [(Delay forMilliseconds: 500) wait])
                 queueCompilationCommandsOn: SharedQueue2 new] repeat] fork.
        "Force a garbage collect on every allocation"
        SmalltalkImage current vmParameterAt: 5 put: 1.
        10000 timesRepeat: [self example].
        Exupery log: 'after running example'.
        "Reset garbage collects to a sensible value"
        SmalltalkImage current vmParameterAt: 5 put: 4000.
        processes do: [:each| each terminate].

    createPoint
        "^ 10 @ 20"

What I think is happening is the 10 profiling processes end up
consuming all the time and starving the user level process so it
never completes and never resets the garbage collector's collection
rate to a sane value. Having produced the lockup twice without any
compiled code, I'm reasonably happy that this is the case.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Debugging a bug that wasn't.

Igor Stasenko
2008/4/20  <[hidden email]>:

>
>  I'm testing and debugging before the 0.14 release. The major benefit
>  is faster compile times and better generated code. There's around a
>  10% gain or loss for the macro benchmarks depending on what's
>  compiled.
>
>  The bug was triggered by the below code. Originally it was compiling
>  "^[42] value". The set-up shown below is what generates the fault
>  without needing to compile the code. The fault was odd because it
>  wasn't crashing, and wasn't stuck in an infinite loop but the image
>  had locked up.
>
>  I was investigating by interrupting execution with gdb and looking at
>  both the Smalltalk stack using "p printCallStack()" and the C stack
>  using "where". Normally, if compiled code was corrupting the object
>  memory it would crash fairly quickly. That it kept executing, which
>  could be seen because the Smalltalk stack kept changing was weird.  If
>  the GC's got a corrupt view of the object memory then normally it'll
>  end up really corrupting it when it decides to try to interpret the
>  middle of an object as an object header.
>
>  The "Error signal." at the front of the test is because running the
>  test will lock up the image. This allows the rest of the suite to
>  be run.
>
>     testInterruptCausesCrashes
>         "Crashing is a failure"
>         | processes |
>         Error signal.
>         self compileExpression: 'self createPoint'.
>         processes := (1 to: 10) collect:
>             [:each| [[(ExuperyProfiler
>                spyOn: [(Delay forMilliseconds: 500) wait])
>                  queueCompilationCommandsOn: SharedQueue2 new] repeat] fork.
>         "Force a garbage collect on every allocation"
>         SmalltalkImage current vmParameterAt: 5 put: 1.
>         10000 timesRepeat: [self example].
>         Exupery log: 'after running example'.
>         "Reset garbage collects to a sensible value"
>         SmalltalkImage current vmParameterAt: 5 put: 4000.
>         processes do: [:each| each terminate].
>
>     createPoint
>         "^ 10 @ 20"
>
>  What I think is happening is the 10 profiling processes end up
>  consuming all the time and starving the user level process so it
>  never completes and never resets the garbage collector's collection
>  rate to a sane value. Having produced the lockup twice without any
>  compiled code, I'm reasonably happy that this is the case.
>

GC signals a semaphore after finishing (don't remember - is weak
finalization process using it or different one).
I'm not sure, can it somehow interfere with given case?

If weak finalization loop creates new objects, it can lead to
exclusive looping in it, because user processes running with lower
priority.

finalizationProcess

        [true] whileTrue:
                [FinalizationSemaphore wait.
                FinalizationLock critical:
                        [FinalizationDependents do:
                                [:weakDependent |
                                weakDependent ifNotNil:
                                        [weakDependent finalizeValues.
                                        "***Following statement is required to keep weakDependent
                                        from holding onto its value as garbage.***"
                                        weakDependent := nil]]]
                        ifError:
                        [:msg :rcvr | rcvr error: msg].
                ].

see, if FinalizationSemaphore having excess signals after executing
cycle once, it continues with execution.
I think better to place FinalizationSemaphore initSignals at the end
of weak finalization, so it wouldn't be triggered by garbage
collection which may be happen in weak finalization routines.

>  Bryce
>  _______________________________________________
>  Exupery mailing list
>  [hidden email]
>  http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>



--
Best regards,
Igor Stasenko AKA sig.
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Debugging a bug that wasn't.

Bryce Kampjes
Igor Stasenko writes:
 > GC signals a semaphore after finishing (don't remember - is weak
 > finalization process using it or different one).
 > I'm not sure, can it somehow interfere with given case?
 >
 > If weak finalization loop creates new objects, it can lead to
 > exclusive looping in it, because user processes running with lower
 > priority.
 >

The problem has nothing to do with finalisation. The issue is running
a GC for every allocation slows down execution a lot. Enough that
there's a race to get through the test code before all 10 profiling
threads start profiling. If they all start profiling then they can
consume all available CPU.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Debugging a bug that wasn't.

Igor Stasenko
2008/4/20  <[hidden email]>:

> Igor Stasenko writes:
>   > GC signals a semaphore after finishing (don't remember - is weak
>   > finalization process using it or different one).
>   > I'm not sure, can it somehow interfere with given case?
>   >
>   > If weak finalization loop creates new objects, it can lead to
>   > exclusive looping in it, because user processes running with lower
>   > priority.
>   >
>
>  The problem has nothing to do with finalisation. The issue is running
>  a GC for every allocation slows down execution a lot. Enough that
>  there's a race to get through the test code before all 10 profiling
>  threads start profiling. If they all start profiling then they can
>  consume all available CPU.
>

Okay. So, the problem that forcing to collect garbage after single
allocation seem too stressing for your tests.
Then why not divide test on parts - one for testing GC, another one
for testing interrupts which causing active process switching?

>
>
>  Bryce
>  _______________________________________________
>  Exupery mailing list
>  [hidden email]
>  http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
>



--
Best regards,
Igor Stasenko AKA sig.
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Debugging a bug that wasn't.

Bryce Kampjes
Igor Stasenko writes:
 > Okay. So, the problem that forcing to collect garbage after single
 > allocation seem too stressing for your tests.
 > Then why not divide test on parts - one for testing GC, another one
 > for testing interrupts which causing active process switching?

There's no bug hence the title of this thread. For a week I thought
there was a bug because with one instance it was appearing and
disappearing based on whether I compiled the code or not. That was
just a consequence of a very close race.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery