Smalltalk › Squeak › Squeak - Dev

Timing out Trait tests

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

17 messages Options

Frank Shearar-3

Timing out Trait tests

A bunch of the Trait tests fail, despite no one having worked on the
Traits code in quite a while. The symptom is that the tests time out
[1]. I've tried to debug some, only to find that they pass. For
instance, <timeout: 60> sometimes "fixes" the tests.

But the tests shouldn't be failing. Any takers?

frank

[1] http://build.squeak.org/job/SqueakTrunk/756/testReport/TraitsTests.Kernel/TraitCompositionTest/testAliasCompositions/

timrowledge

Re: Timing out Trait tests

On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:

> A bunch of the Trait tests fail, despite no one having worked on the
> Traits code in quite a while. The symptom is that the tests time out
> [1]. I've tried to debug some, only to find that they pass. For
> instance, <timeout: 60> sometimes "fixes" the tests.
>
> But the tests shouldn't be failing. Any takers?

I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.

If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Can't program his way out of a for-loop.

Frank Shearar-3

Re: Timing out Trait tests

On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:

>
> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>
>> A bunch of the Trait tests fail, despite no one having worked on the
>> Traits code in quite a while. The symptom is that the tests time out
>> [1]. I've tried to debug some, only to find that they pass. For
>> instance, <timeout: 60> sometimes "fixes" the tests.
>>
>> But the tests shouldn't be failing. Any takers?
>
> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>
> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.

They time out on my beefy work laptop: a 2.7 GHz machine. That's
probably a higher spec machine than the build server.

If I was a betting man I'd be inclined to put my money on (b).

frank

Frank Shearar-3

Re: Timing out Trait tests

On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:

> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

traitcompositiontest.txt (30K) Download Attachment

Nicolas Cellier

Re: Timing out Trait tests

2014/1/13 Frank Shearar <[hidden email]>

On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:

**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:

**Leaves**
57.1 (2,518) Array elementsForwardIdentityTo:
16.4 (722) StandardFileStream primWrite:from:startingAt:count:
6.5 (288) Symbol flushCache
5.7 (252) StandardFileStream primOpen:writable:

2nd run:

**Leaves**
64.8 (4,436) Array elementsForwardIdentityTo:
17.3 (1,183) StandardFileStream primWrite:from:startingAt:count:
4.6 (318) Symbol flushCache
4.1 (282) StandardFileStream primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...

Karl Ramberg

Re: Timing out Trait tests

I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.

Cheers,

Karl

On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier <[hidden email]> wrote:

2014/1/13 Frank Shearar <[hidden email]>

On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:
**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:
**Leaves**
57.1 (2,518) Array elementsForwardIdentityTo:
16.4 (722) StandardFileStream primWrite:from:startingAt:count:
6.5 (288) Symbol flushCache
5.7 (252) StandardFileStream primOpen:writable:

2nd run:
**Leaves**
64.8 (4,436) Array elementsForwardIdentityTo:
17.3 (1,183) StandardFileStream primWrite:from:startingAt:count:
4.6 (318) Symbol flushCache
4.1 (282) StandardFileStream primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...

SqueakDebug.log (16K) Download Attachment

Karl Ramberg

Re: Timing out Trait tests

TraitsFileOutTest is failing

Correct log attached this time

Cheers,

Karl

On Mon, Jan 13, 2014 at 11:53 PM, karl ramberg <[hidden email]> wrote:

I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.

Cheers,
Karl

On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier <[hidden email]> wrote:

2014/1/13 Frank Shearar <[hidden email]>

On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:
**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:
**Leaves**
57.1 (2,518) Array elementsForwardIdentityTo:
16.4 (722) StandardFileStream primWrite:from:startingAt:count:
6.5 (288) Symbol flushCache
5.7 (252) StandardFileStream primOpen:writable:

2nd run:
**Leaves**
64.8 (4,436) Array elementsForwardIdentityTo:
17.3 (1,183) StandardFileStream primWrite:from:startingAt:count:
4.6 (318) Symbol flushCache
4.1 (282) StandardFileStream primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...

SqueakDebug.log (14K) Download Attachment

Nicolas Cellier

Re: Timing out Trait tests

2014/1/14 karl ramberg <[hidden email]>

TraitsFileOutTest is failing

Correct log attached this time

Cheers,
Karl

id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

means the file was closed...

I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...

On Mon, Jan 13, 2014 at 11:53 PM, karl ramberg <[hidden email]> wrote:

I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.

Cheers,
Karl

On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier <[hidden email]> wrote:

2014/1/13 Frank Shearar <[hidden email]>

On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:
**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:
**Leaves**
57.1 (2,518) Array elementsForwardIdentityTo:
16.4 (722) StandardFileStream primWrite:from:startingAt:count:
6.5 (288) Symbol flushCache
5.7 (252) StandardFileStream primOpen:writable:

2nd run:
**Leaves**
64.8 (4,436) Array elementsForwardIdentityTo:
17.3 (1,183) StandardFileStream primWrite:from:startingAt:count:
4.6 (318) Symbol flushCache
4.1 (282) StandardFileStream primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...

David T. Lewis

Re: Timing out Trait tests

On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:

> 2014/1/14 karl ramberg <[hidden email]>
>
> > TraitsFileOutTest is failing
> >
> > Correct log attached this time
> >
> > Cheers,
> > Karl
> >
> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> means the file was closed...
>
> I observed increasing failure rate related to randomly closed change log
> the last few months (particularly when loading/merging with MC), it would
> be good to identify the root cause...
>

That sounds like exactly the problem that Frank was describing in another
thread. He was ending up with a corrupt changes file IIRC, and the problems
he described seemed to be related to primSize checks on a changes file that
had been closed (this would presumably be a process trying to append to the
changes file when some other process had closed the file stream).

Dave

Frank Shearar-3

Re: Timing out Trait tests

On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:

> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

frank

> Dave

Karl Ramberg

Re: Timing out Trait tests

On Tue, Jan 14, 2014 at 8:20 AM, Frank Shearar <[hidden email]> wrote:

On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this.

I'm also using windows so can look like the problem is related to that somehow.

Karl

So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

frank

> Dave

Eliot Miranda-2

Re: Timing out Trait tests

In reply to this post by Frank Shearar-3

Hi Frank,

On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:

On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.

HTH

--
best,

Eliot

filehandlelimits.1.cs (5K) Download Attachment

Frank Shearar-3

Re: Timing out Trait tests

On 14 January 2014 19:10, Eliot Miranda <[hidden email]> wrote:

> Hi Frank,
>
>
> On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]>
> wrote:
>>
>> On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
>> > On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> >> 2014/1/14 karl ramberg <[hidden email]>
>> >>
>> >> > TraitsFileOutTest is failing
>> >> >
>> >> > Correct log attached this time
>> >> >
>> >> > Cheers,
>> >> > Karl
>> >> >
>> >> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> >> means the file was closed...
>> >>
>> >> I observed increasing failure rate related to randomly closed change
>> >> log
>> >> the last few months (particularly when loading/merging with MC), it
>> >> would
>> >> be good to identify the root cause...
>> >>
>> >
>> > That sounds like exactly the problem that Frank was describing in
>> > another
>> > thread. He was ending up with a corrupt changes file IIRC, and the
>> > problems
>> > he described seemed to be related to primSize checks on a changes file
>> > that
>> > had been closed (this would presumably be a process trying to append to
>> > the
>> > changes file when some other process had closed the file stream).
>>
>> I'm relieved it's not just me, but sad that it's not just me
>> experiencing this. So my understanding is that changes are stored by
>> accessing SourceFiles at: 2. If you want a readonly copy of sources or
>> changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
>> step is verifying that nothing referencing these _directly_ does
>> things like closing files (I'd be tempted to look at CROSF first.)
>
>
> forgive me for not replying earlier but I needed help to track this down.
> We suffered a very similar symptom a while ago at Cadence. It turns out
> that in our case the issue was the system running out of file handles
> because it left lots of fils open. The cause was with Monticello package
> loading and the fix was to use CurrentReadOnlySourceFiles more. I'm
> attaching a changeset in the hope that this is the same issue and that our
> changeset can point you towards a fix.

Thanks, Eliot. From what I can make out, the MCPackage >> #snapshot in
this filein differs from trunk's only in the "reject: [:ea | [ea
methodClass language isNewspeakLanguage3]ifError:[false]" guard. So
the significant part of the changeset is PackageInfo >>
#changeRecordForOverriddenMethod:, which I've submitted to the Inbox
as an mcz so we can all have a look.

Thanks!

frank

> HTH
> --
> best,
> Eliot
>
>
>

Nicolas Cellier

Re: Timing out Trait tests

2014/1/15 Frank Shearar <[hidden email]>

On 14 January 2014 19:10, Eliot Miranda <[hidden email]> wrote:
> Hi Frank,
>
>
> On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]>
> wrote:
>>
>> On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
>> > On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> >> 2014/1/14 karl ramberg <[hidden email]>
>> >>
>> >> > TraitsFileOutTest is failing
>> >> >
>> >> > Correct log attached this time
>> >> >
>> >> > Cheers,
>> >> > Karl
>> >> >
>> >> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> >> means the file was closed...
>> >>
>> >> I observed increasing failure rate related to randomly closed change
>> >> log
>> >> the last few months (particularly when loading/merging with MC), it
>> >> would
>> >> be good to identify the root cause...
>> >>
>> >
>> > That sounds like exactly the problem that Frank was describing in
>> > another
>> > thread. He was ending up with a corrupt changes file IIRC, and the
>> > problems
>> > he described seemed to be related to primSize checks on a changes file
>> > that
>> > had been closed (this would presumably be a process trying to append to
>> > the
>> > changes file when some other process had closed the file stream).
>>
>> I'm relieved it's not just me, but sad that it's not just me
>> experiencing this. So my understanding is that changes are stored by
>> accessing SourceFiles at: 2. If you want a readonly copy of sources or
>> changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
>> step is verifying that nothing referencing these _directly_ does
>> things like closing files (I'd be tempted to look at CROSF first.)
>
>
> forgive me for not replying earlier but I needed help to track this down.
> We suffered a very similar symptom a while ago at Cadence. It turns out
> that in our case the issue was the system running out of file handles
> because it left lots of fils open. The cause was with Monticello package
> loading and the fix was to use CurrentReadOnlySourceFiles more. I'm
> attaching a changeset in the hope that this is the same issue and that our
> changeset can point you towards a fix.

Yeah, that's the problems that I most often encounter with changes files, and for a long time:
'no such file or directory' when opening the changes failed
I agree, most probably a simple problem of limited resources (like too many readOnlyCopy files were opened, but not yet closed/reclaimed).
Eliot, isn't it your fault? The faster the VM, the sooner we hit the resource limits

But I also managed to get a corrupt change log recently, could not trace it...
I suspect that closing the original changes is a bit different than starving of new readOnlyCopy, but who knows...

Thanks, Eliot. From what I can make out, the MCPackage >> #snapshot in
this filein differs from trunk's only in the "reject: [:ea | [ea
methodClass language isNewspeakLanguage3]ifError:[false]" guard. So
the significant part of the changeset is PackageInfo >>
#changeRecordForOverriddenMethod:, which I've submitted to the Inbox
as an mcz so we can all have a look.

Thanks!

frank

> HTH
> --
> best,
> Eliot
>
>
>

Karl Ramberg

Re: Timing out Trait tests

In reply to this post by Eliot Miranda-2

I don't think the issue is file handles.

I tested a new trunk image and got the prim failed.

I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.

It's just the traits tests that show the error

Cheers,

Karl

On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda <[hidden email]> wrote:

Hi Frank,

On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:

On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.

HTH
--
best,
Eliot

Karl Ramberg

Re: Timing out Trait tests

I think I found a clue.

I do not get a prim failed error when I revert the method Environment>>forgetClass:logged: to earlier version than 12/31/2013.

Cheers,

Karl

On Wed, Jan 15, 2014 at 9:18 PM, karl ramberg <[hidden email]> wrote:

I don't think the issue is file handles.
I tested a new trunk image and got the prim failed.
I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.

It's just the traits tests that show the error

Cheers,
Karl

On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda <[hidden email]> wrote:

Hi Frank,

On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:

On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.

HTH
--
best,
Eliot

Karl Ramberg

Re: Timing out Trait tests

I found the source of the MultiByteFileStream(Object)>>primitiveFailed: problem on my system.

A antivirus program was checking the file while it was used by Squeak.

I googled a little and found this clue:

http://marc.info/?l=squeak-dev&m=118046065627943&w=2

All Traits test pass now and much quicker that before :-)

Cheers,

Karl

On Wed, Jan 15, 2014 at 10:24 PM, karl ramberg <[hidden email]> wrote:

I think I found a clue.
I do not get a prim failed error when I revert the method Environment>>forgetClass:logged: to earlier version than 12/31/2013.

Cheers,

Karl

On Wed, Jan 15, 2014 at 9:18 PM, karl ramberg <[hidden email]> wrote:

I don't think the issue is file handles.
I tested a new trunk image and got the prim failed.
I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.

It's just the traits tests that show the error

Cheers,
Karl

On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda <[hidden email]> wrote:

Hi Frank,

On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:

On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.

HTH
--
best,
Eliot