Timing out Trait tests

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Timing out Trait tests

Frank Shearar-3
A bunch of the Trait tests fail, despite no one having worked on the
Traits code in quite a while. The symptom is that the tests time out
[1]. I've tried to debug some, only to find that they pass. For
instance, <timeout: 60> sometimes "fixes" the tests.

But the tests shouldn't be failing. Any takers?

frank

[1] http://build.squeak.org/job/SqueakTrunk/756/testReport/TraitsTests.Kernel/TraitCompositionTest/testAliasCompositions/

Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

timrowledge

On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:

> A bunch of the Trait tests fail, despite no one having worked on the
> Traits code in quite a while. The symptom is that the tests time out
> [1]. I've tried to debug some, only to find that they pass. For
> instance, <timeout: 60> sometimes "fixes" the tests.
>
> But the tests shouldn't be failing. Any takers?

I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.

If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Can't program his way out of a for-loop.



Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Frank Shearar-3
On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:

>
> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>
>> A bunch of the Trait tests fail, despite no one having worked on the
>> Traits code in quite a while. The symptom is that the tests time out
>> [1]. I've tried to debug some, only to find that they pass. For
>> instance, <timeout: 60> sometimes "fixes" the tests.
>>
>> But the tests shouldn't be failing. Any takers?
>
> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>
> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.

They time out on my beefy work laptop: a 2.7 GHz machine. That's
probably a higher spec machine than the build server.

If I was a betting man I'd be inclined to put my money on (b).

frank

Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Frank Shearar-3
On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:

> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).
Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank



traitcompositiontest.txt (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Nicolas Cellier

2014/1/13 Frank Shearar <[hidden email]>
On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:
**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:
**Leaves**
57.1 (2,518)  Array  elementsForwardIdentityTo:
16.4 (722)  StandardFileStream  primWrite:from:startingAt:count:
6.5 (288)  Symbol  flushCache
5.7 (252)  StandardFileStream  primOpen:writable:

2nd run:
**Leaves**
64.8 (4,436)  Array  elementsForwardIdentityTo:
17.3 (1,183)  StandardFileStream  primWrite:from:startingAt:count:
4.6 (318)  Symbol  flushCache
4.1 (282)  StandardFileStream  primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...



Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Karl Ramberg
I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.

Cheers,
Karl


On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier <[hidden email]> wrote:

2014/1/13 Frank Shearar <[hidden email]>
On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:
**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:
**Leaves**
57.1 (2,518)  Array  elementsForwardIdentityTo:
16.4 (722)  StandardFileStream  primWrite:from:startingAt:count:
6.5 (288)  Symbol  flushCache
5.7 (252)  StandardFileStream  primOpen:writable:

2nd run:
**Leaves**
64.8 (4,436)  Array  elementsForwardIdentityTo:
17.3 (1,183)  StandardFileStream  primWrite:from:startingAt:count:
4.6 (318)  Symbol  flushCache
4.1 (282)  StandardFileStream  primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...








SqueakDebug.log (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Karl Ramberg
TraitsFileOutTest is failing

Correct log attached this time

Cheers,
Karl


On Mon, Jan 13, 2014 at 11:53 PM, karl ramberg <[hidden email]> wrote:
I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.

Cheers,
Karl


On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier <[hidden email]> wrote:

2014/1/13 Frank Shearar <[hidden email]>
On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:
**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:
**Leaves**
57.1 (2,518)  Array  elementsForwardIdentityTo:
16.4 (722)  StandardFileStream  primWrite:from:startingAt:count:
6.5 (288)  Symbol  flushCache
5.7 (252)  StandardFileStream  primOpen:writable:

2nd run:
**Leaves**
64.8 (4,436)  Array  elementsForwardIdentityTo:
17.3 (1,183)  StandardFileStream  primWrite:from:startingAt:count:
4.6 (318)  Symbol  flushCache
4.1 (282)  StandardFileStream  primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...









SqueakDebug.log (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Nicolas Cellier

2014/1/14 karl ramberg <[hidden email]>
TraitsFileOutTest is failing

Correct log attached this time

Cheers,
Karl

id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...

I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...

 

On Mon, Jan 13, 2014 at 11:53 PM, karl ramberg <[hidden email]> wrote:
I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.

Cheers,
Karl


On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier <[hidden email]> wrote:

2014/1/13 Frank Shearar <[hidden email]>
On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
>>
>> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote:
>>
>>> A bunch of the Trait tests fail, despite no one having worked on the
>>> Traits code in quite a while. The symptom is that the tests time out
>>> [1]. I've tried to debug some, only to find that they pass. For
>>> instance, <timeout: 60> sometimes "fixes" the tests.
>>>
>>> But the tests shouldn't be failing. Any takers?
>>
>> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
>>
>> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either
>> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?)
>> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
>
> They time out on my beefy work laptop: a 2.7 GHz machine. That's
> probably a higher spec machine than the build server.
>
> If I was a betting man I'd be inclined to put my money on (b).

Attached is the output of MessageTally spyOn: [[TraitCompositionTest
debug: #testAliasCompositions] on: TestFailure do: [#fail]], after
adding <timeout: 60> to the test.

The leaves are very interesting:

**Leaves**
43.1% {28650ms} CompiledMethod>>originalTraitMethod:
36.9% {24505ms} Text>>size
9.7% {6425ms} Delay>>wait
3.4% {2239ms} SecurityManager class>>default
1.1% {749ms} CompiledMethod>>checkOKToAdd:at:

_24 seconds_ in Text >> #size ?! Also,
CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In
fact the only things that have changed, that I can see, in the code
path are Chris Muller's fixing of TestCase >> #debug and Nicolas
Cellier's cleanup of the Compiler API. Nothing looks even remotely
capable of slowing stuff down this dramatically.

frank

I don't get same figures here:

1st run:
**Leaves**
50.5% {3145ms} CompiledMethod>>originalTraitMethod:
18.1% {1125ms} Delay>>wait
11.8% {731ms} MultiByteFileStream>>nextPut:
5.1% {316ms} Trait(ClassDescription)>>organization
3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
1.7% {108ms} CompiledMethodWithNode>>selector
1.1% {67ms} ByteSymbol(Symbol)>>=

2nd run:

**Leaves**
50.5% {1991ms} CompiledMethod>>originalTraitMethod:
15.7% {617ms} MultiByteFileStream>>nextPut:
8.8% {346ms} Delay>>wait
4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite:
3.9% {153ms} Trait(ClassDescription)>>organization
2.3% {92ms} ByteSymbol(Symbol)>>=
1.5% {59ms} WeakArray class>>finalizationProcess
1.4% {54ms} CompiledMethodWithNode>>selector
1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:

Hmm, great variations between 2 runs?
I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.

With an AndreasSystemProfiler, here are the leaves:

1st run:
**Leaves**
57.1 (2,518)  Array  elementsForwardIdentityTo:
16.4 (722)  StandardFileStream  primWrite:from:startingAt:count:
6.5 (288)  Symbol  flushCache
5.7 (252)  StandardFileStream  primOpen:writable:

2nd run:
**Leaves**
64.8 (4,436)  Array  elementsForwardIdentityTo:
17.3 (1,183)  StandardFileStream  primWrite:from:startingAt:count:
4.6 (318)  Symbol  flushCache
4.1 (282)  StandardFileStream  primOpen:writable:

Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...












Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

David T. Lewis
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:

> 2014/1/14 karl ramberg <[hidden email]>
>
> > TraitsFileOutTest is failing
> >
> > Correct log attached this time
> >
> > Cheers,
> > Karl
> >
> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> means the file was closed...
>
> I observed increasing failure rate related to randomly closed change log
> the last few months (particularly when loading/merging with MC), it would
> be good to identify the root cause...
>

That sounds like exactly the problem that Frank was describing in another
thread. He was ending up with a corrupt changes file IIRC, and the problems
he described seemed to be related to primSize checks on a changes file that
had been closed (this would presumably be a process trying to append to the
changes file when some other process had closed the file stream).

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Frank Shearar-3
On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:

> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

frank

> Dave

Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Karl Ramberg


On Tue, Jan 14, 2014 at 8:20 AM, Frank Shearar <[hidden email]> wrote:
On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this.
 
I'm also using windows so can look like the problem is related to that somehow.
 
Karl
 
So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

frank

> Dave




Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Eliot Miranda-2
In reply to this post by Frank Shearar-3
Hi Frank,


On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:
On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down.  We suffered a very similar symptom a while ago at Cadence.  It turns out that in our case the issue was the system running out of file handles because it left lots of fils open.  The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more.  I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.  

HTH
--
best,
Eliot



filehandlelimits.1.cs (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Frank Shearar-3
On 14 January 2014 19:10, Eliot Miranda <[hidden email]> wrote:

> Hi Frank,
>
>
> On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]>
> wrote:
>>
>> On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
>> > On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> >> 2014/1/14 karl ramberg <[hidden email]>
>> >>
>> >> > TraitsFileOutTest is failing
>> >> >
>> >> > Correct log attached this time
>> >> >
>> >> > Cheers,
>> >> > Karl
>> >> >
>> >> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> >> means the file was closed...
>> >>
>> >> I observed increasing failure rate related to randomly closed change
>> >> log
>> >> the last few months (particularly when loading/merging with MC), it
>> >> would
>> >> be good to identify the root cause...
>> >>
>> >
>> > That sounds like exactly the problem that Frank was describing in
>> > another
>> > thread. He was ending up with a corrupt changes file IIRC, and the
>> > problems
>> > he described seemed to be related to primSize checks on a changes file
>> > that
>> > had been closed (this would presumably be a process trying to append to
>> > the
>> > changes file when some other process had closed the file stream).
>>
>> I'm relieved it's not just me, but sad that it's not just me
>> experiencing this. So my understanding is that changes are stored by
>> accessing SourceFiles at: 2. If you want a readonly copy of sources or
>> changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
>> step is verifying that nothing referencing these _directly_ does
>> things like closing files (I'd be tempted to look at CROSF first.)
>
>
> forgive me for not replying earlier but I needed help to track this down.
> We suffered a very similar symptom a while ago at Cadence.  It turns out
> that in our case the issue was the system running out of file handles
> because it left lots of fils open.  The cause was with Monticello package
> loading and the fix was to use CurrentReadOnlySourceFiles more.  I'm
> attaching a changeset in the hope that this is the same issue and that our
> changeset can point you towards a fix.

Thanks, Eliot. From what I can make out, the MCPackage >> #snapshot in
this filein differs from trunk's only in the "reject: [:ea | [ea
methodClass language isNewspeakLanguage3]ifError:[false]" guard. So
the significant part of the changeset is PackageInfo >>
#changeRecordForOverriddenMethod:, which I've submitted to the Inbox
as an mcz so we can all have a look.

Thanks!

frank

> HTH
> --
> best,
> Eliot
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Nicolas Cellier



2014/1/15 Frank Shearar <[hidden email]>
On 14 January 2014 19:10, Eliot Miranda <[hidden email]> wrote:
> Hi Frank,
>
>
> On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]>
> wrote:
>>
>> On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
>> > On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> >> 2014/1/14 karl ramberg <[hidden email]>
>> >>
>> >> > TraitsFileOutTest is failing
>> >> >
>> >> > Correct log attached this time
>> >> >
>> >> > Cheers,
>> >> > Karl
>> >> >
>> >> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> >> means the file was closed...
>> >>
>> >> I observed increasing failure rate related to randomly closed change
>> >> log
>> >> the last few months (particularly when loading/merging with MC), it
>> >> would
>> >> be good to identify the root cause...
>> >>
>> >
>> > That sounds like exactly the problem that Frank was describing in
>> > another
>> > thread. He was ending up with a corrupt changes file IIRC, and the
>> > problems
>> > he described seemed to be related to primSize checks on a changes file
>> > that
>> > had been closed (this would presumably be a process trying to append to
>> > the
>> > changes file when some other process had closed the file stream).
>>
>> I'm relieved it's not just me, but sad that it's not just me
>> experiencing this. So my understanding is that changes are stored by
>> accessing SourceFiles at: 2. If you want a readonly copy of sources or
>> changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
>> step is verifying that nothing referencing these _directly_ does
>> things like closing files (I'd be tempted to look at CROSF first.)
>
>
> forgive me for not replying earlier but I needed help to track this down.
> We suffered a very similar symptom a while ago at Cadence.  It turns out
> that in our case the issue was the system running out of file handles
> because it left lots of fils open.  The cause was with Monticello package
> loading and the fix was to use CurrentReadOnlySourceFiles more.  I'm
> attaching a changeset in the hope that this is the same issue and that our
> changeset can point you towards a fix.


Yeah, that's the problems that I most often encounter with changes files, and for a long time:
 'no such file or directory' when opening the changes failed
I agree, most probably a simple problem of limited resources (like too many readOnlyCopy files were opened, but not yet closed/reclaimed).
Eliot, isn't it your fault? The faster the VM, the sooner we hit the resource limits

But I also managed to get a corrupt change log recently, could not trace it...
I suspect that closing the original changes is a bit different than starving of new readOnlyCopy, but who knows...
 
Thanks, Eliot. From what I can make out, the MCPackage >> #snapshot in
this filein differs from trunk's only in the "reject: [:ea | [ea
methodClass language isNewspeakLanguage3]ifError:[false]" guard. So
the significant part of the changeset is PackageInfo >>
#changeRecordForOverriddenMethod:, which I've submitted to the Inbox
as an mcz so we can all have a look.

Thanks!

frank

> HTH
> --
> best,
> Eliot
>
>
>




Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Karl Ramberg
In reply to this post by Eliot Miranda-2
I don't think the issue is file handles.
I tested a new trunk image and got the prim failed. 
I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.

It's just the traits tests that show the error

Cheers,
Karl


On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda <[hidden email]> wrote:
Hi Frank,


On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:
On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down.  We suffered a very similar symptom a while ago at Cadence.  It turns out that in our case the issue was the system running out of file handles because it left lots of fils open.  The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more.  I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.  

HTH
--
best,
Eliot






Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Karl Ramberg
I think I found a clue.
I do not get a prim failed error when I revert the method Environment>>forgetClass:logged: to  earlier version than 12/31/2013.


Cheers,
Karl



On Wed, Jan 15, 2014 at 9:18 PM, karl ramberg <[hidden email]> wrote:
I don't think the issue is file handles.
I tested a new trunk image and got the prim failed. 
I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.

It's just the traits tests that show the error

Cheers,
Karl


On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda <[hidden email]> wrote:
Hi Frank,


On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:
On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down.  We suffered a very similar symptom a while ago at Cadence.  It turns out that in our case the issue was the system running out of file handles because it left lots of fils open.  The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more.  I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.  

HTH
--
best,
Eliot







Reply | Threaded
Open this post in threaded view
|

Re: Timing out Trait tests

Karl Ramberg
I found the source of the MultiByteFileStream(Object)>>primitiveFailed: problem on my system.
A antivirus program was checking the file while it was used by Squeak.

I googled a little and found this clue:

All Traits test pass now and much quicker that before :-)

Cheers,
Karl


On Wed, Jan 15, 2014 at 10:24 PM, karl ramberg <[hidden email]> wrote:
I think I found a clue.
I do not get a prim failed error when I revert the method Environment>>forgetClass:logged: to  earlier version than 12/31/2013.


Cheers,
Karl



On Wed, Jan 15, 2014 at 9:18 PM, karl ramberg <[hidden email]> wrote:
I don't think the issue is file handles.
I tested a new trunk image and got the prim failed. 
I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.

It's just the traits tests that show the error

Cheers,
Karl


On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda <[hidden email]> wrote:
Hi Frank,


On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote:
On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
>> 2014/1/14 karl ramberg <[hidden email]>
>>
>> > TraitsFileOutTest is failing
>> >
>> > Correct log attached this time
>> >
>> > Cheers,
>> > Karl
>> >
>> > id:     #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>> means the file was closed...
>>
>> I observed increasing failure rate related to randomly closed change log
>> the last few months (particularly when loading/merging with MC), it would
>> be good to identify the root cause...
>>
>
> That sounds like exactly the problem that Frank was describing in another
> thread. He was ending up with a corrupt changes file IIRC, and the problems
> he described seemed to be related to primSize checks on a changes file that
> had been closed (this would presumably be a process trying to append to the
> changes file when some other process had closed the file stream).

I'm relieved it's not just me, but sad that it's not just me
experiencing this. So my understanding is that changes are stored by
accessing SourceFiles at: 2. If you want a readonly copy of sources or
changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first
step is verifying that nothing referencing these _directly_ does
things like closing files (I'd be tempted to look at CROSF first.)

forgive me for not replying earlier but I needed help to track this down.  We suffered a very similar symptom a while ago at Cadence.  It turns out that in our case the issue was the system running out of file handles because it left lots of fils open.  The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more.  I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.  

HTH
--
best,
Eliot