A bunch of the Trait tests fail, despite no one having worked on the
Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests. But the tests shouldn't be failing. Any takers? frank [1] http://build.squeak.org/job/SqueakTrunk/756/testReport/TraitsTests.Kernel/TraitCompositionTest/testAliasCompositions/ |
On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote: > A bunch of the Trait tests fail, despite no one having worked on the > Traits code in quite a while. The symptom is that the tests time out > [1]. I've tried to debug some, only to find that they pass. For > instance, <timeout: 60> sometimes "fixes" the tests. > > But the tests shouldn't be failing. Any takers? I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value. If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?) b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful random insult:- Can't program his way out of a for-loop. |
On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote:
> > On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote: > >> A bunch of the Trait tests fail, despite no one having worked on the >> Traits code in quite a while. The symptom is that the tests time out >> [1]. I've tried to debug some, only to find that they pass. For >> instance, <timeout: 60> sometimes "fixes" the tests. >> >> But the tests shouldn't be failing. Any takers? > > I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value. > > If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either > a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?) > b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case. They time out on my beefy work laptop: a 2.7 GHz machine. That's probably a higher spec machine than the build server. If I was a betting man I'd be inclined to put my money on (b). frank |
On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote:
> On 11 January 2014 19:19, tim Rowledge <[hidden email]> wrote: >> >> On 11-01-2014, at 6:29 AM, Frank Shearar <[hidden email]> wrote: >> >>> A bunch of the Trait tests fail, despite no one having worked on the >>> Traits code in quite a while. The symptom is that the tests time out >>> [1]. I've tried to debug some, only to find that they pass. For >>> instance, <timeout: 60> sometimes "fixes" the tests. >>> >>> But the tests shouldn't be failing. Any takers? >> >> I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value. >> >> If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either >> a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?) >> b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case. > > They time out on my beefy work laptop: a 2.7 GHz machine. That's > probably a higher spec machine than the build server. > > If I was a betting man I'd be inclined to put my money on (b). debug: #testAliasCompositions] on: TestFailure do: [#fail]], after adding <timeout: 60> to the test. The leaves are very interesting: **Leaves** 43.1% {28650ms} CompiledMethod>>originalTraitMethod: 36.9% {24505ms} Text>>size 9.7% {6425ms} Delay>>wait 3.4% {2239ms} SecurityManager class>>default 1.1% {749ms} CompiledMethod>>checkOKToAdd:at: _24 seconds_ in Text >> #size ?! Also, CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In fact the only things that have changed, that I can see, in the code path are Chris Muller's fixing of TestCase >> #debug and Nicolas Cellier's cleanup of the Compiler API. Nothing looks even remotely capable of slowing stuff down this dramatically. frank traitcompositiontest.txt (30K) Download Attachment |
2014/1/13 Frank Shearar <[hidden email]> On 11 January 2014 22:55, Frank Shearar <[hidden email]> wrote: I don't get same figures here: 1st run: **Leaves** 50.5% {3145ms} CompiledMethod>>originalTraitMethod: 18.1% {1125ms} Delay>>wait 11.8% {731ms} MultiByteFileStream>>nextPut: 5.1% {316ms} Trait(ClassDescription)>>organization 3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 1.7% {108ms} CompiledMethodWithNode>>selector 1.1% {67ms} ByteSymbol(Symbol)>>= 2nd run: **Leaves** 50.5% {1991ms} CompiledMethod>>originalTraitMethod: 15.7% {617ms} MultiByteFileStream>>nextPut: 8.8% {346ms} Delay>>wait 4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 3.9% {153ms} Trait(ClassDescription)>>organization 2.3% {92ms} ByteSymbol(Symbol)>>= 1.5% {59ms} WeakArray class>>finalizationProcess 1.4% {54ms} CompiledMethodWithNode>>selector 1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent: Hmm, great variations between 2 runs? I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy. With an AndreasSystemProfiler, here are the leaves: 1st run: **Leaves** 57.1 (2,518) Array elementsForwardIdentityTo: 16.4 (722) StandardFileStream primWrite:from:startingAt:count: 6.5 (288) Symbol flushCache 5.7 (252) StandardFileStream primOpen:writable: 2nd run: **Leaves** 64.8 (4,436) Array elementsForwardIdentityTo: 17.3 (1,183) StandardFileStream primWrite:from:startingAt:count: 4.6 (318) Symbol flushCache 4.1 (282) StandardFileStream primOpen:writable: Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)... |
I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests. Cheers, Karl On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier <[hidden email]> wrote:
SqueakDebug.log (16K) Download Attachment |
TraitsFileOutTest is failing Correct log attached this time Cheers, Karl On Mon, Jan 13, 2014 at 11:53 PM, karl ramberg <[hidden email]> wrote:
SqueakDebug.log (14K) Download Attachment |
2014/1/14 karl ramberg <[hidden email]>
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] means the file was closed... I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...
|
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
> 2014/1/14 karl ramberg <[hidden email]> > > > TraitsFileOutTest is failing > > > > Correct log attached this time > > > > Cheers, > > Karl > > > > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > means the file was closed... > > I observed increasing failure rate related to randomly closed change log > the last few months (particularly when loading/merging with MC), it would > be good to identify the root cause... > That sounds like exactly the problem that Frank was describing in another thread. He was ending up with a corrupt changes file IIRC, and the problems he described seemed to be related to primSize checks on a changes file that had been closed (this would presumably be a process trying to append to the changes file when some other process had closed the file stream). Dave |
On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote:
> On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote: >> 2014/1/14 karl ramberg <[hidden email]> >> >> > TraitsFileOutTest is failing >> > >> > Correct log attached this time >> > >> > Cheers, >> > Karl >> > >> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] >> means the file was closed... >> >> I observed increasing failure rate related to randomly closed change log >> the last few months (particularly when loading/merging with MC), it would >> be good to identify the root cause... >> > > That sounds like exactly the problem that Frank was describing in another > thread. He was ending up with a corrupt changes file IIRC, and the problems > he described seemed to be related to primSize checks on a changes file that > had been closed (this would presumably be a process trying to append to the > changes file when some other process had closed the file stream). I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.) frank > Dave |
On Tue, Jan 14, 2014 at 8:20 AM, Frank Shearar <[hidden email]> wrote:
I'm also using windows so can look like the problem is related to that somehow.
Karl
So my understanding is that changes are stored by |
In reply to this post by Frank Shearar-3
Hi Frank,
On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> wrote: -- On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote: forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.
HTH best, Eliot
filehandlelimits.1.cs (5K) Download Attachment |
On 14 January 2014 19:10, Eliot Miranda <[hidden email]> wrote:
> Hi Frank, > > > On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <[hidden email]> > wrote: >> >> On 13 January 2014 23:42, David T. Lewis <[hidden email]> wrote: >> > On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote: >> >> 2014/1/14 karl ramberg <[hidden email]> >> >> >> >> > TraitsFileOutTest is failing >> >> > >> >> > Correct log attached this time >> >> > >> >> > Cheers, >> >> > Karl >> >> > >> >> > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] >> >> means the file was closed... >> >> >> >> I observed increasing failure rate related to randomly closed change >> >> log >> >> the last few months (particularly when loading/merging with MC), it >> >> would >> >> be good to identify the root cause... >> >> >> > >> > That sounds like exactly the problem that Frank was describing in >> > another >> > thread. He was ending up with a corrupt changes file IIRC, and the >> > problems >> > he described seemed to be related to primSize checks on a changes file >> > that >> > had been closed (this would presumably be a process trying to append to >> > the >> > changes file when some other process had closed the file stream). >> >> I'm relieved it's not just me, but sad that it's not just me >> experiencing this. So my understanding is that changes are stored by >> accessing SourceFiles at: 2. If you want a readonly copy of sources or >> changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first >> step is verifying that nothing referencing these _directly_ does >> things like closing files (I'd be tempted to look at CROSF first.) > > > forgive me for not replying earlier but I needed help to track this down. > We suffered a very similar symptom a while ago at Cadence. It turns out > that in our case the issue was the system running out of file handles > because it left lots of fils open. The cause was with Monticello package > loading and the fix was to use CurrentReadOnlySourceFiles more. I'm > attaching a changeset in the hope that this is the same issue and that our > changeset can point you towards a fix. Thanks, Eliot. From what I can make out, the MCPackage >> #snapshot in this filein differs from trunk's only in the "reject: [:ea | [ea methodClass language isNewspeakLanguage3]ifError:[false]" guard. So the significant part of the changeset is PackageInfo >> #changeRecordForOverriddenMethod:, which I've submitted to the Inbox as an mcz so we can all have a look. Thanks! frank > HTH > -- > best, > Eliot > > > |
2014/1/15 Frank Shearar <[hidden email]>
Yeah, that's the problems that I most often encounter with changes files, and for a long time: 'no such file or directory' when opening the changes failed I agree, most probably a simple problem of limited resources (like too many readOnlyCopy files were opened, but not yet closed/reclaimed). Eliot, isn't it your fault? The faster the VM, the sooner we hit the resource limits But I also managed to get a corrupt change log recently, could not trace it... I suspect that closing the original changes is a bit different than starving of new readOnlyCopy, but who knows...
|
In reply to this post by Eliot Miranda-2
I don't think the issue is file handles. I tested a new trunk image and got the prim failed. I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.
It's just the traits tests that show the error Cheers, Karl On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda <[hidden email]> wrote:
|
I think I found a clue. I do not get a prim failed error when I revert the method Environment>>forgetClass:logged: to earlier version than 12/31/2013.Cheers,
Karl On Wed, Jan 15, 2014 at 9:18 PM, karl ramberg <[hidden email]> wrote:
|
I found the source of the MultiByteFileStream(Object)>>primitiveFailed: problem on my system. A antivirus program was checking the file while it was used by Squeak. I googled a little and found this clue:
All Traits test pass now and much quicker that before :-)
Cheers, Karl On Wed, Jan 15, 2014 at 10:24 PM, karl ramberg <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |