CI status

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: CI status

Tobias Pape
 

> On 28.10.2020, at 17:23, Eliot Miranda <[hidden email]> wrote:
>
>
> Hi Tobi, Hi Levente,
>
>> On Oct 28, 2020, at 7:06 AM, Tobias Pape <[hidden email]> wrote:
>>
>> 
>> Hi
>>
>>> Hi Tim,
>>> what is the result on that box with threaded heartbeat VM of:
>>> [SUnitToolBuilderTests new testHandlingNotification] timeToRun.
>>> If it is more than 5000 ms, then you would confirm the problem that I encounter.
>>> This testHandlingNotification is repeating 10 loops with a 200ms delay wait.
>>> The problem I've got with threaded heartbeat VM is that first 5 delays run normally, but next 5 will last for 1 second instead of 200ms...
>>> This is probably what happens on the CI server too (the test times out and CI fails).
>>
>> Also note that the CI test builds both vms but uses the last one built (because it overwrites the first), and that happens to be the itimer one, not th ethreaded.
>>
>> The code above runs in around ~2000 ms on my machine (~2015, with ubuntu 18.04)
>>
>> Things that happened:
>> - I just ran the test suite in the DEBUG itimer headful and headless variant and it passes.
>> - I just ran the test suite in the DEBUG threaded headful and headless variant and it passes.
>> - I ran the RELEASE itimer headful and headless variant and it passes
>> - I ran the RELEASE threaded headless variant and it FAILED as on the CI
>> - I ran the RELEASE threaded headful variant and it FAILED LESS
>>    I mean: testHandlingNotification passed, and so did  testValueWithinTimingBasic and testValueWithinTimingNestedInner
>>    but testValueWithinTimingNestedOuter testValueWithinTimingRepeat still fail!
>>
>> So there are discrepancies between
>> debug and release    and
>> headful and headless    (at least for threaded release)
>>
>> TL;DR: The linux x86_32 cog v3 threaded release vm has a timing problem ...
>>
>> Does that help anyone?
>
> If you add code to extract the number of ioProcessEvents calls etc (see About Squeak VM parameters tab for the relevant info) does that tell us more?  IIRC one available vm parameter is the number of heartbeats.  So we should be able to see if it is the heartbeat itself that is failing or if it is further up stream.

ping me tomorrow, my linux machine is already switched off…
-t

>
>>
>> Best regards
>> -Tobias
>>
>> BTW: Eliot, the VM spits out a "aioDisable: epoll_ctl: Bad file descriptor". Is that expected?
>
> It’s not expected, but may be harmless.  Levente, is this noise?  Or is it that the input file descriptor is being shut down?  Presumably this is just noise U.K. do with running headless.


Reply | Threaded
Open this post in threaded view
|

Re: CI status

Levente Uzonyi
In reply to this post by Tobias Pape
 
Hi Tobias,

On Wed, 28 Oct 2020, Tobias Pape wrote:

> BTW: Eliot, the VM spits out a "aioDisable: epoll_ctl: Bad file descriptor". Is that expected?

Yes, it is expected. See "Known issue" at
https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/528 .


Levente
Reply | Threaded
Open this post in threaded view
|

Re: CI status

Tobias Pape
 

> On 28.10.2020, at 21:40, Levente Uzonyi <[hidden email]> wrote:
>
> Hi Tobias,
>
> On Wed, 28 Oct 2020, Tobias Pape wrote:
>
>> BTW: Eliot, the VM spits out a "aioDisable: epoll_ctl: Bad file descriptor". Is that expected?
>
> Yes, it is expected. See "Known issue" at
> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/528 .
>

My bad, I didn't read properly :D
-t
Reply | Threaded
Open this post in threaded view
|

Re: CI status

Nicolas Cellier
In reply to this post by Eliot Miranda-2
 


Le mer. 28 oct. 2020 à 17:23, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tobi, Hi Levente,

> On Oct 28, 2020, at 7:06 AM, Tobias Pape <[hidden email]> wrote:
>
> 
> Hi
>
>>>>> On 27.10.2020, at 23:39, Nicolas Cellier <[hidden email]> wrote:
>> Hi Tim,
>> what is the result on that box with threaded heartbeat VM of:
>> [SUnitToolBuilderTests new testHandlingNotification] timeToRun.
>> If it is more than 5000 ms, then you would confirm the problem that I encounter.
>> This testHandlingNotification is repeating 10 loops with a 200ms delay wait.
>> The problem I've got with threaded heartbeat VM is that first 5 delays run normally, but next 5 will last for 1 second instead of 200ms...
>> This is probably what happens on the CI server too (the test times out and CI fails).
>
> Also note that the CI test builds both vms but uses the last one built (because it overwrites the first), and that happens to be the itimer one, not th ethreaded.
>
> The code above runs in around ~2000 ms on my machine (~2015, with ubuntu 18.04)
>
> Things that happened:
> - I just ran the test suite in the DEBUG itimer headful and headless variant and it passes.
> - I just ran the test suite in the DEBUG threaded headful and headless variant and it passes.
> - I ran the RELEASE itimer headful and headless variant and it passes
> - I ran the RELEASE threaded headless variant and it FAILED as on the CI
> - I ran the RELEASE threaded headful variant and it FAILED LESS
>     I mean: testHandlingNotification passed, and so did  testValueWithinTimingBasic and testValueWithinTimingNestedInner
>     but testValueWithinTimingNestedOuter testValueWithinTimingRepeat still fail!
>
> So there are discrepancies between
> debug and release    and
> headful and headless    (at least for threaded release)
>
> TL;DR: The linux x86_32 cog v3 threaded release vm has a timing problem ...
>
> Does that help anyone?

If you add code to extract the number of ioProcessEvents calls etc (see About Squeak VM parameters tab for the relevant info) does that tell us more?  IIRC one available vm parameter is the number of heartbeats.  So we should be able to see if it is the heartbeat itself that is failing or if it is further up stream.


Hi Eliot,
I instrumented the code like that:

testHandlingNotification
    | receivedSignal resumed |
    receivedSignal := resumed := false.
    [ | count |
    "client-code puts up progress, and signals some notications"
    Transcript cr; nextPutAll: 'test begins:'; print: Time utcMicrosecondClock \\ 3600000000.
    count := 0.
    'doing something'
        displayProgressFrom: 0
        to: 10
        during:
            [ : bar | 10 timesRepeat:
                [ bar value: (count := count + 1).
                Transcript cr; nextPutAll: 'count:'; print: count.
                Transcript cr; nextPutAll: 'before wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                (Delay forMilliseconds: 200) wait.
                Transcript cr; nextPutAll: 'after wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                Notification signal: 'message'.
                Transcript cr; nextPutAll: 'after notified:'; print: Time utcMicrosecondClock \\ 3600000000.
                Transcript cr; nextPutAll: '# ioProcessEvents: '; print: (Smalltalk vmParameterAt: 57).
                Transcript cr; nextPutAll: '# forceInterruptCheck: '; print: (Smalltalk vmParameterAt: 58).
                Transcript cr; nextPutAll: '# check event calls: '; print: (Smalltalk vmParameterAt: 59).
                resumed := true ] ] ]
        on: Notification
        do:
            [ : noti | receivedSignal := true.
            Transcript cr; nextPutAll: 'notified at:'; print: Time utcMicrosecondClock \\ 3600000000.
            noti resume ].
    Transcript endEntry.
    self
         assert: receivedSignal ;
         assert: resumed

The result I obtain with an updated trunk image and 64bits threaded VM:
build.linux64x64/squeak.cog.spur/build$ ./squeak ../../../image/trunk6-64.image

test begins:80631205
notified at:80631251
count:1
before wait:80633324
after wait:80834614
after notified:80834670
# ioProcessEvents: 94
# forceInterruptCheck: 3368
# check event calls: 815
count:2
before wait:80835008
after wait:81036398
after notified:81036454
# ioProcessEvents: 103
# forceInterruptCheck: 3459
# check event calls: 906
count:3
before wait:81036755
after wait:81238355
after notified:81238374
# ioProcessEvents: 113
# forceInterruptCheck: 3549
# check event calls: 996
count:4
before wait:81238633
after wait:81440889
after notified:81440909
# ioProcessEvents: 123
# forceInterruptCheck: 3639
# check event calls: 1086
count:5
before wait:81441151
after wait:82609021
after notified:82609053
# ioProcessEvents: 132
# forceInterruptCheck: 4119
# check event calls: 1161
count:6
before wait:82609520
after wait:83611465
after notified:83611497
# ioProcessEvents: 133
# forceInterruptCheck: 4536
# check event calls: 1162
count:7
before wait:83611974
after wait:84614429
after notified:84614462
# ioProcessEvents: 134
# forceInterruptCheck: 4961
# check event calls: 1165
count:8
before wait:84614926
after wait:85617377
after notified:85617404
# ioProcessEvents: 135
# forceInterruptCheck: 5368
# check event calls: 1166
count:9
before wait:85617840
after wait:86618396
after notified:86618428
# ioProcessEvents: 136
# forceInterruptCheck: 5776
# check event calls: 1167
count:10
before wait:86618896
after wait:87620232
after notified:87620291
# ioProcessEvents: 137
# forceInterruptCheck: 6196
# check event calls: 1168

Notice 10 ioProcessEvents and about 200ms wait on loop 1-5.
Then a single ioProcessEvent and about 1000ms wait on loops 6-10 (single check event call too).
Reply | Threaded
Open this post in threaded view
|

Re: CI status

Nicolas Cellier
 


Le mer. 28 oct. 2020 à 22:11, Nicolas Cellier <[hidden email]> a écrit :


Le mer. 28 oct. 2020 à 17:23, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tobi, Hi Levente,

> On Oct 28, 2020, at 7:06 AM, Tobias Pape <[hidden email]> wrote:
>
> 
> Hi
>
>>>>> On 27.10.2020, at 23:39, Nicolas Cellier <[hidden email]> wrote:
>> Hi Tim,
>> what is the result on that box with threaded heartbeat VM of:
>> [SUnitToolBuilderTests new testHandlingNotification] timeToRun.
>> If it is more than 5000 ms, then you would confirm the problem that I encounter.
>> This testHandlingNotification is repeating 10 loops with a 200ms delay wait.
>> The problem I've got with threaded heartbeat VM is that first 5 delays run normally, but next 5 will last for 1 second instead of 200ms...
>> This is probably what happens on the CI server too (the test times out and CI fails).
>
> Also note that the CI test builds both vms but uses the last one built (because it overwrites the first), and that happens to be the itimer one, not th ethreaded.
>
> The code above runs in around ~2000 ms on my machine (~2015, with ubuntu 18.04)
>
> Things that happened:
> - I just ran the test suite in the DEBUG itimer headful and headless variant and it passes.
> - I just ran the test suite in the DEBUG threaded headful and headless variant and it passes.
> - I ran the RELEASE itimer headful and headless variant and it passes
> - I ran the RELEASE threaded headless variant and it FAILED as on the CI
> - I ran the RELEASE threaded headful variant and it FAILED LESS
>     I mean: testHandlingNotification passed, and so did  testValueWithinTimingBasic and testValueWithinTimingNestedInner
>     but testValueWithinTimingNestedOuter testValueWithinTimingRepeat still fail!
>
> So there are discrepancies between
> debug and release    and
> headful and headless    (at least for threaded release)
>
> TL;DR: The linux x86_32 cog v3 threaded release vm has a timing problem ...
>
> Does that help anyone?

If you add code to extract the number of ioProcessEvents calls etc (see About Squeak VM parameters tab for the relevant info) does that tell us more?  IIRC one available vm parameter is the number of heartbeats.  So we should be able to see if it is the heartbeat itself that is failing or if it is further up stream.


Hi Eliot,
I instrumented the code like that:

testHandlingNotification
    | receivedSignal resumed |
    receivedSignal := resumed := false.
    [ | count |
    "client-code puts up progress, and signals some notications"
    Transcript cr; nextPutAll: 'test begins:'; print: Time utcMicrosecondClock \\ 3600000000.
    count := 0.
    'doing something'
        displayProgressFrom: 0
        to: 10
        during:
            [ : bar | 10 timesRepeat:
                [ bar value: (count := count + 1).
                Transcript cr; nextPutAll: 'count:'; print: count.
                Transcript cr; nextPutAll: 'before wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                (Delay forMilliseconds: 200) wait.
                Transcript cr; nextPutAll: 'after wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                Notification signal: 'message'.
                Transcript cr; nextPutAll: 'after notified:'; print: Time utcMicrosecondClock \\ 3600000000.
                Transcript cr; nextPutAll: '# ioProcessEvents: '; print: (Smalltalk vmParameterAt: 57).
                Transcript cr; nextPutAll: '# forceInterruptCheck: '; print: (Smalltalk vmParameterAt: 58).
                Transcript cr; nextPutAll: '# check event calls: '; print: (Smalltalk vmParameterAt: 59).
                resumed := true ] ] ]
        on: Notification
        do:
            [ : noti | receivedSignal := true.
            Transcript cr; nextPutAll: 'notified at:'; print: Time utcMicrosecondClock \\ 3600000000.
            noti resume ].
    Transcript endEntry.
    self
         assert: receivedSignal ;
         assert: resumed

The result I obtain with an updated trunk image and 64bits threaded VM:
build.linux64x64/squeak.cog.spur/build$ ./squeak ../../../image/trunk6-64.image

test begins:80631205
notified at:80631251
count:1
before wait:80633324
after wait:80834614
after notified:80834670
# ioProcessEvents: 94
# forceInterruptCheck: 3368
# check event calls: 815
count:2
before wait:80835008
after wait:81036398
after notified:81036454
# ioProcessEvents: 103
# forceInterruptCheck: 3459
# check event calls: 906
count:3
before wait:81036755
after wait:81238355
after notified:81238374
# ioProcessEvents: 113
# forceInterruptCheck: 3549
# check event calls: 996
count:4
before wait:81238633
after wait:81440889
after notified:81440909
# ioProcessEvents: 123
# forceInterruptCheck: 3639
# check event calls: 1086
count:5
before wait:81441151
after wait:82609021
after notified:82609053
# ioProcessEvents: 132
# forceInterruptCheck: 4119
# check event calls: 1161
count:6
before wait:82609520
after wait:83611465
after notified:83611497
# ioProcessEvents: 133
# forceInterruptCheck: 4536
# check event calls: 1162
count:7
before wait:83611974
after wait:84614429
after notified:84614462
# ioProcessEvents: 134
# forceInterruptCheck: 4961
# check event calls: 1165
count:8
before wait:84614926
after wait:85617377
after notified:85617404
# ioProcessEvents: 135
# forceInterruptCheck: 5368
# check event calls: 1166
count:9
before wait:85617840
after wait:86618396
after notified:86618428
# ioProcessEvents: 136
# forceInterruptCheck: 5776
# check event calls: 1167
count:10
before wait:86618896
after wait:87620232
after notified:87620291
# ioProcessEvents: 137
# forceInterruptCheck: 6196
# check event calls: 1168

Notice 10 ioProcessEvents and about 200ms wait on loop 1-5.
Then a single ioProcessEvent and about 1000ms wait on loops 6-10 (single check event call too).

and on itimer heatbeat, 200ms per loop, 10 ioProcessEvents, 100 forceInterruptCheck and 100 check event calls
This is consistent with the 2ms heartbeat (vmParameter #26).
linux64x64/squeak.cog.spur/build.itimerheartbeat$ ./squeak ../../../image/trunk6-64.image

test begins:1013906366
notified at:1013906407
count:1
before wait:1013910362
after wait:1014112028
after notified:1014112085
# ioProcessEvents: 208
# forceInterruptCheck: 2299
# check event calls: 2236
count:2
before wait:1014112443
after wait:1014314011
after notified:1014314062
# ioProcessEvents: 218
# forceInterruptCheck: 2400
# check event calls: 2337
count:3
before wait:1014314405
after wait:1014516017
after notified:1014516052
# ioProcessEvents: 228
# forceInterruptCheck: 2501
# check event calls: 2438
count:4
before wait:1014516326
after wait:1014718011
after notified:1014718039
# ioProcessEvents: 237
# forceInterruptCheck: 2602
# check event calls: 2539
count:5
before wait:1014718296
after wait:1014920490
after notified:1014920521
# ioProcessEvents: 247
# forceInterruptCheck: 2703
# check event calls: 2640
count:6
before wait:1014920967
after wait:1015122184
after notified:1015122215
# ioProcessEvents: 256
# forceInterruptCheck: 2804
# check event calls: 2741
count:7
before wait:1015122608
after wait:1015324166
after notified:1015324198
# ioProcessEvents: 266
# forceInterruptCheck: 2905
# check event calls: 2841
count:8
before wait:1015324595
after wait:1015526278
after notified:1015526299
# ioProcessEvents: 275
# forceInterruptCheck: 3006
# check event calls: 2942
count:9
before wait:1015526556
after wait:1015728488
after notified:1015728518
# ioProcessEvents: 285
# forceInterruptCheck: 3107
# check event calls: 3042
count:10
before wait:1015728947
after wait:1015930499
after notified:1015930530
# ioProcessEvents: 295
# forceInterruptCheck: 3208
# check event calls: 3143
Reply | Threaded
Open this post in threaded view
|

Re: CI status

timrowledge
In reply to this post by Nicolas Cellier
 


> On 2020-10-27, at 3:39 PM, Nicolas Cellier <[hidden email]> wrote:
>
>  [SUnitToolBuilderTests new testHandlingNotification] timeToRun.


Hmm. I consistently get 2020mS, despite the previously mentioned problem with the stupid RTPRIO stuff.

2030 on a Pi 4, with no rtprio problems.

Do any of the VM Parameteres in the 'About Squeak' window tell us anything useful here? the x64 box says
#26  2 heartbeat peiod
for example.
Again, Pi
#26 2 heartbeat period
...so no difference there.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Foolproof operation:  All parameters are hard coded.


Reply | Threaded
Open this post in threaded view
|

Re: CI status

Eliot Miranda-2
In reply to this post by Nicolas Cellier
 
Hi Nicolas,

On Wed, Oct 28, 2020 at 2:21 PM Nicolas Cellier <[hidden email]> wrote:
 


Le mer. 28 oct. 2020 à 22:11, Nicolas Cellier <[hidden email]> a écrit :


Le mer. 28 oct. 2020 à 17:23, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tobi, Hi Levente,

> On Oct 28, 2020, at 7:06 AM, Tobias Pape <[hidden email]> wrote:
>
> 
> Hi
>
>>>>> On 27.10.2020, at 23:39, Nicolas Cellier <[hidden email]> wrote:
>> Hi Tim,
>> what is the result on that box with threaded heartbeat VM of:
>> [SUnitToolBuilderTests new testHandlingNotification] timeToRun.
>> If it is more than 5000 ms, then you would confirm the problem that I encounter.
>> This testHandlingNotification is repeating 10 loops with a 200ms delay wait.
>> The problem I've got with threaded heartbeat VM is that first 5 delays run normally, but next 5 will last for 1 second instead of 200ms...
>> This is probably what happens on the CI server too (the test times out and CI fails).
>
> Also note that the CI test builds both vms but uses the last one built (because it overwrites the first), and that happens to be the itimer one, not th ethreaded.
>
> The code above runs in around ~2000 ms on my machine (~2015, with ubuntu 18.04)
>
> Things that happened:
> - I just ran the test suite in the DEBUG itimer headful and headless variant and it passes.
> - I just ran the test suite in the DEBUG threaded headful and headless variant and it passes.
> - I ran the RELEASE itimer headful and headless variant and it passes
> - I ran the RELEASE threaded headless variant and it FAILED as on the CI
> - I ran the RELEASE threaded headful variant and it FAILED LESS
>     I mean: testHandlingNotification passed, and so did  testValueWithinTimingBasic and testValueWithinTimingNestedInner
>     but testValueWithinTimingNestedOuter testValueWithinTimingRepeat still fail!
>
> So there are discrepancies between
> debug and release    and
> headful and headless    (at least for threaded release)
>
> TL;DR: The linux x86_32 cog v3 threaded release vm has a timing problem ...
>
> Does that help anyone?

If you add code to extract the number of ioProcessEvents calls etc (see About Squeak VM parameters tab for the relevant info) does that tell us more?  IIRC one available vm parameter is the number of heartbeats.  So we should be able to see if it is the heartbeat itself that is failing or if it is further up stream.


Hi Eliot,
I instrumented the code like that:

testHandlingNotification
    | receivedSignal resumed |
    receivedSignal := resumed := false.
    [ | count |
    "client-code puts up progress, and signals some notications"
    Transcript cr; nextPutAll: 'test begins:'; print: Time utcMicrosecondClock \\ 3600000000.
    count := 0.
    'doing something'
        displayProgressFrom: 0
        to: 10
        during:
            [ : bar | 10 timesRepeat:
                [ bar value: (count := count + 1).
                Transcript cr; nextPutAll: 'count:'; print: count.
                Transcript cr; nextPutAll: 'before wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                (Delay forMilliseconds: 200) wait.
                Transcript cr; nextPutAll: 'after wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                Notification signal: 'message'.
                Transcript cr; nextPutAll: 'after notified:'; print: Time utcMicrosecondClock \\ 3600000000.
                Transcript cr; nextPutAll: '# ioProcessEvents: '; print: (Smalltalk vmParameterAt: 57).
                Transcript cr; nextPutAll: '# forceInterruptCheck: '; print: (Smalltalk vmParameterAt: 58).
                Transcript cr; nextPutAll: '# check event calls: '; print: (Smalltalk vmParameterAt: 59).
                resumed := true ] ] ]
        on: Notification
        do:
            [ : noti | receivedSignal := true.
            Transcript cr; nextPutAll: 'notified at:'; print: Time utcMicrosecondClock \\ 3600000000.
            noti resume ].
    Transcript endEntry.
    self
         assert: receivedSignal ;
         assert: resumed

The result I obtain with an updated trunk image and 64bits threaded VM:
build.linux64x64/squeak.cog.spur/build$ ./squeak ../../../image/trunk6-64.image

test begins:80631205
notified at:80631251
count:1
before wait:80633324
after wait:80834614
after notified:80834670
# ioProcessEvents: 94
# forceInterruptCheck: 3368
# check event calls: 815
count:2
before wait:80835008
after wait:81036398
after notified:81036454
# ioProcessEvents: 103
# forceInterruptCheck: 3459
# check event calls: 906
count:3
before wait:81036755
after wait:81238355
after notified:81238374
# ioProcessEvents: 113
# forceInterruptCheck: 3549
# check event calls: 996
count:4
before wait:81238633
after wait:81440889
after notified:81440909
# ioProcessEvents: 123
# forceInterruptCheck: 3639
# check event calls: 1086
count:5
before wait:81441151
after wait:82609021
after notified:82609053
# ioProcessEvents: 132
# forceInterruptCheck: 4119
# check event calls: 1161
count:6
before wait:82609520
after wait:83611465
after notified:83611497
# ioProcessEvents: 133
# forceInterruptCheck: 4536
# check event calls: 1162
count:7
before wait:83611974
after wait:84614429
after notified:84614462
# ioProcessEvents: 134
# forceInterruptCheck: 4961
# check event calls: 1165
count:8
before wait:84614926
after wait:85617377
after notified:85617404
# ioProcessEvents: 135
# forceInterruptCheck: 5368
# check event calls: 1166
count:9
before wait:85617840
after wait:86618396
after notified:86618428
# ioProcessEvents: 136
# forceInterruptCheck: 5776
# check event calls: 1167
count:10
before wait:86618896
after wait:87620232
after notified:87620291
# ioProcessEvents: 137
# forceInterruptCheck: 6196
# check event calls: 1168

Notice 10 ioProcessEvents and about 200ms wait on loop 1-5.
Then a single ioProcessEvent and about 1000ms wait on loops 6-10 (single check event call too).

and on itimer heatbeat, 200ms per loop, 10 ioProcessEvents, 100 forceInterruptCheck and 100 check event calls
This is consistent with the 2ms heartbeat (vmParameter #26).
linux64x64/squeak.cog.spur/build.itimerheartbeat$ ./squeak ../../../image/trunk6-64.image

test begins:1013906366
notified at:1013906407
count:1
before wait:1013910362
after wait:1014112028
after notified:1014112085
# ioProcessEvents: 208
# forceInterruptCheck: 2299
# check event calls: 2236
count:2
before wait:1014112443
after wait:1014314011
after notified:1014314062
# ioProcessEvents: 218
# forceInterruptCheck: 2400
# check event calls: 2337
count:3
before wait:1014314405
after wait:1014516017
after notified:1014516052
# ioProcessEvents: 228
# forceInterruptCheck: 2501
# check event calls: 2438
count:4
before wait:1014516326
after wait:1014718011
after notified:1014718039
# ioProcessEvents: 237
# forceInterruptCheck: 2602
# check event calls: 2539
count:5
before wait:1014718296
after wait:1014920490
after notified:1014920521
# ioProcessEvents: 247
# forceInterruptCheck: 2703
# check event calls: 2640
count:6
before wait:1014920967
after wait:1015122184
after notified:1015122215
# ioProcessEvents: 256
# forceInterruptCheck: 2804
# check event calls: 2741
count:7
before wait:1015122608
after wait:1015324166
after notified:1015324198
# ioProcessEvents: 266
# forceInterruptCheck: 2905
# check event calls: 2841
count:8
before wait:1015324595
after wait:1015526278
after notified:1015526299
# ioProcessEvents: 275
# forceInterruptCheck: 3006
# check event calls: 2942
count:9
before wait:1015526556
after wait:1015728488
after notified:1015728518
# ioProcessEvents: 285
# forceInterruptCheck: 3107
# check event calls: 3042
count:10
before wait:1015728947
after wait:1015930499
after notified:1015930530
# ioProcessEvents: 295
# forceInterruptCheck: 3208
# check event calls: 3143

Thanks that's great.  SO the problem is nothing to do with the heartbeat.  It seems to be working perfectly OK, right?
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: CI status

Nicolas Cellier
 
Hi Eliot,

Le mer. 28 oct. 2020 à 23:56, Eliot Miranda <[hidden email]> a écrit :
 
Hi Nicolas,

On Wed, Oct 28, 2020 at 2:21 PM Nicolas Cellier <[hidden email]> wrote:
 


Le mer. 28 oct. 2020 à 22:11, Nicolas Cellier <[hidden email]> a écrit :


Le mer. 28 oct. 2020 à 17:23, Eliot Miranda <[hidden email]> a écrit :
 
Hi Tobi, Hi Levente,

> On Oct 28, 2020, at 7:06 AM, Tobias Pape <[hidden email]> wrote:
>
> 
> Hi
>
>>>>> On 27.10.2020, at 23:39, Nicolas Cellier <[hidden email]> wrote:
>> Hi Tim,
>> what is the result on that box with threaded heartbeat VM of:
>> [SUnitToolBuilderTests new testHandlingNotification] timeToRun.
>> If it is more than 5000 ms, then you would confirm the problem that I encounter.
>> This testHandlingNotification is repeating 10 loops with a 200ms delay wait.
>> The problem I've got with threaded heartbeat VM is that first 5 delays run normally, but next 5 will last for 1 second instead of 200ms...
>> This is probably what happens on the CI server too (the test times out and CI fails).
>
> Also note that the CI test builds both vms but uses the last one built (because it overwrites the first), and that happens to be the itimer one, not th ethreaded.
>
> The code above runs in around ~2000 ms on my machine (~2015, with ubuntu 18.04)
>
> Things that happened:
> - I just ran the test suite in the DEBUG itimer headful and headless variant and it passes.
> - I just ran the test suite in the DEBUG threaded headful and headless variant and it passes.
> - I ran the RELEASE itimer headful and headless variant and it passes
> - I ran the RELEASE threaded headless variant and it FAILED as on the CI
> - I ran the RELEASE threaded headful variant and it FAILED LESS
>     I mean: testHandlingNotification passed, and so did  testValueWithinTimingBasic and testValueWithinTimingNestedInner
>     but testValueWithinTimingNestedOuter testValueWithinTimingRepeat still fail!
>
> So there are discrepancies between
> debug and release    and
> headful and headless    (at least for threaded release)
>
> TL;DR: The linux x86_32 cog v3 threaded release vm has a timing problem ...
>
> Does that help anyone?

If you add code to extract the number of ioProcessEvents calls etc (see About Squeak VM parameters tab for the relevant info) does that tell us more?  IIRC one available vm parameter is the number of heartbeats.  So we should be able to see if it is the heartbeat itself that is failing or if it is further up stream.


Hi Eliot,
I instrumented the code like that:

testHandlingNotification
    | receivedSignal resumed |
    receivedSignal := resumed := false.
    [ | count |
    "client-code puts up progress, and signals some notications"
    Transcript cr; nextPutAll: 'test begins:'; print: Time utcMicrosecondClock \\ 3600000000.
    count := 0.
    'doing something'
        displayProgressFrom: 0
        to: 10
        during:
            [ : bar | 10 timesRepeat:
                [ bar value: (count := count + 1).
                Transcript cr; nextPutAll: 'count:'; print: count.
                Transcript cr; nextPutAll: 'before wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                (Delay forMilliseconds: 200) wait.
                Transcript cr; nextPutAll: 'after wait:'; print: Time utcMicrosecondClock \\ 3600000000.
                Notification signal: 'message'.
                Transcript cr; nextPutAll: 'after notified:'; print: Time utcMicrosecondClock \\ 3600000000.
                Transcript cr; nextPutAll: '# ioProcessEvents: '; print: (Smalltalk vmParameterAt: 57).
                Transcript cr; nextPutAll: '# forceInterruptCheck: '; print: (Smalltalk vmParameterAt: 58).
                Transcript cr; nextPutAll: '# check event calls: '; print: (Smalltalk vmParameterAt: 59).
                resumed := true ] ] ]
        on: Notification
        do:
            [ : noti | receivedSignal := true.
            Transcript cr; nextPutAll: 'notified at:'; print: Time utcMicrosecondClock \\ 3600000000.
            noti resume ].
    Transcript endEntry.
    self
         assert: receivedSignal ;
         assert: resumed

The result I obtain with an updated trunk image and 64bits threaded VM:
build.linux64x64/squeak.cog.spur/build$ ./squeak ../../../image/trunk6-64.image

test begins:80631205
notified at:80631251
count:1
before wait:80633324
after wait:80834614
after notified:80834670
# ioProcessEvents: 94
# forceInterruptCheck: 3368
# check event calls: 815
count:2
before wait:80835008
after wait:81036398
after notified:81036454
# ioProcessEvents: 103
# forceInterruptCheck: 3459
# check event calls: 906
count:3
before wait:81036755
after wait:81238355
after notified:81238374
# ioProcessEvents: 113
# forceInterruptCheck: 3549
# check event calls: 996
count:4
before wait:81238633
after wait:81440889
after notified:81440909
# ioProcessEvents: 123
# forceInterruptCheck: 3639
# check event calls: 1086
count:5
before wait:81441151
after wait:82609021
after notified:82609053
# ioProcessEvents: 132
# forceInterruptCheck: 4119
# check event calls: 1161
count:6
before wait:82609520
after wait:83611465
after notified:83611497
# ioProcessEvents: 133
# forceInterruptCheck: 4536
# check event calls: 1162
count:7
before wait:83611974
after wait:84614429
after notified:84614462
# ioProcessEvents: 134
# forceInterruptCheck: 4961
# check event calls: 1165
count:8
before wait:84614926
after wait:85617377
after notified:85617404
# ioProcessEvents: 135
# forceInterruptCheck: 5368
# check event calls: 1166
count:9
before wait:85617840
after wait:86618396
after notified:86618428
# ioProcessEvents: 136
# forceInterruptCheck: 5776
# check event calls: 1167
count:10
before wait:86618896
after wait:87620232
after notified:87620291
# ioProcessEvents: 137
# forceInterruptCheck: 6196
# check event calls: 1168

Notice 10 ioProcessEvents and about 200ms wait on loop 1-5.
Then a single ioProcessEvent and about 1000ms wait on loops 6-10 (single check event call too).

and on itimer heatbeat, 200ms per loop, 10 ioProcessEvents, 100 forceInterruptCheck and 100 check event calls
This is consistent with the 2ms heartbeat (vmParameter #26).
linux64x64/squeak.cog.spur/build.itimerheartbeat$ ./squeak ../../../image/trunk6-64.image

test begins:1013906366
notified at:1013906407
count:1
before wait:1013910362
after wait:1014112028
after notified:1014112085
# ioProcessEvents: 208
# forceInterruptCheck: 2299
# check event calls: 2236
count:2
before wait:1014112443
after wait:1014314011
after notified:1014314062
# ioProcessEvents: 218
# forceInterruptCheck: 2400
# check event calls: 2337
count:3
before wait:1014314405
after wait:1014516017
after notified:1014516052
# ioProcessEvents: 228
# forceInterruptCheck: 2501
# check event calls: 2438
count:4
before wait:1014516326
after wait:1014718011
after notified:1014718039
# ioProcessEvents: 237
# forceInterruptCheck: 2602
# check event calls: 2539
count:5
before wait:1014718296
after wait:1014920490
after notified:1014920521
# ioProcessEvents: 247
# forceInterruptCheck: 2703
# check event calls: 2640
count:6
before wait:1014920967
after wait:1015122184
after notified:1015122215
# ioProcessEvents: 256
# forceInterruptCheck: 2804
# check event calls: 2741
count:7
before wait:1015122608
after wait:1015324166
after notified:1015324198
# ioProcessEvents: 266
# forceInterruptCheck: 2905
# check event calls: 2841
count:8
before wait:1015324595
after wait:1015526278
after notified:1015526299
# ioProcessEvents: 275
# forceInterruptCheck: 3006
# check event calls: 2942
count:9
before wait:1015526556
after wait:1015728488
after notified:1015728518
# ioProcessEvents: 285
# forceInterruptCheck: 3107
# check event calls: 3042
count:10
before wait:1015728947
after wait:1015930499
after notified:1015930530
# ioProcessEvents: 295
# forceInterruptCheck: 3208
# check event calls: 3143

Thanks that's great.  SO the problem is nothing to do with the heartbeat.  It seems to be working perfectly OK, right?
_,,,^..^,,,_
best, Eliot
Well itimer flavour works fine, but threaded not so much. Why is there 1000ms wait per loop instead of 200ms, and a single check for event and ioProcessEvent per loop instead of 10 starting at loop #6?
Or do you mean that forceInterruptCheck count remains correct? (about 90 interrupt/200ms then 420 interrupt/1000ms - compared to more regular 100 interrupt/200ms in itimer flavour)
Reply | Threaded
Open this post in threaded view
|

Re: CI status

David T. Lewis
In reply to this post by Levente Uzonyi
 
Hi Levente,

Sorry for my delay in replying, these tips are very helpful.


On Tue, Oct 27, 2020 at 12:59:55AM +0100, Levente Uzonyi wrote:

>
> Hi David
>
> On Mon, 26 Oct 2020, David T. Lewis wrote:
>
> >I do not know if it is related to the CI issues, but here is one clue:
> >
> >I am maintaining a V3 "trunk" image that attempts to keep in sync with
> >trunk except for changes that pertain to Spur object format (immediate
> >characters, etc). As of Monticello-ul.728 I get failures in SSL connection
> >to e.g. source.squeak.org. The failures happen with both Cog and the
> >interpreter VM (although the specific error symptoms are different).
> >
> >I can bypass the errors by reverting MCHttpRepository>>httpGet:arguments:
> >to its prior version (stamped ul 9/20/2019).
>
> That version adds some rewrite rules, so some http urls are converted to
> https.
> If that doesn't work, then there is a problem with the SqueakSSL plugin in
> your VM. Does the following print true?
>
> | response |
> response := WebClient httpGet: 'https://source.squeak.org'.
> response code = 200
>

The plugin is out of date in the interpreter VM, and I get a certificate
mismatch that is almost certainly related to the plugin.

With a Cog VM, I get a primitive failure in SqueakSSL>>primitiveSSLCreate.

So in both cases it is presumably a plugin issue.

>
> Instead of reverting #httpGet:arguments: you can remove the individual
> http->https rewrite rules with the following snippet:
>
> MCHttpRepository urlRewriteRules in: [ :rules |
> rules size // 3 timesRepeat: [
> | set |
> set := rules removeFirst: 3.
> (set second beginsWith: 'https') ifFalse: [
> rules addAll: set ] ] ]

Thanks for explaining, I understand it now.

>
> But it would be better to use https for those repositories.
>

Yes for sure.

Thank you!

Dave

>
> Levente
>
> >
> >I have not had time to dig into the underlying cause of the problem,
> >but it seems likely that any CI test for Cog that requires the use
> >of secure sockets will fail as of Monticello-ul.728 or later. This
> >would include any test that needs to access the source.squeak.org
> >repository.
> >
> >Dave
Reply | Threaded
Open this post in threaded view
|

Re: CI status

David T. Lewis
 
Hi Levente,

One final follow up - the reason that my Cog VM failed on primitiveSSLCreate
was simply that I did not have 32 bit libssl libraries installed on my system.

Sorry for the noise,

Dave


On Sun, Nov 01, 2020 at 10:46:19AM -0500, David T. Lewis wrote:

>  
> Hi Levente,
>
> Sorry for my delay in replying, these tips are very helpful.
>
>
> On Tue, Oct 27, 2020 at 12:59:55AM +0100, Levente Uzonyi wrote:
> >
> > Hi David
> >
> > On Mon, 26 Oct 2020, David T. Lewis wrote:
> >
> > >I do not know if it is related to the CI issues, but here is one clue:
> > >
> > >I am maintaining a V3 "trunk" image that attempts to keep in sync with
> > >trunk except for changes that pertain to Spur object format (immediate
> > >characters, etc). As of Monticello-ul.728 I get failures in SSL connection
> > >to e.g. source.squeak.org. The failures happen with both Cog and the
> > >interpreter VM (although the specific error symptoms are different).
> > >
> > >I can bypass the errors by reverting MCHttpRepository>>httpGet:arguments:
> > >to its prior version (stamped ul 9/20/2019).
> >
> > That version adds some rewrite rules, so some http urls are converted to
> > https.
> > If that doesn't work, then there is a problem with the SqueakSSL plugin in
> > your VM. Does the following print true?
> >
> > | response |
> > response := WebClient httpGet: 'https://source.squeak.org'.
> > response code = 200
> >
>
> The plugin is out of date in the interpreter VM, and I get a certificate
> mismatch that is almost certainly related to the plugin.
>
> With a Cog VM, I get a primitive failure in SqueakSSL>>primitiveSSLCreate.
>
> So in both cases it is presumably a plugin issue.
>
> >
> > Instead of reverting #httpGet:arguments: you can remove the individual
> > http->https rewrite rules with the following snippet:
> >
> > MCHttpRepository urlRewriteRules in: [ :rules |
> > rules size // 3 timesRepeat: [
> > | set |
> > set := rules removeFirst: 3.
> > (set second beginsWith: 'https') ifFalse: [
> > rules addAll: set ] ] ]
>
> Thanks for explaining, I understand it now.
>
> >
> > But it would be better to use https for those repositories.
> >
>
> Yes for sure.
>
> Thank you!
>
> Dave
>
> >
> > Levente
> >
> > >
> > >I have not had time to dig into the underlying cause of the problem,
> > >but it seems likely that any CI test for Cog that requires the use
> > >of secure sockets will fail as of Monticello-ul.728 or later. This
> > >would include any test that needs to access the source.squeak.org
> > >repository.
> > >
> > >Dave
12