Image crashing on startup, apparently during GC

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Image crashing on startup, apparently during GC

Damien Pollet
 
Hi, I have a Pharo image that crashes the VM on startup. The crash report below seems to incriminate GC. Should I make it available somewhere online? What's most convenient?

Process:               Pharo [64892]
Path:                  /Users/USER/*/Pharo.app/Contents/MacOS/Pharo
Identifier:            org.pharo.Pharo
Version:               5.0.201708271955 (5.0.201708271955)
Code Type:             X86-64 (Native)
Parent Process:        ??? [64888]
Responsible:           Pharo [64892]
User ID:               501

Date/Time:             2018-03-19 20:27:03.906 +0100
OS Version:            Mac OS X 10.13.3 (17D102)
Report Version:        12
Anonymous UUID:        6D022236-78DD-6676-117F-EADA56D5D1BE

Sleep/Wake UUID:       AC9E4E55-3CDB-4B4F-A6B4-51ACB1177154

Time Awake Since Boot: 28000 seconds
Time Since Wake:       6500 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000012b67c0b0
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x12b67c0b0:
   VM_ALLOCATE            000000011adfc000-00000001259fc000 [172.0M] rw-/rwx SM=PRV  
-->
   STACK GUARD            0000700005613000-0000700005614000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Application Specific Information:
abort() called

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib         0x00007fff53fbde3e __pthread_kill + 10
1   libsystem_pthread.dylib        0x00007fff540fc150 pthread_kill + 333
2   libsystem_c.dylib              0x00007fff53f1a312 abort + 127
3   org.pharo.Pharo                0x0000000104ed8997 sigsegv + 190
4   libsystem_platform.dylib       0x00007fff540eff5a _sigtramp + 26
5   ???                            000000000000000000 0 + 0
6   org.pharo.Pharo                0x0000000104e73558 markObjects + 464
7   org.pharo.Pharo                0x0000000104e72d40 fullGC + 72
8   org.pharo.Pharo                0x0000000104e92dea primitiveFullGC + 45
9   org.pharo.Pharo                0x0000000104e52425 interpret + 26715
10  org.pharo.Pharo                0x0000000104e5c7f6 enterSmalltalkExecutiveImplementation + 152
11  org.pharo.Pharo                0x0000000104e4be6c interpret + 674
12  org.pharo.Pharo                0x0000000104ed9cc1 -[sqSqueakMainApplication runSqueak] + 394
13  com.apple.Foundation           0x00007fff2e6d696c __NSFirePerformWithOrder + 360
14  com.apple.CoreFoundation       0x00007fff2c579127 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
15  com.apple.CoreFoundation       0x00007fff2c57904f __CFRunLoopDoObservers + 527
16  com.apple.CoreFoundation       0x00007fff2c55b6a8 __CFRunLoopRun + 1240
17  com.apple.CoreFoundation       0x00007fff2c55af43 CFRunLoopRunSpecific + 483
18  com.apple.HIToolbox            0x00007fff2b872e26 RunCurrentEventLoopInMode + 286
19  com.apple.HIToolbox            0x00007fff2b872a9f ReceiveNextEventCommon + 366
20  com.apple.HIToolbox            0x00007fff2b872914 _BlockUntilNextEventMatchingListInModeWithFilter + 64
21  com.apple.AppKit               0x00007fff29b3df5f _DPSNextEvent + 2085
22  com.apple.AppKit               0x00007fff2a2d3b4c -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
23  com.apple.AppKit               0x00007fff29b32d6d -[NSApplication run] + 764
24  com.apple.AppKit               0x00007fff29b01f1a NSApplicationMain + 804

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet
Reply | Threaded
Open this post in threaded view
|

Re: Image crashing on startup, apparently during GC

Eliot Miranda-2
 
Hi Damien,

On Tue, Mar 20, 2018 at 3:12 AM, Damien Pollet <[hidden email]> wrote:
 
Hi, I have a Pharo image that crashes the VM on startup. The crash report below seems to incriminate GC. Should I make it available somewhere online? What's most convenient?

Don't care.  Anywhere it can be downloaded from.  Also, try running with -leakcheck 15, preferably in an assert VM and see if that gets you additional information.
 

Process:               Pharo [64892]
Path:                  /Users/USER/*/Pharo.app/Contents/MacOS/Pharo
Identifier:            org.pharo.Pharo
Version:               5.0.201708271955 (5.0.201708271955)
Code Type:             X86-64 (Native)
Parent Process:        ??? [64888]
Responsible:           Pharo [64892]
User ID:               501

Date/Time:             2018-03-19 20:27:03.906 +0100
OS Version:            Mac OS X 10.13.3 (17D102)
Report Version:        12
Anonymous UUID:        6D022236-78DD-6676-117F-EADA56D5D1BE

Sleep/Wake UUID:       AC9E4E55-3CDB-4B4F-A6B4-51ACB1177154

Time Awake Since Boot: 28000 seconds
Time Since Wake:       6500 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000012b67c0b0
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x12b67c0b0:
   VM_ALLOCATE            000000011adfc000-00000001259fc000 [172.0M] rw-/rwx SM=PRV  
-->
   STACK GUARD            0000700005613000-0000700005614000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Application Specific Information:
abort() called

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib         0x00007fff53fbde3e __pthread_kill + 10
1   libsystem_pthread.dylib        0x00007fff540fc150 pthread_kill + 333
2   libsystem_c.dylib              0x00007fff53f1a312 abort + 127
3   org.pharo.Pharo                0x0000000104ed8997 sigsegv + 190
4   libsystem_platform.dylib       0x00007fff540eff5a _sigtramp + 26
5   ???                            000000000000000000 0 + 0
6   org.pharo.Pharo                0x0000000104e73558 markObjects + 464
7   org.pharo.Pharo                0x0000000104e72d40 fullGC + 72
8   org.pharo.Pharo                0x0000000104e92dea primitiveFullGC + 45
9   org.pharo.Pharo                0x0000000104e52425 interpret + 26715
10  org.pharo.Pharo                0x0000000104e5c7f6 enterSmalltalkExecutiveImplementation + 152
11  org.pharo.Pharo                0x0000000104e4be6c interpret + 674
12  org.pharo.Pharo                0x0000000104ed9cc1 -[sqSqueakMainApplication runSqueak] + 394
13  com.apple.Foundation           0x00007fff2e6d696c __NSFirePerformWithOrder + 360
14  com.apple.CoreFoundation       0x00007fff2c579127 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
15  com.apple.CoreFoundation       0x00007fff2c57904f __CFRunLoopDoObservers + 527
16  com.apple.CoreFoundation       0x00007fff2c55b6a8 __CFRunLoopRun + 1240
17  com.apple.CoreFoundation       0x00007fff2c55af43 CFRunLoopRunSpecific + 483
18  com.apple.HIToolbox            0x00007fff2b872e26 RunCurrentEventLoopInMode + 286
19  com.apple.HIToolbox            0x00007fff2b872a9f ReceiveNextEventCommon + 366
20  com.apple.HIToolbox            0x00007fff2b872914 _BlockUntilNextEventMatchingListInModeWithFilter + 64
21  com.apple.AppKit               0x00007fff29b3df5f _DPSNextEvent + 2085
22  com.apple.AppKit               0x00007fff2a2d3b4c -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
23  com.apple.AppKit               0x00007fff29b32d6d -[NSApplication run] + 764
24  com.apple.AppKit               0x00007fff29b01f1a NSApplicationMain + 804

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Image crashing on startup, apparently during GC

Damien Pollet
 
Here are the files (image and various outputs). Running with --leakcheck does mention a few object leaks (see output.txt); I'm not sure where to get or how to build an assert VM.
I also realized the VM I had was from this summer (the one that comes with a 70 image with zeroconf). The output files I include were produced by the VM at get.pharo.org/64/vmLatest70


On 20 March 2018 at 18:17, Eliot Miranda <[hidden email]> wrote:
 
Hi Damien,

On Tue, Mar 20, 2018 at 3:12 AM, Damien Pollet <[hidden email]> wrote:
 
Hi, I have a Pharo image that crashes the VM on startup. The crash report below seems to incriminate GC. Should I make it available somewhere online? What's most convenient?

Don't care.  Anywhere it can be downloaded from.  Also, try running with -leakcheck 15, preferably in an assert VM and see if that gets you additional information.
 

Process:               Pharo [64892]
Path:                  /Users/USER/*/Pharo.app/Contents/MacOS/Pharo
Identifier:            org.pharo.Pharo
Version:               5.0.201708271955 (5.0.201708271955)
Code Type:             X86-64 (Native)
Parent Process:        ??? [64888]
Responsible:           Pharo [64892]
User ID:               501

Date/Time:             2018-03-19 20:27:03.906 +0100
OS Version:            Mac OS X 10.13.3 (17D102)
Report Version:        12
Anonymous UUID:        6D022236-78DD-6676-117F-EADA56D5D1BE

Sleep/Wake UUID:       AC9E4E55-3CDB-4B4F-A6B4-51ACB1177154

Time Awake Since Boot: 28000 seconds
Time Since Wake:       6500 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000012b67c0b0
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x12b67c0b0:
   VM_ALLOCATE            000000011adfc000-00000001259fc000 [172.0M] rw-/rwx SM=PRV  
-->
   STACK GUARD            0000700005613000-0000700005614000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Application Specific Information:
abort() called

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib         0x00007fff53fbde3e __pthread_kill + 10
1   libsystem_pthread.dylib        0x00007fff540fc150 pthread_kill + 333
2   libsystem_c.dylib              0x00007fff53f1a312 abort + 127
3   org.pharo.Pharo                0x0000000104ed8997 sigsegv + 190
4   libsystem_platform.dylib       0x00007fff540eff5a _sigtramp + 26
5   ???                            000000000000000000 0 + 0
6   org.pharo.Pharo                0x0000000104e73558 markObjects + 464
7   org.pharo.Pharo                0x0000000104e72d40 fullGC + 72
8   org.pharo.Pharo                0x0000000104e92dea primitiveFullGC + 45
9   org.pharo.Pharo                0x0000000104e52425 interpret + 26715
10  org.pharo.Pharo                0x0000000104e5c7f6 enterSmalltalkExecutiveImplementation + 152
11  org.pharo.Pharo                0x0000000104e4be6c interpret + 674
12  org.pharo.Pharo                0x0000000104ed9cc1 -[sqSqueakMainApplication runSqueak] + 394
13  com.apple.Foundation           0x00007fff2e6d696c __NSFirePerformWithOrder + 360
14  com.apple.CoreFoundation       0x00007fff2c579127 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
15  com.apple.CoreFoundation       0x00007fff2c57904f __CFRunLoopDoObservers + 527
16  com.apple.CoreFoundation       0x00007fff2c55b6a8 __CFRunLoopRun + 1240
17  com.apple.CoreFoundation       0x00007fff2c55af43 CFRunLoopRunSpecific + 483
18  com.apple.HIToolbox            0x00007fff2b872e26 RunCurrentEventLoopInMode + 286
19  com.apple.HIToolbox            0x00007fff2b872a9f ReceiveNextEventCommon + 366
20  com.apple.HIToolbox            0x00007fff2b872914 _BlockUntilNextEventMatchingListInModeWithFilter + 64
21  com.apple.AppKit               0x00007fff29b3df5f _DPSNextEvent + 2085
22  com.apple.AppKit               0x00007fff2a2d3b4c -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
23  com.apple.AppKit               0x00007fff29b32d6d -[NSApplication run] + 764
24  com.apple.AppKit               0x00007fff29b01f1a NSApplicationMain + 804

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet




--
_,,,^..^,,,_
best, Eliot




--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet
Reply | Threaded
Open this post in threaded view
|

Re: Image crashing on startup, apparently during GC

Eliot Miranda-2
 
Hi Damien,

On Fri, Mar 23, 2018 at 12:38 PM, Damien Pollet <[hidden email]> wrote:
 
Here are the files (image and various outputs). Running with --leakcheck does mention a few object leaks (see output.txt);

Indeed the image is corrupt at start-up.  See below.
 
I'm not sure where to get or how to build an assert VM.

When you build a phar. VM under build.macos64x64/pharo.cog.spur using the mvm script (mvm -A) you produce an assert VM in PharoAssert.app.
 
I also realized the VM I had was from this summer (the one that comes with a 70 image with zeroconf). The output files I include were produced by the VM at get.pharo.org/64/vmLatest70


On 20 March 2018 at 18:17, Eliot Miranda <[hidden email]> wrote:
 
Hi Damien,

On Tue, Mar 20, 2018 at 3:12 AM, Damien Pollet <[hidden email]> wrote:
 
Hi, I have a Pharo image that crashes the VM on startup. The crash report below seems to incriminate GC. Should I make it available somewhere online? What's most convenient?

Don't care.  Anywhere it can be downloaded from.  Also, try running with -leakcheck 15, preferably in an assert VM and see if that gets you additional information.
 

Process:               Pharo [64892]
Path:                  /Users/USER/*/Pharo.app/Contents/MacOS/Pharo
Identifier:            org.pharo.Pharo
Version:               5.0.201708271955 (5.0.201708271955)

Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:

Spur:
Fix a bad bug in SpurPlnningCompactor.  unmarkObjectsFromFirstFreeObject, used when the compactor requires more than one pass due to insufficient savedFirstFieldsSpace, expects the corpse of a moved object to be unmarked, but copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.  Unmarking the corpse before the copy unmarks both.  This fixes a crash with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring: creates lots of files, enough to push the system into the multi-pass regime.


Pharo urgently needs to upgrade the VM to one more up to date than 2017 08 27 (in fact more up-to-date than opensmalltalk/vm commit 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018 -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image corruption in large images, and can occur (as it has here) at start-up, causing one's work to be irretrievably lost.
 
Code Type:             X86-64 (Native)
Parent Process:        ??? [64888]
Responsible:           Pharo [64892]
User ID:               501

Date/Time:             2018-03-19 20:27:03.906 +0100
OS Version:            Mac OS X 10.13.3 (17D102)
Report Version:        12
Anonymous UUID:        6D022236-78DD-6676-117F-EADA56D5D1BE

Sleep/Wake UUID:       AC9E4E55-3CDB-4B4F-A6B4-51ACB1177154

Time Awake Since Boot: 28000 seconds
Time Since Wake:       6500 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000012b67c0b0
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x12b67c0b0:
   VM_ALLOCATE            000000011adfc000-00000001259fc000 [172.0M] rw-/rwx SM=PRV  
-->
   STACK GUARD            0000700005613000-0000700005614000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Application Specific Information:
abort() called

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib         0x00007fff53fbde3e __pthread_kill + 10
1   libsystem_pthread.dylib        0x00007fff540fc150 pthread_kill + 333
2   libsystem_c.dylib              0x00007fff53f1a312 abort + 127
3   org.pharo.Pharo                0x0000000104ed8997 sigsegv + 190
4   libsystem_platform.dylib       0x00007fff540eff5a _sigtramp + 26
5   ???                            000000000000000000 0 + 0
6   org.pharo.Pharo                0x0000000104e73558 markObjects + 464
7   org.pharo.Pharo                0x0000000104e72d40 fullGC + 72
8   org.pharo.Pharo                0x0000000104e92dea primitiveFullGC + 45
9   org.pharo.Pharo                0x0000000104e52425 interpret + 26715
10  org.pharo.Pharo                0x0000000104e5c7f6 enterSmalltalkExecutiveImplementation + 152
11  org.pharo.Pharo                0x0000000104e4be6c interpret + 674
12  org.pharo.Pharo                0x0000000104ed9cc1 -[sqSqueakMainApplication runSqueak] + 394
13  com.apple.Foundation           0x00007fff2e6d696c __NSFirePerformWithOrder + 360
14  com.apple.CoreFoundation       0x00007fff2c579127 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
15  com.apple.CoreFoundation       0x00007fff2c57904f __CFRunLoopDoObservers + 527
16  com.apple.CoreFoundation       0x00007fff2c55b6a8 __CFRunLoopRun + 1240
17  com.apple.CoreFoundation       0x00007fff2c55af43 CFRunLoopRunSpecific + 483
18  com.apple.HIToolbox            0x00007fff2b872e26 RunCurrentEventLoopInMode + 286
19  com.apple.HIToolbox            0x00007fff2b872a9f ReceiveNextEventCommon + 366
20  com.apple.HIToolbox            0x00007fff2b872914 _BlockUntilNextEventMatchingListInModeWithFilter + 64
21  com.apple.AppKit               0x00007fff29b3df5f _DPSNextEvent + 2085
22  com.apple.AppKit               0x00007fff2a2d3b4c -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
23  com.apple.AppKit               0x00007fff29b32d6d -[NSApplication run] + 764
24  com.apple.AppKit               0x00007fff29b01f1a NSApplicationMain + 804

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet




--
_,,,^..^,,,_
best, Eliot




--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Image crashing on startup, apparently during GC

Eliot Miranda-2
 
Hi Damien,

On Fri, Mar 23, 2018 at 1:52 PM, Eliot Miranda <[hidden email]> wrote:
Hi Damien,

On Fri, Mar 23, 2018 at 12:38 PM, Damien Pollet <[hidden email]> wrote:
 
Here are the files (image and various outputs). Running with --leakcheck does mention a few object leaks (see output.txt);

Indeed the image is corrupt at start-up.  See below.
 
I'm not sure where to get or how to build an assert VM.

When you build a phar. VM under build.macos64x64/pharo.cog.spur using the mvm script (mvm -A) you produce an assert VM in PharoAssert.app.

Note that if you build an Assert VM you will be able to manually patch the image in lldb so that you can rescue it.  It looks like this:

$ lldb PharoAssert.app/Contents/MacOS/Pharo

(lldb) target create "/Users/eliot/oscogvm/build.macos64x64/pharo.cog.spur/PharoAssert.app/Contents/MacOS/Pharo"
Current executable set to '/Users/eliot/oscogvm/build.macos64x64/pharo.cog.spur/PharoAssert.app/Contents/MacOS/Pharo' (x86_64).
(lldb) settings set -- target.run-args  "clap_broken.d9e5daa.image"
(lldb) b warning
Breakpoint 1: 3 locations.
(lldb) run --leakcheck 31 clap_broken.d9e5daa.image
Process 31569 launched: '/Users/eliot/oscogvm/build.macos64x64/pharo.cog.spur/PharoAssert.app/Contents/MacOS/Pharo' (x86_64)
object leak in        0x10f919658 @ 0 =        0x122216538
object leak in        0x10fbb3448 @ 0 =        0x122216760
object leak in        0x10fbb3480 @ 0 =        0x1222166a8
object leak in        0x10ff384f0 @ 0 =        0x122d480b0
object leak in        0x10ff38518 @ 0 =        0x122d480b0
object leak in        0x10ff385d0 @ 0 =        0x122d480b0
Process 31569 stopped
* thread #1: tid = 0x5b6d56, 0x0000000100001a83 Pharo`warning(s="checkHeapIntegrityclassIndicesShouldBeValid(0, 1) 57196") + 19 at gcc3x-cointerp.c:44, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100001a83 Pharo`warning(s="checkHeapIntegrityclassIndicesShouldBeValid(0, 1) 57196") + 19 at gcc3x-cointerp.c:44
   41   sqInt warnpid, erroronwarn;
   42   void
   43   warning(char *s) { /* Print an error message but don't necessarily exit. */
-> 44   if (erroronwarn) error(s);
   45   if (warnpid)
   46   printf("\n%s pid %ld\n", s, (long)warnpid);
   47   else
(lldb) call storePointerUncheckedofObjectwithValue(0,0x10f919658,nilObj)
(sqInt) $0 = 4478138592
(lldb) call storePointerUncheckedofObjectwithValue(0,0x10fbb3448,nilObj)
(sqInt) $1 = 4478138592
(lldb) call storePointerUncheckedofObjectwithValue(0,0x10fbb3480,nilObj)
(sqInt) $2 = 4478138592
(lldb) call storePointerUncheckedofObjectwithValue(0,0x10ff384f0,nilObj)
(sqInt) $3 = 4478138592
(lldb) call storePointerUncheckedofObjectwithValue(0,0x10ff38518,nilObj)
(sqInt) $4 = 4478138592
(lldb) call storePointerUncheckedofObjectwithValue(0,0x10ff385d0,nilObj)
(sqInt) $5 = 4478138592
(lldb) expr checkForLeaks = 0
(sqInt) $0 = 0
(lldb) c
 

and then save the image.

 
I also realized the VM I had was from this summer (the one that comes with a 70 image with zeroconf). The output files I include were produced by the VM at get.pharo.org/64/vmLatest70


On 20 March 2018 at 18:17, Eliot Miranda <[hidden email]> wrote:
 
Hi Damien,

On Tue, Mar 20, 2018 at 3:12 AM, Damien Pollet <[hidden email]> wrote:
 
Hi, I have a Pharo image that crashes the VM on startup. The crash report below seems to incriminate GC. Should I make it available somewhere online? What's most convenient?

Don't care.  Anywhere it can be downloaded from.  Also, try running with -leakcheck 15, preferably in an assert VM and see if that gets you additional information.
 

Process:               Pharo [64892]
Path:                  /Users/USER/*/Pharo.app/Contents/MacOS/Pharo
Identifier:            org.pharo.Pharo
Version:               5.0.201708271955 (5.0.201708271955)

Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:

Spur:
Fix a bad bug in SpurPlnningCompactor.  unmarkObjectsFromFirstFreeObject, used when the compactor requires more than one pass due to insufficient savedFirstFieldsSpace, expects the corpse of a moved object to be unmarked, but copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.  Unmarking the corpse before the copy unmarks both.  This fixes a crash with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring: creates lots of files, enough to push the system into the multi-pass regime.


Pharo urgently needs to upgrade the VM to one more up to date than 2017 08 27 (in fact more up-to-date than opensmalltalk/vm commit 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018 -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image corruption in large images, and can occur (as it has here) at start-up, causing one's work to be irretrievably lost.
 
Code Type:             X86-64 (Native)
Parent Process:        ??? [64888]
Responsible:           Pharo [64892]
User ID:               501

Date/Time:             2018-03-19 20:27:03.906 +0100
OS Version:            Mac OS X 10.13.3 (17D102)
Report Version:        12
Anonymous UUID:        6D022236-78DD-6676-117F-EADA56D5D1BE

Sleep/Wake UUID:       AC9E4E55-3CDB-4B4F-A6B4-51ACB1177154

Time Awake Since Boot: 28000 seconds
Time Since Wake:       6500 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000012b67c0b0
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x12b67c0b0:
   VM_ALLOCATE            000000011adfc000-00000001259fc000 [172.0M] rw-/rwx SM=PRV  
-->
   STACK GUARD            0000700005613000-0000700005614000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Application Specific Information:
abort() called

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib         0x00007fff53fbde3e __pthread_kill + 10
1   libsystem_pthread.dylib        0x00007fff540fc150 pthread_kill + 333
2   libsystem_c.dylib              0x00007fff53f1a312 abort + 127
3   org.pharo.Pharo                0x0000000104ed8997 sigsegv + 190
4   libsystem_platform.dylib       0x00007fff540eff5a _sigtramp + 26
5   ???                            000000000000000000 0 + 0
6   org.pharo.Pharo                0x0000000104e73558 markObjects + 464
7   org.pharo.Pharo                0x0000000104e72d40 fullGC + 72
8   org.pharo.Pharo                0x0000000104e92dea primitiveFullGC + 45
9   org.pharo.Pharo                0x0000000104e52425 interpret + 26715
10  org.pharo.Pharo                0x0000000104e5c7f6 enterSmalltalkExecutiveImplementation + 152
11  org.pharo.Pharo                0x0000000104e4be6c interpret + 674
12  org.pharo.Pharo                0x0000000104ed9cc1 -[sqSqueakMainApplication runSqueak] + 394
13  com.apple.Foundation           0x00007fff2e6d696c __NSFirePerformWithOrder + 360
14  com.apple.CoreFoundation       0x00007fff2c579127 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
15  com.apple.CoreFoundation       0x00007fff2c57904f __CFRunLoopDoObservers + 527
16  com.apple.CoreFoundation       0x00007fff2c55b6a8 __CFRunLoopRun + 1240
17  com.apple.CoreFoundation       0x00007fff2c55af43 CFRunLoopRunSpecific + 483
18  com.apple.HIToolbox            0x00007fff2b872e26 RunCurrentEventLoopInMode + 286
19  com.apple.HIToolbox            0x00007fff2b872a9f ReceiveNextEventCommon + 366
20  com.apple.HIToolbox            0x00007fff2b872914 _BlockUntilNextEventMatchingListInModeWithFilter + 64
21  com.apple.AppKit               0x00007fff29b3df5f _DPSNextEvent + 2085
22  com.apple.AppKit               0x00007fff2a2d3b4c -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
23  com.apple.AppKit               0x00007fff29b32d6d -[NSApplication run] + 764
24  com.apple.AppKit               0x00007fff29b01f1a NSApplicationMain + 804

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet




--
_,,,^..^,,,_
best, Eliot




--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet




--
_,,,^..^,,,_
best, Eliot



--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

EstebanLM
In reply to this post by Eliot Miranda-2
 
hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>  
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>  
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

Eliot Miranda-2
In reply to this post by Eliot Miranda-2
 
Hi Cyril,

On Sat, Mar 24, 2018 at 1:50 AM, Cyril Ferlicot D. <[hidden email]> wrote:
Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
> Hi Damien,
>
> Indeed the image is corrupt at start-up.  See below.
>  
>
> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>
> Spur:
> Fix a bad bug in SpurPlnningCompactor.
>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
> than one pass due to insufficient savedFirstFieldsSpace, expects the
> corpse of a moved object to be unmarked, but
> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target. 
> Unmarking the corpse before the copy unmarks both.  This fixes a crash
> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
> creates lots of files, enough to push the system into the multi-pass regime.
>
>
> Pharo urgently needs to upgrade the VM to one more up to date than 2017
> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
> corruption in large images, and can occur (as it has here) at start-up,
> causing one's work to be irretrievably lost.
>  

Hi Eliot,

I think that there is a lot of people who would like to get a newer
stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
which VM are stable enough to be promoted as stable.

Some weeks ago Esteban tried to promote a VM as stable and he had to
revert it the same day because a regression occurred in the VM.

If you're able to tell us which vms are stable in those present at
http://files.pharo.org/vm/pharo-spur32/ and
http://files.pharo.org/vm/pharo-spur64/ it would be a great help.

Even better would be for the pharo community to have a way to know which
vms are stable or not without having to ask you.

The way to do that is to run tests on the CI infrastructure and mark VMs that pass the tests as stable.  That;s what happens on travis with the Cog VMs built there.
 

Have a nice day.

>
> --
> _,,,^..^,,,_
> best, Eliot
--
Cyril Ferlicot
https://ferlicot.fr




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

Nicolas Cellier
In reply to this post by EstebanLM
 


2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <[hidden email]>:

hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>


Hi,
Several problems are mixed here, let's try and decouple:
- 1) there are ongoing development in the core of VM that may introduce some instability
- 2) there are ongoing development in some plugins also
- 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM

For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
Pharo is relying on these progress, they are vital.
IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.

Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
Thanks to all who are working in this direction.

For 2) we had a few problems, but again this is for improving important features (SSL...)
Much of the development happens in feature branches already.
But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.

For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...

If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
We have to understand that 3) is absolutely vital.

May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
Fortunately, this has changed in 2018.
Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
We will never thank them enough for that. This also shows that cooperation may pay.

But this is still very fragile.
If we want to make progress, we should ask why it is so.
We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).

Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
This was counter productive. Work must be produced upstream, or it's wasted.
I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.

If you have constructive ideas that will help decoupling all these problems, we are all ear.

PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

Cyril Ferlicot D
 
On 30/03/2018 23:56, Nicolas Cellier wrote:

>
> Hi,
> Several problems are mixed here, let's try and decouple:
> - 1) there are ongoing development in the core of VM that may introduce
> some instability
> - 2) there are ongoing development in some plugins also
> - 3) there are infrastructure problems preventing to produce artifacts
> whatever the intrinsic stability of the VM
>
> For 1) development happens in VMMaker, and we have to be relying on
> experts. Today that is Eliot and Clement.
> We all want 64bits VM, improved GC, improved become:, write barrier,
> ephemerons, threaded FFI calls and adaptive optimization.
> Pharo is relying on these progress, they are vital.
> IMO, we are reaching a good level of confidence, and I hope to see some
> VMMaker version blessed as stable pretty soon.
>
> Instead of whining, the best we can do for reaching this state is help
> them by providing accurate bug reports and even better reproducible cases.
> Thanks to all who are working in this direction.
>
> For 2) we had a few problems, but again this is for improving important
> features (SSL...)
> Much of the development happens in feature branches already.
> But since we are targetting so many platforms, and don't have automated
> tests that scale yet, we still need beta testers.
> We can discuss about the introduction of such beta features wrt release
> cycles, that will be a good thing.
> Ideally we should tend toward continuous integration and have very short
> cycles, but we're not yet there.
>
> For 3)  we had a lot of problems, like staled links, invalid
> credentials, evolution of the version of tools at automated build site,
> etc...
>
> If we don't build the artifacts, then we can't even have a chance to
> test the stability of 1) and 2)
> We have to understand that 3) is absolutely vital.
>
> May I remind that for a very long period last year, the build were
> broken due to lack of work at Pharo side.
> Fortunately, this has changed in 2018.
> Fabio has been working REALLY hard to improve 3), and without the help
> of Esteban,I don't think he could have reached the holy green build status.
> We will never thank them enough for that. This also shows that
> cooperation may pay.
>
> But this is still very fragile.
> If we want to make progress, we should ask why it is so.
> We could analyze the regressions, and decide if the complexity is
> sustainable, or eventually drop some drag.
> We are chasing many hares by building the VM for Newspeak/Pharo/Squeak
> i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
> If it happens that a fix vital for Pharo/Squeak does break Newspeak
> tests, then it slows down the progress...
> Maybe we would want to decouple a bit more the problems there too (they
> may come from some image side weakness).
>
> Last two years I've also observed some work exclusively done in the
> Pharo fork of the opensmalltalk VM.
> This was counter productive. Work must be produced upstream, or it's wasted.
> I once thought that the Pharo fork could be the place for the pharo team
> to manage official stable versions.
> But I agree that this is too much duplicated work and would be very
> happy to see the work happen upstream too.
>
> If you have constructive ideas that will help decoupling all these
> problems, we are all ear.
>
> PS: i did not post this answer for avoiding sterile discussion, but
> since Phil asked...
>

Hi Nicolas,

Thanks for this explanation. It's helpful for people like me who work
only with the image side of Pharo and only begin to read the work done
on the VM side. It's not always obvious what are really the problems
when we do not know the exact situation and can sometime leads to
misunderstanding.

If the community have some ideas on how to improve the current
infrastructure but does not have the time to do it currently, maybe it
could be good to list them on the github issue tracker with an
"Infrastructure" tag? Like that if someone complains and is ready to
help a little you can just send it the link of the issues tagged
"Infrastructure".

--
Cyril Ferlicot
https://ferlicot.fr

Reply | Threaded
Open this post in threaded view
|

Re: Image crashing on startup, apparently during GC

Alistair Grant
In reply to this post by Eliot Miranda-2
 
On 23 March 2018 at 21:52, Eliot Miranda <[hidden email]> wrote:
>
> Pharo urgently needs to upgrade the VM

I couldn't agree more, and I know Esteban wants to release a new VM.
I did quite a bit of testing on a VM from 15 March that I thought
would make a good candidate before realising that the Mac VMs aren't
available.

> to one more up to date than 2017 08 27 (in fact more up-to-date than opensmalltalk/vm commit 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018 -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image corruption in large images, and can occur (as it has here) at start-up, causing one's work to be irretrievably lost.

Most, if not all, the VMs between 1 Jan and 15 Mar have bugs that are
triggered either by the automated test suite or the bootstrap process.


The blocks I can see at the moment are:

- Multiple builds have failed with an internal compiler error on the
sista builds.
-- The earliest occurrence I could find was commit 1f0a7da, but it may
have been earlier.
- Even if the Mac builds show success in travis, they aren't making it
on to files.pharo.org.
-- I haven't ever worked with this code.

Not directly related, but:
- Bintray hasn't been updated since 8 March 2018.


I think it could also be useful for files.pharo.org to have release
candidate links available, which would help people to focus testing on
a particular VM.  They would need to be manually maintained, but I
think the benefits would be worthwhile.

Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: Image crashing on startup, apparently during GC

EstebanLM
 
Hi,

On 31 Mar 2018, at 11:45, Alistair Grant <[hidden email]> wrote:


On 23 March 2018 at 21:52, Eliot Miranda <[hidden email]> wrote:

Pharo urgently needs to upgrade the VM

I couldn't agree more, and I know Esteban wants to release a new VM.
I did quite a bit of testing on a VM from 15 March that I thought
would make a good candidate before realising that the Mac VMs aren't
available.

I will try to promote then the one of 15 march. We’ll see next week. 
but then, this is part of my observation: We cannot know which VMs are stable, and that’s because the *process* to make them stable is very “human dependent”: We consider a version stable when it builds on CI and Eliot says is stable. But since Eliot does not use Pharo (not a critic, a reality), that may be not true for Pharo. And that’s actually what happens, Pharo crashes. 
I tried to avoid a bit this problem with our fork and nightly builds that runs the pharo tests (to knew about problems as early as possible). But to be honest I didn’t have the time (and the will) to work on it recently, then pharo fork is in practice stalled. I will revive that eventually… but just when I find the time and the spirit to do it.

to one more up to date than 2017 08 27 (in fact more up-to-date than opensmalltalk/vm commit 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018 -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image corruption in large images, and can occur (as it has here) at start-up, causing one's work to be irretrievably lost.

Most, if not all, the VMs between 1 Jan and 15 Mar have bugs that are
triggered either by the automated test suite or the bootstrap process.


The blocks I can see at the moment are:

- Multiple builds have failed with an internal compiler error on the
sista builds.
-- The earliest occurrence I could find was commit 1f0a7da, but it may
have been earlier.
- Even if the Mac builds show success in travis, they aren't making it
on to files.pharo.org.

latest VM copied into files.pharo.org is 16/03. 
we need to see what’s happening there.

-- I haven't ever worked with this code.

Not directly related, but:
- Bintray hasn't been updated since 8 March 2018.


I think it could also be useful for files.pharo.org to have release
candidate links available, which would help people to focus testing on
a particular VM.  They would need to be manually maintained, but I
think the benefits would be worthwhile.

all VMs are available to test.
just… not available *directly* to general users. 
now… I could have a 70rc link in vm subdir. But since I cannot know which VM is RC I find it pointless at this moment.

cheers, 
Esteban



Cheers,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

EstebanLM
In reply to this post by Nicolas Cellier
 


On 30 Mar 2018, at 23:56, Nicolas Cellier <[hidden email]> wrote:



2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <[hidden email]>:

hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>


Hi,
Several problems are mixed here, let's try and decouple:
- 1) there are ongoing development in the core of VM that may introduce some instability
- 2) there are ongoing development in some plugins also
- 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM

you are right in all points, but for me this is a problem of process. 

- we have no defined milestones so nobody knows if they can jump to help.
- plugin development happens “by his own” and nobody knows what happens, why happens and how it happens.
- infrastructure is not bad and a lot of efforts has been made to make it work. But code sources are scattered around the world and the only thing that reunites them is the hand of the one who generates the C sources.

IMHO, is this “disconnection” what causes most of the problems. 

cheers,
Esteban


For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
Pharo is relying on these progress, they are vital.
IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.

Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
Thanks to all who are working in this direction.

For 2) we had a few problems, but again this is for improving important features (SSL...)
Much of the development happens in feature branches already.
But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.

For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...

If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
We have to understand that 3) is absolutely vital.

May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
Fortunately, this has changed in 2018.
Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
We will never thank them enough for that. This also shows that cooperation may pay.

But this is still very fragile.
If we want to make progress, we should ask why it is so.
We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).

Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
This was counter productive. Work must be produced upstream, or it's wasted.

This happened just once or twice. And it was because people were ignorant of “joint" so they continued contributing as before (and people were pointed to right place when we had the opportunity).
And I disagree this was counterproductive because I took the effort to merge the changes into osvm. This worked fine until I stopped to do that job, but well… just one PR got stalled there for months and Alistair integrated it recently. 

What *did happen* and I’m still not ready to let it go is a lot of the small changes that we presented to be rejected (or ignored) without further consideration. But well, let’s keep it positive and not enter to sterile discussions, I just think you are wrong with this argument.

cheers,
Esteban

I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.

If you have constructive ideas that will help decoupling all these problems, we are all ear.

PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

tesonep@gmail.com
 
Hi all, 
Esteban, 
   one thing we can do is to add a job to the Pharo CI to try to perform the bootstrap and testing of it using the latest VM.
This of course will no replace manual testing but it can give us the result of running a quite complex process using the latest VM. 
Even more, the last problems with stability were detected when trying to execute a complete bootstrap.

Today I will make a pull request (to never be integrated) using the latest VM instead of the stable one.
Later we can migrate it to be an independent job, that is always there to validate the new latest VM.
Moreover, if the VM is published using a kind of semantic versioning, we can refer to an specific VM in the bootstrap process.
This will allow us to check new VMs just with a PR (taking advance of all the automatic process that now is working) and also to reproduce the building using the exact VM used during the generation of an image, this is a weak point we have today, where we all always using Stable, and not a defined version.


Cheers,

On Sat, Mar 31, 2018 at 3:03 PM, Esteban Lorenzano <[hidden email]> wrote:
 


On 30 Mar 2018, at 23:56, Nicolas Cellier <[hidden email]> wrote:



2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <[hidden email]>:

hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>


Hi,
Several problems are mixed here, let's try and decouple:
- 1) there are ongoing development in the core of VM that may introduce some instability
- 2) there are ongoing development in some plugins also
- 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM

you are right in all points, but for me this is a problem of process. 

- we have no defined milestones so nobody knows if they can jump to help.
- plugin development happens “by his own” and nobody knows what happens, why happens and how it happens.
- infrastructure is not bad and a lot of efforts has been made to make it work. But code sources are scattered around the world and the only thing that reunites them is the hand of the one who generates the C sources.

IMHO, is this “disconnection” what causes most of the problems. 

cheers,
Esteban


For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
Pharo is relying on these progress, they are vital.
IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.

Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
Thanks to all who are working in this direction.

For 2) we had a few problems, but again this is for improving important features (SSL...)
Much of the development happens in feature branches already.
But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.

For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...

If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
We have to understand that 3) is absolutely vital.

May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
Fortunately, this has changed in 2018.
Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
We will never thank them enough for that. This also shows that cooperation may pay.

But this is still very fragile.
If we want to make progress, we should ask why it is so.
We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).

Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
This was counter productive. Work must be produced upstream, or it's wasted.

This happened just once or twice. And it was because people were ignorant of “joint" so they continued contributing as before (and people were pointed to right place when we had the opportunity).
And I disagree this was counterproductive because I took the effort to merge the changes into osvm. This worked fine until I stopped to do that job, but well… just one PR got stalled there for months and Alistair integrated it recently. 

What *did happen* and I’m still not ready to let it go is a lot of the small changes that we presented to be rejected (or ignored) without further consideration. But well, let’s keep it positive and not enter to sterile discussions, I just think you are wrong with this argument.

cheers,
Esteban

I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.

If you have constructive ideas that will help decoupling all these problems, we are all ear.

PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...






--
Pablo Tesone.
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Image crashing on startup, apparently during GC

Alistair Grant
In reply to this post by EstebanLM
 
On Sat, Mar 31, 2018 at 02:42:33PM +0200, Esteban Lorenzano wrote:
>  

> Hi,
>
>
>     On 31 Mar 2018, at 11:45, Alistair Grant <[hidden email]> wrote:
>
>
>     On 23 March 2018 at 21:52, Eliot Miranda <[hidden email]> wrote:
>
>
>         Pharo urgently needs to upgrade the VM
>
>
>     I couldn't agree more, and I know Esteban wants to release a new VM.
>     I did quite a bit of testing on a VM from 15 March that I thought
>     would make a good candidate before realising that the Mac VMs aren't
>     available.
>
>
> I will try to promote then the one of 15 march. We?ll see next week.

There were a few builds on the 15th, the VMs I tested were:

http://files.pharo.org/vm/pharo-spur64/linux/pharo-linux-x86_64threaded-201803160215-43a2f5c.zip
http://files.pharo.org/vm/pharo-spur32/linux/pharo-linux-i386threaded-201803160215-43a2f5c.zip



> but then, this is part of my observation: We cannot know which VMs are stable,
> and that?s because the *process* to make them stable is very ?human dependent?:
> We consider a version stable when it builds on CI and Eliot says is stable. But
> since Eliot does not use Pharo (not a critic, a reality), that may be not true
> for Pharo. And that?s actually what happens, Pharo crashes.
> I tried to avoid a bit this problem with our fork and nightly builds that runs
> the pharo tests (to knew about problems as early as possible). But to be honest
> I didn?t have the time (and the will) to work on it recently, then pharo fork
> is in practice stalled. I will revive that eventually? but just when I find the
> time and the spirit to do it.
>
>
>
>         to one more up to date than 2017 08 27 (in fact more up-to-date than
>         opensmalltalk/vm commit 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri
>         Jan 19 13:17:57 2018 -0800).  The bug that VMMaker.oscog-eem.2320 fixes
>         can result in image corruption in large images, and can occur (as it
>         has here) at start-up, causing one's work to be irretrievably lost.
>
>
>     Most, if not all, the VMs between 1 Jan and 15 Mar have bugs that are
>     triggered either by the automated test suite or the bootstrap process.
>
>
>     The blocks I can see at the moment are:
>
>     - Multiple builds have failed with an internal compiler error on the
>     sista builds.
>     -- The earliest occurrence I could find was commit 1f0a7da, but it may
>     have been earlier.
>     - Even if the Mac builds show success in travis, they aren't making it
>     on to files.pharo.org.
>
>
> latest VM copied into files.pharo.org is 16/03.
> we need to see what?s happening there.
>
>
>     -- I haven't ever worked with this code.
>
>     Not directly related, but:
>     - Bintray hasn't been updated since 8 March 2018.
>
>
>     I think it could also be useful for files.pharo.org to have release
>     candidate links available, which would help people to focus testing on
>     a particular VM.  They would need to be manually maintained, but I
>     think the benefits would be worthwhile.
>
>
> all VMs are available to test.
> just? not available *directly* to general users.
> now? I could have a 70rc link in vm subdir. But since I cannot know which VM is
> RC I find it pointless at this moment.

I didn't mean that we would always be looking to release a new version,
but there have been multiple occassions in the past when you've asked
people to try a particular VM to see if it solves a problem.  In those
instances, marking it as RC would make it simpler for others to
contribute to the testing and give you more confidence to promote it to
stable.

Actually, if the link existed all the time, but most of the time it was
the same as the stable version, I'd probably just use it as my download
link, which would mean that I'm testing the RC whenever it comes out,
and stay with it while it is stable and there isn't a new RC.


Cheers,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

Nicolas Cellier
In reply to this post by EstebanLM
 


2018-03-31 15:03 GMT+02:00 Esteban Lorenzano <[hidden email]>:
 


On 30 Mar 2018, at 23:56, Nicolas Cellier <[hidden email]> wrote:



2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <[hidden email]>:

hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>


Hi,
Several problems are mixed here, let's try and decouple:
- 1) there are ongoing development in the core of VM that may introduce some instability
- 2) there are ongoing development in some plugins also
- 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM

you are right in all points, but for me this is a problem of process. 

- we have no defined milestones so nobody knows if they can jump to help.
- plugin development happens “by his own” and nobody knows what happens, why happens and how it happens.
- infrastructure is not bad and a lot of efforts has been made to make it work. But code sources are scattered around the world and the only thing that reunites them is the hand of the one who generates the C sources.

IMHO, is this “disconnection” what causes most of the problems. 

cheers,
Esteban


Hi Esteban,
I see no fatality, and github also provides tools for that.
Look, there is the project page on github
https://github.com/OpenSmalltalk/opensmalltalk-vm/projects/1

Maybe the Pharo team is willing to collaborate and take active parts in definition of milestones?
 

For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
Pharo is relying on these progress, they are vital.
IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.

Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
Thanks to all who are working in this direction.

For 2) we had a few problems, but again this is for improving important features (SSL...)
Much of the development happens in feature branches already.
But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.

For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...

If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
We have to understand that 3) is absolutely vital.

May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
Fortunately, this has changed in 2018.
Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
We will never thank them enough for that. This also shows that cooperation may pay.

But this is still very fragile.
If we want to make progress, we should ask why it is so.
We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).

Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
This was counter productive. Work must be produced upstream, or it's wasted.

This happened just once or twice. And it was because people were ignorant of “joint" so they continued contributing as before (and people were pointed to right place when we had the opportunity).
And I disagree this was counterproductive because I took the effort to merge the changes into osvm. This worked fine until I stopped to do that job, but well… just one PR got stalled there for months and Alistair integrated it recently. 

What *did happen* and I’m still not ready to let it go is a lot of the small changes that we presented to be rejected (or ignored) without further consideration. But well, let’s keep it positive and not enter to sterile discussions, I just think you are wrong with this argument.

No, it's important, we (opensmalltalk-vm team) can't let such bad feeling and frustration creep in.
Every contribution counts, that does not mean that every PR will be accepted, but we owe an explanation if not.
Some are accepted instantly, some are accepted after modification requests, some are rejected (I hope with some rationalization).
What is problematic is that some were ignored for too long time, I regret the situation, but there is no deliberate intention to ignore them, just lack of manpower IMO.
For example, the recent work of Alistair shows that there is no fatality here, it's just that someone has to do the hard work (kudos!).

Also, the question is coupled with stability: having a red status does not help, for almost every PR that I accepted, I had to dig into travis console reports and compare to status of previous build in order to know if it was a regression, or just a long time failing case... This does not scale!

I'm all for more distributed power, and that should come with responsibilities, first a cooperative "you break it, you fix it" attitude.

Or maybe do you want clarified decision process?
For now, people that feel interested by a PR raise their voice.
I don't know if we need something more formal
For important design decisions there is vm-dev mailing list to discuss about that.

cheers
Nicolas
 
cheers,
Esteban

I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.

If you have constructive ideas that will help decoupling all these problems, we are all ear.

PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

EstebanLM
 
hi,

On 31 Mar 2018, at 17:34, Nicolas Cellier <[hidden email]> wrote:



2018-03-31 15:03 GMT+02:00 Esteban Lorenzano <[hidden email]>:
 


On 30 Mar 2018, at 23:56, Nicolas Cellier <[hidden email]> wrote:



2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <[hidden email]>:

hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>


Hi,
Several problems are mixed here, let's try and decouple:
- 1) there are ongoing development in the core of VM that may introduce some instability
- 2) there are ongoing development in some plugins also
- 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM

you are right in all points, but for me this is a problem of process. 

- we have no defined milestones so nobody knows if they can jump to help.
- plugin development happens “by his own” and nobody knows what happens, why happens and how it happens.
- infrastructure is not bad and a lot of efforts has been made to make it work. But code sources are scattered around the world and the only thing that reunites them is the hand of the one who generates the C sources.

IMHO, is this “disconnection” what causes most of the problems. 

cheers,
Esteban


Hi Esteban,
I see no fatality, and github also provides tools for that.
Look, there is the project page on github
https://github.com/OpenSmalltalk/opensmalltalk-vm/projects/1

Maybe the Pharo team is willing to collaborate and take active parts in definition of milestones?
 

For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
Pharo is relying on these progress, they are vital.
IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.

Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
Thanks to all who are working in this direction.

For 2) we had a few problems, but again this is for improving important features (SSL...)
Much of the development happens in feature branches already.
But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.

For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...

If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
We have to understand that 3) is absolutely vital.

May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
Fortunately, this has changed in 2018.
Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
We will never thank them enough for that. This also shows that cooperation may pay.

But this is still very fragile.
If we want to make progress, we should ask why it is so.
We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).

Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
This was counter productive. Work must be produced upstream, or it's wasted.

This happened just once or twice. And it was because people were ignorant of “joint" so they continued contributing as before (and people were pointed to right place when we had the opportunity).
And I disagree this was counterproductive because I took the effort to merge the changes into osvm. This worked fine until I stopped to do that job, but well… just one PR got stalled there for months and Alistair integrated it recently. 

What *did happen* and I’m still not ready to let it go is a lot of the small changes that we presented to be rejected (or ignored) without further consideration. But well, let’s keep it positive and not enter to sterile discussions, I just think you are wrong with this argument.

No, it's important, we (opensmalltalk-vm team) can't let such bad feeling and frustration creep in.
Every contribution counts, that does not mean that every PR will be accepted, but we owe an explanation if not.
Some are accepted instantly, some are accepted after modification requests, some are rejected (I hope with some rationalization).
What is problematic is that some were ignored for too long time, I regret the situation, but there is no deliberate intention to ignore them, just lack of manpower IMO.
For example, the recent work of Alistair shows that there is no fatality here, it's just that someone has to do the hard work (kudos!).

I’m sorry for have a bitter feeling, but I’m going to give you an example so you understand why I’m saying this: I proposed the refactor of Alien package (which is obviously right and simple) at least three times in last three years (by different means). First time I’ve been told “no, we prefer it like that”. Second time I’ve been told “we need to think about that” (and no answer later). Last time I even didn’t receive a response. So well, when Torsten proposed it he first came to me. I told him “talk in vm-dev and good luck”. His proposal was accepted (thanks god). 
Several situations like this one made me think that I’m a second class citizen in this community. And you know what? I’m 46 years and I do not want to be treated as is I’m a child that can play with the toys someone let me or not. So yes, I’m sad and not very “into” this this days. I made a lot of efforts to came back for the de facto fork we had because I always pushed to work together. But I do not see the spirit of collaboration I hoped. 

Also, the question is coupled with stability: having a red status does not help, for almost every PR that I accepted, I had to dig into travis console reports and compare to status of previous build in order to know if it was a regression, or just a long time failing case... This does not scale!

is worst that "does not scale".
again, let me put you an example: Imagine I want to work on FFI and I need to touch both an external file and VMMaker: I cannot do a PR because VMMaker is not there and VM building will be broken and I cannot push VMMaker because platform sources are not there and VM building will be broken too.
So, the only solution is what happens today: I need to push VMMaker and changes at the same time. So I’m forced to work on the development branch (we are all forced to work there), instead how I think it should be: each one should be able to work on their branch and contribute changes through PRs (PRs that can be validated with a good CI process).

As a consequence, since all contributions go to the development branch… we have no stability and we need to wait of the blessing of a VM. That may or may not happen.

Meh, whatever… is obvious that my way of work is different to most of the people of this community so I will go back to what I do now: just the absolutely necessary.

cheers, 
Esteban

ps: I will not continue discussing this… I know how things are and I’m sorry to put such a negative perspective in this list, but I needed to say it.


I'm all for more distributed power, and that should come with responsibilities, first a cooperative "you break it, you fix it" attitude.

Or maybe do you want clarified decision process?
For now, people that feel interested by a PR raise their voice.
I don't know if we need something more formal
For important design decisions there is vm-dev mailing list to discuss about that.

cheers
Nicolas
 
cheers,
Esteban

I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.

If you have constructive ideas that will help decoupling all these problems, we are all ear.

PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

Nicolas Cellier
 


2018-03-31 20:36 GMT+02:00 Esteban Lorenzano <[hidden email]>:
 
hi,

On 31 Mar 2018, at 17:34, Nicolas Cellier <[hidden email]> wrote:



2018-03-31 15:03 GMT+02:00 Esteban Lorenzano <[hidden email]>:
 


On 30 Mar 2018, at 23:56, Nicolas Cellier <[hidden email]> wrote:



2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <[hidden email]>:

hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>


Hi,
Several problems are mixed here, let's try and decouple:
- 1) there are ongoing development in the core of VM that may introduce some instability
- 2) there are ongoing development in some plugins also
- 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM

you are right in all points, but for me this is a problem of process. 

- we have no defined milestones so nobody knows if they can jump to help.
- plugin development happens “by his own” and nobody knows what happens, why happens and how it happens.
- infrastructure is not bad and a lot of efforts has been made to make it work. But code sources are scattered around the world and the only thing that reunites them is the hand of the one who generates the C sources.

IMHO, is this “disconnection” what causes most of the problems. 

cheers,
Esteban


Hi Esteban,
I see no fatality, and github also provides tools for that.
Look, there is the project page on github
https://github.com/OpenSmalltalk/opensmalltalk-vm/projects/1

Maybe the Pharo team is willing to collaborate and take active parts in definition of milestones?
 

For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
Pharo is relying on these progress, they are vital.
IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.

Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
Thanks to all who are working in this direction.

For 2) we had a few problems, but again this is for improving important features (SSL...)
Much of the development happens in feature branches already.
But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.

For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...

If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
We have to understand that 3) is absolutely vital.

May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
Fortunately, this has changed in 2018.
Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
We will never thank them enough for that. This also shows that cooperation may pay.

But this is still very fragile.
If we want to make progress, we should ask why it is so.
We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).

Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
This was counter productive. Work must be produced upstream, or it's wasted.

This happened just once or twice. And it was because people were ignorant of “joint" so they continued contributing as before (and people were pointed to right place when we had the opportunity).
And I disagree this was counterproductive because I took the effort to merge the changes into osvm. This worked fine until I stopped to do that job, but well… just one PR got stalled there for months and Alistair integrated it recently. 

What *did happen* and I’m still not ready to let it go is a lot of the small changes that we presented to be rejected (or ignored) without further consideration. But well, let’s keep it positive and not enter to sterile discussions, I just think you are wrong with this argument.

No, it's important, we (opensmalltalk-vm team) can't let such bad feeling and frustration creep in.
Every contribution counts, that does not mean that every PR will be accepted, but we owe an explanation if not.
Some are accepted instantly, some are accepted after modification requests, some are rejected (I hope with some rationalization).
What is problematic is that some were ignored for too long time, I regret the situation, but there is no deliberate intention to ignore them, just lack of manpower IMO.
For example, the recent work of Alistair shows that there is no fatality here, it's just that someone has to do the hard work (kudos!).

I’m sorry for have a bitter feeling, but I’m going to give you an example so you understand why I’m saying this: I proposed the refactor of Alien package (which is obviously right and simple) at least three times in last three years (by different means). First time I’ve been told “no, we prefer it like that”. Second time I’ve been told “we need to think about that” (and no answer later). Last time I even didn’t receive a response. So well, when Torsten proposed it he first came to me. I told him “talk in vm-dev and good luck”. His proposal was accepted (thanks god). 
Several situations like this one made me think that I’m a second class citizen in this community. And you know what? I’m 46 years and I do not want to be treated as is I’m a child that can play with the toys someone let me or not. So yes, I’m sad and not very “into” this this days. I made a lot of efforts to came back for the de facto fork we had because I always pushed to work together. But I do not see the spirit of collaboration I hoped. 

Torsten has brought the technical merits of doing so, and the cons of status quo, it was nothing personnal.
I don't know how the topic was brought on vm-dev. at that time. Probably the technical merit was not perceived then.
For me, you are in the opensmalltalk team, this does not mean 2nd zone (or the whole Smalltalk community is second zone then).

Also, the question is coupled with stability: having a red status does not help, for almost every PR that I accepted, I had to dig into travis console reports and compare to status of previous build in order to know if it was a regression, or just a long time failing case... This does not scale!

is worst that "does not scale".
again, let me put you an example: Imagine I want to work on FFI and I need to touch both an external file and VMMaker: I cannot do a PR because VMMaker is not there and VM building will be broken and I cannot push VMMaker because platform sources are not there and VM building will be broken too.
So, the only solution is what happens today: I need to push VMMaker and changes at the same time. So I’m forced to work on the development branch (we are all forced to work there), instead how I think it should be: each one should be able to work on their branch and contribute changes through PRs (PRs that can be validated with a good CI process).

As a consequence, since all contributions go to the development branch… we have no stability and we need to wait of the blessing of a VM. That may or may not happen.

Meh, whatever… is obvious that my way of work is different to most of the people of this community so I will go back to what I do now: just the absolutely necessary.

It's not at all about my way or your way, you don't have to make it something personal.
It' about the feasibility and sustainability of the different solutions.
You are right, branch development is not compatible with versionning of generated code because it leads to unsolvable merge conflicts.
Whatever diverging opinions, the very first condition for automating code generation is to have reproducible artifacts (generated code).
You know that it's not the case today, even if generating twice from the very same image.
We can't shove this problem under the carpet.
Otherwise, two different build of the same source could lead to different VM behavior, which would be worse than what we have today.

Beside, except Eliot, Clement and Ronaldo, most developers work on plugins.
Developping plugins in branches is still possible, unlike core VM.
There is a reduced probability of conflict of gnerated code given the number of concurrent developpers today.
So we commit in trunk more often than strictly necessary IMO.

Finally, the case you are describing is rare and mostly concern modification of core VM.
As said in other thread, dev cycle of core VM feature is long anyway (6+ month).
In this context, feature branch are not sustainable.
A good branch is a short branch, everything else is illusory and leads to merge nightmares.
In such conditions, one have to accept temporary unstability and organize release cycles with stabilisation phase.
How differently does the Pharo release cycle works?
You have raised good points concerning the organization of those cycles.
My answer does not change: take your place, throw inferiority complex away and participate.
You are very capable, and it will be beneficial to the community at large.

And don't take this answer as advocacy for the status quo.
I'm trying to analyze and identify locks.
It does not mean that we can't unlock :)

cheers
Nicolas

cheers, 
Esteban

ps: I will not continue discussing this… I know how things are and I’m sorry to put such a negative perspective in this list, but I needed to say it.


I'm all for more distributed power, and that should come with responsibilities, first a cooperative "you break it, you fix it" attitude.

Or maybe do you want clarified decision process?
For now, people that feel interested by a PR raise their voice.
I don't know if we need something more formal
For important design decisions there is vm-dev mailing list to discuss about that.

cheers
Nicolas
 
cheers,
Esteban

I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.

If you have constructive ideas that will help decoupling all these problems, we are all ear.

PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...



Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

timrowledge
 


>

> You are right, branch development is not compatible with versionning of generated code because it leads to unsolvable merge conflicts.
> So you legitimately raise the question https://stackoverflow.com/questions/893913/should-i-store-generated-code-in-source-control
> Whatever diverging opinions, the very first condition for automating code generation is to have reproducible artifacts (generated  code).

I don't like the idea of sticking generated code into an SCCS either - it just seems wrong and likely to mislead people that just don't get the idea of dynamically generated code - BUT against that is the reproducibility thing which a colossal issue for practical sanity.

Maybe I should offer a snippet that sticks a big warning as a comment in front of every single generated routine
"This code was generated from XXXX on DDDDDD:HHMMSS and should not be edited let alone edited and pushed back into SCCS"

Or alternatively, maybe changing the VMMaker to output its work in BrainFuck or Whitespace or some other machine-but-not-human readable form.... :-)


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
If at first you don't succeed, destroy all evidence that you tried.


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

K K Subbu
 
On Sunday 01 April 2018 06:49 AM, tim Rowledge wrote:
> Maybe I should offer a snippet that sticks a big warning as a comment
> in front of every single generated routine "This code was generated
> from XXXX on DDDDDD:HHMMSS and should not be edited let alone edited
> and pushed back into SCCS"
I think there are two problems mixed up here. One is about sticking meta
data tags like version control tags into generated source code. This is
an easier problem to solve with Git.

The second, and more difficult one, is that an artifact is dependent on
two types of sources only one of which can be compiled directly by
automated builds. The other type (auto generated from .st) requires
manual intervention during compiling and intermediate code needs to be
preserved for debugging.

What if Slang could append a version hash at the end of the generated
file and use a special extension (say *.cpk) to mark such augmented files?

        HASH=xxxxxxxx

Build scripts can use this hash file to decide if the corresponding .c
file needs to be compiled or not:

foo.hash : foo.cpk
        update foo.hash iff its HASH differs from that in foo.cpk

foo.o  : foo.hash
        unpack foo.c from foo.cpk
  compile foo.c to get foo.o

A CPK could also concat a compressed C file with a HASH line and the C
part could be extracted by the build script. It will deter people from
editing a pack file while still allowing access to intermediate code for
VM developers.

If a single ST coded component generates multiple *.[ch] files, they
could all be packed into a single file and share the same hash. Build
scripts can unpack and recompile individual files if the hash changes.

Regards .. Subbu
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Image crashing on startup, apparently during GC

EstebanLM
In reply to this post by Nicolas Cellier
 


On 31 Mar 2018, at 23:31, Nicolas Cellier <[hidden email]> wrote:



2018-03-31 20:36 GMT+02:00 Esteban Lorenzano <[hidden email]>:
 
hi,

On 31 Mar 2018, at 17:34, Nicolas Cellier <[hidden email]> wrote:



2018-03-31 15:03 GMT+02:00 Esteban Lorenzano <[hidden email]>:
 


On 30 Mar 2018, at 23:56, Nicolas Cellier <[hidden email]> wrote:



2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <[hidden email]>:

hi,

> On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <[hidden email]> wrote:
>
> Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> Hi Damien,
>>
>> Indeed the image is corrupt at start-up.  See below.
>>
>>
>> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>
>> Spur:
>> Fix a bad bug in SpurPlnningCompactor.
>>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> corpse of a moved object to be unmarked, but
>> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> creates lots of files, enough to push the system into the multi-pass regime.
>>
>>
>> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> corruption in large images, and can occur (as it has here) at start-up,
>> causing one's work to be irretrievably lost.
>>
>
> Hi Eliot,
>
> I think that there is a lot of people who would like to get a newer
> stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
> which VM are stable enough to be promoted as stable.
>
> Some weeks ago Esteban tried to promote a VM as stable and he had to
> revert it the same day because a regression occurred in the VM.
>
> If you're able to tell us which vms are stable in those present at
> http://files.pharo.org/vm/pharo-spur32/ and
> http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>
> Even better would be for the pharo community to have a way to know which
> vms are stable or not without having to ask you.

there is no “stable” branch in Cog, and that’s a problem.
“released” versions (the version you can find as stable) are not working for Pharo :(

I tried to promote versions from end feb and that crashed.

next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.

Esteban


>
> Have a nice day.
>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
> --
> Cyril Ferlicot
> https://ferlicot.fr
>


Hi,
Several problems are mixed here, let's try and decouple:
- 1) there are ongoing development in the core of VM that may introduce some instability
- 2) there are ongoing development in some plugins also
- 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM

you are right in all points, but for me this is a problem of process. 

- we have no defined milestones so nobody knows if they can jump to help.
- plugin development happens “by his own” and nobody knows what happens, why happens and how it happens.
- infrastructure is not bad and a lot of efforts has been made to make it work. But code sources are scattered around the world and the only thing that reunites them is the hand of the one who generates the C sources.

IMHO, is this “disconnection” what causes most of the problems. 

cheers,
Esteban


Hi Esteban,
I see no fatality, and github also provides tools for that.
Look, there is the project page on github
https://github.com/OpenSmalltalk/opensmalltalk-vm/projects/1

Maybe the Pharo team is willing to collaborate and take active parts in definition of milestones?
 

For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
Pharo is relying on these progress, they are vital.
IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.

Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
Thanks to all who are working in this direction.

For 2) we had a few problems, but again this is for improving important features (SSL...)
Much of the development happens in feature branches already.
But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.

For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...

If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
We have to understand that 3) is absolutely vital.

May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
Fortunately, this has changed in 2018.
Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
We will never thank them enough for that. This also shows that cooperation may pay.

But this is still very fragile.
If we want to make progress, we should ask why it is so.
We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).

Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
This was counter productive. Work must be produced upstream, or it's wasted.

This happened just once or twice. And it was because people were ignorant of “joint" so they continued contributing as before (and people were pointed to right place when we had the opportunity).
And I disagree this was counterproductive because I took the effort to merge the changes into osvm. This worked fine until I stopped to do that job, but well… just one PR got stalled there for months and Alistair integrated it recently. 

What *did happen* and I’m still not ready to let it go is a lot of the small changes that we presented to be rejected (or ignored) without further consideration. But well, let’s keep it positive and not enter to sterile discussions, I just think you are wrong with this argument.

No, it's important, we (opensmalltalk-vm team) can't let such bad feeling and frustration creep in.
Every contribution counts, that does not mean that every PR will be accepted, but we owe an explanation if not.
Some are accepted instantly, some are accepted after modification requests, some are rejected (I hope with some rationalization).
What is problematic is that some were ignored for too long time, I regret the situation, but there is no deliberate intention to ignore them, just lack of manpower IMO.
For example, the recent work of Alistair shows that there is no fatality here, it's just that someone has to do the hard work (kudos!).

I’m sorry for have a bitter feeling, but I’m going to give you an example so you understand why I’m saying this: I proposed the refactor of Alien package (which is obviously right and simple) at least three times in last three years (by different means). First time I’ve been told “no, we prefer it like that”. Second time I’ve been told “we need to think about that” (and no answer later). Last time I even didn’t receive a response. So well, when Torsten proposed it he first came to me. I told him “talk in vm-dev and good luck”. His proposal was accepted (thanks god). 
Several situations like this one made me think that I’m a second class citizen in this community. And you know what? I’m 46 years and I do not want to be treated as is I’m a child that can play with the toys someone let me or not. So yes, I’m sad and not very “into” this this days. I made a lot of efforts to came back for the de facto fork we had because I always pushed to work together. But I do not see the spirit of collaboration I hoped. 

Torsten has brought the technical merits of doing so, and the cons of status quo, it was nothing personnal.
I don't know how the topic was brought on vm-dev. at that time. Probably the technical merit was not perceived then.
For me, you are in the opensmalltalk team, this does not mean 2nd zone (or the whole Smalltalk community is second zone then).

Also, the question is coupled with stability: having a red status does not help, for almost every PR that I accepted, I had to dig into travis console reports and compare to status of previous build in order to know if it was a regression, or just a long time failing case... This does not scale!

is worst that "does not scale".
again, let me put you an example: Imagine I want to work on FFI and I need to touch both an external file and VMMaker: I cannot do a PR because VMMaker is not there and VM building will be broken and I cannot push VMMaker because platform sources are not there and VM building will be broken too.
So, the only solution is what happens today: I need to push VMMaker and changes at the same time. So I’m forced to work on the development branch (we are all forced to work there), instead how I think it should be: each one should be able to work on their branch and contribute changes through PRs (PRs that can be validated with a good CI process).

As a consequence, since all contributions go to the development branch… we have no stability and we need to wait of the blessing of a VM. That may or may not happen.

Meh, whatever… is obvious that my way of work is different to most of the people of this community so I will go back to what I do now: just the absolutely necessary.

It's not at all about my way or your way, you don't have to make it something personal.
It' about the feasibility and sustainability of the different solutions.
You are right, branch development is not compatible with versionning of generated code because it leads to unsolvable merge conflicts.
Whatever diverging opinions, the very first condition for automating code generation is to have reproducible artifacts (generated code).
You know that it's not the case today, even if generating twice from the very same image.
We can't shove this problem under the carpet.
Otherwise, two different build of the same source could lead to different VM behavior, which would be worse than what we have today.

Beside, except Eliot, Clement and Ronaldo, most developers work on plugins.
Developping plugins in branches is still possible, unlike core VM. 
There is a reduced probability of conflict of gnerated code given the number of concurrent developpers today.
So we commit in trunk more often than strictly necessary IMO.



hi,

(concentrating on the positive)

yes and no :)
first, I find this discouraging for people that may jump in to help. 
second, most plugins are developed as part of VMMaker package, so the problem remains (guess who proposed to split VMMaker into SLANG, VMMaker, and Plugins* a lot of time ago? :P )

in my ideal illusory world, plugins would be decoupled from VMMaker, each one on their separated package. That would make what you propose possible.

cheers!
Esteban


Finally, the case you are describing is rare and mostly concern modification of core VM.
As said in other thread, dev cycle of core VM feature is long anyway (6+ month).
In this context, feature branch are not sustainable.
A good branch is a short branch, everything else is illusory and leads to merge nightmares.
In such conditions, one have to accept temporary unstability and organize release cycles with stabilisation phase.
How differently does the Pharo release cycle works?
You have raised good points concerning the organization of those cycles.
My answer does not change: take your place, throw inferiority complex away and participate.
You are very capable, and it will be beneficial to the community at large.

And don't take this answer as advocacy for the status quo.
I'm trying to analyze and identify locks.
It does not mean that we can't unlock :)

cheers
Nicolas

cheers, 
Esteban

ps: I will not continue discussing this… I know how things are and I’m sorry to put such a negative perspective in this list, but I needed to say it.


I'm all for more distributed power, and that should come with responsibilities, first a cooperative "you break it, you fix it" attitude.

Or maybe do you want clarified decision process?
For now, people that feel interested by a PR raise their voice.
I don't know if we need something more formal
For important design decisions there is vm-dev mailing list to discuss about that.

cheers
Nicolas
 
cheers,
Esteban

I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.

If you have constructive ideas that will help decoupling all these problems, we are all ear.

PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...

12