Hi all, I'm proposing a kind-of critical change that I believe is very good for the health of the system: I want that the startup of the system runs in maximum priority and becomes non-interruptable. Right now, when you save your image, the shutdown and startup are run in the same priority than the process that triggered the save (usually the ui or the command line, priority 40). This can cause lots of problems and race conditions: processes with higher priorities can interrupt the shutdown/startup and try to do something while the system is unstable. As a side effect also, when you use extensively the command line, you start stacking startup contexts from old sessions: ... session 3 ctxt 4 <- This guy makes a save and a new session starts session 3 ctxt 3 session 3 ctxt 2 session 3 ctxt 1 session 2 ctxt 4 <- This guy makes a save and a new session starts session 2 ctxt 3 session 2 ctxt 2 session 2 ctxt 1 session 1 ctxt 4 <- This guy makes a save and a new session starts session 1 ctxt 3 session 1 ctxt 2 session 1 ctxt 1 Old contexts are never collected, and the objects they referenced neither. To fix these two problems I propose to do every image save/session start in a new process in maximum priority. That way, other process should not be able to interrupt the startup process. Moreover, every session shutdown/startup should happen in a new clean process, to avoid the session stacking. For normal users, this should have no side effect at all. This change will have a good impact on people working on the debugger and the stack such as fueling-out the stack because they will have a cleaner stack. There is however a side-effect/design point to consider: startup actions should be quick to run. If a startup action requires to run a long-running action such as starting a server or managing a command line action, that should run in a separate process with lower priority (usually userPriority). In other words, the startup action should create a new process managing its action. If you want to review (and I'd be glad) Pull request: https://github.com/pharo-project/pharo/pull/198 Fogbugz issue: https://pharo.fogbugz.com/f/cases/20309 Current validation going on: https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/view/change-requests/job/PR-198/ Guille
|
I’m definitely in favour of doing something (as I found it confusing, and in a way also found it has the potential to leak what could be sensitive information through command line contexts that are still lurking in the image with details about previous directory structures and commands).
I will try and and see how it impacts PharoLambda as a test example. Tim
|
Hi guille thanks for raising this point. I will wait a bit that other people comment on it since it is a crucial point. Stef On Mon, Aug 14, 2017 at 4:47 PM, Tim Mackinnon <[hidden email]> wrote:
|
In reply to this post by Tim Mackinnon
On Mon, Aug 14, 2017 at 4:47 PM, Tim Mackinnon <[hidden email]> wrote:
Sure, go on and play with the build artifacts in here:
|
In reply to this post by Guillermo Polito
Hi Guille,
On Mon, Aug 14, 2017 at 3:42 AM, Guillermo Polito <[hidden email]> wrote:
I think you're mixing up two things here. One is wanting to make startup/shutdown uninterruptible by raising the priority of the process that does startup & shutdown. The other is starting the system with a new process. I don't want to express an opinion on the former, but I think the latter is not a good idea. One important feature of Smalltalks up until your proposed change is that snapshotAs:thenQuit: can be sent anywhere, causes a snapshot and resumes execution after the send of snapshotAs:thenQuit: once the internals of snapshotAs:thenQuit: have rebooted the system. This is really useful: - applications can snapshot the system as they choose for checkpointing, etc - for VM debugging one can execute the snapshotAs:thenQuit: anywhere (for example inside an inspector where self is bound to some interesting object) and follow it by whatever code one chooses, e.g. to provoked a bug. For example Smalltalk snapshotAs: 'foo' thenQuit: true. self doSomethingTerminal Your change will break this and make it much harder to debug certain things. One will be able to use the file-in scheme, but this implies the system will always run the compiler first, and one may not want that. Further, the system stacking looks like a bug that should be addressed elsewhere. Isn't it session management's responsibility for dropping older sessions?
_,,,^..^,,,_ best, Eliot |
Hi Eliot,
On Mon, Aug 14, 2017 at 9:07 PM, Eliot Miranda <[hidden email]> wrote:
Yes, we agree they are two different issues.
But this behaviour is kept in my proposal :). Actually, what I propose is mainly changing the following: SessionManager >> #snapshot: save andQuit: shouldQuit -> renamed into #launchSnapshot: save andQuit: quit. and then define: snapshot: save andQuit: quit | isImageStarting wait | wait := Semaphore new. [ isImageStarting := self launchSnapshot: save andQuit: quit. wait signal ] forkAt: Processor highestPriority. wait wait. ^ isImageStarting Since the snapshot runs in max priority, users will be suspended until the startup is run in a safe manner. Moreover, I added a semaphore to handle the case where a process in max priority calls a snapshot. To me this proposal is cleaner than changing the priority of the current thread to the top priority, to then reset it to the old priority.
I did not get this one. Could you explain further? If you mean calling the command line handlers to install .st files for example, the command line handlers are called before anything else today in any case, as they are subscribed to the startup. So even if you saved your image from the world menu, the saved image will try to handle command line arguments on the next startup.
Yes, it's a different issue, I know. The current issue is that there is code that creates a snapshot as part of the startup. Starting up from a separate process is part of the solution for this (as a way to prevent it, whatever the users do). Thanks for looking into it :) Guille |
On Mon, Aug 14, 2017 at 12:26 PM, Guillermo Polito <[hidden email]> wrote:
OK, good.
But to accomplish that you can simply do snapshot: save andQuit: quit ^[self launchSnapshot: save andQuit: quit] valueUnpreemptively right?
Not if encapsulated in valueUnpreemptively.
Ignore me. It's a non-issue since your proposal doesn't change what I thought it did. I thought you were proposing to start the system in a new process so that the state of the system didn't include the process that did the snapshot. Forgive the noise.
_,,,^..^,,,_ best, Eliot |
Thanks for this discussion!
It will improve Guille proposal and Pharo :) I'm happy! And learning things. (ok I should prepare my lectures instead of reading. I guess that I will have to use pomodoro techniques to force me to work :) Stef On Mon, Aug 14, 2017 at 9:48 PM, Eliot Miranda <[hidden email]> wrote: > > > On Mon, Aug 14, 2017 at 12:26 PM, Guillermo Polito > <[hidden email]> wrote: >> >> Hi Eliot, >> >> On Mon, Aug 14, 2017 at 9:07 PM, Eliot Miranda <[hidden email]> >> wrote: >>> >>> Hi Guille, >>> >>> On Mon, Aug 14, 2017 at 3:42 AM, Guillermo Polito >>> <[hidden email]> wrote: >>>> >>>> Hi all, >>>> >>>> I'm proposing a kind-of critical change that I believe is very good for >>>> the health of the system: I want that the startup of the system runs in >>>> maximum priority and becomes non-interruptable. >>>> >>>> Right now, when you save your image, the shutdown and startup are run in >>>> the same priority than the process that triggered the save (usually the ui >>>> or the command line, priority 40). This can cause lots of problems and race >>>> conditions: processes with higher priorities can interrupt the >>>> shutdown/startup and try to do something while the system is unstable. As a >>>> side effect also, when you use extensively the command line, you start >>>> stacking startup contexts from old sessions: >>>> >>>> ... >>>> session 3 ctxt 4 <- This guy makes a save and a new session starts >>>> session 3 ctxt 3 >>>> session 3 ctxt 2 >>>> session 3 ctxt 1 >>>> session 2 ctxt 4 <- This guy makes a save and a new session starts >>>> session 2 ctxt 3 >>>> session 2 ctxt 2 >>>> session 2 ctxt 1 >>>> session 1 ctxt 4 <- This guy makes a save and a new session starts >>>> session 1 ctxt 3 >>>> session 1 ctxt 2 >>>> session 1 ctxt 1 >>>> >>>> Old contexts are never collected, and the objects they referenced >>>> neither. >>>> >>>> To fix these two problems I propose to do every image save/session start >>>> in a new process in maximum priority. That way, other process should not be >>>> able to interrupt the startup process. Moreover, every session >>>> shutdown/startup should happen in a new clean process, to avoid the session >>>> stacking. >>> >>> >>> I think you're mixing up two things here. One is wanting to make >>> startup/shutdown uninterruptible by raising the priority of the process that >>> does startup & shutdown. The other is starting the system with a new >>> process. >> >> >> Yes, we agree they are two different issues. >> >>> >>> >>> I don't want to express an opinion on the former, but I think the latter >>> is not a good idea. One important feature of Smalltalks up until your >>> proposed change is that snapshotAs:thenQuit: can be sent anywhere, causes a >>> snapshot and resumes execution after the send of snapshotAs:thenQuit: once >>> the internals of snapshotAs:thenQuit: have rebooted the system. This is >>> really useful: >>> >>> - applications can snapshot the system as they choose for checkpointing, >>> etc >>> - for VM debugging one can execute the snapshotAs:thenQuit: anywhere (for >>> example inside an inspector where self is bound to some interesting object) >>> and follow it by whatever code one chooses, e.g. to provoked a bug. For >>> example >>> Smalltalk snapshotAs: 'foo' thenQuit: true. >>> self doSomethingTerminal >>> >>> Your change will break this and make it much harder to debug certain >>> things. >> >> >> But this behaviour is kept in my proposal :). > > > OK, good. > > >> >> Actually, what I propose is mainly changing the following: >> >> SessionManager >> #snapshot: save andQuit: shouldQuit >> -> renamed into #launchSnapshot: save andQuit: quit. >> >> and then define: >> >> snapshot: save andQuit: quit >> | isImageStarting wait | >> wait := Semaphore new. >> [ >> isImageStarting := self launchSnapshot: save andQuit: quit. >> wait signal >> ] forkAt: Processor highestPriority. >> wait wait. >> ^ isImageStarting >> >> >> Since the snapshot runs in max priority, users will be suspended until the >> startup is run in a safe manner. Moreover, I added a semaphore to handle the >> case where a process in max priority calls a snapshot. > > > But to accomplish that you can simply do > > snapshot: save andQuit: quit > ^[self launchSnapshot: save andQuit: quit] valueUnpreemptively > > > right? > >> >> To me this proposal is cleaner than changing the priority of the current >> thread to the top priority, to then reset it to the old priority. > > > Not if encapsulated in valueUnpreemptively. > >> >>> >>> One will be able to use the file-in scheme, but this implies the system >>> will always run the compiler first, and one may not want that. >> >> >> I did not get this one. Could you explain further? > > > Ignore me. It's a non-issue since your proposal doesn't change what I > thought it did. I thought you were proposing to start the system in a new > process so that the state of the system didn't include the process that did > the snapshot. Forgive the noise. > >> >> If you mean calling the command line handlers to install .st files for >> example, the command line handlers are called before anything else today in >> any case, as they are subscribed to the startup. So even if you saved your >> image from the world menu, the saved image will try to handle command line >> arguments on the next startup. >> >>> >>> Further, the system stacking looks like a bug that should be addressed >>> elsewhere. Isn't it session management's responsibility for dropping older >>> sessions? >> >> >> Yes, it's a different issue, I know. The current issue is that there is >> code that creates a snapshot as part of the startup. Starting up from a >> separate process is part of the solution for this (as a way to prevent it, >> whatever the users do). >> >> Thanks for looking into it :) >> >> Guille > > > > > -- > _,,,^..^,,,_ > best, Eliot |
In reply to this post by Guillermo Polito
In case any of the shutdown/startup scripts use a delay, now or in the future,
I'd first try at highestPriority-1 to avoid influence on the DelayScheduler. but then Eliot's suggestion to valueUnpreemptively may avoid that anyway. btw, what happens if an error occurs inside valueUnpreemptively? Does the normal priority debugger still get to run? cheers -ben On Mon, Aug 14, 2017 at 6:42 PM, Guillermo Polito <[hidden email]> wrote:
|
Hi Ben,
It does not.
BTW, on a related note, on snapshot, the primitive should not do a GC. Instead, the system should go a GC and allow any resulting finalization complete before the snapshot and well before the quit. Consider a buffered file that is flushed on finalization. This must happen before the snapshot. The GC in the snapshot is fine for ensuring the snapshot file is compact but very wrong for adding things to the finalization queue.
|
In reply to this post by Ben Coman
On Tue, Aug 15, 2017 at 3:25 PM, Ben Coman <[hidden email]> wrote:
Why should a shutdown/startup use a delay? startup and shutdown should be fast and not be blocked... If there is a delay on client code, it should block a client thread, not a system thread... Moreover, I see a series of issues in having the delay process running in higher priority than the startup. If I'm wrong, please correct me because otherwise that means there is something I'm not getting. First, today the Delay scheduling process is being terminated on shutdown and being re-initialized on startup. This means that even if a shutdown/startup action tries to use a delay that will fail/block indefinitely? Second, what happens with race conditions between the startup and the delay process? If the shutdown is in the middle of terminating the delay process and the delay process gets suddenly activated? stopTimerEventLoop "Stop the timer event loop" timerEventLoop ifNotNil: [ timerEventLoop terminate ]. timerEventLoop := nil. Maybe before terminating the timerEventLoop we need to suspend it? That will at least atomically (primitive) remove the process from the ready list and avoid it from being activated again, no? In any case, I see no good in letting a delay work on startup. That is far too low level and the system would be in a far too unstable state to run any code other than the startup itself.
|
There is possibility where delay can be used during startup/shutdown. Library can clean resources which are managed by kind of pool which organizes timeout logic to enter synchronization monitor. I checked Seamless which manages connections this way. But I not found any issue there. 2017-08-16 1:46 GMT+02:00 Guillermo Polito <[hidden email]>:
|
On Wed, Aug 16, 2017 at 10:50 AM, Denis Kudriashov <[hidden email]> wrote:
Can you explain this in more detail? I want to know exactly what "clean resources", "kind of pool" and "timeout logic to enter synchronization monitor" mean concretely. Also, how are you subscribing seamless to the startup list?
|
2017-08-16 12:02 GMT+02:00 Guillermo Polito <[hidden email]>:
Seamless server cleans all opened connections on image save. It registers using:
I copied this logic from ZnServer. But Seamless manages connections using ObjectPool. So when image save is performed "connectionPool clear" is evaluated. It closes connections and reset all caches. Problem that ObjectPool is protected by Monitor with timeout option. And #clear method enters this monitor. It is possible that at the time of image save monitor will be busy and #clear method will wait for delay to enter critical section.
|
On Wed, Aug 16, 2017 at 12:13 PM, Denis Kudriashov <[hidden email]> wrote:
Does this means that there may be a timeout while closing connections during shutdown? Because in that case the session manager will continue shutting down things and seamless may be not fully cleaned up upon next startup.
|
2017-08-16 12:28 GMT+02:00 Guillermo Polito <[hidden email]>:
Yes. It is possible. But it looks like very rare case. Interesting if there is real solution to this. From the other side I am not sure why connections should be closed when image is saved. In case of Seamless pool is constructed in the way that it checks if socket is valid before borrow it to user.
|
On Wed, Aug 16, 2017 at 1:21 PM, Denis Kudriashov <[hidden email]> wrote:
Well... maybe we need some actions that can "cancel" the shutdown? Like in the operating system, when there is an app that cannot be closed, it asks you to force close or not... Otherwise, I'd advice that at shutdown seamless should be in any case safer and avoid that case. Maybe you want to clear the pool without a timeout? Maybe you want to force kill connections without waiting for them (because they may never end)?
Well, that's a seamless issue, isn't it? Did you try not subscribing seamless to the startup to see if it still behaves well?
|
Well, that's a seamless issue, isn't it? Did you try not subscribing seamless to the startup to see if it still behaves well? It is example to show that delay can be used during snapshot logic. And it can be hidden inside libraries. In my case it is hidden inside ObjectPool. And I of course not thought about #clear method when setup timeout. I think implementations of database connection pool can be affected too. What we have? Voyage? (I tested Seamless scenario: it works fine without cleanup) 2017-08-16 13:28 GMT+02:00 Guillermo Polito <[hidden email]>:
|
In reply to this post by Guillermo Polito
On Wed, Aug 16, 2017 at 7:46 AM, Guillermo Polito <[hidden email]> wrote:
okay. My response was a just reflex to not assume what some third-party developer might do. I'm not familiar with the startup/shutdown and had not considered it deeply.
That is not how I understand it works. The DelayScheduler process is not terminated, just suspended by DelayMicrosecondScheduler>>shutdown. #stopTimerEventLoop is not called during shutdown.
That is effectively only called when changing which between different DelayScheduler-implementations.
I'm not completely happy with how the DelayScheduler shutdown happens now, whether some case might cause timingSemaphore to be triggered to run, and how #save/#restoreResumptionTimes are called from low-priority code which feels susceptible to race conditions (although I've not identified any). A while a go I was playing with the idea that to **ensure** the timerEventLoop doesn't run, you add another semaphore in middle activated by #shutDown, - but it was near a Pharo 5 release so I didn't push it in, and then failed to get back to it. From memory something like... DelayScheduler>>handleTimerEventLoop: microsecondNowTick .... suspendDelaysSemaphore ifNotNil: [ self saveResumptionTimes. suspendDelaysSemaphore wait. suspendDelaysSemaphore := nil. self restoreResumptionTimes. ]. DelayScheduler>>shutDown suspendDelaysSemaphore := Semaphore new. DelayScheduler>>startUp suspendDelaysSemaphore signal. I feel a dedicated sempahore would be more robust to control of the suspension of the DelayScehduler during shutdown/start, rather than leaving it waiting on the timingSemaphore that several things interact with and hoping none signal it. Maybe it time for me to try for my first Pharo 7 contribution. cheers -ben
|
Free forum by Nabble | Edit this page |