Hello, i trying to provide a fix to the issue
http://bugs.squeak.org/view.php?id=6822 what is needed to make following code work w/o error: |sema proc | sema := Semaphore new. proc := [ sema wait. self error: 'should not be there' ] fork. Processor yield. proc suspend. proc resume. The problem here, is that once you issue a #suspend on a process it puts nil to its myList ivar, ignoring the fact that it could be waiting on semaphore. This leads to disconnecting the process from semaphore, and any subsequent 'sema signal' will be simply lost: suspend | oldList | <primitive: 88> "primitive will fail, if receiver is not active process, which is just our case" myList ifNil:[^nil]. oldList := myList. myList := nil. oldList remove: self ifAbsent:[]. ^oldList the possible fix would be: - add an instance variable to Process, say 'semaphore'. - fix the #suspend to following: suspend | oldList | <primitive: 88> "primitive will fail, if receiver is not active process, which is just our case" myList ifNil:[^nil]. oldList := myList. myList := nil. oldList remove: self ifAbsent:[]. oldList isKindOf: Semaphore ifTrue: [ semaphore := oldList ]. ^oldList - then, in the #resume resume suspendedContext ifNil: [^ self primitiveFailed]. semaphore ifNotNil: [ semaphore resumeWaiting: self. semaphore := nil. ^ self. ] ^ self primitiveResume and Semaphore>>resumeWaiting: aProcess excessSignals>0 ifTrue: [excessSignals := excessSignals-1. aProcess primitiveResume. "resume immediately" ] ifFalse: [self addLastLink: aProcess] ---------- An attached file is the changeset which makes following test to work w/o failure: |sema proc bool | bool := true. sema := Semaphore new. proc := [ sema wait. bool ifTrue: [self error: 'should not be there']. Transcript show: 'semaphore signaled' ] fork. Processor yield. proc newSuspend. proc newResume. self assert: (sema last == proc). bool := false. sema signal. P.S. i didn't checked how this fix will affect other things around processes, like mentioned by Mike: Note that Process>>signalException: (as of in Squeak 3.8, it may be fixed in later versions?) relies on this broken behaviour. -- Best regards, Igor Stasenko AKA sig. process-suspend-fix.1.cs (3K) Download Attachment |
Hi Igor -
Having spent more time than I dare admitting in debugging multi-process interactions on heavily utilized servers I can tell you that your fix isn't going to work very well ;-) If only because it relies on #suspend being non-atomic which was one of the largest sources of problems when used with semaphores. Atomic suspension is absolutely critical since the VM may or may not decide to switch processes when you are in the midst of modifying that list and when this happens you're in for a very bad time (processes will vanish in thin air and garbage collected). Making Process>>suspend be atomic in all circumstances was one of the major advances for stability on our servers and I'm kinda surprised these changes are not in the latest VMs (I will verify this later). One thing I'm curious about is what is the use case are you looking at? I have never come across this particular problem in practice since explicit suspend and resume operations are exceptionally rare outside of the debugger. That said, I am not certain the behavior is "broken" to gegin with, in particular consider the alternatives. Depending on what problem you're trying to solve, you might want to work around the issue by utilizing the return value from #suspend. For example, doing something like: list := process suspend. list resumeProcess: process. and then implement Semaphore>>resumeProcess: aProcess "The process was waiting on this semaphore. Put it back on." Of course, this also illustrates one of the problems with doing this in general which is: Are you going to put the process at the beginning of the list? In the middle? At the end? Since any decision probably depends on application specific context, one could argue that the best way to deal with it then is along the lines of, say: list := process suspend. list ifNil:[process resume "process was active or suspended already"] ifNotNil:[list addFirst: process] "put process back on list" But of course the latter then raises the issue that if this is an already signaled semaphore this shouldn't be doing that either. Icky stuff. Which is why I really do prefer the current behavior which gives you a consistent behavior across the board. #suspend takes a process of a semaphore, #resume simply makes the process runnable again. It may not be what you expected it to be but it is consistent and sometimes this is all you can ask for. If you need specifically different behavior you can implement that in a way described above but it isn't clear to me whether any of these would be improvements because when it comes to the edge cases they are just as problematic as the current solution. Cheers, - Andreas Igor Stasenko wrote: > Hello, i trying to provide a fix to the issue > http://bugs.squeak.org/view.php?id=6822 > what is needed to make following code work w/o error: > > |sema proc | > sema := Semaphore new. > proc := [ sema wait. self error: 'should not be there' ] fork. > Processor yield. > proc suspend. > proc resume. > > The problem here, is that once you issue a #suspend on a process it > puts nil to its myList ivar, ignoring the fact that it could be > waiting on semaphore. > This leads to disconnecting the process from semaphore, and any > subsequent 'sema signal' will be simply lost: > > suspend > | oldList | > <primitive: 88> "primitive will fail, if receiver is not active > process, which is just our case" > myList ifNil:[^nil]. > oldList := myList. > myList := nil. > oldList remove: self ifAbsent:[]. > ^oldList > > the possible fix would be: > - add an instance variable to Process, say 'semaphore'. > > - fix the #suspend to following: > > suspend > | oldList | > <primitive: 88> "primitive will fail, if receiver is not active > process, which is just our case" > myList ifNil:[^nil]. > oldList := myList. > myList := nil. > oldList remove: self ifAbsent:[]. > oldList isKindOf: Semaphore ifTrue: [ semaphore := oldList ]. > ^oldList > > > - then, in the #resume > > resume > suspendedContext ifNil: [^ self primitiveFailed]. > semaphore ifNotNil: [ > semaphore resumeWaiting: self. > semaphore := nil. > ^ self. > ] > ^ self primitiveResume > > and > > Semaphore>>resumeWaiting: aProcess > excessSignals>0 > ifTrue: [excessSignals := excessSignals-1. > aProcess primitiveResume. "resume immediately" ] > ifFalse: [self addLastLink: aProcess] > > ---------- > An attached file is the changeset which makes following test to work > w/o failure: > > |sema proc bool | > bool := true. > sema := Semaphore new. > proc := [ sema wait. bool ifTrue: [self error: 'should not be there']. > Transcript show: 'semaphore signaled' ] fork. > Processor yield. > proc newSuspend. > proc newResume. > self assert: (sema last == proc). > bool := false. > sema signal. > > > P.S. i didn't checked how this fix will affect other things around > processes, like mentioned by Mike: > > Note that Process>>signalException: (as of in Squeak 3.8, it may be > fixed in later versions?) relies on this broken behaviour. > > > > > ------------------------------------------------------------------------ > > |
2009/4/28 Andreas Raab <[hidden email]>:
> Hi Igor - > > Having spent more time than I dare admitting in debugging multi-process > interactions on heavily utilized servers I can tell you that your fix isn't > going to work very well ;-) If only because it relies on #suspend being > non-atomic which was one of the largest sources of problems when used with > semaphores. Atomic suspension is absolutely critical since the VM may or may > not decide to switch processes when you are in the midst of modifying that > list and when this happens you're in for a very bad time (processes will > vanish in thin air and garbage collected). Making Process>>suspend be atomic > in all circumstances was one of the major advances for stability on our > servers and I'm kinda surprised these changes are not in the latest VMs (I > will verify this later). > VM, just because i sure that everyone who is interested in this topic, understands it by default :) > One thing I'm curious about is what is the use case are you looking at? I > have never come across this particular problem in practice since explicit > suspend and resume operations are exceptionally rare outside of the > debugger. > Well, i discovered this issue when wanted to make a current process be the only process which can be running during a certain period of time - without any chance for being interrupted by another, higher priority process. And how naive i am, thinking that doing simply suspend/resume will do what i want to. :) IMO, the suspend/resume should guarantee that suspended process will not perform any actions under any circumstances and be able to continue normally after issuing corresponding #resume. As my issue illustrates, this is not true for processes which is waiting on semaphore. Ask yourself, why a developer, who may want to suspend any process (regardless of his intents) to resume it later, should make any assertions like "what will be broken if i suspend it?". > That said, I am not certain the behavior is "broken" to gegin with, in > particular consider the alternatives. Depending on what problem you're > trying to solve, you might want to work around the issue by utilizing the > return value from #suspend. For example, doing something like: > > list := process suspend. > list resumeProcess: process. > > and then implement > > Semaphore>>resumeProcess: aProcess > "The process was waiting on this semaphore. Put it back on." > Of course, there is alternatives. " sarcasm starts here " For instance an integer multiplication for positive a*b can be implemented as: result := 0. 1 to: a do: [:i | result := result + b ]. ^ result and then we could tell developers how to implement a 'rare' cases when they need to multiply non-positive integers, or floating point values. and so on. Depending on their problem, we could always propose a solution :) " end of sarcasm " > Of course, this also illustrates one of the problems with doing this in > general which is: Are you going to put the process at the beginning of the > list? In the middle? At the end? Since any decision probably depends on > application specific context, one could argue that the best way to deal with > it then is along the lines of, say: > > list := process suspend. > list > ifNil:[process resume "process was active or suspended already"] > ifNotNil:[list addFirst: process] "put process back on list" > > But of course the latter then raises the issue that if this is an already > signaled semaphore this shouldn't be doing that either. Icky stuff. > > Which is why I really do prefer the current behavior which gives you a > consistent behavior across the board. #suspend takes a process of a > semaphore, #resume simply makes the process runnable again. It may not be > what you expected it to be but it is consistent and sometimes this is all > you can ask for. If you need specifically different behavior you can > implement that in a way described above but it isn't clear to me whether any > of these would be improvements because when it comes to the edge cases they > are just as problematic as the current solution. > multiple processes waiting on a single semaphore. It is correct only for a single process. If we suppose, that we are running in ideal environment, where processes are running in parallel, then nothing prevents us to implement waiting as following: Semaphore>>wait | flag | flag := Array with: false with: nil. "a second slot will be used for the 'next' pointer, for chaining the list of consumers who waiting on signal" self scheduleFlagToSignal: flag. [ flag at: 1 ] whileFalse: [] it is clear, that to exit from loop, there should be external process who alters the flag from false to true. Also, it is clear, that suspending the loop in the middle can't affect other processes which possibly waiting for same semaphore signal, as well as it guarantees that semaphore signal will be consumed in correct order. This example illustrates the fundamental difference between waiting and suspending/preemption, while in Squeak, unfortunately there is an equal sign between them (semaphores use a VM's preemption mechanism to implement waiting). If we take this example as a base, then of course we can implement a correct scenario which deals with preemption and scheduling. > Cheers, > - Andreas > > > Igor Stasenko wrote: >> >> Hello, i trying to provide a fix to the issue >> http://bugs.squeak.org/view.php?id=6822 >> what is needed to make following code work w/o error: >> >> |sema proc | >> sema := Semaphore new. >> proc := [ sema wait. self error: 'should not be there' ] fork. >> Processor yield. >> proc suspend. >> proc resume. >> >> The problem here, is that once you issue a #suspend on a process it >> puts nil to its myList ivar, ignoring the fact that it could be >> waiting on semaphore. >> This leads to disconnecting the process from semaphore, and any >> subsequent 'sema signal' will be simply lost: >> >> suspend >> | oldList | >> <primitive: 88> "primitive will fail, if receiver is not active >> process, which is just our case" >> myList ifNil:[^nil]. >> oldList := myList. >> myList := nil. >> oldList remove: self ifAbsent:[]. >> ^oldList >> >> the possible fix would be: >> - add an instance variable to Process, say 'semaphore'. >> >> - fix the #suspend to following: >> >> suspend >> | oldList | >> <primitive: 88> "primitive will fail, if receiver is not active >> process, which is just our case" >> myList ifNil:[^nil]. >> oldList := myList. >> myList := nil. >> oldList remove: self ifAbsent:[]. >> oldList isKindOf: Semaphore ifTrue: [ semaphore := oldList ]. >> ^oldList >> >> >> - then, in the #resume >> >> resume >> suspendedContext ifNil: [^ self primitiveFailed]. >> semaphore ifNotNil: [ >> semaphore resumeWaiting: self. >> semaphore := nil. >> ^ self. >> ] >> ^ self primitiveResume >> >> and >> >> Semaphore>>resumeWaiting: aProcess >> excessSignals>0 >> ifTrue: [excessSignals := excessSignals-1. >> aProcess primitiveResume. "resume immediately" ] >> ifFalse: [self addLastLink: aProcess] >> >> ---------- >> An attached file is the changeset which makes following test to work >> w/o failure: >> >> |sema proc bool | >> bool := true. >> sema := Semaphore new. >> proc := [ sema wait. bool ifTrue: [self error: 'should not be there']. >> Transcript show: 'semaphore signaled' ] fork. >> Processor yield. >> proc newSuspend. >> proc newResume. >> self assert: (sema last == proc). >> bool := false. >> sema signal. >> >> >> P.S. i didn't checked how this fix will affect other things around >> processes, like mentioned by Mike: >> >> Note that Process>>signalException: (as of in Squeak 3.8, it may be >> fixed in later versions?) relies on this broken behaviour. >> >> >> >> >> ------------------------------------------------------------------------ >> >> > > > -- Best regards, Igor Stasenko AKA sig. |
2009/4/28 Igor Stasenko <[hidden email]>:
[snip] > > If we suppose, that we are running in ideal environment, where > processes are running in parallel, then nothing prevents us to > implement waiting as following: > > Semaphore>>wait > | flag | > flag := Array with: false with: nil. "a second slot will be used > for the 'next' pointer, for chaining the list of consumers who waiting > on signal" > > self scheduleFlagToSignal: flag. > > [ flag at: 1 ] whileFalse: [] > > it is clear, that to exit from loop, there should be external process > who alters the flag from false to true. > Also, it is clear, that suspending the loop in the middle can't affect > other processes which possibly waiting for same semaphore signal, as > well as it guarantees that semaphore signal will be consumed in > correct order. > environment, except that it will waste the CPU cycles in loop when waiting. > This example illustrates the fundamental difference between waiting > and suspending/preemption, while in Squeak, unfortunately there is an > equal sign between them (semaphores use a VM's preemption mechanism to > implement waiting). > If we take this example as a base, then of course we can implement a > correct scenario which deals with preemption and scheduling. > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Igor Stasenko
Igor Stasenko wrote:
> 2009/4/28 Andreas Raab <[hidden email]>: >> One thing I'm curious about is what is the use case are you looking at? I >> have never come across this particular problem in practice since explicit >> suspend and resume operations are exceptionally rare outside of the >> debugger. > > Well, i discovered this issue when wanted to make a current process be > the only process which can be running during a certain period of time > - without any chance for being interrupted by another, higher priority > process. I'm missing some context here. How does this issue relate to sending a process suspend; resume and expect it to keep waiting on a semaphore? If I'd have to solve this problem I would just bump the process' priority temporarily. > IMO, the suspend/resume should guarantee that suspended process will > not perform any actions under any circumstances and be able to > continue normally after issuing corresponding #resume. > As my issue illustrates, this is not true for processes which is > waiting on semaphore. Yes, but even with your proposal this wouldn't be true since a process suspended from some position in the list wouldn't be put back on in the same position. In practice, there are *severe* limits to that statement about how the system "ought" to behave when you run hundreds of processes with some 50,000 network interrupts per second behind a Tweak UI ;-) I think I can prove that your implied definition of "continuing normally" is impossible to achieve in any system that has to deal with asynchronous signals. > Ask yourself, why a developer, who may want to suspend any process > (regardless of his intents) to resume it later, should make any > assertions like "what will be broken if i suspend it?". Thus my question about use cases. I haven't seen many uses of suspend outside of the debugger. And I don't think that's by accident - suspend is very tricky to deal with in a realistic setting that needs to deal with asynchronous signals. Most of the time it is a last resort solution (i.e., don't care too much about what happens afterwards) not something that you would do casually and expect to be side-effect free. > Right. My proposal doesn't deals correctly with cases when there are > multiple processes waiting on a single semaphore. > It is correct only for a single process. > > If we suppose, that we are running in ideal environment, where > processes are running in parallel, then nothing prevents us to > implement waiting as following: The problem is that in a "real" environment signals are asynchronous. Unless you have some way of stopping time and other external interrupts at the same time you simply cannot guarantee that after the #suspend there isn't an external signal which causes some other process waiting on the semaphore to execute before the process that "ought" to be released. For example, just consider a mutex where for some reason ordering matters like in Tweak (which does break if processes are not put back in the same order in which they were taken off the list): You have a process which holds the mutex, two more are waiting. You send #suspend to the first (waiting) one, it is off-list. Now the current mutex owner leaves that mutex. What should happen? Should the entire mutex stall because the process that was supposed to go next was suspended? If it proceeds it changes the ordering and that can cause all sorts of problems (as I found out when testing some earlier versions of the semaphore fixes that weren't quite correct ;-) That is the main problem with the whole idea of trying to have fine-grained control of processes externally - you cannot know the precise circumstances when these things happen and whether issues such as ordering matter which is why it is generally better to do this either implicitly (using only priorities) or by messaging (send a signal/message to the process and have it pick it up later). Outside of that transparent external control of process execution ranges somewhere between very tricky and plain impossible when asynchronous signals are involved. Cheers, - Andreas |
2009/4/28 Andreas Raab <[hidden email]>:
> Igor Stasenko wrote: >> >> 2009/4/28 Andreas Raab <[hidden email]>: >>> >>> One thing I'm curious about is what is the use case are you looking at? I >>> have never come across this particular problem in practice since explicit >>> suspend and resume operations are exceptionally rare outside of the >>> debugger. >> >> Well, i discovered this issue when wanted to make a current process be >> the only process which can be running during a certain period of time >> - without any chance for being interrupted by another, higher priority >> process. > > I'm missing some context here. How does this issue relate to sending a > process suspend; resume and expect it to keep waiting on a semaphore? If I'd > have to solve this problem I would just bump the process' priority > temporarily. > Process suspension is a STRONG guarantee that given process will not perform any actions,until it receives #resume. Priority is a wrong way to ensure this: different VMs could break this contract easily , while breaking a #suspend contract is something what doesn't fits in my mind :) >> IMO, the suspend/resume should guarantee that suspended process will >> not perform any actions under any circumstances and be able to >> continue normally after issuing corresponding #resume. >> As my issue illustrates, this is not true for processes which is >> waiting on semaphore. > > Yes, but even with your proposal this wouldn't be true since a process > suspended from some position in the list wouldn't be put back on in the same > position. In practice, there are *severe* limits to that statement about how > the system "ought" to behave when you run hundreds of processes with some > 50,000 network interrupts per second behind a Tweak UI ;-) I think I can > prove that your implied definition of "continuing normally" is impossible to > achieve in any system that has to deal with asynchronous signals. > Do not try to scare me with numbers: if things working correctly for 2-3 processes, why they should fail for 50000? ;) Certainly, the problem is to correctly identify a set of operations which require atomicity (at language side and at VM side, if its using many native threads). But if its done right, then who cares about numbers? >> Ask yourself, why a developer, who may want to suspend any process >> (regardless of his intents) to resume it later, should make any >> assertions like "what will be broken if i suspend it?". > > Thus my question about use cases. I haven't seen many uses of suspend > outside of the debugger. And I don't think that's by accident - suspend is > very tricky to deal with in a realistic setting that needs to deal with > asynchronous signals. Most of the time it is a last resort solution (i.e., > don't care too much about what happens afterwards) not something that you > would do casually and expect to be side-effect free. > Suspeding process is an explicit way to control on what happens in your system. Many facilities can benefit from it, is we guarantee a certain contracts to be fullfilled. Actually, we are using suspend/resume every day, even without noticing it - consider an image snapshot/startup :) Does processes which were waiting for semaphore and saved in image in such state start working after startup as if semaphore signalled? Do such processes lose their 'wait' state? >> Right. My proposal doesn't deals correctly with cases when there are >> multiple processes waiting on a single semaphore. >> It is correct only for a single process. >> >> If we suppose, that we are running in ideal environment, where >> processes are running in parallel, then nothing prevents us to >> implement waiting as following: > > The problem is that in a "real" environment signals are asynchronous. Unless > you have some way of stopping time and other external interrupts at the same > time you simply cannot guarantee that after the #suspend there isn't an > external signal which causes some other process waiting on the semaphore to > execute before the process that "ought" to be released. > this is certanly doable, because at primitive (VM) level there is no other activity at language side, other than VM does. > For example, just consider a mutex where for some reason ordering matters > like in Tweak (which does break if processes are not put back in the same > order in which they were taken off the list): You have a process which holds > the mutex, two more are waiting. You send #suspend to the first (waiting) > one, it is off-list. Now the current mutex owner leaves that mutex. What > should happen? Should the entire mutex stall because the process that was > supposed to go next was suspended? If it proceeds it changes the ordering > and that can cause all sorts of problems (as I found out when testing some > earlier versions of the semaphore fixes that weren't quite correct ;-) > It should stall, of course, because first process who waiting on mutex should obtain it first. The fact that its suspended is not relevant. This is what i'm trying to say: waiting semantics should be kept separated from scheduling. A proof case is: mutex critical: [ proc := Processor activeProcess. [ proc suspend. self do something here. proc resume. ] fork. Processor yield. ]. it shows, that process which obtained a mutex can be suspended at any point of time. And it gives you the right answer: any other processes who waiting on same mutex will stall forever, until eventually, suspended process will be resumed and release the mutex. > That is the main problem with the whole idea of trying to have fine-grained > control of processes externally - you cannot know the precise circumstances > when these things happen and whether issues such as ordering matter which is > why it is generally better to do this either implicitly (using only > priorities) or by messaging (send a signal/message to the process and have > it pick it up later). Outside of that transparent external control of > process execution ranges somewhere between very tricky and plain impossible > when asynchronous signals are involved. > I do not agree. As i said before, priorities is a fluid essence, which simply shows to VM , which process have a better chance to take control over computing resources (other VMs can treat priority differently - like a percentage of computing resources which can be allocated for a given process and guarantee that all active processes will not starve during a certain period of time). I don't like implicit control, explicit is much more better , because it guarantees that under any circumstances your code will work same as before. I will try to implement a VM-side primitives which will guarantee atomicity for Semaphore wait/signal operations. Then we can continue our discussion using more grounded arguments. :) > Cheers, > - Andreas > > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote:
> 2009/4/28 Andreas Raab <[hidden email]>: >> Igor Stasenko wrote: >>> 2009/4/28 Andreas Raab <[hidden email]>: >>>> One thing I'm curious about is what is the use case are you looking at? I >>>> have never come across this particular problem in practice since explicit >>>> suspend and resume operations are exceptionally rare outside of the >>>> debugger. >>> Well, i discovered this issue when wanted to make a current process be >>> the only process which can be running during a certain period of time >>> - without any chance for being interrupted by another, higher priority >>> process. >> I'm missing some context here. How does this issue relate to sending a >> process suspend; resume and expect it to keep waiting on a semaphore? If I'd >> have to solve this problem I would just bump the process' priority >> temporarily. >> > > Process suspension is a STRONG guarantee that given process will not > perform any actions,until it receives #resume. This is not an answer to my (honest) question. I don't understand how suspending and resuming processes does allow you to solve your originally stated problem that is "running during a certain period of time without any chance for being interrupted by another, higher priority process". I can do what you are stating as the problem by changing the process priority, I don't see how suspending a process would even begin to address this. In short, I don't see what solution you are proposing to address your stated problem and why the issue with suspend and resume arises from that. > Priority is a wrong way to ensure this: different VMs could break this > contract easily , while breaking a #suspend contract is something what > doesn't fits in my mind :) I'll fix that when I have a reason to switch to a VM that has a different scheduling policy. I'm not into discussing problems I don't have. >>> IMO, the suspend/resume should guarantee that suspended process will >>> not perform any actions under any circumstances and be able to >>> continue normally after issuing corresponding #resume. >>> As my issue illustrates, this is not true for processes which is >>> waiting on semaphore. >> Yes, but even with your proposal this wouldn't be true since a process >> suspended from some position in the list wouldn't be put back on in the same >> position. In practice, there are *severe* limits to that statement about how >> the system "ought" to behave when you run hundreds of processes with some >> 50,000 network interrupts per second behind a Tweak UI ;-) I think I can >> prove that your implied definition of "continuing normally" is impossible to >> achieve in any system that has to deal with asynchronous signals. >> > > Do not try to scare me with numbers: if things working correctly for > 2-3 processes, why they should fail for 50000? ;) Because your assumption that the code is "working correctly" is just that - a wild guess about what might happen in what circumstances. Since there is no mathematically sound proof that the code is indeed "working correctly" (in fact you probably don't even have a definition of what it means to "work correctly"), running that code at 10,000 times the frequency allows you to find problems with a much higher probability than you would otherwise be able to. Seriously, we didn't find the problems we fixed over the last years by reasoning - we found them because the system came to a screeching halt often enough to allow us to find the issues. > Certainly, the problem is to correctly identify a set of operations > which require atomicity (at language side and at VM side, if its using > many native threads). But if its done right, then who cares about > numbers? Numbers show that your code *actually* works instead you just thinking it works. Do you really believe that people who wrote the code that we fixed did know that their code was buggy and were just too lazy to write correct code? Come on, get serious. >>> Ask yourself, why a developer, who may want to suspend any process >>> (regardless of his intents) to resume it later, should make any >>> assertions like "what will be broken if i suspend it?". >> Thus my question about use cases. I haven't seen many uses of suspend >> outside of the debugger. And I don't think that's by accident - suspend is >> very tricky to deal with in a realistic setting that needs to deal with >> asynchronous signals. Most of the time it is a last resort solution (i.e., >> don't care too much about what happens afterwards) not something that you >> would do casually and expect to be side-effect free. > > Suspeding process is an explicit way to control on what happens in your system. > Many facilities can benefit from it, is we guarantee a certain > contracts to be fullfilled. > Actually, we are using suspend/resume every day, even without noticing > it - consider an image snapshot/startup :) But that a) suspends *all* processes (in effect stops time) and b) it is not side-effect free and nobody expects it to be. In fact image snapshot is a great example of cooperating processes (including shutdown / startup processing) and why one cannot assume that external suspend / resume can be completely side effect free. > Does processes which were waiting for semaphore and saved in image in > such state start working after startup as if semaphore signalled? Do > such processes lose their 'wait' state? In the non-trivial cases, they do. That's because they are being shut down during the system shutdown and startup sequence. Check out what (for example) web servers do - they stop the listener process and restart it after the system has been restarted. But even that is beside the point because you are cherry-picking one particular aspect of image saving without looking at the parts that make this possible. Including for example, the *great* care Delay goes through when adjusting wakeup times, or the resource management necessary for sockets and files just so the rest of the system can *pretend* nothing of importance just happened. Or are you now suggesting that a process that is waiting on a delay must also adjust the wakeup time for that delay when it is suspended and resumed later? Close and reopen files? Shut down and reopen sockets? ;-) >> The problem is that in a "real" environment signals are asynchronous. Unless >> you have some way of stopping time and other external interrupts at the same >> time you simply cannot guarantee that after the #suspend there isn't an >> external signal which causes some other process waiting on the semaphore to >> execute before the process that "ought" to be released. >> > If you speaking about Squeak VM, and its green threading model then > this is certanly doable, because at primitive (VM) level there is no > other activity at language side, other than VM does. I don't understand that sentence. > It should stall, of course, because first process who waiting on mutex > should obtain it first. The fact that its suspended is not relevant. Well, I guess that's the end of this discussion for me. I have little interest to discuss hypotheticals. All of my needs are very practical. > This is what i'm trying to say: waiting semantics should be kept > separated from scheduling. > A proof case is: > > mutex critical: [ > proc := Processor activeProcess. > [ proc suspend. self do something here. proc resume. ] fork. > Processor yield. > ]. > > it shows, that process which obtained a mutex can be suspended at any > point of time. And it gives you the right answer: any other processes > who waiting on same mutex will stall forever, until eventually, > suspended process will be resumed and release the mutex. I don't get what you are trying to show with the above. Yes a process which holds a mutex can be suspended while it holds the mutex. This has been true forever and it has nothing to do with the case I was describing which was about the order at which processes arrive at and leave a mutex. > I do not agree. As i said before, priorities is a fluid essence, which > simply shows to VM , which process have a better chance to take > control over computing resources (other VMs can treat priority > differently - like a percentage of computing resources which can be > allocated for a given process and guarantee that all active processes > will not starve during a certain period of time). > I don't like implicit control, explicit is much more better , because > it guarantees that under any circumstances your code will work same as > before. > > I will try to implement a VM-side primitives which will guarantee > atomicity for Semaphore wait/signal operations. Then we can continue > our discussion using more grounded arguments. :) Good luck with that. I'll settle for a definition of "correct" behavior in the presence of asynchronous interrupts since it is really not clear to me what you mean when you say "correct". My problem here is that I've been knee-deep for years now into all of the things that *actually* happen that I have no illusions left about hypotheticals. I'll settle for a least-surprise approach that gives me "a" result consistently. I'll rather take a wrong result 100% of the time that a right result 95% of the time because the former you can learn to work around quickly whereas the latter may work for weeks and then fall over three times in a day. But you know all that already - just compare our discussion about the expected behavior of forking processes at the same priority. Obviously we have different opinions about these issues but in my experience with these (and related) issues whenever we removed uncertainty we improved robustness because it's the rare and unexpected things that really get you, not the obvious ones. Cheers, - Andreas |
Andreas Raab wrote:
> I'll settle for a definition of "correct" behavior > in the presence of asynchronous interrupts since it is really not clear > to me what you mean when you say "correct". I had a thought about that this morning. One possible definition of "correct" in this context may mean that a process in a "frozen" state (to disambiguate from a current "suspended" state) simply does not advance its pc but behaves identical to an unfrozen process in all other respects. The effect can be simulated (within reason) by something that simply drops the priority of a process to below the idle process' priority so that it just doesn't get any computational resources. The point being that a freezing a process doesn't even touch the suspended list. You could implement this for example by using negative priorities to indicate that a process is frozen, along the lines of: Process>>isFrozen "Answer if the process is frozen" ^priority < 0 Process>>freeze "This needs to be a primitive" self isFrozen ifTrue:[^self]. priority := priority negated. Process>>unfreeze "This needs to be a primitive" self isFrozen ifFalse:[^self]. priority := priority negated. Changing synchronousSignal: not to allow frozen processes to become runnable (i.e., test for negative priority and just ignore it if that's the case) would achieve most of what you need (modulo the primitives for freeze/unfreeze). Obviously the effects will be "interesting" (to say the least) if you freeze a process on its way into a mutex since that process will indeed hold the mutex until such a point that it becomes unfrozen. But I guess that is kind of what you were asking for after all ;-) Cheers, - Andreas |
2009/4/28 Andreas Raab <[hidden email]>:
> Andreas Raab wrote: >> >> I'll settle for a definition of "correct" behavior in the presence of >> asynchronous interrupts since it is really not clear to me what you mean >> when you say "correct". > > I had a thought about that this morning. One possible definition of > "correct" in this context may mean that a process in a "frozen" state (to > disambiguate from a current "suspended" state) simply does not advance its > pc but behaves identical to an unfrozen process in all other respects. > > The effect can be simulated (within reason) by something that simply drops > the priority of a process to below the idle process' priority so that it > just doesn't get any computational resources. > > The point being that a freezing a process doesn't even touch the suspended > list. You could implement this for example by using negative priorities to > indicate that a process is frozen, along the lines of: > > Process>>isFrozen > "Answer if the process is frozen" > ^priority < 0 > > Process>>freeze > "This needs to be a primitive" > self isFrozen ifTrue:[^self]. > priority := priority negated. > > Process>>unfreeze > "This needs to be a primitive" > self isFrozen ifFalse:[^self]. > priority := priority negated. > > Changing synchronousSignal: not to allow frozen processes to become runnable > (i.e., test for negative priority and just ignore it if that's the case) > would achieve most of what you need (modulo the primitives for > freeze/unfreeze). > > Obviously the effects will be "interesting" (to say the least) if you freeze > a process on its way into a mutex since that process will indeed hold the > mutex until such a point that it becomes unfrozen. But I guess that is kind > of what you were asking for after all ;-) > Good idea, except that you losing the original priority value. And freezing/suspending is very much synonyms as to me :) After your criticism (sorry i haven't commented it yet..), my thoughts vent somewhere else, trying to identify what is wrong with Squeak scheduling/concurrency and why its so brittle even at core levels (not mentioning an application levels which rely on it). I came to an idea , you might be interested in. As many of us know, some CPUs having a special mode - interrupt mode. What if we introduce the interrupt mode for scheduler? Processor interruptWith: aBlock "evaluate the block closure in interrupt mode. VM guarantees the atomicity of block evaluation (no external events could interrupt its evaluation, since processor are already in interrupt mode). A 'Processor interruptedProcess' holds a process which is being interrupted. Once block evaluation is done, VM continues with process which holding interruptedProcess ivar. In case of any external events(such as semaphore signal), which may occur during interrupt, they are queued in list. Interpreter continues to handle interrupts in interrupt mode until there are pending interrupts in its queue. If code sends #interruptWith: when already in interrupt mode , then block will be evaluated immediately, as if sending #value message to it " Now i trying to imagine, how a basic stuff might look like(please correct me if its utterly wrong way ;), if we will be able to use interrupt mode. First, task switching, and scheduling policy is in hands of developers, not VM: Processor>>initializeSchedulingAction "when VM going to switch tasks , like in wait/suspend/running to the bottom of stack, it picks a block closure , located in Processor's schedulingAction instance variable, and runs it as an interrupt. " schedulingAction := [ self interruptedProcess: self pickNextScheduledProcessBasedOnWhateverCriteria. ]. and next is trivial: Processor>>yield "yield the current process" self interruptWith: [ self schedulingAction value. ]. As a consequence of this, semaphores can be implemented as is (w/o the need of special primitives) Semaphore>>signal Processor interruptWith: [ self isEmpty ifTrue: [excessSignals := excessSignals+1] ifFalse: [Processor resume: self removeFirstLink] ]. Semaphore>>wait Processor interruptWith: [ excessSignals>0 ifTrue: [excessSignals := excessSignals-1] ifFalse: [self addLastLink: Processor activeProcess suspend] ] Delays: Delay>>wait "action & delayedProcess is instance variables" [ Processor interruptWith: [ delayedProcess := Processor interruptedProcess. Processor unschedule: delayedProcess. action := [ Processor rescheduleProcess: delayedProcess ]. self class scheduleDelay: self. Processor continueWithNextScheduledProcess. ]. ] ifCurtailed: [ action := [] ]. --- Delay class>>scheduleDelay: delay "gory details about picking the nearest delay to be signaled... " ActiveDelay := ... self primitiveScheduleAction: [ ActiveDelay action value. self sheduleNextDelay ] registerAsExternalObject atMilliseconds: pointInTime. A 'primitiveScheduleAction...' will schedule an interrupt to evaluate a closure which were registered in external objects table. Note, that since access to state (ActiveDelay) is possible only during interrupt phase (we can enforce this quite easy), we don't need an extra synchronizations like AccessProtect critical: []. So, the obvious benefit - a code is greatly simplified. Also, note the tricks with 'action' ivar of delay. An ifCurtailed: block sets action to empty block, so nothing bad happens when given delay will be activated by timerInterrupt. In similar way, we could add a 'signalAction' ivar to Semaphore, and replace it in case, if we don't want it to be performed (such in #terminate /#critical: case). And, what is now possible - a shift in mindset - use direct actions (closures), instead of semaphores. Lets get straight to the business: why do we need to wrap stuff with semaphores first, and only then do something? :) There could be many variations in actions - a different mixtures of waiting for semaphore + abandoning wait after timeout.. etc etc. There is almost no need for extra primitives, except one which gives you interrupt mode. Like in sockets, what if we instead of creating a bunch of semaphores: semaIndex := Smalltalk registerExternalObject: semaphore. readSemaIndex := Smalltalk registerExternalObject: readSemaphore. writeSemaIndex := Smalltalk registerExternalObject: writeSemaphore. and passing them to primitive: socketHandle := self primSocketCreateNetwork: 0 type: socketType receiveBufferSize: 8000 sendBufSize: 8000 semaIndex: semaIndex readSemaIndex: readSemaIndex writeSemaIndex: writeSemaIndex. pass the closures instead? Then, when something happens on socket -> VM will evaluate given closure in interrupt mode. And you're free to react on these events as you like. What you think? > Cheers, > - Andreas > > > -- Best regards, Igor Stasenko AKA sig. |
2009/4/28 Igor Stasenko <[hidden email]>:
> > Good idea, except that you losing the original priority value. And > freezing/suspending is very much synonyms as to me :) > Na, look better, Andreas keep the priority by just negating it... :) |
In reply to this post by Igor Stasenko
On 4/28/09, Igor Stasenko <[hidden email]> wrote:
> Hello, i trying to provide a fix to the issue > http://bugs.squeak.org/view.php?id=6822 > what is needed to make following code work w/o error: > > |sema proc | > sema := Semaphore new. > proc := [ sema wait. self error: 'should not be there' ] fork. > Processor yield. > proc suspend. > proc resume. > > The problem here, is that once you issue a #suspend on a process it > puts nil to its myList ivar, ignoring the fact that it could be > waiting on semaphore. > This leads to disconnecting the process from semaphore, and any > subsequent 'sema signal' will be simply lost: > > suspend > | oldList | > <primitive: 88> "primitive will fail, if receiver is not active > process, which is just our case" > myList ifNil:[^nil]. > oldList := myList. > myList := nil. > oldList remove: self ifAbsent:[]. > ^oldList > > the possible fix would be: > - add an instance variable to Process, say 'semaphore'. Personally, I'd add an boolean instance variable called "suspended" to Process, and then modify the scheduler (which unfortunately is in the VM code) to respect it. Gulik. -- http://gulik.pbwiki.com/ |
In reply to this post by Andreas.Raab
On 4/28/09, Andreas Raab <[hidden email]> wrote:
> Igor Stasenko wrote: >> Ask yourself, why a developer, who may want to suspend any process >> (regardless of his intents) to resume it later, should make any >> assertions like "what will be broken if i suspend it?". > > Thus my question about use cases. I haven't seen many uses of suspend > outside of the debugger. And I don't think that's by accident - suspend > is very tricky to deal with in a realistic setting that needs to deal > with asynchronous signals. Most of the time it is a last resort solution > (i.e., don't care too much about what happens afterwards) not something > that you would do casually and expect to be side-effect free. IMHO Process>>suspend should never by used in "normal" code. It should only be used from debuggers and system tools. But it should still be implemented to exhibit the correct behaviour. The above is exactly the use case where this bug has bitten me. When I was trying to debug concurrent code, the debugger would simply ignore Semaphore>>wait and step right over it! The debugger quickly became useless when I had to manually keep track of which semaphores were signalled and which weren't. Of course, the debugger would also need improvement to make sure that it doesn't suspend the entire GUI every time its simulated process waits on a semaphore, but that's another issue. > The problem is that in a "real" environment signals are asynchronous. > Unless you have some way of stopping time and other external interrupts > at the same time you simply cannot guarantee that after the #suspend > there isn't an external signal which causes some other process waiting > on the semaphore to execute before the process that "ought" to be released. By my understanding of how suspending a process should work, if a Process is suspended (by calling >>suspend) then no force on Earth other than called >>resume on it should resume it again. Any events or signals on it should accumulate until it is resumed. > For example, just consider a mutex where for some reason ordering > matters like in Tweak (which does break if processes are not put back in > the same order in which they were taken off the list): You have a > process which holds the mutex, two more are waiting. You send #suspend > to the first (waiting) one, it is off-list. Now the current mutex owner > leaves that mutex. What should happen? Should the entire mutex stall > because the process that was supposed to go next was suspended? If it > proceeds it changes the ordering and that can cause all sorts of > problems (as I found out when testing some earlier versions of the > semaphore fixes that weren't quite correct ;-) What list? Are you referring to the linked list that a Semaphore maintains? I would consider the linked list of Processes that Semaphores maintain to be an implementation detail of Processes and Semaphores. Your code should be written to be completely oblivious to it. I believe the correct behaviour of Semaphore>>signal should be that the next process to be run would be either the process doing the signalling, or any other process waiting on that semaphore. Assuming a multi-core capable VM, two processes might end up concurrrently continuing execution. There shouldn't be any guaranteed ordering in the resuming of processes; that's an implementation detail in the VM that could potentially change. And yes, I believe that in your example, it is correct that the entire mutex should "stall", meaning that the process that entered the mutex has entered a "suspended" state and all processes still waiting on that mutex remain in their "waiting" state. If the mutex didn't "stall", the debugger wouldn't be particularly helpful. Gulik. -- http://gulik.pbwiki.com/ |
Hi Michael,
responding to clear-up some apparent confusions. On Tue, Apr 28, 2009 at 3:29 PM, Michael van der Gulik <[hidden email]> wrote: On 4/28/09, Andreas Raab <[hidden email]> wrote: Define "normal" :) The above is exactly the use case where this bug has bitten me. When I Signals accumulate on Semaphores, not on processes.
This is more than an implementation detail. Processes are subclasses of Link that add a "myList" instance variable, and Semaphores subclasses of LinkedList. The runnable process lists in Processor are LinkedList instances. So a process is either the sole activeProcess and not on any list, or suspended, and not on any list, or on one of the runnable process lists, or waiting on a Semaphore. The "myList" instance variable is set to the list a process is waiting on or not. So a process is suspended iff myList == nil and the process ~~ Processor activeProcess. None of Processor (the access of the value of (Smalltalk associationAt: #Process), activeProcess (an inst var accessor) and #== or #and: are suspension points so one can write
Process isSuspended ^myList == nil and: [self ~~ Processor activeProcess] But wait!?!? The Squeak definition is simply Process methods for accessing isSuspended ^myList isNil
which is broken. Take for example the following two forks, the first of which gets added to the runnable process lists before it runs, the second of which runs immediately:
| s r |
s := Semaphore new.
[| thisProcess |
thisProcess := Processor activeProcess.
r := {(Processor instVarNamed: 'quiescentProcessLists') indexOf: (thisProcess instVarNamed: 'myList').
thisProcess priority.
thisProcess isSuspended }.
s signal] forkAt: Processor userBackgroundPriority.
s wait.
r
answers #(1 30 false)
| s r |
s := Semaphore new.
[| thisProcess |
thisProcess := Processor activeProcess.
r := {(Processor instVarNamed: 'quiescentProcessLists') indexOf: (thisProcess instVarNamed: 'myList').
thisProcess priority.
thisProcess isSuspended }.
s signal] forkAt: Processor activePriority + 20.
s wait.
r
answers #(0 60 true) So I'm confused too! There's a bug in the VM and in the definition of isSuspended. isSuspended needs to read isSuspended
^myList == nil and: [self ~~ Processor activeProcess] and the VM needs to set myList to nil in transferTo:. If I were thinking of a truly multi-process implementation I'd set myList to the process itself so that we could define
isSuspended ^myList == nil isActive
but one step at a timee. I believe the correct behaviour of Semaphore>>signal should be that Um, this seems very confused. The process doing the signalling should not be affected at all. Only processes waiting on the semaphore should be affected, and by definition the signalling process can't be waiting on the semapihre it seignals because it has to be runnable to be able to signal.
Further, yes there *must* be an ordering in the resumption order. It must be strictly FIFO for many scheduling algorithms to work. Even on a multi-core CPU the scheduler can be well-defined such that processes waiting on a semaphore become active in the order they are waiting on the semaphore. Even in a hypothetical multi-core VM with two concurrent processes simultaneously signalling a semaphore with two processes waiting on it the system would have to ensure that the two signals ended up scheduling both processes, not that both signals ended up somehow proceeding only one; i.e. the VM is going to have to serialize or mutually-exclude access to the semaphore to prevent signals being lost.
And yes, this is guaranteed, very conciously so. The Smalltalk scheduler is strictly a real-time scheduler preemptive across priority and cooperatively-scheduled round-robin within priority, such that all lower-priority runnable processes are preempted by any higher-priority runnable processes as soon as any higher-priority process becomes runnable. The only bug in this in Squeak (apart from the myList snafu above) is that when a lower-priority process is preempted by a higher one it gets added to the back of its scheduling queue, so within a priority processes can't rely on being co-operatively scheduled. This has been fixed in VisualWorks, where a preempted process is added to the front of its runnable process list. (But VisualWorks goes on to break Semaphores by making them priority queues where the first highest priority process is the one that gets run, which will break certain scheduling algorithms).
And yes, I believe that in your example, it is correct that the entire Cheers Eliot
|
On 4/29/09, Eliot Miranda <[hidden email]> wrote:
> Hi Michael, >> What list? Are you referring to the linked list that a Semaphore >> maintains? I would consider the linked list of Processes that >> Semaphores maintain to be an implementation detail of Processes and >> Semaphores. Your code should be written to be completely oblivious to >> it. > > > This is more than an implementation detail. Processes are subclasses of > Link that add a "myList" instance variable, and Semaphores subclasses of > LinkedList. The runnable process lists in Processor are LinkedList > instances. So a process is either the sole activeProcess > and not on any > list, or suspended, and not on any list, or on one of the runnable process > lists, or waiting on a Semaphore. The "myList" instance variable is set to > the list a process is waiting on or not. So a process is suspended iff > myList == nil and the process ~~ Processor activeProcess. <snip> Okay, so we have two types of "suspended" going on here: 1: The process is not currently running, but might be soon. 2: The process is in a special "suspended" state for debugging or management, and won't be scheduled. Again, you're also working with implementation details. As a programmer of "normal" code (where "normal" isn't defined here :-P ), I don't know or care whether Process is a subclass of Link or whether Semaphores are a subclass of LinkedList. They have protocols that I'd hopefully never need to use. That sort of stuff should be abstracted away from me. What I care about is that: * When I fork a process, conceptually it is running at the same time. How the processes are actually scheduled or run should not be of interest to me. * When a process waits on a semaphore, it won't restart until that semaphore is signalled... unless I've asked for a timeout. This includes when the process has been suspended and resumed - it should still remain in the "waiting" state. And either of (depending on who is right between us; discussion continued below): * When a semaphore is signalled, *some* process that was waiting on it shall, at some stage in the future depending on scheduling behaviour, resume execution. or * When a semaphore is signalled, the process that has been waiting on this semaphore the longest shall, at some stage in the future depending on scheduling behaviour, resume execution. (noting that the current implementation in Squeak resumes the waiting process immediately) And then: * If Process>>suspend is invoked on a Process, that Process enteres a suspended state and won't run until >>resume is invoked on it. (If the Process was waiting on a semaphore, it will continue waiting on that semaphore when resumed). This is my impression of the API that should be presented to application developers. > I believe the correct behaviour of Semaphore>>signal should be that >> the next process to be run would be either the process doing the >> signalling, or any other process waiting on that semaphore. Assuming a >> multi-core capable VM, two processes might end up concurrently >> continuing execution. There shouldn't be any guaranteed ordering in >> the resuming of processes; that's an implementation detail in the VM >> that could potentially change. > > > Um, this seems very confused. The process doing the signalling should not > be affected at all. Only processes waiting on the semaphore should be > affected, and by definition the signalling process can't be waiting on the > semapihre it seignals because it has to be runnable to be able to signal. > > Further, yes there *must* be an ordering in the resumption order. It must > be strictly FIFO for many scheduling algorithms to work. Would you point me in the right direction on this? Do you have examples of algorithms where the resumption order of processes waiting on a Semaphore need to be FIFO? > Even on a > multi-core CPU the scheduler can be well-defined such that processes waiting > on a semaphore become active in the order they are waiting on the semaphore. > Even in a hypothetical multi-core VM with two concurrent processes > simultaneously signalling a semaphore with two processes waiting on it the > system would have to ensure that the two signals ended up scheduling both > processes, not that both signals ended up somehow proceeding only one; i.e. > the VM is going to have to serialize or mutually-exclude access to the > semaphore to prevent signals being lost. > > And yes, this is guaranteed, very conciously so. The Smalltalk scheduler is > strictly a real-time scheduler preemptive across priority > and cooperatively-scheduled round-robin within priority, such that all > lower-priority runnable processes are preempted by any higher-priority > runnable processes as soon as any higher-priority process becomes runnable. This would break on a multi-core capable VM. On such a hypothetical VM (we can dream, right?), any number of processes from any number of runlevels could be running truly concurrently. I keep going on about the hypothetical multi-core capable VM, because I honestly believe that one day some bright spark will make us one and there will be much rejoicing and joviality. If nobody else does, I will, once I've finished all my current projects(*). If all our code relies on the behaviour of the scheduler, then it's going to be painful for a very long time before our images will be run nice and concurrently on multi-core PCs. Standard PCs these days have two or four cores, and I can see the number of cores going all exponential on us. Gulik. (*) ha ha ha ha. See my wiki below. -- http://gulik.pbwiki.com/ |
In reply to this post by Igor Stasenko
Igor Stasenko wrote:
> I came to an idea , you might be interested in. > As many of us know, some CPUs having a special mode - interrupt mode. > What if we introduce the interrupt mode for scheduler? [... snip ...] > Now i trying to imagine, how a basic stuff might look like(please > correct me if its utterly wrong way ;), if we will be able to use > interrupt mode. This is actually along similar lines of thought that I had when I was thinking of how to get rid of the builtin VM scheduling behavior. The main thought that I had was that the VM may have a "special" process - the scheduler process (duh!) which it runs when it doesn't know what else to do. The VM would then not directly schedule processes after semaphore signals but rather put them onto a "ready" queue that can be read by the scheduler process and switch to the scheduler process. The scheduler process decides what to run next and resumes the process via a primitive. Whenever an external signal comes in, the VM automatically activates the scheduler process and the scheduler process then decides whether to resume the previously running process or to switch to a different process. In a way this folds the timer process into the scheduler (which makes good sense from my perspective because much of the work in the timer is stuff that could be more effectively take place in the scheduler). The implementation should be relatively straightforward - just add a scheduler process and a ready list to the special objects, and wherever the VM would normally process switch you just switch to the scheduler. Voila, there is your user-manipulable scheduler ;-) And obviously, anything that is run out of the scheduler process is by definition non-interruptable because there is simply nothing to switch to! Cheers, - Andreas |
Hi Andreas,
On Tue, Apr 28, 2009 at 8:33 PM, Andreas Raab <[hidden email]> wrote:
How would you generalise this to a natively multi-threaded VM? Obviously one simple way is to stop the other processors at a suspension point before letting the scheduler process proceed, but is there anything cleverer that doesn't halt all processors until the singleton scheduler has made its mind up?
Cheers, |
On 4/29/09, Eliot Miranda <[hidden email]> wrote:
> Hi Andreas, > > On Tue, Apr 28, 2009 at 8:33 PM, Andreas Raab <[hidden email]> wrote: >> This is actually along similar lines of thought that I had when I was >> thinking of how to get rid of the builtin VM scheduling behavior. The main >> thought that I had was that the VM may have a "special" process - the >> scheduler process (duh!) which it runs when it doesn't know what else to >> do. >> The VM would then not directly schedule processes after semaphore signals >> but rather put them onto a "ready" queue that can be read by the scheduler >> process and switch to the scheduler process. The scheduler process decides >> what to run next and resumes the process via a primitive. Whenever an >> external signal comes in, the VM automatically activates the scheduler >> process and the scheduler process then decides whether to resume the >> previously running process or to switch to a different process. >> >> In a way this folds the timer process into the scheduler (which makes good >> sense from my perspective because much of the work in the timer is stuff >> that could be more effectively take place in the scheduler). The >> implementation should be relatively straightforward - just add a scheduler >> process and a ready list to the special objects, and wherever the VM would >> normally process switch you just switch to the scheduler. Voila, there is >> your user-manipulable scheduler ;-) And obviously, anything that is run >> out >> of the scheduler process is by definition non-interruptable because there >> is >> simply nothing to switch to! > > > How would you generalise this to a natively multi-threaded VM? Obviously > one simple way is to stop the other processors at a suspension point before > letting the scheduler process proceed, but is there anything cleverer that > doesn't halt all processors until the singleton scheduler has made its mind > up? Every OS thread would have it's own scheduler process. Each scheduler process has it's own data structure (collection of process lists in runlevels / balancing tree / whatever) which it has complete ownership over meaning that it doesn't need to synchonise access to it. Each process has affinity with a particular scheduler so that the interrupt-like mechanism Igor mentioned would hold only for that particular scheduler instance and it's associated list/tree/whatever of processes. Each scheduler process would then communicate with other scheduler processes using something like... umm... a shared queue of messages that it would poll on... or something. I haven't thought out all the details here yet. The communication would be necessary for (at least:) * moving processes between schedulers. This would happen when a scheduler goes idle so that load balancing happens. * signalling semaphores where the waiting process belongs to another scheduler. * and other stuff I haven't thought of yet. Gulik. -- http://gulik.pbwiki.com/ |
In reply to this post by Eliot Miranda-2
Eliot Miranda wrote:
> How would you generalise this to a natively multi-threaded VM? > Obviously one simple way is to stop the other processors at a > suspension point before letting the scheduler process proceed, but is > there anything cleverer that doesn't halt all processors until the > singleton scheduler has made its mind up? There is probably something cleverer - just not as the first step ;-) The advantage of this approach is that if you halt all threads then you know you're in a safe spot which gives you room to experiment. Note that you could even run GC only from the scheduler (i.e., treat GC as an external signal that requests a GC from the scheduler) which solves the concurrent GC problem even with the current VM design. I think this would be a great starting point to learn more about multi-threaded issues in your environment. As you learn how to cope with the situation and what the implications of having a single scheduler really are you can move to relax the dependencies and see where this gets you. Cheers, - Andreas |
On 4/29/09, Andreas Raab <[hidden email]> wrote:
> Eliot Miranda wrote: > Note that > you could even run GC only from the scheduler (i.e., treat GC as an > external signal that requests a GC from the scheduler) which solves the > concurrent GC problem even with the current VM design. ...or even implement the GC in bytecodes. You could provide primitives which gives lower level access to the image - raw memory maybe, or perhaps objects indexed by oop. It would be a bit tricky managing the garbage the GC itself would generate. Maybe you could just leave it lying around, or maybe deallocate it manually? And it would be a bit slow. Gulik. -- http://gulik.pbwiki.com/ |
In reply to this post by Andreas.Raab
2009/4/29 Andreas Raab <[hidden email]>:
> Igor Stasenko wrote: >> >> I came to an idea , you might be interested in. >> As many of us know, some CPUs having a special mode - interrupt mode. >> What if we introduce the interrupt mode for scheduler? > > [... snip ...] >> >> Now i trying to imagine, how a basic stuff might look like(please >> correct me if its utterly wrong way ;), if we will be able to use >> interrupt mode. > > This is actually along similar lines of thought that I had when I was > thinking of how to get rid of the builtin VM scheduling behavior. The main > thought that I had was that the VM may have a "special" process - the > scheduler process (duh!) which it runs when it doesn't know what else to do. > The VM would then not directly schedule processes after semaphore signals > but rather put them onto a "ready" queue that can be read by the scheduler > process and switch to the scheduler process. The scheduler process decides > what to run next and resumes the process via a primitive. Whenever an > external signal comes in, the VM automatically activates the scheduler > process and the scheduler process then decides whether to resume the > previously running process or to switch to a different process. > > In a way this folds the timer process into the scheduler (which makes good > sense from my perspective because much of the work in the timer is stuff > that could be more effectively take place in the scheduler). The > implementation should be relatively straightforward - just add a scheduler > process and a ready list to the special objects, and wherever the VM would > normally process switch you just switch to the scheduler. Voila, there is > your user-manipulable scheduler ;-) And obviously, anything that is run out > of the scheduler process is by definition non-interruptable because there is > simply nothing to switch to! > Very nice indeed. That's even better that my first proposal. ProcessorScheduler>>schedulingProcessLoop [ self handlePendingSignalsAndActions. activeProcess ifNil: [ self idle ] ifNotNil: [ self primitiveTransferControlTo: activeProcess]. ] repeat. and when any process, somehow stops running (suspend/wait/terminate/interrupted etc), VM will again switch to scheduler process loop. What is important in having it, that there is guarantee to be not preempted by anything. Simply by having this, many concurrency/scheduling related problems can be solved by language-side implementation, without fear of having gotchas from VM side. Also, VM doesn't needs to know details about priorities, suspending, etc etc.. - which means that we can simplify VM considerably and implement same parts on the language side, where everything is late bound :) As for moving to multi-cores.. yes, as Gulik suggests, its like adding a new dimension: - local scheduler for each core - single global scheduler for freezing everything This, of course, if we could afford running same object memory over multiple cores. Handling interpreter/object memory state(s) with multiple cores is not trivial thing. If we going to keep more isolated model (islands, hydra ) then we need no/minimal changes to scheduler - each scheduler serves own island and receives asynchronous signals from other collegues through shared queue. > Cheers, > - Andreas > -- Best regards, Igor Stasenko AKA sig. |
Free forum by Nabble | Edit this page |