Hey!
I look into using Taskit for a new development and wondered about some features. What is the right upstream repository? What are the goals to get builds greens? I wondered if somebody thought of remote task execution? What I am missing is handling for overload. E.g. before queuing too many tasks I would prefer an exception to be raised (or the task blocking/slowing down). Signalling an exception is probably more reasonable as one task could queue another task (while >>#value is being executed...). What are the plans here? I can mitigate by always using futures and using >>#waitForCompletion:.. Are there ideas how to add remote task scheduling? Maybe use Seamless for it? Have workers connect to the scheduler? Other ideas? Who would have time to review an approach and the code? cheers holger |
Hi Holger!
I respond in bold On Tue, 24 Apr 2018 at 12:00 Holger Freyther <[hidden email]> wrote: Hey! The main repo so far is https://github.com/sbragagnolo/taskit What are the goals to get builds greens? Don't have really a strategy yet, I did not have time to check on them. I can put my hands on it for a while and come up with a plan on that I wondered if somebody thought of remote task execution? * If by remote calls you mean through REST Api's or things like that, no yet. but is easy to use as: [ service call ] schedule. or result := [ service call ] future * If what you mean is to execute a program on the underlying operative system, we have a already a way to do it for linux (not tested on mac, but it may work there) result := [ :spec | spec command: 'bash'; option: '-c'; argument: command ] asOSTask future. The result deployed on the future is the stdout of the process . The text on the exception, the one on stderr of the process. Is based on OSSubProcess, it does not work properly with long stdout. * If what you mean is the execution of code deployed on other images, earlier versions of taskit had this feature, based on OSProcess, but we are not really ready for doing it for real. We need to be able to define if an image is suitable or not to run a command. *If you mean something else, I would need more information :).
*For raising exceptions on too much tasks you only need to set a custom made kind of queue. I don't think this is the way we want to go. Since there is not much to do in the case of 'too much tasks scheduled'. Is easy to try later from a do-it, but from an application point of view, is too much. For dealing with the sharing of resources, we have implemented the worker pools abstractions. When you do [ action ] schedule / [ action ] future, both created tasks are scheduled into the default runner. The default runner is a working pool with a default 'poolSizeMax' on 4, meaning, limit 4 processes working over the tasks. (this is a dynamic configuration, you can change it by TKTConfiguration runner poolMaxSize: 20. )
Since you speak about seamless here, i suppose two different images, doesn't matter where. It's not a bad idea to go seamless, but i did not go through the first restriction of remote executions (if the remote image can or not execute the task and if both images share the same semantic for the execution), then i did not yet checked on which communication platform to use for it Have workers connect to the scheduler? Other ideas? what do you mean by connection to the scheduler? The workers we use do not know their pools, if that is what you mean. Who would have time to review an approach and the code? You can send it to me, Disclaimer, i am going on vacations for a while tomorrow Nice to know your interests! cheers. Santiago
|
> On 24. Apr 2018, at 20:16, Santiago Bragagnolo <[hidden email]> wrote: > > Hi Holger! > I respond in bold hehe. And in the reply I am back to non rich text. Let me see if I quote it correctly. > > > > On Tue, 24 Apr 2018 at 12:00 Holger Freyther <[hidden email]> wrote: > Hey! > > I wondered if somebody thought of remote task execution? > > *If you mean something else, I would need more information :). > When you do [ action ] schedule / [ action ] future, both created tasks are scheduled into the default runner. The default runner is a working pool with a default 'poolSizeMax' on 4, meaning, limit 4 processes working over the tasks. (this is a dynamic configuration, you can change it by > TKTConfiguration runner poolMaxSize: 20. ) Yes. But with more work than the workers can handle the queue will grow. Which means the (median/max) latency of the system will monotonically increase.. to the point of the entire system failing (tasks handled after the external deadlines expired, effectively no work being done). For network connected systems I like to think in terms of "back pressure" (not read more from the socket than the image can handle, eventually leading to the TCP window shrinking) and one way of doing it is to have bounded queues (and/or sleep when scheduling work). I can see multiple parts of a solution (and they have different benefits and issues): * Be able to attach a deadline to a task (e.g. see context.Context in go) * Be able to have a "blocking until queue is less than X elements" schedule (but that is difficult as one task might be scheduled during the >>#value of another task). > Are there ideas how to add remote task scheduling? Maybe use Seamless for it? > Since you speak about seamless here, i suppose two different images, doesn't matter where. > It's not a bad idea to go seamless, but i did not go through the first restriction of remote executions (if the remote image can or not execute the task and if both images share the same semantic for the execution), then i did not yet checked on which communication platform to use for it Right it would need to be homogeneous images (and care taken that the external interface remains similar enough). > Have workers connect to the scheduler? Other ideas? > what do you mean by connection to the scheduler? The workers we use do not know their pools, if that is what you mean. Let's assume scheduling a task is simple (read something from a socket) but the computation is expensive (database look-up, math, etc). Hopefully one will reach the point where one image can schedule more tasks than a single worker image can handle. At that point it could be neat to scale by just starting another image. By inverting the launch order (workers connect to the scheduler) scaling can become more easy. holger |
On Tue, 24 Apr 2018 at 16:18 Holger Freyther <[hidden email]> wrote:
hahahaha, non rich?, how come? I will still bolding hoping that if you need to ensure the content you will check in a rich text client :D
Normally the worker pool adjust to the minimal needed workers (there is a watch dog checking how much idle processes are there, or more workers are needed, and ensuring to spawn or stop process regarding to the state). So, the number poolMaxSize is just a maximal limit. This limit should be set for ensuring that the tasks that are running concurrently are not incurring into too much resource consumption or into too much overhead leading to kind of trashing. I am not really friend of setting only a number for such a complex problematic, but so far is the only approach I found that it does not lead to a complex design. If you have better ideas to discuss on this subject, i am completely open. (the same to deal with priorities by general system understanding rather than absolute numbers)
I wouldn't mind to have a second type of queue with this behaviour, with a mean of configuration for setting one or other queue with it's specific management encapsulated. Personally, in my domains of usage ( crawling, querying and in sensor/actuator) i personally wouldn't use it. But I suppose you have a better domain for this case. It would be good to discuss it to have a better understanding of the need.
I would like to understand a bit better what you are trying to do. I have the hunch that you are looking for a multiple images solution, for load balance in between images. TaskIT is mean to plan tasks into process, regarding to the local image needs. You seems to need something for planning tasks of a general system, beyond one image, and maybe taking in care a process / network topology. If it's more that side, we should discuss in what extensions we can do into taskit to be suitable of usage in this case, but surely I would be inclined to do a higher level framework or even middleware that uses taskit, than to add all those complexities in taskit. The good news is that i may be needing something similar, so I will be able to help there.
hahaha, now i re-read this paragraph and is not a hunch anymore. You are looking for multiple images :). holger |
What about using VertStix for remote execution? Andrew On Tue, 2018-04-24 at 15:31 +0000, Santiago Bragagnolo wrote:
|
In reply to this post by Santiago Bragagnolo
Generally to avoid this I've used the Synapse micro service bus. It also allows the creation of an unlimited number of queues, allowing higher priority tasks to "jump the queue". ' Backpressure' is precisely what message buses avoid in distributed computing. One of my never-have-time-for projects is to port Synapse to Pharo. SST has a 'start a slave on another node and route to it' methodology but it's hella complex, especially in terms of distributed garbage collection etc. For real-time systems SST is great, not really necessary to get into that kind of complexity otherwise though. Andrew On Tue, 2018-04-24 at 15:31 +0000, Santiago Bragagnolo wrote:
|
In reply to this post by Santiago Bragagnolo
Btw I think you meant "thrashing", not "trashing'. Trashing is what my team leads do when they read my code. 😉. Andrew On Tue, 2018-04-24 at 15:31 +0000, Santiago Bragagnolo wrote:
|
In reply to this post by aglynn42
> On 25. Apr 2018, at 08:42, Andrew Glynn <[hidden email]> wrote: > > Generally to avoid this I've used the Synapse micro service bus. It also allows the creation of an unlimited number of queues, allowing higher priority tasks to "jump the queue". ' Backpressure' is precisely what message buses avoid in distributed computing. Can you elaborate and point to which Synapse you are meaning? If you use transport protocols like TCP (in contrast to QUIC or SCTP) there will be head-of-line blocking, how do you jump the queue on a single TCP connection? |
In reply to this post by Santiago Bragagnolo
> On 24. Apr 2018, at 23:31, Santiago Bragagnolo <[hidden email]> wrote: > > > Yes. But with more work than the workers can handle the queue will grow. Which means the (median/max) latency of the system will monotonically increase.. to the point of the entire system failing (tasks handled after the external deadlines expired, effectively no work being done). > > > Normally the worker pool adjust to the minimal needed workers (there is a watch dog checking how much idle processes are there, or more workers are needed, and ensuring to spawn or stop process regarding to the state). > So, the number poolMaxSize is just a maximal limit. This limit should be set for ensuring that the tasks that are running concurrently are not incurring into too much resource consumption or into too much overhead leading to kind of trashing. > I am not really friend of setting only a number for such a complex problematic, but so far is the only approach I found that it does not lead to a complex design. If you have better ideas to discuss on this subject, i am completely open. (the same to deal with priorities by general system understanding rather than absolute numbers) I think we might not talk about the same thing. Any system might end up being driven close or above its limits. One question is if it can recover from it. Let me try to give you a basic example (and if one changes from 'dev' to a proper work pool one just needs to adjust timings to show the same problem). The code schedules a block that on each invocation takes about one second to execute. But the completion time is monotonically increasing. | completions | completions := OrderedCollection new. 1 to: 1000 do: [:each | | start | start := DateAndTime now. [ (Delay forSeconds: 1) wait. completions add: (DateAndTime now - start) ] schedule. (Delay forMilliseconds: 200) wait. ]. completions Now why is this a problem? It is a problem because once the system is in overload it will never recover (unless tasks are being stopped). The question is what can be done from a framework point to gracefully degrade? I am leaving this here right now. holger |
Free forum by Nabble | Edit this page |