Voluntarily cancelling requests ("applying an expiration date")

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Voluntarily cancelling requests ("applying an expiration date")

Holger Freyther
tl;dr: I am searching for a pattern (later code) to apply expiration to operations.



Introduction:

One nice aspect of Mongodb is that it has built-in data distribution[1] and configurable retention[2]. The upstream project has a document called "Server Discovery and Monitoring (SDAM)", defining how a client should behave. Martin Dias is currently implementing SDAM in MongoTalk/Voyage and I took it on a test drive.


Behavior:

My software stack is using Zinc, Zinc-REST, Voyage and Mongo. When a new REST requests arrives I am using Voyage (e.g. >>#selectOne:) which will use MongoTalk. The MongoTalk code needs to select the right server. It's currently done by waiting for a result.

Next I started to simulate database outages. The rest clients retried when not receiving a result within two seconds (no back-off/jitter). What happened was roughly the following:


[
        1.) ZnServer accepts a new connection
        2.) MongoTalk waits for a server longer than 2s
        "nothing.. the above waits..."
] repeat.




Problem:

What happened next surprised me. I expected to have a bad time when the database recovers and all the stale (remember the REST clients already gave up and closed the socket) requests will be answered. Instead my image crashed early in my test as the ExternalSemaphoreTable was full.

Let's focus on the timeout behavior and discuss the existence of the ExternalSemaphoreTable and the number of entries separately at a different time.




To me the two main problems I see are:


1.) Lack of back-pressure for ZnManagingMultiThreadedServer

2.) Disconnect of time between the Application Layer handling REST is allowed to take and down the stack how long MongoTalk may sleep and wait for a server.


The first item is difficult. Even answering HTTP 500 when we are out of space in the ExternalSemaphore is difficult... Let's ignore this for now as well.






What I look for:


1.) Voluntarily Timeout

Inside my Application code I would like to tag an operation with a timeout. This means everything that is done should complete within X seconds. It can be used on a voluntarily basis.


>>#lookupPerson

   "We expect all database operations to complete within two seconds"
   person := ComputeContext current withTimeout: 2 seconds during: [
        repository selectOne: Person where: [:each name | ...],
   ].
 


MongoTalk>>stuff
  "See if the outer context timeout has expired and signal. E.g. before writing
  something into the socket to keep consistency."
  ComputeContext current checkExpired.


MongoTalk>>other
  "Sleep for up to the remaining time out
  (someSemaphore waitTimeoutContext: ComputeContext current) ifFalse: [
     SomethingExpired signal.
  ]



2.) Cancellation


More difficult to write in pseudo code (without TaskIt?). In my above case we are waiting for the database to be ready while the client already closed the file descriptor. Now we are not able to see this until much later.

The idea is that in addition to the timeout we can pass a block that is called when an operation should be cancelled and the ComputeContext can be checked if something has been cancelled?




The above takes inspiration from Go's context package[3]. In Go the context should be passed as parameter but we could make it a Process variable?





Question:

How do you handle this in your systems? Is this something we can consider for Pharo9?



thanks
        holger








[1] It has the concept of "replicationSet" and works by having a primary, secondary and arbiters running.
[2] For every write one can configure if the write should succeed immediately (before it is even on disk) or when it has been written to multiple stores (e.g. majority, US and EMEA)
[3] https://golang.org/pkg/context/



Reply | Threaded
Open this post in threaded view
|

Re: Voluntarily cancelling requests ("applying an expiration date")

Sven Van Caekenberghe-2
Hi Holger,

That is a complicated story ;-)

But, you running out of external semaphores means that you are using too many sockets, are not closing/releasing them (in time) and/or your GC does not run enough to keep up (it is easy to deplete the external semaphore table without the GC kicking in).

You must have a loop somewhere that goes too fast and maybe does not clean up properly while doing so.

YMMV, but I do similar things -- implement/offer REST services that call other REST/network services, all with timeouts, in several variations, for years, and I do not have problems like you describe.

I would suggest enabling logging so that you can see better where the allocations happen and if your cleanup code does its work.

Sven

PS: Zinc logging is easy, just do

  ZnLogEvent logToTranscript

> On 9 Feb 2020, at 16:31, Holger Freyther <[hidden email]> wrote:
>
> tl;dr: I am searching for a pattern (later code) to apply expiration to operations.
>
>
>
> Introduction:
>
> One nice aspect of Mongodb is that it has built-in data distribution[1] and configurable retention[2]. The upstream project has a document called "Server Discovery and Monitoring (SDAM)", defining how a client should behave. Martin Dias is currently implementing SDAM in MongoTalk/Voyage and I took it on a test drive.
>
>
> Behavior:
>
> My software stack is using Zinc, Zinc-REST, Voyage and Mongo. When a new REST requests arrives I am using Voyage (e.g. >>#selectOne:) which will use MongoTalk. The MongoTalk code needs to select the right server. It's currently done by waiting for a result.
>
> Next I started to simulate database outages. The rest clients retried when not receiving a result within two seconds (no back-off/jitter). What happened was roughly the following:
>
>
> [
> 1.) ZnServer accepts a new connection
> 2.) MongoTalk waits for a server longer than 2s
> "nothing.. the above waits..."
> ] repeat.
>
>
>
>
> Problem:
>
> What happened next surprised me. I expected to have a bad time when the database recovers and all the stale (remember the REST clients already gave up and closed the socket) requests will be answered. Instead my image crashed early in my test as the ExternalSemaphoreTable was full.
>
> Let's focus on the timeout behavior and discuss the existence of the ExternalSemaphoreTable and the number of entries separately at a different time.
>
>
>
>
> To me the two main problems I see are:
>
>
> 1.) Lack of back-pressure for ZnManagingMultiThreadedServer
>
> 2.) Disconnect of time between the Application Layer handling REST is allowed to take and down the stack how long MongoTalk may sleep and wait for a server.
>
>
> The first item is difficult. Even answering HTTP 500 when we are out of space in the ExternalSemaphore is difficult... Let's ignore this for now as well.
>
>
>
>
>
>
> What I look for:
>
>
> 1.) Voluntarily Timeout
>
> Inside my Application code I would like to tag an operation with a timeout. This means everything that is done should complete within X seconds. It can be used on a voluntarily basis.
>
>
>>> #lookupPerson
>
>   "We expect all database operations to complete within two seconds"
>   person := ComputeContext current withTimeout: 2 seconds during: [
> repository selectOne: Person where: [:each name | ...],
>   ].
>
>
>
> MongoTalk>>stuff
>  "See if the outer context timeout has expired and signal. E.g. before writing
>  something into the socket to keep consistency."
>  ComputeContext current checkExpired.
>
>
> MongoTalk>>other
>  "Sleep for up to the remaining time out
>  (someSemaphore waitTimeoutContext: ComputeContext current) ifFalse: [
>     SomethingExpired signal.
>  ]
>
>
>
> 2.) Cancellation
>
>
> More difficult to write in pseudo code (without TaskIt?). In my above case we are waiting for the database to be ready while the client already closed the file descriptor. Now we are not able to see this until much later.
>
> The idea is that in addition to the timeout we can pass a block that is called when an operation should be cancelled and the ComputeContext can be checked if something has been cancelled?
>
>
>
>
> The above takes inspiration from Go's context package[3]. In Go the context should be passed as parameter but we could make it a Process variable?
>
>
>
>
>
> Question:
>
> How do you handle this in your systems? Is this something we can consider for Pharo9?
>
>
>
> thanks
> holger
>
>
>
>
>
>
>
>
> [1] It has the concept of "replicationSet" and works by having a primary, secondary and arbiters running.
> [2] For every write one can configure if the write should succeed immediately (before it is even on disk) or when it has been written to multiple stores (e.g. majority, US and EMEA)
> [3] https://golang.org/pkg/context/
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Voluntarily cancelling requests ("applying an expiration date")

Sabine Manaa
Hi Holger,

I did not completely understand your mail but when reading the answer of Sven I remembered that some time ago we also had problems with running out of semaphores.

After writing with Esteban Lorenzano, we did the following:
1)  Smalltalk vm maxExternalSemaphoresSilently: 65535. at startup of the application
2)  setting the default pool size of VOMongoRepository from 10 to 2 (we have our own VOMongoRepository subclass)

Perhaps this is not your topic perhaps it helps.
Sabine


Am Mo., 10. Feb. 2020 um 15:14 Uhr schrieb Sven Van Caekenberghe <[hidden email]>:
Hi Holger,

That is a complicated story ;-)

But, you running out of external semaphores means that you are using too many sockets, are not closing/releasing them (in time) and/or your GC does not run enough to keep up (it is easy to deplete the external semaphore table without the GC kicking in).

You must have a loop somewhere that goes too fast and maybe does not clean up properly while doing so.

YMMV, but I do similar things -- implement/offer REST services that call other REST/network services, all with timeouts, in several variations, for years, and I do not have problems like you describe.

I would suggest enabling logging so that you can see better where the allocations happen and if your cleanup code does its work.

Sven

PS: Zinc logging is easy, just do

  ZnLogEvent logToTranscript

> On 9 Feb 2020, at 16:31, Holger Freyther <[hidden email]> wrote:
>
> tl;dr: I am searching for a pattern (later code) to apply expiration to operations.
>
>
>
> Introduction:
>
> One nice aspect of Mongodb is that it has built-in data distribution[1] and configurable retention[2]. The upstream project has a document called "Server Discovery and Monitoring (SDAM)", defining how a client should behave. Martin Dias is currently implementing SDAM in MongoTalk/Voyage and I took it on a test drive.
>
>
> Behavior:
>
> My software stack is using Zinc, Zinc-REST, Voyage and Mongo. When a new REST requests arrives I am using Voyage (e.g. >>#selectOne:) which will use MongoTalk. The MongoTalk code needs to select the right server. It's currently done by waiting for a result.
>
> Next I started to simulate database outages. The rest clients retried when not receiving a result within two seconds (no back-off/jitter). What happened was roughly the following:
>
>
> [
>       1.) ZnServer accepts a new connection
>       2.) MongoTalk waits for a server longer than 2s
>       "nothing.. the above waits..."
> ] repeat.
>
>
>
>
> Problem:
>
> What happened next surprised me. I expected to have a bad time when the database recovers and all the stale (remember the REST clients already gave up and closed the socket) requests will be answered. Instead my image crashed early in my test as the ExternalSemaphoreTable was full.
>
> Let's focus on the timeout behavior and discuss the existence of the ExternalSemaphoreTable and the number of entries separately at a different time.
>
>
>
>
> To me the two main problems I see are:
>
>
> 1.) Lack of back-pressure for ZnManagingMultiThreadedServer
>
> 2.) Disconnect of time between the Application Layer handling REST is allowed to take and down the stack how long MongoTalk may sleep and wait for a server.
>
>
> The first item is difficult. Even answering HTTP 500 when we are out of space in the ExternalSemaphore is difficult... Let's ignore this for now as well.
>
>
>
>
>
>
> What I look for:
>
>
> 1.) Voluntarily Timeout
>
> Inside my Application code I would like to tag an operation with a timeout. This means everything that is done should complete within X seconds. It can be used on a voluntarily basis.
>
>
>>> #lookupPerson
>
>   "We expect all database operations to complete within two seconds"
>   person := ComputeContext current withTimeout: 2 seconds during: [
>       repository selectOne: Person where: [:each name | ...],
>   ].
>
>
>
> MongoTalk>>stuff
>  "See if the outer context timeout has expired and signal. E.g. before writing
>  something into the socket to keep consistency."
>  ComputeContext current checkExpired.
>
>
> MongoTalk>>other
>  "Sleep for up to the remaining time out
>  (someSemaphore waitTimeoutContext: ComputeContext current) ifFalse: [
>     SomethingExpired signal.
>  ]
>
>
>
> 2.) Cancellation
>
>
> More difficult to write in pseudo code (without TaskIt?). In my above case we are waiting for the database to be ready while the client already closed the file descriptor. Now we are not able to see this until much later.
>
> The idea is that in addition to the timeout we can pass a block that is called when an operation should be cancelled and the ComputeContext can be checked if something has been cancelled?
>
>
>
>
> The above takes inspiration from Go's context package[3]. In Go the context should be passed as parameter but we could make it a Process variable?
>
>
>
>
>
> Question:
>
> How do you handle this in your systems? Is this something we can consider for Pharo9?
>
>
>
> thanks
>       holger
>
>
>
>
>
>
>
>
> [1] It has the concept of "replicationSet" and works by having a primary, secondary and arbiters running.
> [2] For every write one can configure if the write should succeed immediately (before it is even on disk) or when it has been written to multiple stores (e.g. majority, US and EMEA)
> [3] https://golang.org/pkg/context/
>
>
>