What is the status of Zinc on 3.1? I loaded the gemstone3.1 branch from GitHub and tried to start fixing failing tests, but I managed to segfault a 3.1.0.2 gem after I tried to push the forkAt:named: method up a level in the Block class hierarchy...
-- Ken Treis Miriam Technologies, Inc. (866) 652-2040 x221 |
Ken,
I'm going to try to take a look at this later this afternoon ... after a very quick look, I'm curious if the latest gemstone2.4 should be merged back into the gemstone3.1 branch ... but that shouldn't have an impact on a segfault ... Do you have the c stack trace from the crash ( in the log file)? I can pass that information to the vm guys to get them started on looking at things ... I'm on vacation until the beginning of the year, which means I might be MIA every once in a while, but I intend to keep an eye on things when I get the opportunity:) Dale ----- Original Message ----- | From: "Ken Treis" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Saturday, December 15, 2012 2:09:31 PM | Subject: [GS/SS Beta] Zinc on 3.1 | | What is the status of Zinc on 3.1? I loaded the gemstone3.1 branch | from GitHub and tried to start fixing failing tests, but I managed | to segfault a 3.1.0.2 gem after I tried to push the forkAt:named: | method up a level in the Block class hierarchy... | | | | | | | | | | -- | Ken Treis | Miriam Technologies, Inc. | (866) 652-2040 x221 | |
Here's what I've got in the log -- I am fighting some different fires right now but can maybe get some better steps to reproduce a little later.
Gemstone Signal Handler: Signal 11, SIGSEGV Received HostFaultHandler: signal = 11 info->si_signo = 11 = 0xb info->si_code = 1 = 0x1 info->si_errno = 0 = 0x0 info->si_addr = 0x24 Begin attempt to print C-level stack for current thread of process 44154 at: [Info]: libicudata-3.1.0.2-64.dylib: loaded [Info]: libicuuc-3.1.0.2-64.dylib: loaded [Info]: libicui18n-3.1.0.2-64.dylib: loaded Mon Dec 10 18:03:17 PST 2012 0 libgcilnk-3.1.0.2-64.dylib 0x0000000100551211 _Z15HostPrintCStackv + 158 1 libgcilnk-3.1.0.2-64.dylib 0x00000001005a00a6 HostFaultHandler + 838 2 libsystem_c.dylib 0x00007fff86aa1cfa _sigtramp + 26 3 ??? 0x6e6172742f617461 0x0 + 7953764260249695329 4 libgcilnk-3.1.0.2-64.dylib 0x0000000100407a2b .LcheckPrimResult + 0 5 libgcilnk-3.1.0.2-64.dylib 0x00000001003de0ac _Z19IntLpSupControlLoopP2omiPP10omObjSType + 764 6 libgcilnk-3.1.0.2-64.dylib 0x00000001003d429a _Z11IntContinueP13IntStateSTypePP10omObjSTypeS3_ijP11GciErrSType + 346 7 libgcilnk-3.1.0.2-64.dylib 0x0000000100291a51 _ZL14gemDoContinue_P2omyyiP11GciErrSType + 225 8 libgcilnk-3.1.0.2-64.dylib 0x0000000100291b34 _Z13GemDoContinueyPyyiP11GciErrSType + 116 9 libgcilnk-3.1.0.2-64.dylib 0x000000010024e405 _ZL12dispatchLoopP2omP13LgcStateSTypeP13RpcStateSType + 8853 10 libgcilnk-3.1.0.2-64.dylib 0x000000010025007f _Z12GemDoRpcLoopP2omP13LgcStateSTypeP19IntGciActStateSType + 479 11 libgcilnk-3.1.0.2-64.dylib 0x000000010023a674 _ZL9doConnectv + 516 12 libgcilnk-3.1.0.2-64.dylib 0x000000010023ae31 _ZL15dispatchCommandv + 817 13 libgcilnk-3.1.0.2-64.dylib 0x000000010023b431 _Z4GdbgPKci + 1281 14 libgcilnk-3.1.0.2-64.dylib 0x00000001002512fc gemMain + 2124 15 gem 0x0000000100001c21 main + 376 16 gem 0x0000000100001504 start + 52 17 ??? 0x0000000000000004 0x0 + 4 End of C-level stack for current thread process 44154 -- Ken Treis Miriam Technologies, Inc. (866) 652-2040 x221 On Dec 17, 2012, at 1:47 PM, Dale Henrichs wrote:
|
Ken,
Okay, I've forwarded the c stack and will let you know what Allen learns from the stack. I plan to look at zinc this afternoon... Dale ----- Original Message ----- | From: "Ken Treis" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Monday, December 17, 2012 2:50:07 PM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | Here's what I've got in the log -- I am fighting some different fires | right now but can maybe get some better steps to reproduce a little | later. | | | | Gemstone Signal Handler: Signal 11, SIGSEGV Received | HostFaultHandler: signal = 11 | info->si_signo = 11 = 0xb | info->si_code = 1 = 0x1 | info->si_errno = 0 = 0x0 | info->si_addr = 0x24 | | | Begin attempt to print C-level stack for current thread of process | 44154 at: [Info]: libicudata-3.1.0.2-64.dylib: loaded | [Info]: libicuuc-3.1.0.2-64.dylib: loaded | [Info]: libicui18n-3.1.0.2-64.dylib: loaded | Mon Dec 10 18:03:17 PST 2012 | | | 0 libgcilnk-3.1.0.2-64.dylib 0x0000000100551211 _Z15HostPrintCStackv | + 158 | 1 libgcilnk-3.1.0.2-64.dylib 0x00000001005a00a6 HostFaultHandler + | 838 | 2 libsystem_c.dylib 0x00007fff86aa1cfa _sigtramp + 26 | 3 ??? 0x6e6172742f617461 0x0 + 7953764260249695329 | 4 libgcilnk-3.1.0.2-64.dylib 0x0000000100407a2b .LcheckPrimResult + 0 | 5 libgcilnk-3.1.0.2-64.dylib 0x00000001003de0ac | _Z19IntLpSupControlLoopP2omiPP10omObjSType + 764 | 6 libgcilnk-3.1.0.2-64.dylib 0x00000001003d429a | _Z11IntContinueP13IntStateSTypePP10omObjSTypeS3_ijP11GciErrSType + | 346 | 7 libgcilnk-3.1.0.2-64.dylib 0x0000000100291a51 | _ZL14gemDoContinue_P2omyyiP11GciErrSType + 225 | 8 libgcilnk-3.1.0.2-64.dylib 0x0000000100291b34 | _Z13GemDoContinueyPyyiP11GciErrSType + 116 | 9 libgcilnk-3.1.0.2-64.dylib 0x000000010024e405 | _ZL12dispatchLoopP2omP13LgcStateSTypeP13RpcStateSType + 8853 | 10 libgcilnk-3.1.0.2-64.dylib 0x000000010025007f | _Z12GemDoRpcLoopP2omP13LgcStateSTypeP19IntGciActStateSType + 479 | 11 libgcilnk-3.1.0.2-64.dylib 0x000000010023a674 _ZL9doConnectv + 516 | 12 libgcilnk-3.1.0.2-64.dylib 0x000000010023ae31 | _ZL15dispatchCommandv + 817 | 13 libgcilnk-3.1.0.2-64.dylib 0x000000010023b431 _Z4GdbgPKci + 1281 | 14 libgcilnk-3.1.0.2-64.dylib 0x00000001002512fc gemMain + 2124 | 15 gem 0x0000000100001c21 main + 376 | 16 gem 0x0000000100001504 start + 52 | 17 ??? 0x0000000000000004 0x0 + 4 | | | End of C-level stack for current thread process 44154 | | | | | | | | | | | | -- | Ken Treis | Miriam Technologies, Inc. | (866) 652-2040 x221 | | | On Dec 17, 2012, at 1:47 PM, Dale Henrichs wrote: | | | | Ken, | | I'm going to try to take a look at this later this afternoon ... | after a very quick look, I'm curious if the latest gemstone2.4 | should be merged back into the gemstone3.1 branch ... but that | shouldn't have an impact on a segfault ... | | Do you have the c stack trace from the crash ( in the log file)? I | can pass that information to the vm guys to get them started on | looking at things ... | | I'm on vacation until the beginning of the year, which means I might | be MIA every once in a while, but I intend to keep an eye on things | when I get the opportunity:) | | Dale | | ----- Original Message ----- | | From: "Ken Treis" < [hidden email] > | | To: "GemStone Seaside beta discussion" < [hidden email] | | > | | Sent: Saturday, December 15, 2012 2:09:31 PM | | Subject: [GS/SS Beta] Zinc on 3.1 | | | | What is the status of Zinc on 3.1? I loaded the gemstone3.1 branch | | from GitHub and tried to start fixing failing tests, but I managed | | to segfault a 3.1.0.2 gem after I tried to push the forkAt:named: | | method up a level in the Block class hierarchy... | | | | | | | | | | | | | | | | | | | | -- | | Ken Treis | | Miriam Technologies, Inc. | | (866) 652-2040 x221 | | | | |
In reply to this post by Ken Treis
Ken,
I've just finished taking a look at the state of zinc[1] for gemstone and I think that the gemstone3.1 port should be based on the current gemstone2.4 branch and I've merged the gemstone2.4 into gemstone3.1 ... there were a bunch of tests that needed to be changed for gemstone (a number of places where strings and symbols were being compared which causes tests to fail in gemstone) ... besides the forkAt:named: change (which I haven't quite gotten to), It looks to me like the SocketStream class needs to be ported to GemStone 3.1. The SocketStream tests are passing in GemStone3.1, but a number of the failures I see in GemStone3.1 are related to socket errors ... Dale [1] https://github.com/glassdb/zinc ----- Original Message ----- | From: "Ken Treis" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Saturday, December 15, 2012 2:09:31 PM | Subject: [GS/SS Beta] Zinc on 3.1 | | What is the status of Zinc on 3.1? I loaded the gemstone3.1 branch | from GitHub and tried to start fixing failing tests, but I managed | to segfault a 3.1.0.2 gem after I tried to push the forkAt:named: | method up a level in the Block class hierarchy... | | | | | | | | | | -- | Ken Treis | Miriam Technologies, Inc. | (866) 652-2040 x221 | |
Ken,
The problems are definitely related to socket issues ... at the moment it looks like we're not closing sockets when we should (via the ifCurtailed logic?)... I've been using 3.1.0.1 so far and haven't hit the sigsegv...yet. I'll try to spend some more time tomorrow and get a handle on the socket issues. Dale ----- Original Message ----- | From: "Dale Henrichs" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Tuesday, December 18, 2012 7:50:53 PM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | Ken, | | I've just finished taking a look at the state of zinc[1] for gemstone | and I think that the gemstone3.1 port should be based on the current | gemstone2.4 branch and I've merged the gemstone2.4 into gemstone3.1 | ... there were a bunch of tests that needed to be changed for | gemstone (a number of places where strings and symbols were being | compared which causes tests to fail in gemstone) ... besides the | forkAt:named: change (which I haven't quite gotten to), | | It looks to me like the SocketStream class needs to be ported to | GemStone 3.1. The SocketStream tests are passing in GemStone3.1, but | a number of the failures I see in GemStone3.1 are related to socket | errors ... | | Dale | | [1] https://github.com/glassdb/zinc | ----- Original Message ----- | | From: "Ken Treis" <[hidden email]> | | To: "GemStone Seaside beta discussion" <[hidden email]> | | Sent: Saturday, December 15, 2012 2:09:31 PM | | Subject: [GS/SS Beta] Zinc on 3.1 | | | | What is the status of Zinc on 3.1? I loaded the gemstone3.1 branch | | from GitHub and tried to start fixing failing tests, but I managed | | to segfault a 3.1.0.2 gem after I tried to push the forkAt:named: | | method up a level in the Block class hierarchy... | | | | | | | | | | | | | | | | | | | | -- | | Ken Treis | | Miriam Technologies, Inc. | | (866) 652-2040 x221 | | | |
Ken,
We're running the ifCurtailed: logic, but there's a Delay in the code immediately before the socket close that appears to be the culprit ... I think we monkeyed with the process switching logic for 3.x which presumably explains the differences in behavior between Pharo, 2.4.x and 3.1.x ... If I'm not mistaken, we made additional changes for 3.1.0.2 which might be responsible for the SIGSEGV ... The upshot at this point in time is that I think it is likely that there's not much more work to be done in porting the gemstone2.4 branch of zinc to 3.1.x ... I will talk to one of the vm guys tomorrow and get the scoop on the process issues... Dale ----- Original Message ----- | From: "Dale Henrichs" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Tuesday, December 18, 2012 8:37:19 PM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | Ken, | | The problems are definitely related to socket issues ... at the | moment it looks like we're not closing sockets when we should (via | the ifCurtailed logic?)... | | I've been using 3.1.0.1 so far and haven't hit the sigsegv...yet. | | I'll try to spend some more time tomorrow and get a handle on the | socket issues. | | Dale | | ----- Original Message ----- | | From: "Dale Henrichs" <[hidden email]> | | To: "GemStone Seaside beta discussion" <[hidden email]> | | Sent: Tuesday, December 18, 2012 7:50:53 PM | | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | | Ken, | | | | I've just finished taking a look at the state of zinc[1] for | | gemstone | | and I think that the gemstone3.1 port should be based on the | | current | | gemstone2.4 branch and I've merged the gemstone2.4 into gemstone3.1 | | ... there were a bunch of tests that needed to be changed for | | gemstone (a number of places where strings and symbols were being | | compared which causes tests to fail in gemstone) ... besides the | | forkAt:named: change (which I haven't quite gotten to), | | | | It looks to me like the SocketStream class needs to be ported to | | GemStone 3.1. The SocketStream tests are passing in GemStone3.1, | | but | | a number of the failures I see in GemStone3.1 are related to socket | | errors ... | | | | Dale | | | | [1] https://github.com/glassdb/zinc | | ----- Original Message ----- | | | From: "Ken Treis" <[hidden email]> | | | To: "GemStone Seaside beta discussion" | | | <[hidden email]> | | | Sent: Saturday, December 15, 2012 2:09:31 PM | | | Subject: [GS/SS Beta] Zinc on 3.1 | | | | | | What is the status of Zinc on 3.1? I loaded the gemstone3.1 | | | branch | | | from GitHub and tried to start fixing failing tests, but I | | | managed | | | to segfault a 3.1.0.2 gem after I tried to push the forkAt:named: | | | method up a level in the Block class hierarchy... | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -- | | | Ken Treis | | | Miriam Technologies, Inc. | | | (866) 652-2040 x221 | | | | | | |
Ken,
I have confirmed that in the code following the delay in the ifCurtailed: block (where the socket is supposed to be closed) is not being executed when the process is terminated in 3.1.0.1 ... Don't know if that's related to your sigsegv (haven't tried yet), but it is one of the blockades for zinc running in 3.1.0.1 (and maybe the only one:). I'm not sure that the delay is a good idea...I might check with Sven about his logic in that case ... Dale ----- Original Message ----- | From: "Dale Henrichs" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Tuesday, December 18, 2012 9:38:13 PM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | Ken, | | We're running the ifCurtailed: logic, but there's a Delay in the code | immediately before the socket close that appears to be the culprit | ... I think we monkeyed with the process switching logic for 3.x | which presumably explains the differences in behavior between Pharo, | 2.4.x and 3.1.x ... If I'm not mistaken, we made additional changes | for 3.1.0.2 which might be responsible for the SIGSEGV ... | | The upshot at this point in time is that I think it is likely that | there's not much more work to be done in porting the gemstone2.4 | branch of zinc to 3.1.x ... | | I will talk to one of the vm guys tomorrow and get the scoop on the | process issues... | | Dale | | ----- Original Message ----- | | From: "Dale Henrichs" <[hidden email]> | | To: "GemStone Seaside beta discussion" <[hidden email]> | | Sent: Tuesday, December 18, 2012 8:37:19 PM | | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | | Ken, | | | | The problems are definitely related to socket issues ... at the | | moment it looks like we're not closing sockets when we should (via | | the ifCurtailed logic?)... | | | | I've been using 3.1.0.1 so far and haven't hit the sigsegv...yet. | | | | I'll try to spend some more time tomorrow and get a handle on the | | socket issues. | | | | Dale | | | | ----- Original Message ----- | | | From: "Dale Henrichs" <[hidden email]> | | | To: "GemStone Seaside beta discussion" | | | <[hidden email]> | | | Sent: Tuesday, December 18, 2012 7:50:53 PM | | | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | | | | Ken, | | | | | | I've just finished taking a look at the state of zinc[1] for | | | gemstone | | | and I think that the gemstone3.1 port should be based on the | | | current | | | gemstone2.4 branch and I've merged the gemstone2.4 into | | | gemstone3.1 | | | ... there were a bunch of tests that needed to be changed for | | | gemstone (a number of places where strings and symbols were being | | | compared which causes tests to fail in gemstone) ... besides the | | | forkAt:named: change (which I haven't quite gotten to), | | | | | | It looks to me like the SocketStream class needs to be ported to | | | GemStone 3.1. The SocketStream tests are passing in GemStone3.1, | | | but | | | a number of the failures I see in GemStone3.1 are related to | | | socket | | | errors ... | | | | | | Dale | | | | | | [1] https://github.com/glassdb/zinc | | | ----- Original Message ----- | | | | From: "Ken Treis" <[hidden email]> | | | | To: "GemStone Seaside beta discussion" | | | | <[hidden email]> | | | | Sent: Saturday, December 15, 2012 2:09:31 PM | | | | Subject: [GS/SS Beta] Zinc on 3.1 | | | | | | | | What is the status of Zinc on 3.1? I loaded the gemstone3.1 | | | | branch | | | | from GitHub and tried to start fixing failing tests, but I | | | | managed | | | | to segfault a 3.1.0.2 gem after I tried to push the | | | | forkAt:named: | | | | method up a level in the Block class hierarchy... | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -- | | | | Ken Treis | | | | Miriam Technologies, Inc. | | | | (866) 652-2040 x221 | | | | | | | | | | |
On Dec 20, 2012, at 6:34 PM, Dale Henrichs wrote: I have confirmed that in the code following the delay in the ifCurtailed: block (where the socket is supposed to be closed) is not being executed when the process is terminated in 3.1.0.1 ... Sven and I got streaming responses working in Zinc (for Seaside 3.1 in Pharo), so now I'm looking to try it in GemStone. A couple questions: 1. GRGemStonePlatform>>seasideProcessRequest:adaptor:resultBlock: has an exponential back-off/retry that seems a bit over-the-top: wait 10 ms, then 100ms, 1s, 10s (!), 100s (!!!), etc. Is it really expensive to try to get the session lock? Could we maybe set an overall retry timeout and then just spin with a short delay until we can actually grab it? I implemented a strategy like this for one of our old Seaside 2.8 FastCGI apps as follows: | timeout| 2. Can you help me understand why the lock is necessary? I'm too cowardly to just remove it and see what happens. It seems like the "abort and reprocess on commit failure" approach could obviate the need for the big lock, especially given that WASession uses WAMutualExclusionFilter by default. I'm not sure that the delay is a good idea...I might check with Sven about his logic in that case ... Sven, what is the purpose of the delay in #releaseServerSocket? I removed the delay in my Pharo 2.0 image, and the tests still pass...
-- Ken Treis Miriam Technologies, Inc. |
Sorry, just realized the answer to my own question #2 below. Obviously mutexes are only scoped to one VM (gem) so you'd need the lock to get behavior that the filter intends. I'm not quite as dumb as I look, just.. almost.
Maybe I'm really struggling with why I would need the WAMutualExclusionFilter in my app in the first place...
Ken On Jan 14, 2013, at 1:00 PM, Ken Treis wrote:
|
In reply to this post by Ken Treis
Ken,
No family members were harmed while working on vacation:) With 5 dogs, just being home helps my wife get things done that would otherwise be difficult:) Q#1: Yeah, you are right about the exponential back-off ... something is needed to avoid going super hot waiting for the lock ... it costs a call to the stone to acquire a lock (not horribly expensive), but there is no telling how long it will be before the lock is released and you don't want the gem going too hot polling ... your suggestion looks good, but I'd be inclined to spin fast and then plateau, but your solution is definitely superior to the existing one:) ifCurtailed: bug: The fact that ifCurtailed: block doesn't finish running is definitely a bug that was introduced in 3.x at some point in time. Right now (3.1.0.2) the vm notices that the process is non-responsive and the process is terminated without running any more ifCurtailed: blocks (GsProcess>>terminate9)... definitely too harsh. If push comes to shove I can provide a patch for this particular bug (still working through the best fix ...), but I am also curious about the rationale for the Delay in this particular case ... Dale ----- Original Message ----- | From: "Ken Treis" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Cc: "Sven Van Caekenberghe" <[hidden email]> | Sent: Monday, January 14, 2013 1:00:42 PM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | | On Dec 20, 2012, at 6:34 PM, Dale Henrichs wrote: | | | I have confirmed that in the code following the delay in the | ifCurtailed: block (where the socket is supposed to be closed) is | not being executed when the process is terminated in 3.1.0.1 ... | | Don't know if that's related to your sigsegv (haven't tried yet), but | it is one of the blockades for zinc running in 3.1.0.1 (and maybe | the only one:). | | | Thanks for working on this for me. Weren't you supposed to be on | vacation? I appreciate it and hope you didn't sacrifice any family | time for my sake. | | | Sven and I got streaming responses working in Zinc (for Seaside 3.1 | in Pharo), so now I'm looking to try it in GemStone. A couple | questions: | | | 1. GRGemStonePlatform>>seasideProcessRequest:adaptor:resultBlock: has | an exponential back-off/retry that seems a bit over-the-top: wait 10 | ms, then 100ms, 1s, 10s (!), 100s (!!!), etc. Is it really expensive | to try to get the session lock? Could we maybe set an overall retry | timeout and then just spin with a short delay until we can actually | grab it? I implemented a strategy like this for one of our old | Seaside 2.8 FastCGI apps as follows: | | | | | timeout| | timeout := Time now addSeconds: 60. | [self answerResponderRoleCheckingLock: aFSResponderRole] | whileFalse: [ | Time now > timeout | ifTrue: [ | | string | | string := self internalServerMalfunction: 'The system is too busy, | please try again in a moment.'. | aFSResponderRole nextPutAll: string. | ^self | ]. | (Delay forMilliseconds: 10) wait ]. | | | 2. Can you help me understand why the lock is necessary? I'm too | cowardly to just remove it and see what happens. It seems like the | "abort and reprocess on commit failure" approach could obviate the | need for the big lock, especially given that WASession uses | WAMutualExclusionFilter by default. | | | | | I'm not sure that the delay is a good idea...I might check with Sven | about his logic in that case ... | | I don't know if he's on this list, so I'm copying him. Although it | seems like a bug in GS (delay in ifCurtailed: means that later code | is never run), you're thinking we could get around it if that delay | wasn't necessary. Am I understanding correctly? | | | Sven, what is the purpose of the delay in #releaseServerSocket? I | removed the delay in my Pharo 2.0 image, and the tests still pass... | | | | -- | Ken Treis | Miriam Technologies, Inc. | | |
On Jan 14, 2013, at 4:14 PM, Dale Henrichs wrote: ifCurtailed: bug: From Sven: I am not on any Gemstone lists ;-) I'd say we should try taking out the delay and hope no people/dogs get hurt. Ken
|
Sounds like a plan...
----- Original Message ----- | From: "Ken Treis" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Cc: "Sven Van Caekenberghe" <[hidden email]> | Sent: Monday, January 14, 2013 4:17:23 PM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | | On Jan 14, 2013, at 4:14 PM, Dale Henrichs wrote: | | | ifCurtailed: bug: | | The fact that ifCurtailed: block doesn't finish running is definitely | a bug that was introduced in 3.x at some point in time. Right now | (3.1.0.2) the vm notices that the process is non-responsive and the | process is terminated without running any more ifCurtailed: blocks | (GsProcess>>terminate9)... definitely too harsh. If push comes to | shove I can provide a patch for this particular bug (still working | through the best fix ...), but I am also curious about the rationale | for the Delay in this particular case ... | | From Sven: | | | | | I am not on any Gemstone lists ;-) | | The delay there is from some code that I initially based the server | on, maybe two years ago. It seemed like some safety measure. But I | am not at all surprised that removing it does no harm. On the other | hand, I would also be surprised if it hurt anybody. | | This is all about your networking primitives and the OS interface: | things are never perfect there… | | | I'd say we should try taking out the delay and hope no people/dogs | get hurt. | | | | | Ken |
Getting closer… I had to make a couple of changes to get this working in 3.1:
* Push forkAt:named: up to BlockClosure * Removed delay from ZnSingleThreadedServer>>releaseServerSocket … and I needed a fix to DateAndTime>>asTimeStamp that's in a recent Squeak package but not in the Squeak.v3 package yet. This is all on github under ktreis/zinc in a branch called "no-delay". I hope I did this right -- I'm not very well-versed in git yet.
Debugging is a pain. When a test fails due to an exception in a ZnServer process, it seems that the GemTools image gets out of sync with the backend. It's like there's a backlog of exceptions to work through, and it takes a logout/login to get back in sync. And 3.1.0.2 segfaults all over the place on my Mac when running the Zinc tests. Once I dropped back to 3.1.0.1, I was able to make a little more progress. I'm still getting some test failures/errors, but I think they're mostly related to my development setup. One of them tries to connect to Google by name, but it ends up resolving as an IPv6 address which my network doesn't currently support. I looked quickly but didn't see an easy way within SocketStream / SPort to request only IPv4 addresses. -- Ken Treis Miriam Technologies, Inc. On Jan 14, 2013, at 5:35 PM, Dale Henrichs wrote:
|
Comments embedded ...
----- Original Message ----- | From: "Ken Treis" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Tuesday, January 22, 2013 12:14:23 PM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | Getting closer… I had to make a couple of changes to get this working | in 3.1: | | | * Push forkAt:named: up to BlockClosure | * Removed delay from ZnSingleThreadedServer>>releaseServerSocket Makes sense... | | | … and I needed a fix to DateAndTime>>asTimeStamp that's in a recent | Squeak package but not in the Squeak.v3 package yet. Good catch ... | | | This is all on github under ktreis/zinc in a branch called | "no-delay". I hope I did this right -- I'm not very well-versed in | git yet. That is the right way to do it ... you've isolated your changes in a topic branch and can be easily merged into the 2.4 and 3.1 branches as needed ... | | | | Debugging is a pain. When a test fails due to an exception in a | ZnServer process, it seems that the GemTools image gets out of sync | with the backend. It's like there's a backlog of exceptions to work | through, and it takes a logout/login to get back in sync. And | 3.1.0.2 segfaults all over the place on my Mac when running the Zinc | tests. Once I dropped back to 3.1.0.1, I was able to make a little | more progress. Yes ... if there are multiple threads and each of the threads has an exception pending, things can get a bit muddled ... If there are one or more errors on other threads, they will pop up when you make a new gci call for any reason ... and only one gets raised at a time ... You might find a bit of help by using the process browser as you can get a overview of all of the processes that are running ... Typically I think these things go haywire because the process handling in zinc for multi-threaded processing is not very tolerant of errors ... I've seen a lot of infinite process forking loops get started (especially in the tests) when an error condition is hit ... perhaps the delay in the ensure block simply slowed down the inifinite forking loop enough to make it manageable ... While GemStone does not make it easier to debug the multi-process rats nest ... I think that there are only a handful of spots that need to be cleaned up (where the forks are) and make sure that certain types of errors short-circuit a fork that will lead right back to the original problem ... I have yet to try the zinc stuff in 3.1.0.2, but it sounds like we've got come 3.1.0.2 specific sigsegv problems on the mac that will be easier for us to debug if we can reproduce the problems locally ... Have you been getting sigsegvs in 3.1.0.1 as well? | | | I'm still getting some test failures/errors, but I think they're | mostly related to my development setup. One of them tries to connect | to Google by name, but it ends up resolving as an IPv6 address which | my network doesn't currently support. I looked quickly but didn't | see an easy way within SocketStream / SPort to request only IPv4 | addresses. I think we only do IPv6 calls these days ... I'm not completely up to speed on how IPv4 networks are handled, but I think that you have to have IPv6 support enabled in the kernel for 3.x... | | | | | | | | | | | | -- | Ken Treis | Miriam Technologies, Inc. | | | | On Jan 14, 2013, at 5:35 PM, Dale Henrichs wrote: | | | | Sounds like a plan... | | ----- Original Message ----- | | From: "Ken Treis" < [hidden email] > | | To: "GemStone Seaside beta discussion" < [hidden email] | | > | | Cc: "Sven Van Caekenberghe" < [hidden email] > | | Sent: Monday, January 14, 2013 4:17:23 PM | | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | | | | | | On Jan 14, 2013, at 4:14 PM, Dale Henrichs wrote: | | | | | | ifCurtailed: bug: | | | | The fact that ifCurtailed: block doesn't finish running is | | definitely | | a bug that was introduced in 3.x at some point in time. Right now | | (3.1.0.2) the vm notices that the process is non-responsive and the | | process is terminated without running any more ifCurtailed: blocks | | (GsProcess>>terminate9)... definitely too harsh. If push comes to | | shove I can provide a patch for this particular bug (still working | | through the best fix ...), but I am also curious about the | | rationale | | for the Delay in this particular case ... | | | | From Sven: | | | | | | | | | | I am not on any Gemstone lists ;-) | | | | The delay there is from some code that I initially based the server | | on, maybe two years ago. It seemed like some safety measure. But I | | am not at all surprised that removing it does no harm. On the other | | hand, I would also be surprised if it hurt anybody. | | | | This is all about your networking primitives and the OS interface: | | things are never perfect there… | | | | | | I'd say we should try taking out the delay and hope no people/dogs | | get hurt. | | | | | | | | | | Ken | | |
On Jan 22, 2013, at 1:29 PM, Dale Henrichs wrote:
Found a little more on this -- I think it may be partially in Sport and partially in GS. I have hacked around Sport for now, but the larger issue is inside a primitive invoked by GsSocket>>connectTo:on:. When debugging `GsSocket new connectTo: 80 on: 'www.google.com'`, I see that the address list returned by two-arg primitive #25 only has one entry: Google's IPv6 address. As far as I can tell, the list should contain multiple entries, like what I get from the command line: $ host www.google.com www.google.com has address 74.125.141.99 www.google.com has address 74.125.141.105 www.google.com has address 74.125.141.104 www.google.com has address 74.125.141.106 www.google.com has address 74.125.141.147 www.google.com has address 74.125.141.103 www.google.com has IPv6 address 2607:f8b0:400e:c00::69 I'm guessing that the primitive is preferring the IPv6 address, if present, at the expense of the IPv4 addresses. Ken |
Ciao,
+1 relative to the www.google.com IP resolution. I read this only now, my last email : HTTP request for 'maps.google.com' don't answer into GLASS is relative to the same problem. Dario |
Ciao, Any idea about it? Thanks, Dario
|
Dario and Ken,
I'm having one of our engineers look into the ipv4/ipv6 issues ... Dale ----- Original Message ----- | From: "Dario Trussardi" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Tuesday, January 29, 2013 2:52:42 AM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | Ciao, | | | someone addressed the issue relative to the www.google.com IP | resolution into GLASS 3.1.0.2? | | | Any idea about it? | | | Thanks, | | | Dario | | | | | | | Ciao, | | | | | | | | | On Jan 22, 2013, at 1:29 PM, Dale Henrichs wrote: | | | | | … and I needed a fix to DateAndTime>>asTimeStamp that's in a recent | | Squeak package but not in the Squeak.v3 package yet. | | Good catch ... | | | Published as Squeak.v3-KenTreis.287.1 in case it's helpful for your | merge. | | | | | Typically I think these things go haywire because the process | handling in zinc for multi-threaded processing is not very tolerant | of errors ... I've seen a lot of infinite process forking loops get | started (especially in the tests) when an error condition is hit ... | perhaps the delay in the ensure block simply slowed down the | inifinite forking loop enough to make it manageable ... | | While GemStone does not make it easier to debug the multi-process | rats nest ... I think that there are only a handful of spots that | need to be cleaned up (where the forks are) and make sure that | certain types of errors short-circuit a fork that will lead right | back to the original problem … | | | I may have seen this too, some failing tests would lead to a temp | object memory overflow. I didn't characterize it any further. | | | | | I have yet to try the zinc stuff in 3.1.0.2, but it sounds like we've | got come 3.1.0.2 specific sigsegv problems on the mac that will be | easier for us to debug if we can reproduce the problems locally ... | Have you been getting sigsegvs in 3.1.0.1 as well? | | | Just in 3.1.0.2. I haven't seen any in 3.1.0.1. | | | | | | | I'm still getting some test failures/errors, but I think they're | | mostly related to my development setup. One of them tries to | | connect | | to Google by name, but it ends up resolving as an IPv6 address | | which | | my network doesn't currently support. I looked quickly but didn't | | see an easy way within SocketStream / SPort to request only IPv4 | | addresses. | | I think we only do IPv6 calls these days ... I'm not completely up to | speed on how IPv4 networks are handled, but I think that you have to | have IPv6 support enabled in the kernel for 3.x… | | | | Found a little more on this -- I think it may be partially in Sport | and partially in GS. I have hacked around Sport for now, but the | larger issue is inside a primitive invoked by | GsSocket>>connectTo:on:. When debugging `GsSocket new connectTo: 80 | on: 'www.google.com'`, I see that the address list returned by | two-arg primitive #25 only has one entry: Google's IPv6 address. As | far as I can tell, the list should contain multiple entries, like | what I get from the command line: | | | $ host www.google.com | | www.google.com has address 74.125.141.99 | www.google.com has address 74.125.141.105 | www.google.com has address 74.125.141.104 | www.google.com has address 74.125.141.106 | www.google.com has address 74.125.141.147 | www.google.com has address 74.125.141.103 | www.google.com has IPv6 address 2607:f8b0:400e:c00::69 | | | I'm guessing that the primitive is preferring the IPv6 address, if | present, at the expense of the IPv4 addresses. | | | | | Ken | | +1 relative to the www.google.com IP resolution. | | | I read this only now, my last email : HTTP request for | 'maps.google.com' don't answer into GLASS | | | is relative to the same problem. | | | Dario | | | | | | | |
Dario and Ken,
We've gotten to the bottom of the problem and we have some bugs in our socket code related to IPv6 and IPv4 addresses ... As Ken observed, we are not listing all of the entries that we should. The primitive code used in both GsSocket class>>getHostAddressesByName: and GsSocket>>connectTo:on:timeoutMs: has the same bug. We have a bug fix, but it is in the primitive C code. The fix will be included in 3.1.0.3 when it becomes available. The only workaround that we know of is to either use on of the IPv4 addresses (acquired via a different means ... System class>>performOnServer: probably) or to enable IPv6 support on your machine. Dale ----- Original Message ----- | From: "Dale Henrichs" <[hidden email]> | To: "GemStone Seaside beta discussion" <[hidden email]> | Sent: Friday, February 1, 2013 10:13:45 AM | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | Dario and Ken, | | I'm having one of our engineers look into the ipv4/ipv6 issues ... | | Dale | | ----- Original Message ----- | | From: "Dario Trussardi" <[hidden email]> | | To: "GemStone Seaside beta discussion" <[hidden email]> | | Sent: Tuesday, January 29, 2013 2:52:42 AM | | Subject: Re: [GS/SS Beta] Zinc on 3.1 | | | | | | Ciao, | | | | | | someone addressed the issue relative to the www.google.com IP | | resolution into GLASS 3.1.0.2? | | | | | | Any idea about it? | | | | | | Thanks, | | | | | | Dario | | | | | | | | | | | | | | Ciao, | | | | | | | | | | | | | | | | | | On Jan 22, 2013, at 1:29 PM, Dale Henrichs wrote: | | | | | | | | | … and I needed a fix to DateAndTime>>asTimeStamp that's in a | | | recent | | | Squeak package but not in the Squeak.v3 package yet. | | | | Good catch ... | | | | | | Published as Squeak.v3-KenTreis.287.1 in case it's helpful for your | | merge. | | | | | | | | | | Typically I think these things go haywire because the process | | handling in zinc for multi-threaded processing is not very tolerant | | of errors ... I've seen a lot of infinite process forking loops get | | started (especially in the tests) when an error condition is hit | | ... | | perhaps the delay in the ensure block simply slowed down the | | inifinite forking loop enough to make it manageable ... | | | | While GemStone does not make it easier to debug the multi-process | | rats nest ... I think that there are only a handful of spots that | | need to be cleaned up (where the forks are) and make sure that | | certain types of errors short-circuit a fork that will lead right | | back to the original problem … | | | | | | I may have seen this too, some failing tests would lead to a temp | | object memory overflow. I didn't characterize it any further. | | | | | | | | | | I have yet to try the zinc stuff in 3.1.0.2, but it sounds like | | we've | | got come 3.1.0.2 specific sigsegv problems on the mac that will be | | easier for us to debug if we can reproduce the problems locally ... | | Have you been getting sigsegvs in 3.1.0.1 as well? | | | | | | Just in 3.1.0.2. I haven't seen any in 3.1.0.1. | | | | | | | | | | | | | I'm still getting some test failures/errors, but I think they're | | | mostly related to my development setup. One of them tries to | | | connect | | | to Google by name, but it ends up resolving as an IPv6 address | | | which | | | my network doesn't currently support. I looked quickly but didn't | | | see an easy way within SocketStream / SPort to request only IPv4 | | | addresses. | | | | I think we only do IPv6 calls these days ... I'm not completely up | | to | | speed on how IPv4 networks are handled, but I think that you have | | to | | have IPv6 support enabled in the kernel for 3.x… | | | | | | | | Found a little more on this -- I think it may be partially in Sport | | and partially in GS. I have hacked around Sport for now, but the | | larger issue is inside a primitive invoked by | | GsSocket>>connectTo:on:. When debugging `GsSocket new connectTo: 80 | | on: 'www.google.com'`, I see that the address list returned by | | two-arg primitive #25 only has one entry: Google's IPv6 address. As | | far as I can tell, the list should contain multiple entries, like | | what I get from the command line: | | | | | | $ host www.google.com | | | | www.google.com has address 74.125.141.99 | | www.google.com has address 74.125.141.105 | | www.google.com has address 74.125.141.104 | | www.google.com has address 74.125.141.106 | | www.google.com has address 74.125.141.147 | | www.google.com has address 74.125.141.103 | | www.google.com has IPv6 address 2607:f8b0:400e:c00::69 | | | | | | I'm guessing that the primitive is preferring the IPv6 address, if | | present, at the expense of the IPv4 addresses. | | | | | | | | | | Ken | | | | +1 relative to the www.google.com IP resolution. | | | | | | I read this only now, my last email : HTTP request for | | 'maps.google.com' don't answer into GLASS | | | | | | is relative to the same problem. | | | | | | Dario | | | | | | | | | | | | | | | |
Free forum by Nabble | Edit this page |