squeak on opensolaris 2009.06

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

squeak on opensolaris 2009.06

Randal L. Schwartz
 

I'm trying to get squeak running on opensolaris 2009.06 (in the cloud on
Amazon EC2, so I don't have a desktop).

I've tried both the squeakvm.org prebuilt binaries and the binaries
I've built from the src tarball.

I'm getting roughly the same behavior in both a Seaside image and an image to
which I've added RFB.  They work fine locally (OSX 10.5 with a closure VM),
but they seem to be hung up waiting for I/O when I start them on the cloud.

As in, I can connect to the RFB image, and it takes about 20 seconds to
refresh.  If I click on anything, no response.  But if I break the RFB
connection, and connect again, I get the *result* of that action.

It might be related to the fact that 2009.06 solaris kernel added some extra
error return from the poll() call that nobody was expecting (which broke
apache), and I presume Squeak is using poll at the lowest level.

Here's the stack where squeak seems to be frozen:

 cf6b1687 pollsys  (8047850, 1, 8047908, 0)
 cf65ce01 pselect  (8, 8047a50, 80479d0, 8047950, 8047908, 0) + 199
 cf65d1d6 select   (8, 8047a50, 80479d0, 8047950, 8047948, 80f94e0) + 78
 0807d45a aioPoll  (0, 0, 0, 0, 0, 0) + 10a
 00000000 ???????? ()

Is anyone using Squeak on solaris/opensolaris and can help me?
I'd hate to have to keep renting my openbsd box just for this app. :(

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

johnmci
 
Well on the iPhone flavour of OS-X I determined that if in aioPoll.c  
that if
maxFd was zero and you did the

      n= select(maxFd, &rd, &wr, &ex, &tv);

then the return value of 'n' was random junk.  This of course was a  
problem.
apple bug id 5971805, now closed in 3.x unable to recreate.

To work around it I altered the check for maxFd

   ms= ioMSecs();
  if (maxFd == 0)
         return 0;

   for (;;)
     {

However this likely isn't your problem since you have a socket open  
for listen?
Assuming your
cf65d1d6 select   (8, 8047a50, 80479d0, 8047950, 8047948, 80f94e0) + 78
means 8 file/socket descriptors in use?


The other issue of course is if the image has the patch applied for  
the process lockup issue if the
image is running as a headless server, and you toggled the server  
mode, aka

Preferences togglePreference: #serverMode

That of course depends on what the image heritage is.


On 2009-11-07, at 12:45 PM, Randal L. Schwartz wrote:

>
>
> I'm trying to get squeak running on opensolaris 2009.06 (in the  
> cloud on
> Amazon EC2, so I don't have a desktop).
>
> I've tried both the squeakvm.org prebuilt binaries and the binaries
> I've built from the src tarball.
>
> I'm getting roughly the same behavior in both a Seaside image and an  
> image to
> which I've added RFB.  They work fine locally (OSX 10.5 with a  
> closure VM),
> but they seem to be hung up waiting for I/O when I start them on the  
> cloud.
>
> As in, I can connect to the RFB image, and it takes about 20 seconds  
> to
> refresh.  If I click on anything, no response.  But if I break the RFB
> connection, and connect again, I get the *result* of that action.
>
> It might be related to the fact that 2009.06 solaris kernel added  
> some extra
> error return from the poll() call that nobody was expecting (which  
> broke
> apache), and I presume Squeak is using poll at the lowest level.
>
> Here's the stack where squeak seems to be frozen:
>
> cf6b1687 pollsys  (8047850, 1, 8047908, 0)
> cf65ce01 pselect  (8, 8047a50, 80479d0, 8047950, 8047908, 0) + 199
> cf65d1d6 select   (8, 8047a50, 80479d0, 8047950, 8047948, 80f94e0) +  
> 78
> 0807d45a aioPoll  (0, 0, 0, 0, 0, 0) + 10a
> 00000000 ???????? ()
>
> Is anyone using Squeak on solaris/opensolaris and can help me?
> I'd hate to have to keep renting my openbsd box just for this app. :(
>
> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503  
> 777 0095
> <[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside  
> discussion

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

Randal L. Schwartz
 
>>>>> "John" == John M McIntosh <[hidden email]> writes:

John> However this likely isn't your problem since you have a socket open  for
John> listen?
John> Assuming your
John> cf65d1d6 select   (8, 8047a50, 80479d0, 8047950, 8047948, 80f94e0) + 78
John> means 8 file/socket descriptors in use?

I suspect it does.


John> The other issue of course is if the image has the patch applied for  the
John> process lockup issue if the
John> image is running as a headless server, and you toggled the server  mode, aka

John> Preferences togglePreference: #serverMode

John> That of course depends on what the image heritage is.

I'm using Squeak 3.10.2-trunk as of today, with the latest RFB from squeak
source, and it *does* work headful(? as in non-headless :-) on my laptop.  I
didn't switch any preference bits before uploading it.  Should I have?

What is the purpose of #serverMode?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

johnmci
 

On 2009-11-07, at 1:52 PM, Randal L. Schwartz wrote:

> I'm using Squeak 3.10.2-trunk as of today, with the latest RFB from  
> squeak
> source, and it *does* work headful(? as in non-headless :-) on my  
> laptop.  I
> didn't switch any preference bits before uploading it.  Should I have?
>
> What is the purpose of #serverMode?

http://bugs.squeak.org/view.php?id=6581

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

Randal L. Schwartz
 
>>>>> "John" == John M McIntosh <[hidden email]> writes:

John> On 2009-11-07, at 1:52 PM, Randal L. Schwartz wrote:

>> I'm using Squeak 3.10.2-trunk as of today, with the latest RFB from  squeak
>> source, and it *does* work headful(? as in non-headless :-) on my  laptop.  I
>> didn't switch any preference bits before uploading it.  Should I have?
>>
>> What is the purpose of #serverMode?

John> http://bugs.squeak.org/view.php?id=6581

OK, I tried enabling that, and no change.  Any other ideas?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

David T. Lewis
In reply to this post by Randal L. Schwartz
 
On Sat, Nov 07, 2009 at 12:45:35PM -0800, Randal L. Schwartz wrote:

>  
> It might be related to the fact that 2009.06 solaris kernel added some extra
> error return from the poll() call that nobody was expecting (which broke
> apache), and I presume Squeak is using poll at the lowest level.
>
> Here's the stack where squeak seems to be frozen:
>
>  cf6b1687 pollsys  (8047850, 1, 8047908, 0)
>  cf65ce01 pselect  (8, 8047a50, 80479d0, 8047950, 8047908, 0) + 199
>  cf65d1d6 select   (8, 8047a50, 80479d0, 8047950, 8047948, 80f94e0) + 78
>  0807d45a aioPoll  (0, 0, 0, 0, 0, 0) + 10a
>  00000000 ???????? ()

This is probably a red herring. An idle image will normally be spending
most of its time in aioPoll(), so if you interrupt it you are most
likely to see a backtrace like this. If aioPoll() were not working,
you would certainly see other problems in a normal headfull image.

The function in the unix VM is (from sqaio.h):

  /* Sleep for at most `microSeconds'.  Any event(s) arriving for
   * handled fd(s) will terminate the sleep, with the appropriate
   * handler(s) being called before returning.
   */
  extern int aioPoll(int microSeconds);

Which is implemented using select() in platforms/unix/vm/aio.c. Off
topic but FYI, the aio functions are accessible from the image via
the AioPlugin, which is distributed with Unix VMs.

Dave

Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

johnmci
In reply to this post by Randal L. Schwartz
 
Mmm well then you are into following the procedures as outlined in

http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-July/119023.html

setup the message tracing and process stack stump logic in a VM and
use that to figure out why the VM is not responding.


On 2009-11-07, at 2:48 PM, Randal L. Schwartz wrote:

>>>>>> "John" == John M McIntosh <[hidden email]>  
>>>>>> writes:
>
> John> On 2009-11-07, at 1:52 PM, Randal L. Schwartz wrote:
>
>>> I'm using Squeak 3.10.2-trunk as of today, with the latest RFB  
>>> from  squeak
>>> source, and it *does* work headful(? as in non-headless :-) on my  
>>> laptop.  I
>>> didn't switch any preference bits before uploading it.  Should I  
>>> have?
>>>
>>> What is the purpose of #serverMode?
>
> John> http://bugs.squeak.org/view.php?id=6581
>
> OK, I tried enabling that, and no change.  Any other ideas?
>
> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503  
> 777 0095
> <[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside  
> discussion

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

Randal L. Schwartz
 
>>>>> "John" == John M McIntosh <[hidden email]> writes:

John> Mmm well then you are into following the procedures as outlined in
John> http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-July/119023.html

John> setup the message tracing and process stack stump logic in a VM and
John> use that to figure out why the VM is not responding.

I've launched Squeak successfully in VirtualBox running OpenSolaris
2009.06, using the squeakvm.org's Solaris binaries and the 3.10.2-trunk
image.

However, every time it tries to open a socket, I get:

  sqConnectToPort: Interrupted system call.

for valid connections, and

  sqConnectToPort: Connection refused

for non-existing ports.

This may be related to the new returns possible from poll(), the
same thing that broke Apache.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

johnmci
 
That's because in
void sqSocketConnectToPort(SocketPtr s, sqInt addr, sqInt port)

       result= connect(SOCKET(s), (struct sockaddr *)&saddr, sizeof
(saddr));

does not check for

  [EINTR]            Its execution was interrupted by a signal.

which should cause a retry condition.

The  sqUnixSocket.c  is extremely poor in it's ability to handle EINTR  
return code.
You could check with whoever support that code and see if it can be  
fixed.

Also I recall something about you being able to turn timer interrupts  
off via a unix squeak cmd line option which might help a bit?


On 2009-11-08, at 12:17 PM, Randal L. Schwartz wrote:

> However, every time it tries to open a socket, I get:
>
>  sqConnectToPort: Interrupted system call.

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

Bert Freudenberg
 

On 08.11.2009, at 22:06, John M McIntosh wrote:

> That's because in
> void sqSocketConnectToPort(SocketPtr s, sqInt addr, sqInt port)
>
>      result= connect(SOCKET(s), (struct sockaddr *)&saddr, sizeof
> (saddr));
>
> does not check for
>
> [EINTR]            Its execution was interrupted by a signal.
>
> which should cause a retry condition.
>
> The  sqUnixSocket.c  is extremely poor in it's ability to handle  
> EINTR return code.
> You could check with whoever support that code and see if it can be  
> fixed.
>
> Also I recall something about you being able to turn timer  
> interrupts off via a unix squeak cmd line option which might help a  
> bit?
>
>
> On 2009-11-08, at 12:17 PM, Randal L. Schwartz wrote:
>
>> However, every time it tries to open a socket, I get:
>>
>> sqConnectToPort: Interrupted system call.

This very much sounds like it could be the issue. A couple years back  
I was called in to make the Browser plugin work on Solaris. The main  
issue turned out to be unhandled return codes. The dominance of Linux  
makes us sloppy I guess ...

- Bert -

Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

Randal L. Schwartz
 
>>>>> "Bert" == Bert Freudenberg <[hidden email]> writes:

>>> sqConnectToPort: Interrupted system call.

Bert> This very much sounds like it could be the issue. A couple years back I
Bert> was called in to make the Browser plugin work on Solaris. The main issue
Bert> turned out to be unhandled return codes. The dominance of Linux makes us
Bert> sloppy I guess ...

Is it just a matter of looping when you see EINTR, or is there a reason that
should send you down some other path, handled at a higher level (above the VM,
for example)?

I'm gonna try building a VM in the next couple of days, and I need to
understand how to handle this.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
Reply | Threaded
Open this post in threaded view
|

Re: squeak on opensolaris 2009.06

johnmci
 
Something like

do {
result = some_system_call(...);
} while (result <  0 &&  errno == EINTR);
...

for the recv, select, close, recvfrom, read, write
and likely I've missed some and also haven't considered the ipv6 calls.

So for example the

static int socketReadable(int s)
{
   char buf[1];
   int n= recv(s, (void *)buf, 1, MSG_PEEK);
   if (n > 0) return 1;
   if ((n < 0) && (errno == EWOULDBLOCK)) return 0;
   return -1; /* EOF */
}

would become

static int socketReadable(int s)
{
   char buf[1];

   int n;
   do  {
      n = recv(s, (void *)buf, 1, MSG_PEEK);
   } while (n <  0 &&  errno == EINTR);

   if (n > 0) return 1;
   if ((n < 0) && (errno == EWOULDBLOCK)) return 0;
   return -1; /* EOF */
}

On 2009-11-09, at 6:10 AM, Randal L. Schwartz wrote:

>
>>>>>> "Bert" == Bert Freudenberg <[hidden email]> writes:
>
>>>> sqConnectToPort: Interrupted system call.
>
> Bert> This very much sounds like it could be the issue. A couple  
> years back I
> Bert> was called in to make the Browser plugin work on Solaris. The  
> main issue
> Bert> turned out to be unhandled return codes. The dominance of  
> Linux makes us
> Bert> sloppy I guess ...
>
> Is it just a matter of looping when you see EINTR, or is there a  
> reason that
> should send you down some other path, handled at a higher level  
> (above the VM,
> for example)?
>
> I'm gonna try building a VM in the next couple of days, and I need to
> understand how to handle this.
>
> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503  
> 777 0095
> <[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside  
> discussion

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================