networking problem (repost)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

networking problem (repost)

Michael Roberts-2
i'm not sure this email came through whilst the list was down
sunday/monday so i am reposting by request. i will add some further
content at the end
=====================================================
I have dug around the network problem (OS X) in Pharo 1.0. I am
convinced #useOldNetwork = true does not work.  Looking through old
core images, it never worked; at least with my current OS and VM
choice.

i have been using Magma as a test case. Indeed it is broken on 10508
rc2, however if I set

NetNameResolver classPool at: #UseOldNetwork put: false

and hack #initializeNetwork to make the reference false as well (since
it can change its value once set)... I can run the first 10mins of the
Magma test suite which is heavy on network use.  I only can't complete
the magma test case due to unrelated issues.  I've asked Chris to see
if these changes work for him.

i propose in Pharo Core 1.0
-set UseOldNetwork := false, and change all references that try and
set it to true
-now we are only using the new network infrastructure,
-gather fresh test cases from the community where networking fails and
fix them with a final push.
-do not try and fix the UseOldNetwork := true branches

If we can not get this approach above to work then as a fallback I
propose to remove all the new network code and revert it all back to
the point where it was simpler where socket addresses were just byte
arrays...We then spend more time in 1.1 trying to get it to work
properly. Does anyone care at this point how the infrastructure looks
in 1.0? The fact that our rc2 network layer is totally bust is enough
to do something drastic to finish the cycle.

I propose in Pharo 1.1
-strip out all useOldNetwork = true branches.  it does not work, and
it is a mess to have these two branches side by side, especially when
the value of useOldNetwork can change on its own (this happened a
number of times to me) I guess due to all the attempts to check and
initialize the network. somewhere it flips.

obviously we need some focus on testing a final set of changes, but
please consider the direction I outline.

thanks,
Mike
====================================================

I spent some time after writing this above looking at the magma test
case more deeply and chatting to Chris offline.  After fixing a
quoting issue with magma image launching i managed to get the magma
test suite to run for about 30 mins (until it stopped...). So in
principle the networking does work...

Then to my astonishment i could not get it to run again at all for
another few hours, i got no where... it was like it was a dream. I
suspect the problem is the exact network interfaces that the server
and client try to connect/bind to. In the old ipv4 world it was fairly
simple.  i suspect in the ipv6 world, different parts of magma are
binding to different interfaces and i find it impenetrable to work out
which. i have lots of interfaces on my machine (wifi, ethernet,
parallels virtual, etc) and when you do NetNameResolver
localHostAddress (or whatever the message is) you get back only the
first one. I think this could be the cause of issues. Adrian mentions
to me John thought they came back randomly.

So now my conclusion is firmly we have to take a step back and
simplify.  for pharo 1.0 unless we have someone with a lot of time i
do not feel we can untangle the resolver and socket code.  We should
revert it back to the original #[192 168 0 10] style code when it used
to be simple. This assumes the primitives still support this. Then
what I would like to see in Pharo 1.1 or 1.2, is that we separate out
classes for ip4 and ip6 support, and make them work cleanly and
transparently together. Just because we have ip6 support i don't want
to be forced to use it. we should have a clearer interface to
enumerate all the network interfaces on the machine, and we should
really document this heavily. I think the new help system would be
great for this.  I started to have to write my own scripts just to
document the change in API across images. these can go straight into a
help system...

cheers,
Mike

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

Michael Rueger-6
On 3/9/2010 10:08 AM, Michael Roberts wrote:

> So now my conclusion is firmly we have to take a step back and
> simplify.  for pharo 1.0 unless we have someone with a lot of time i
> do not feel we can untangle the resolver and socket code.  We should

As the one originally being responsible for integrating the new socket
code I'm voting in favor of this solution.

The new socket code was initiated in the OLPC project and I was
(falsely) assuming that it it would be stable and/or be maintained to a
satisfactory level.
So either the OLPC people have fixes that we don't know about (just
looked at the latest etoys image and the code seems to be the same) or
it just happens to work for them. Or people using the etoys version
simply don't do anything exciting with the network.


Michael

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

Schwab,Wilhelm K
The only thing I do not like is going back to when "addresses were just byte arrays."  I think we should probably do that in the interest of time, but we in fact need to go the other way (certainly for 1.1), toward an InternetAddress class with aspects for address and host name (allowing lazy resolution in either direction).  We should also separate port numbers from the address.

Another thing we should do (again no later than 1.1) is separate the resolver into IPv4 and 6 subclasses to make it understandable.

Bill



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Michael Rueger
Sent: Tuesday, March 09, 2010 4:39 AM
To: [hidden email]
Subject: Re: [Pharo-project] networking problem (repost)

On 3/9/2010 10:08 AM, Michael Roberts wrote:

> So now my conclusion is firmly we have to take a step back and
> simplify.  for pharo 1.0 unless we have someone with a lot of time i
> do not feel we can untangle the resolver and socket code.  We should

As the one originally being responsible for integrating the new socket code I'm voting in favor of this solution.

The new socket code was initiated in the OLPC project and I was
(falsely) assuming that it it would be stable and/or be maintained to a satisfactory level.
So either the OLPC people have fixes that we don't know about (just looked at the latest etoys image and the code seems to be the same) or it just happens to work for them. Or people using the etoys version simply don't do anything exciting with the network.


Michael

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

Miguel Cobá
In reply to this post by Michael Roberts-2
El mar, 09-03-2010 a las 09:08 +0000, Michael Roberts escribió:

> i'm not sure this email came through whilst the list was down
> sunday/monday so i am reposting by request. i will add some further
> content at the end
> =====================================================
> I have dug around the network problem (OS X) in Pharo 1.0. I am
> convinced #useOldNetwork = true does not work.  Looking through old
> core images, it never worked; at least with my current OS and VM
> choice.
>
> i have been using Magma as a test case. Indeed it is broken on 10508
> rc2, however if I set
>
> NetNameResolver classPool at: #UseOldNetwork put: false
>
> and hack #initializeNetwork to make the reference false as well (since
> it can change its value once set)... I can run the first 10mins of the
> Magma test suite which is heavy on network use.  I only can't complete
> the magma test case due to unrelated issues.  I've asked Chris to see
> if these changes work for him.
>
> i propose in Pharo Core 1.0
> -set UseOldNetwork := false, and change all references that try and
> set it to true
> -now we are only using the new network infrastructure,
> -gather fresh test cases from the community where networking fails and
> fix them with a final push.
> -do not try and fix the UseOldNetwork := true branches
>
> If we can not get this approach above to work then as a fallback I
> propose to remove all the new network code and revert it all back to
> the point where it was simpler where socket addresses were just byte
> arrays...We then spend more time in 1.1 trying to get it to work
> properly. Does anyone care at this point how the infrastructure looks
> in 1.0? The fact that our rc2 network layer is totally bust is enough
> to do something drastic to finish the cycle.
>
> I propose in Pharo 1.1
> -strip out all useOldNetwork = true branches.  it does not work, and
> it is a mess to have these two branches side by side, especially when
> the value of useOldNetwork can change on its own (this happened a
> number of times to me) I guess due to all the attempts to check and
> initialize the network. somewhere it flips.
>
> obviously we need some focus on testing a final set of changes, but
> please consider the direction I outline.
>
> thanks,
> Mike
> ====================================================
>
> I spent some time after writing this above looking at the magma test
> case more deeply and chatting to Chris offline.  After fixing a
> quoting issue with magma image launching i managed to get the magma
> test suite to run for about 30 mins (until it stopped...). So in
> principle the networking does work...
>
> Then to my astonishment i could not get it to run again at all for
> another few hours, i got no where... it was like it was a dream. I
> suspect the problem is the exact network interfaces that the server
> and client try to connect/bind to. In the old ipv4 world it was fairly
> simple.  i suspect in the ipv6 world, different parts of magma are
> binding to different interfaces and i find it impenetrable to work out
> which. i have lots of interfaces on my machine (wifi, ethernet,
> parallels virtual, etc) and when you do NetNameResolver
> localHostAddress (or whatever the message is) you get back only the
> first one. I think this could be the cause of issues. Adrian mentions
> to me John thought they came back randomly.
>
> So now my conclusion is firmly we have to take a step back and
> simplify.  for pharo 1.0 unless we have someone with a lot of time i
> do not feel we can untangle the resolver and socket code.  We should
> revert it back to the original #[192 168 0 10] style code when it used
> to be simple. This assumes the primitives still support this. Then
> what I would like to see in Pharo 1.1 or 1.2, is that we separate out
> classes for ip4 and ip6 support, and make them work cleanly and
> transparently together. Just because we have ip6 support i don't want
> to be forced to use it. we should have a clearer interface to
> enumerate all the network interfaces on the machine, and we should
> really document this heavily. I think the new help system would be
> great for this.  I started to have to write my own scripts just to
> document the change in API across images. these can go straight into a
> help system...

Yes, please, I vote for reverting to the old working code. In 1.1 or now
with the GSoC we can propose a new infrastructure for Networking.


>
> cheers,
> Mike
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
Miguel Cobá
http://miguel.leugim.com.mx


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

johnmci
Well Michael is dropping by for a visit. I think we'll reserve a few hours to figure out the technical side
of what the proper answer is when you want to know what your IP and interface in use is.  Either this
is easy to do and *fixes* all the hassles, or unlikely...

From what I gather this is the key issue of the IPV6 usage ?

>
> Yes, please, I vote for reverting to the old working code. In 1.1 or now
> with the GSoC we can propose a new infrastructure for Networking.

--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================





_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

Stéphane Ducasse
In reply to this post by Michael Rueger-6
Thanks mike for the update.

Stef

>> So now my conclusion is firmly we have to take a step back and
>> simplify.  for pharo 1.0 unless we have someone with a lot of time i
>> do not feel we can untangle the resolver and socket code.  We should
>
> As the one originally being responsible for integrating the new socket code I'm voting in favor of this solution.
>
> The new socket code was initiated in the OLPC project and I was (falsely) assuming that it it would be stable and/or be maintained to a satisfactory level.
> So either the OLPC people have fixes that we don't know about (just looked at the latest etoys image and the code seems to be the same) or it just happens to work for them. Or people using the etoys version simply don't do anything exciting with the network.
>
>
> Michael
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

Stéphane Ducasse
In reply to this post by johnmci
Excellent!
Have fun... and beers from us :)

Stef

On Mar 9, 2010, at 5:25 PM, John M McIntosh wrote:

> Well Michael is dropping by for a visit. I think we'll reserve a few hours to figure out the technical side
> of what the proper answer is when you want to know what your IP and interface in use is.  Either this
> is easy to do and *fixes* all the hassles, or unlikely...
>
> From what I gather this is the key issue of the IPV6 usage ?
>
>>
>> Yes, please, I vote for reverting to the old working code. In 1.1 or now
>> with the GSoC we can propose a new infrastructure for Networking.
>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ===========================================================================
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

Michael Roberts-2
In reply to this post by johnmci
Hi John, thanks for volunteering!  I have dug into the magma issue a
bit more. it is difficult to say.  i can describe my test case, and
YMMV.

(if you need more detail just ping me off list, i suspect your own
investigation path would bear interesting fruit)
-take latest rc 2 Pharo image

-force use new network class var
-force default on initialization to use new network

-install OSProcess
-install Magma
-fix os process launching issue in magma code
-stop magma changing OS x display resolution (annoying)
-run magma test suite
-<launches 4 images>

so the first thing to check is that the os process launch does not
have the side-effect  of killing the network. i suspect not, there are
so many hooks to reinit.

i tried to force server creation to the local host address, if so the
test failure reduces to this:

| if socket |
if := NetNameResolver localHostAddress.
socket := Socket newTCP.
socket
        listenOn: xxxxx
        backlogSize: 50
        interface: if.
socket isValid

where if is an ipv6 structure.  socket isValid returns false, so it is
dead.  However, if I look on the shell

mike-mac:~ mike$ lsof -i4 | grep Squeak
Squeak     9944 mike    9u  IPv4 0x6a9f66c      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   10u  IPv4 0x6ab0a68      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   11u  IPv4 0x6a75a68      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   12u  IPv4 0x8e8a270      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   13u  IPv4 0x6a9ee64      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   14u  IPv4 0x6aa4a68      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   15u  IPv4 0x6a94a68      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   16u  IPv4 0x8e86e64      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   17u  IPv4 0x6a7966c      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   18u  IPv4 0x6539e64      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   19u  IPv4 0x919066c      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   21u  IPv4 0x9192a68      0t0  TCP *:* (CLOSED)
Squeak     9944 mike   22u  IPv4 0x918866c      0t0  TCP *:* (CLOSED)
mike-mac:~ mike$ lsof -i6 | grep Squeak
mike-mac:~ mike$

so i have lots of dead ipv4 sockets, and no ipv6 ones... even though
netname resolver returned an ip6 structure. i guess i don't know
really how i expect it to work. maybe just using the wrong api...

I also tried clamping the interface to a known ip address on my machine i.e

if := NetNameResolver addressForName: '192.168.1.15'.

but it does not work either.

hope that debugging helps in any way.

cheers,
Mike

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: networking problem (repost)

Michael Roberts-2
but if i do this

| if socket |
if := NetNameResolver localHostAddress.
socket := Socket newTCP.
socket bindTo: if.
socket listenWithBacklog: 50.
socket isValid.

then it does at least be a socket lsof says is listening.  perhaps
there is a case where the old socket code worked up to a point it
broke. and then the assumption was one could just switch the flag to
'new', but no one has updated use of the API. with so much old and new
code on the same classes i can see why one would assume it works....

I think for pharo 1.0 we really need a functioning ipv4 layer -
however we do it.

cheers,
Mike

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project