Linux locks up when handling large data sets

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Linux locks up when handling large data sets

Stan Shepherd
Hi, I've been testing Squeak's ability to handle large amounts of data. This
snippet:

testArrayFilling
        | startTime endTime iArray jArray kArray |
        iArray := Array ofSize: 100.
        1
                to: 100
                do: [:i |
                        jArray := Array ofSize: 1000.
                        1
                                to: 1000
                                do: [:j |
                                        startTime := Time millisecondClockValue.
                                        kArray := Array ofSize: 1000.
                                        1
                                                to: 1000
                                                do: [:k | kArray at: k put: Object new].
                                        jArray at: j put: kArray.
                                        endTime := Time millisecondClockValue.
                                        Transcript cr; show: i asString , ',' , j asString , ' ' , (endTime -
startTime) asFloat asString.
                                        startTime := Time millisecondClockValue].
                        iArray at: i put: jArray].
        Transcript cr; show: 'Finished'

creates about 5 million objects , then the image freezes. When I run on the same
machine under Windows, it happily continues until the short of memory warning (
about 70 Million objects in my case). The VM is 3.7 in both cases, the image
Damien's 3.9 development image.

There seem to be a number of Unix lockup issues. Is a later (or earlier) VM
likely to fix this please?

Thanks,   ...Stan

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Linux locks up when handling large data sets

John Foster-3
Hi Stan!

Have you tried using the -mmap option when starting Squeak?  I notice that
according to the squeakvm man page:

  squeak uses a dynamic heap by default with the maximum size  set  to
  75%  of the available virtual memory or 1 gigabyte, whichever is smaller.

Perhaps Windows doesn't have this limit.  Is one instance of Object bigger
than 200 bytes? If so you'd expect it to oom.

Assuming you have 2 gig of virtual memory, try starting the VM with

-mmap  1500m

as an option and see if it goes further.

Note you can temporarily increase your virtual memory by running (as root):

dd if=/dev/zero of=/tmp/swapfile bs=1024 count=1M
mkswap /tmp/swapfile
swapon /tmp/swapfile

This will increase your VM by 1 Gigabyte - note that on an intel machine
your VM is usually restricted to a max of about 2gig (unless you start the
kernel with the right options, and you have a large memory machine, and
you have a kernel built to support over 4gig of memory).  I'm also
assuming you have a spare gig of space in /tmp . This lasts till you
reboot or until you run

     swapoff /tmp/swapfile

as root.

You'll also want to rm that file when you've finished!

I guess that the *nix ports have this restriction for the squeak VM
because of the assumption that apps should play nice and not suck the
system into oblivion!

It might also be instructive to run

     vmstat 10 20

in another terminal while you allocate your insanely large array.  You'll
be able to see if the "freeze" is simply massive disk usage as linux goes
crazy seeking inside the swapfile - you are allocating 100 million
references before you even start to allocate Objects! Read the man page on
vmstat on how to interpret it if you don't know the command.

I hope this helps, and doesn't just confuse...

John


> Hi, I've been testing Squeak's ability to handle large amounts of data.
> This
> snippet:
>
> testArrayFilling
> | startTime endTime iArray jArray kArray |
> iArray := Array ofSize: 100.
> 1
> to: 100
> do: [:i |
> jArray := Array ofSize: 1000.
> 1
> to: 1000
> do: [:j |
> startTime := Time millisecondClockValue.
> kArray := Array ofSize: 1000.
> 1
> to: 1000
> do: [:k | kArray at: k put: Object new].
> jArray at: j put: kArray.
> endTime := Time millisecondClockValue.
> Transcript cr; show: i asString , ',' , j asString , ' ' , (endTime -
> startTime) asFloat asString.
> startTime := Time millisecondClockValue].
> iArray at: i put: jArray].
> Transcript cr; show: 'Finished'
>
> creates about 5 million objects , then the image freezes. When I run on
> the same
> machine under Windows, it happily continues until the short of memory
> warning (
> about 70 Million objects in my case). The VM is 3.7 in both cases, the
> image
> Damien's 3.9 development image.
>
> There seem to be a number of Unix lockup issues. Is a later (or earlier)
> VM
> likely to fix this please?
>
> Thanks,   ...Stan
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Linux locks up when handling large data sets

David T. Lewis
In reply to this post by Stan Shepherd
On Fri, May 02, 2008 at 10:36:36PM +0200, [hidden email] wrote:

> Hi, I've been testing Squeak's ability to handle large amounts of data. This
> snippet:
>
> testArrayFilling
> | startTime endTime iArray jArray kArray |
> iArray := Array ofSize: 100.
> 1
> to: 100
> do: [:i |
> jArray := Array ofSize: 1000.
> 1
> to: 1000
> do: [:j |
> startTime := Time millisecondClockValue.
> kArray := Array ofSize: 1000.
> 1
> to: 1000
> do: [:k | kArray at: k put: Object new].
> jArray at: j put: kArray.
> endTime := Time millisecondClockValue.
> Transcript cr; show: i asString , ',' , j asString , ' ' , (endTime -
> startTime) asFloat asString.
> startTime := Time millisecondClockValue].
> iArray at: i put: jArray].
> Transcript cr; show: 'Finished'
>
> creates about 5 million objects , then the image freezes. When I run on the same
> machine under Windows, it happily continues until the short of memory warning (
> about 70 Million objects in my case). The VM is 3.7 in both cases, the image
> Damien's 3.9 development image.

You are probably just growing your image to the point where the operating
system starts swapping. The image will appear to be unresponsive, but if
you interrupt it with <alt>period, it will eventually wake up and return
control to you.

The Unix VM automatically allocates memory from the operating system
as the Squeak object memory grows. This has the nice property of making
the image just as big as it needs to be without you worrying about it,
but it also means that if you write code that allocates far more memory
that is physically available, the operating system is going to start
thrashing to the point where Squeak becomes unusable. It will not really
be frozen though; if you are patient enough you can still save the image,
install more memory on your computer, and restart it ;-)

[hidden email] gave so good tips on how to control VM memory
allocation for the Unix VM.

I have also found that large images that I make on a Linux box
may refuse to load on Windows, with the Windows VM apparently
unable to allocate enough memory (even though plenty of memory
is available). I'm afraid that I never bothered to figure out
why (possibly error or ignorance on my part) but I mention it
so you won't be surprised if you try to show off your large
data sets on someone else's Windows laptop.

Dave


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Linux locks up when handling large data sets

Stan Shepherd
In reply to this post by John Foster-3
Quoting [hidden email]:

> Hi Stan!
>
> Have you tried using the -mmap option when starting Squeak?  I notice that
> according to the squeakvm man page:
>
>   squeak uses a dynamic heap by default with the maximum size  set  to
>   75%  of the available virtual memory or 1 gigabyte, whichever is smaller.
>
> Perhaps Windows doesn't have this limit.  Is one instance of Object bigger
> than 200 bytes? If so you'd expect it to oom.
>
> Assuming you have 2 gig of virtual memory, try starting the VM with
>
> -mmap  1500m
>
> as an option and see if it goes further.
>
> Note you can temporarily increase your virtual memory by running (as root):
>
> dd if=/dev/zero of=/tmp/swapfile bs=1024 count=1M
> mkswap /tmp/swapfile
> swapon /tmp/swapfile
>
> This will increase your VM by 1 Gigabyte - note that on an intel machine
> your VM is usually restricted to a max of about 2gig (unless you start the
> kernel with the right options, and you have a large memory machine, and
> you have a kernel built to support over 4gig of memory).  I'm also
> assuming you have a spare gig of space in /tmp . This lasts till you
> reboot or until you run
>
>      swapoff /tmp/swapfile
>
> as root.
>
> You'll also want to rm that file when you've finished!
>
> I guess that the *nix ports have this restriction for the squeak VM
> because of the assumption that apps should play nice and not suck the
> system into oblivion!
>
> It might also be instructive to run
>
>      vmstat 10 20
>
> in another terminal while you allocate your insanely large array.  You'll
> be able to see if the "freeze" is simply massive disk usage as linux goes
> crazy seeking inside the swapfile - you are allocating 100 million
> references before you even start to allocate Objects! Read the man page on
> vmstat on how to interpret it if you don't know the command.
>
> I hope this helps, and doesn't just confuse...
>
> John
>

> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>

Hi John, not confusing- an excellent response, thanks.

With the memory option it also cruises on under Linux, until it freezes at 70
million objects.

While it's still loading vmstat shows:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0  32528  38552  22792 758560    0    0   183   520  295 2399 66  9 17  7
 3  0  32528  38420  22800 758568    0    0     0     3  254 1779 92  8  0  0
 4  0  32528  38420  22804 758568    0    0     0     6  234 2100 93  7  0  0
 4  0  32528  38420  22804 758568    0    0     0     0  239 2125 93  7  0  0
 2  0  32528  38420  22808 758568    0    0     0     0  233 1379 95  5  0  0
 2  0  32528  38420  22816 758568    0    0     0     1  251 2255 92  8  0  0
 2  0  32528  38296  22824 758568    0    0     0     3  244 1963 94  6  0  0
 4  0  32528  38296  22824 758568    0    0     0     0  230 1863 93  7  0  0
 5  0  32528  38296  22832 758568    0    0     0     4  234 2108 91  9  0  0
 2  0  32528  38296  22840 758568    0    0     0    12  228 1616 94  6  0  0
 4  0  32528  38296  22840 758568    0    0     0     0  234 2142 93  7  0  0
 2  0  32528  36608  22680 751824    0    0     0     1  228 1639 94  6  0  0
 1  0  32528  36608  22680 751824    0    0     0     7  234 2128 92  8  0  0
 3  0  32528  36484  22680 751824    0    0     0     3  248 2170 92  9  0  0
 2  0  32528  39080  22560 749472    0    0     2     4  231 1610 93  7  0  0
 4  0  32528  35104  22560 749584    0    0     0     0  239 2137 89 11  0  0
 3  0  32528  34636  22560 749584    0    0     0     0  236 2084 92  8  0  0
 3  0  32528  34960  22568 749584    0    0     0     4  233 1898 93  7  0  0
 4  0  32528  38956  22568 749584    0    0     0     1  238 2277 90 10  0  0
 2  0  32528  36724  22568 749584    0    0     0     0  227 1719 94  6  0  0

Without the -mmap option, once the image has frozen, vmstat shows:


 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0  32528  41884  23220 703504    0    0   130   370  272 2402 74  9 12  5
 1  0  32528  41624  23228 703512    0    0     0     4  215 1094 97  3  0  0
 1  0  32528  41628  23228 703512    0    0     0     0  186  703 98  2  0  0
 1  0  32528  41628  23228 703512    0    0     0     0  179  667 97  3  0  0
 1  0  32528  41628  23236 703512    0    0     0     6  180  668 97  3  0  0
 1  0  32528  41628  23244 703512    0    0     0     4  180  667 97  3  0  0
 1  0  32528  41628  23244 703512    0    0     0     0  179  667 98  2  0  0
 1  0  32528  41628  23244 703512    0    0     0     0  204  765 96  4  0  0
 1  0  32528  41628  23244 703512    0    0     0     2  205 1019 97  3  0  0
 1  0  32528  41628  23260 703516    0    0     0     6  180  665 97  3  0  0
 1  0  32528  41628  23268 703516    0    0     0     1  179  658 98  2  0  0
 1  0  32528  41628  23276 703520    0    0     0     6  180  657 97  3  0  0
 1  0  32528  41628  23276 703520    0    0     0     3  180  655 97  3  0  0
 1  0  32528  41628  23276 703520    0    0     0     0  180  654 97  3  0  0
 1  0  32528  41628  23280 703520    0    0     0     0  179  656 97  3  0  0
 1  0  32528  41628  23280 703520    0    0     0     0  179  659 98  3  0  0
 1  0  32528  41628  23288 703520    0    0     0     1  179  653 97  3  0  0
 1  0  32528  41628  23288 703520    0    0     0     0  179  654 98  2  0  0
 1  0  32528  41628  23288 703520    0    0     0     0  179  654 98  2  0  0
 1  0  32528  41628  23296 703520    0    0     0     3  180  658 98  2  0  0.

Nothing obviously different to me.

At least I can work around this as long as I keep sizes moderate.

Thanks again,   Stan
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Linux locks up when handling large data sets

Stan Shepherd
In reply to this post by David T. Lewis
David T. Lewis wrote

You are probably just growing your image to the point where the operating
system starts swapping. The image will appear to be unresponsive, but if
you interrupt it with <alt>period, it will eventually wake up and return
control to you.
David, my image doesn't respond to <alt>period, although it is still using processor. It didn't sort itself out when left overnight.

David T. Lewis wrote
The Unix VM automatically allocates memory from the operating system
as the Squeak object memory grows. This has the nice property of making
the image just as big as it needs to be without you worrying about it,
but it also means that if you write code that allocates far more memory
that is physically available, the operating system is going to start
thrashing to the point where Squeak becomes unusable. It will not really
be frozen though; if you are patient enough you can still save the image,
install more memory on your computer, and restart it ;-)

johnps11@bigpond.com gave so good tips on how to control VM memory
allocation for the Unix VM.
Yes, using John's tips I can handle much larger data sets; I posted the results to his post.

David T. Lewis wrote
I have also found that large images that I make on a Linux box
may refuse to load on Windows, with the Windows VM apparently
unable to allocate enough memory (even though plenty of memory
is available). I'm afraid that I never bothered to figure out
why (possibly error or ignorance on my part) but I mention it
so you won't be surprised if you try to show off your large
data sets on someone else's Windows laptop.

Dave
Interesting, I might try this too to see what happens.

Thanks for your reply,    Stan
Reply | Threaded
Open this post in threaded view
|

Re: Linux locks up when handling large data sets

John Foster-3
In reply to this post by Stan Shepherd
> Quoting [hidden email]:
>
>> Hi Stan!
<snipped large post>

>
> Hi John, not confusing- an excellent response, thanks.
>
> With the memory option it also cruises on under Linux, until it freezes at
> 70
> million objects.
>
> While it's still loading vmstat shows:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  2  0  32528  38552  22792 758560    0    0   183   520  295 2399 66  9 17
>  7
>  3  0  32528  38420  22800 758568    0    0     0     3  254 1779 92  8  0
>  0
>  4  0  32528  38420  22804 758568    0    0     0     6  234 2100 93  7  0
>  0
>  4  0  32528  38420  22804 758568    0    0     0     0  239 2125 93  7  0
>  0
>  2  0  32528  38420  22808 758568    0    0     0     0  233 1379 95  5  0
>  0
>  2  0  32528  38420  22816 758568    0    0     0     1  251 2255 92  8  0
>  0
>  2  0  32528  38296  22824 758568    0    0     0     3  244 1963 94  6  0
>  0
>  4  0  32528  38296  22824 758568    0    0     0     0  230 1863 93  7  0
>  0
>  5  0  32528  38296  22832 758568    0    0     0     4  234 2108 91  9  0
>  0
>  2  0  32528  38296  22840 758568    0    0     0    12  228 1616 94  6  0
>  0
>  4  0  32528  38296  22840 758568    0    0     0     0  234 2142 93  7  0
>  0
>  2  0  32528  36608  22680 751824    0    0     0     1  228 1639 94  6  0
>  0
>  1  0  32528  36608  22680 751824    0    0     0     7  234 2128 92  8  0
>  0
>  3  0  32528  36484  22680 751824    0    0     0     3  248 2170 92  9  0
>  0
>  2  0  32528  39080  22560 749472    0    0     2     4  231 1610 93  7  0
>  0
>  4  0  32528  35104  22560 749584    0    0     0     0  239 2137 89 11  0
>  0
>  3  0  32528  34636  22560 749584    0    0     0     0  236 2084 92  8  0
>  0
>  3  0  32528  34960  22568 749584    0    0     0     4  233 1898 93  7  0
>  0
>  4  0  32528  38956  22568 749584    0    0     0     1  238 2277 90 10  0
>  0
>  2  0  32528  36724  22568 749584    0    0     0     0  227 1719 94  6  0
>  0
>
> Without the -mmap option, once the image has frozen, vmstat shows:
>
>
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  1  0  32528  41884  23220 703504    0    0   130   370  272 2402 74  9 12
>  5
>  1  0  32528  41624  23228 703512    0    0     0     4  215 1094 97  3  0
>  0
>  1  0  32528  41628  23228 703512    0    0     0     0  186  703 98  2  0
>  0
>  1  0  32528  41628  23228 703512    0    0     0     0  179  667 97  3  0
>  0
>  1  0  32528  41628  23236 703512    0    0     0     6  180  668 97  3  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     4  180  667 97  3  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     0  179  667 98  2  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     0  204  765 96  4  0
>  0
>  1  0  32528  41628  23244 703512    0    0     0     2  205 1019 97  3  0
>  0
>  1  0  32528  41628  23260 703516    0    0     0     6  180  665 97  3  0
>  0
>  1  0  32528  41628  23268 703516    0    0     0     1  179  658 98  2  0
>  0
>  1  0  32528  41628  23276 703520    0    0     0     6  180  657 97  3  0
>  0
>  1  0  32528  41628  23276 703520    0    0     0     3  180  655 97  3  0
>  0
>  1  0  32528  41628  23276 703520    0    0     0     0  180  654 97  3  0
>  0
>  1  0  32528  41628  23280 703520    0    0     0     0  179  656 97  3  0
>  0
>  1  0  32528  41628  23280 703520    0    0     0     0  179  659 98  3  0
>  0
>  1  0  32528  41628  23288 703520    0    0     0     1  179  653 97  3  0
>  0
>  1  0  32528  41628  23288 703520    0    0     0     0  179  654 98  2  0
>  0
>  1  0  32528  41628  23288 703520    0    0     0     0  179  654 98  2  0
>  0
>  1  0  32528  41628  23296 703520    0    0     0     3  180  658 98  2  0
>  0.
>
> Nothing obviously different to me.
>
> At least I can work around this as long as I keep sizes moderate.
>
> Thanks again,   Stan

Hi Stan!

What vmstat seems to suggest is that the issue isn't linux going mad
swapping.  Perhaps there's a hard limit to the number of references the VM
can hold, under linux or Windows.  I also notice that there is the same
amount of stuff swapped out in both traces, and no swap activity (si and
so are both zero - you always ignore the first line of output from vmstat
as it's the activity since boot).

The other odd thing is your total swapped column never changes.  How much
virtual memory do you have? The output of

    free

would tell you.

The drop in the number of context switches when it's locked up suggests
that the CPU is completely busy in the squeak VM.  Maybe the garbage
collector gets in a bit of a tiz when abused - is there some way to easily
trace what the squeak GC is doing?  Perhaps some of the people who are
knowledgeable about the deep internals of the VM could shed light on how
to dynamically trace the time spent in the GC thread versus the user space
thread in the VM, I know I found a process manager in squeak once and saw
the garbage collector in there, but I cant recall how I found it or if it
showed how much time was being spent in each "squeak VM thread".

Yours,

John

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners