Memory limits?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory limits?

Thomas Worthington
I'm testing a bit of code intended for analysing Apache Logs and hitting a memory limit. The log I'm using has a million lines with 18 fields each, so it's not completely trivial (I've already tested it on a 300 line snippet) but it is still an extract of the full-sized logs I want to run it on.

When the code runs I initially get a load of

"Global garbage collection... done"
"Global garbage collection... done, heap grown"
"Global garbage collection... done, heap grown"
"Global garbage collection... done, heap grown"
"Global garbage collection... done, heap grown"
"Global garbage collection... done, heap compacted"
"Global garbage collection... done, heap grown"

Which is expected, but once gst has about a gig of RAM allocated to it I get

[Memory allocation failure]
Can't allocate enough memory to continue.
done"
"Global garbage collection...

which repeats over and over (I think once per line in the log file from that point on). pidstat on gst at that point gives me

11:01:36      UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
11:01:36     1000     18064      0.06      0.00  989144  813120   9.96  gst

The test box has 8G of RAM and the system I want to run it on has 680GB of RAM, so it would be nice to be able to use it. Is there some hard-coded limit in the source which I can lift or am I stuck? Or just missing something?

Thanks,
Thomas

Reply | Threaded
Open this post in threaded view
|

Re: Memory limits?

stes

I'm new to GNU st so I don't know the answer, but when reading your message, some questions arise.

Actually first of all I wonder myself whether GNU gst 3.2.91 is the latest version,
and whether any development is still going on ?

(version from 2015 https://alpha.gnu.org/gnu/smalltalk/)

How exactly was GNU st configured and built ?

What's the version and operating system ?

Did you build a 32bit or a 64bit gst ?

For example:
#smalltalk-3.2.91$ file gst
gst:      ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, not stripped, no debugging information available

Was 'gmake check' reporting any failed checks ?

David Stes

----- Op 21 sep 2020 om 12:19 schreef Thomas Worthington [hidden email]:

> I'm testing a bit of code intended for analysing Apache Logs and hitting a
> memory limit. The log I'm using has a million lines with 18 fields each, so
> it's not completely trivial (I've already tested it on a 300 line snippet) but
> it is still an extract of the full-sized logs I want to run it on.
>
> When the code runs I initially get a load of
>
> "Global garbage collection... done"
> "Global garbage collection... done, heap grown"
> "Global garbage collection... done, heap grown"
> "Global garbage collection... done, heap grown"
> "Global garbage collection... done, heap grown"
> "Global garbage collection... done, heap compacted"
> "Global garbage collection... done, heap grown"
>
> Which is expected, but once gst has about a gig of RAM allocated to it I get
>
> [Memory allocation failure]
> Can't allocate enough memory to continue.
> done"
> "Global garbage collection...
>
> which repeats over and over (I think once per line in the log file from that
> point on). pidstat on gst at that point gives me
>
> 11:01:36      UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
> 11:01:36     1000     18064      0.06      0.00  989144  813120   9.96  gst
>
> The test box has 8G of RAM and the system I want to run it on has 680GB of RAM,
> so it would be nice to be able to use it. Is there some hard-coded limit in the
> source which I can lift or am I stuck? Or just missing something?
>
> Thanks,
> Thomas

Reply | Threaded
Open this post in threaded view
|

Re: Memory limits?

Thomas Worthington
I'm on the master branch (updated Sat Mar 10 06:12:43 2018) and I built using

configure
make

So I assume that it picked up the fact that this is a 64bit machine (Gentoo Linux). All the checks pass.

"file" gives

ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, with debug_info, not stripped

Thomas

[hidden email] writes:

> I'm new to GNU st so I don't know the answer, but when reading your message, some questions arise.
>
> Actually first of all I wonder myself whether GNU gst 3.2.91 is the latest version,
> and whether any development is still going on ?
>
> (version from 2015 https://alpha.gnu.org/gnu/smalltalk/)
>
> How exactly was GNU st configured and built ?
>
> What's the version and operating system ?
>
> Did you build a 32bit or a 64bit gst ?
>
> For example:
> #smalltalk-3.2.91$ file gst
> gst:      ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, not stripped, no debugging information available
>
> Was 'gmake check' reporting any failed checks ?
>
> David Stes
>
> ----- Op 21 sep 2020 om 12:19 schreef Thomas Worthington [hidden email]:
>
>> I'm testing a bit of code intended for analysing Apache Logs and hitting a
>> memory limit. The log I'm using has a million lines with 18 fields each, so
>> it's not completely trivial (I've already tested it on a 300 line snippet) but
>> it is still an extract of the full-sized logs I want to run it on.
>>
>> When the code runs I initially get a load of
>>
>> "Global garbage collection... done"
>> "Global garbage collection... done, heap grown"
>> "Global garbage collection... done, heap grown"
>> "Global garbage collection... done, heap grown"
>> "Global garbage collection... done, heap grown"
>> "Global garbage collection... done, heap compacted"
>> "Global garbage collection... done, heap grown"
>>
>> Which is expected, but once gst has about a gig of RAM allocated to it I get
>>
>> [Memory allocation failure]
>> Can't allocate enough memory to continue.
>> done"
>> "Global garbage collection...
>>
>> which repeats over and over (I think once per line in the log file from that
>> point on). pidstat on gst at that point gives me
>>
>> 11:01:36      UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
>> 11:01:36     1000     18064      0.06      0.00  989144  813120   9.96  gst
>>
>> The test box has 8G of RAM and the system I want to run it on has 680GB of RAM,
>> so it would be nice to be able to use it. Is there some hard-coded limit in the
>> source which I can lift or am I stuck? Or just missing something?
>>
>> Thanks,
>> Thomas


Reply | Threaded
Open this post in threaded view
|

Re: Memory limits?

Gnu mailing list
In reply to this post by Thomas Worthington

Thomas Worthington writes:
>
> The test box has 8G of RAM and the system I want to run it on has 680GB of RAM, so it would be nice to be able to use it. Is there some hard-coded limit in the source which I can lift or am I stuck? Or just missing something?

Are you using a lot of objects? GNU smalltalk has a OOP_TABLE which is
limited in size by default:

libgst/oop.h:

/* The number of OOPs in the system.  This is exclusive of Character,
   True, False, and UndefinedObject (nil) oops, which are
   built-ins.  */
#define INITIAL_OOP_TABLE_SIZE  (1024 * 128 + BUILTIN_OBJECT_BASE)
#define MAX_OOP_TABLE_SIZE      (1 << 23)

So the OOP_TABLE has an initial size of 128K items and will only grow as
high as about 8M items. The limit can be lifted by editing the source
but I am not sure what else could be the next limit.

On top of that, the generational GC engine that GNU Smalltalk does not
handle very large object very well.

Anyway, if you really have to use more than a few millions of objects or
multiple gigabytes of memory for your application, GNU Smalltalk is not
the best tool.

Derek


Reply | Threaded
Open this post in threaded view
|

Re: Memory limits?

stes
In reply to this post by Thomas Worthington

OK, so you have built a 64-bit executable.

The autoconf/configure script is perhaps doing the right thing.

The reason I asked is because a 1GB limit could point towards some 32bit limitation, just a guess.

The documentation
https://www.gnu.org/software/smalltalk/manual/html_node/GC.html

says:

"by default, GNU Smalltalk uses 420=300+60*2 kilobytes of memory, while a simpler configuration would use 720=360*2 kilobytes"

So that documentation seems to point to some 'default' that perhaps can be tuned.

Another issue: I am not very much familiar with GNU st, but I've noticed that there exists an option:

    ./configure --disable-generational-gc

to disable generational garbage collection.

It's a good opportunity to ask whether anyone can explain / provide some documentation what this option is doing.

The above documentation explains that there exists something like a "copying garbage collector",
so perhaps by using --disable-generational-gc a different kind of garbage collection algorithm is used.


For example if I configure like that and run the 'torture.st' test

The test that is in smalltalk-3.2.91/unsupported/torture.st

Then if I run that test, it still prints

"Global garbage collection... done, heap grown"

just as in your initial message (despite the fact that I specified --disable-generational-gc)

Perhaps it is worth a try to try with and without  --disable-generational-gc

This is because you're on a test system anyway ...


David Stes


----- Op 21 sep 2020 om 15:24 schreef Thomas Worthington [hidden email]:

> I'm on the master branch (updated Sat Mar 10 06:12:43 2018) and I built using
>
> configure
> make
>
> So I assume that it picked up the fact that this is a 64bit machine (Gentoo
> Linux). All the checks pass.
>
> "file" gives
>
> ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked,
> interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, with debug_info,
> not stripped
>
> Thomas
>
> [hidden email] writes:
>
>> I'm new to GNU st so I don't know the answer, but when reading your message,
>> some questions arise.
>>
>> Actually first of all I wonder myself whether GNU gst 3.2.91 is the latest
>> version,
>> and whether any development is still going on ?
>>
>> (version from 2015 https://alpha.gnu.org/gnu/smalltalk/)
>>
>> How exactly was GNU st configured and built ?
>>
>> What's the version and operating system ?
>>
>> Did you build a 32bit or a 64bit gst ?
>>
>> For example:
>> #smalltalk-3.2.91$ file gst
>> gst:      ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, not
>> stripped, no debugging information available
>>
>> Was 'gmake check' reporting any failed checks ?
>>
>> David Stes
>>
>> ----- Op 21 sep 2020 om 12:19 schreef Thomas Worthington
>> [hidden email]:
>>
>>> I'm testing a bit of code intended for analysing Apache Logs and hitting a
>>> memory limit. The log I'm using has a million lines with 18 fields each, so
>>> it's not completely trivial (I've already tested it on a 300 line snippet) but
>>> it is still an extract of the full-sized logs I want to run it on.
>>>
>>> When the code runs I initially get a load of
>>>
>>> "Global garbage collection... done"
>>> "Global garbage collection... done, heap grown"
>>> "Global garbage collection... done, heap grown"
>>> "Global garbage collection... done, heap grown"
>>> "Global garbage collection... done, heap grown"
>>> "Global garbage collection... done, heap compacted"
>>> "Global garbage collection... done, heap grown"
>>>
>>> Which is expected, but once gst has about a gig of RAM allocated to it I get
>>>
>>> [Memory allocation failure]
>>> Can't allocate enough memory to continue.
>>> done"
>>> "Global garbage collection...
>>>
>>> which repeats over and over (I think once per line in the log file from that
>>> point on). pidstat on gst at that point gives me
>>>
>>> 11:01:36      UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
>>> 11:01:36     1000     18064      0.06      0.00  989144  813120   9.96  gst
>>>
>>> The test box has 8G of RAM and the system I want to run it on has 680GB of RAM,
>>> so it would be nice to be able to use it. Is there some hard-coded limit in the
>>> source which I can lift or am I stuck? Or just missing something?
>>>
>>> Thanks,
> >> Thomas

Reply | Threaded
Open this post in threaded view
|

Re: Memory limits?

stes
In reply to this post by Thomas Worthington

I've just tested myself with a binary compiled with --disable-generational-gc.

It does not help probably, as I can reproduce your problem also with --disable-generational-gc.

# cat memory.st
"by David Stes -

test to measure memory size (RSS and/or SIZE) of gst process"

| d a b |

a := 100000.
b := 100000.

d := OrderedCollection new.
1 to: a do: [ :i |
  1 to: b do: [ :j |
   d add: Object new
 ]
] !

results in

[Memory allocation failure]
Can't allocate enough memory to continue.

in my case it only seems to have consumed 166 MB of memory

 26589 stes      166M  161M stop     0    0   0:00:31  30% gst/1




Reply | Threaded
Open this post in threaded view
|

Re: Memory limits?

stes
In reply to this post by stes

> The reason I asked is because a 1GB limit could point towards some 32bit
> limitation, just a guess.

Inspection of the source code shows some use of 'int' type.

The 'int' type is 4 bytes, even in 64 bit executables whereas 'long' can be 8 bytes or 4 bytes,
depending on compiled in 32bit or 64bit mode.

So the use of the 'int' type may impose in some cases some kind of 32bit limitations;
it is an indirect limitation as the counter refers to the number of objects, not to the actual memory size.

So basically what you have to find out is whether anyone is using GNU st in 64 mode with large numbers of objects.

It is perhaps not so difficult after all to remove those usages of 'int' and the resulting limitations.

Also there is (almost non-existing) possibility that it is different on your platform, but extremely unlikely:

$ cc -m32 sizeofint.c
$ ./a.out
sizeof(int) is 4

$ cc -m64 sizeofint.c
$ ./a.out
sizeof(int) is 4

$ cat sizeofint.c  

#include <stdio.h>

int main(int argc,char *argv[])
{
  int i = 1 << 23;
 
  printf("sizeof(int) is %d\n",sizeof(int));
}

Reply | Threaded
Open this post in threaded view
|

Re: Memory limits?

Thomas Worthington
I fiddled with the value of MAX_OOP_TABLE_SIZE and was able to get further through the original log file but I hit another barrier at about 2GB, which may be the int limitation as int is signed.

I redid the code slightly to try in Squeak and it worked fine, actually taking up less memory for the whole file whereas I was hitting the limits on gst at about the halfway mark. It was faster too but I hadn't tried the jit in gst so the comparison doesn't mean much.

I confirmed that an int was 4-bytes here too, BTW.

Thanks for the help.

Thomas


[hidden email] writes:

>> The reason I asked is because a 1GB limit could point towards some 32bit
>> limitation, just a guess.
>
> Inspection of the source code shows some use of 'int' type.
>
> The 'int' type is 4 bytes, even in 64 bit executables whereas 'long' can be 8 bytes or 4 bytes,
> depending on compiled in 32bit or 64bit mode.
>
> So the use of the 'int' type may impose in some cases some kind of 32bit limitations;
> it is an indirect limitation as the counter refers to the number of objects, not to the actual memory size.
>
> So basically what you have to find out is whether anyone is using GNU st in 64 mode with large numbers of objects.
>
> It is perhaps not so difficult after all to remove those usages of 'int' and the resulting limitations.
>
> Also there is (almost non-existing) possibility that it is different on your platform, but extremely unlikely:
>
> $ cc -m32 sizeofint.c
> $ ./a.out
> sizeof(int) is 4
>
> $ cc -m64 sizeofint.c
> $ ./a.out
> sizeof(int) is 4
>
> $ cat sizeofint.c
>
> #include <stdio.h>
>
> int main(int argc,char *argv[])
> {
>   int i = 1 << 23;
>
>   printf("sizeof(int) is %d\n",sizeof(int));
> }