I'm testing a bit of code intended for analysing Apache Logs and hitting a memory limit. The log I'm using has a million lines with 18 fields each, so it's not completely trivial (I've already tested it on a 300 line snippet) but it is still an extract of the full-sized logs I want to run it on.
When the code runs I initially get a load of "Global garbage collection... done" "Global garbage collection... done, heap grown" "Global garbage collection... done, heap grown" "Global garbage collection... done, heap grown" "Global garbage collection... done, heap grown" "Global garbage collection... done, heap compacted" "Global garbage collection... done, heap grown" Which is expected, but once gst has about a gig of RAM allocated to it I get [Memory allocation failure] Can't allocate enough memory to continue. done" "Global garbage collection... which repeats over and over (I think once per line in the log file from that point on). pidstat on gst at that point gives me 11:01:36 UID PID minflt/s majflt/s VSZ RSS %MEM Command 11:01:36 1000 18064 0.06 0.00 989144 813120 9.96 gst The test box has 8G of RAM and the system I want to run it on has 680GB of RAM, so it would be nice to be able to use it. Is there some hard-coded limit in the source which I can lift or am I stuck? Or just missing something? Thanks, Thomas |
I'm new to GNU st so I don't know the answer, but when reading your message, some questions arise. Actually first of all I wonder myself whether GNU gst 3.2.91 is the latest version, and whether any development is still going on ? (version from 2015 https://alpha.gnu.org/gnu/smalltalk/) How exactly was GNU st configured and built ? What's the version and operating system ? Did you build a 32bit or a 64bit gst ? For example: #smalltalk-3.2.91$ file gst gst: ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, not stripped, no debugging information available Was 'gmake check' reporting any failed checks ? David Stes ----- Op 21 sep 2020 om 12:19 schreef Thomas Worthington [hidden email]: > I'm testing a bit of code intended for analysing Apache Logs and hitting a > memory limit. The log I'm using has a million lines with 18 fields each, so > it's not completely trivial (I've already tested it on a 300 line snippet) but > it is still an extract of the full-sized logs I want to run it on. > > When the code runs I initially get a load of > > "Global garbage collection... done" > "Global garbage collection... done, heap grown" > "Global garbage collection... done, heap grown" > "Global garbage collection... done, heap grown" > "Global garbage collection... done, heap grown" > "Global garbage collection... done, heap compacted" > "Global garbage collection... done, heap grown" > > Which is expected, but once gst has about a gig of RAM allocated to it I get > > [Memory allocation failure] > Can't allocate enough memory to continue. > done" > "Global garbage collection... > > which repeats over and over (I think once per line in the log file from that > point on). pidstat on gst at that point gives me > > 11:01:36 UID PID minflt/s majflt/s VSZ RSS %MEM Command > 11:01:36 1000 18064 0.06 0.00 989144 813120 9.96 gst > > The test box has 8G of RAM and the system I want to run it on has 680GB of RAM, > so it would be nice to be able to use it. Is there some hard-coded limit in the > source which I can lift or am I stuck? Or just missing something? > > Thanks, > Thomas |
I'm on the master branch (updated Sat Mar 10 06:12:43 2018) and I built using
configure make So I assume that it picked up the fact that this is a 64bit machine (Gentoo Linux). All the checks pass. "file" gives ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, with debug_info, not stripped Thomas [hidden email] writes: > I'm new to GNU st so I don't know the answer, but when reading your message, some questions arise. > > Actually first of all I wonder myself whether GNU gst 3.2.91 is the latest version, > and whether any development is still going on ? > > (version from 2015 https://alpha.gnu.org/gnu/smalltalk/) > > How exactly was GNU st configured and built ? > > What's the version and operating system ? > > Did you build a 32bit or a 64bit gst ? > > For example: > #smalltalk-3.2.91$ file gst > gst: ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, not stripped, no debugging information available > > Was 'gmake check' reporting any failed checks ? > > David Stes > > ----- Op 21 sep 2020 om 12:19 schreef Thomas Worthington [hidden email]: > >> I'm testing a bit of code intended for analysing Apache Logs and hitting a >> memory limit. The log I'm using has a million lines with 18 fields each, so >> it's not completely trivial (I've already tested it on a 300 line snippet) but >> it is still an extract of the full-sized logs I want to run it on. >> >> When the code runs I initially get a load of >> >> "Global garbage collection... done" >> "Global garbage collection... done, heap grown" >> "Global garbage collection... done, heap grown" >> "Global garbage collection... done, heap grown" >> "Global garbage collection... done, heap grown" >> "Global garbage collection... done, heap compacted" >> "Global garbage collection... done, heap grown" >> >> Which is expected, but once gst has about a gig of RAM allocated to it I get >> >> [Memory allocation failure] >> Can't allocate enough memory to continue. >> done" >> "Global garbage collection... >> >> which repeats over and over (I think once per line in the log file from that >> point on). pidstat on gst at that point gives me >> >> 11:01:36 UID PID minflt/s majflt/s VSZ RSS %MEM Command >> 11:01:36 1000 18064 0.06 0.00 989144 813120 9.96 gst >> >> The test box has 8G of RAM and the system I want to run it on has 680GB of RAM, >> so it would be nice to be able to use it. Is there some hard-coded limit in the >> source which I can lift or am I stuck? Or just missing something? >> >> Thanks, >> Thomas |
In reply to this post by Thomas Worthington
Thomas Worthington writes: > > The test box has 8G of RAM and the system I want to run it on has 680GB of RAM, so it would be nice to be able to use it. Is there some hard-coded limit in the source which I can lift or am I stuck? Or just missing something? Are you using a lot of objects? GNU smalltalk has a OOP_TABLE which is limited in size by default: libgst/oop.h: /* The number of OOPs in the system. This is exclusive of Character, True, False, and UndefinedObject (nil) oops, which are built-ins. */ #define INITIAL_OOP_TABLE_SIZE (1024 * 128 + BUILTIN_OBJECT_BASE) #define MAX_OOP_TABLE_SIZE (1 << 23) So the OOP_TABLE has an initial size of 128K items and will only grow as high as about 8M items. The limit can be lifted by editing the source but I am not sure what else could be the next limit. On top of that, the generational GC engine that GNU Smalltalk does not handle very large object very well. Anyway, if you really have to use more than a few millions of objects or multiple gigabytes of memory for your application, GNU Smalltalk is not the best tool. Derek |
In reply to this post by Thomas Worthington
OK, so you have built a 64-bit executable. The autoconf/configure script is perhaps doing the right thing. The reason I asked is because a 1GB limit could point towards some 32bit limitation, just a guess. The documentation https://www.gnu.org/software/smalltalk/manual/html_node/GC.html says: "by default, GNU Smalltalk uses 420=300+60*2 kilobytes of memory, while a simpler configuration would use 720=360*2 kilobytes" So that documentation seems to point to some 'default' that perhaps can be tuned. Another issue: I am not very much familiar with GNU st, but I've noticed that there exists an option: ./configure --disable-generational-gc to disable generational garbage collection. It's a good opportunity to ask whether anyone can explain / provide some documentation what this option is doing. The above documentation explains that there exists something like a "copying garbage collector", so perhaps by using --disable-generational-gc a different kind of garbage collection algorithm is used. For example if I configure like that and run the 'torture.st' test The test that is in smalltalk-3.2.91/unsupported/torture.st Then if I run that test, it still prints "Global garbage collection... done, heap grown" just as in your initial message (despite the fact that I specified --disable-generational-gc) Perhaps it is worth a try to try with and without --disable-generational-gc This is because you're on a test system anyway ... David Stes ----- Op 21 sep 2020 om 15:24 schreef Thomas Worthington [hidden email]: > I'm on the master branch (updated Sat Mar 10 06:12:43 2018) and I built using > > configure > make > > So I assume that it picked up the fact that this is a 64bit machine (Gentoo > Linux). All the checks pass. > > "file" gives > > ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, > interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, with debug_info, > not stripped > > Thomas > > [hidden email] writes: > >> I'm new to GNU st so I don't know the answer, but when reading your message, >> some questions arise. >> >> Actually first of all I wonder myself whether GNU gst 3.2.91 is the latest >> version, >> and whether any development is still going on ? >> >> (version from 2015 https://alpha.gnu.org/gnu/smalltalk/) >> >> How exactly was GNU st configured and built ? >> >> What's the version and operating system ? >> >> Did you build a 32bit or a 64bit gst ? >> >> For example: >> #smalltalk-3.2.91$ file gst >> gst: ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, not >> stripped, no debugging information available >> >> Was 'gmake check' reporting any failed checks ? >> >> David Stes >> >> ----- Op 21 sep 2020 om 12:19 schreef Thomas Worthington >> [hidden email]: >> >>> I'm testing a bit of code intended for analysing Apache Logs and hitting a >>> memory limit. The log I'm using has a million lines with 18 fields each, so >>> it's not completely trivial (I've already tested it on a 300 line snippet) but >>> it is still an extract of the full-sized logs I want to run it on. >>> >>> When the code runs I initially get a load of >>> >>> "Global garbage collection... done" >>> "Global garbage collection... done, heap grown" >>> "Global garbage collection... done, heap grown" >>> "Global garbage collection... done, heap grown" >>> "Global garbage collection... done, heap grown" >>> "Global garbage collection... done, heap compacted" >>> "Global garbage collection... done, heap grown" >>> >>> Which is expected, but once gst has about a gig of RAM allocated to it I get >>> >>> [Memory allocation failure] >>> Can't allocate enough memory to continue. >>> done" >>> "Global garbage collection... >>> >>> which repeats over and over (I think once per line in the log file from that >>> point on). pidstat on gst at that point gives me >>> >>> 11:01:36 UID PID minflt/s majflt/s VSZ RSS %MEM Command >>> 11:01:36 1000 18064 0.06 0.00 989144 813120 9.96 gst >>> >>> The test box has 8G of RAM and the system I want to run it on has 680GB of RAM, >>> so it would be nice to be able to use it. Is there some hard-coded limit in the >>> source which I can lift or am I stuck? Or just missing something? >>> >>> Thanks, > >> Thomas |
In reply to this post by Thomas Worthington
I've just tested myself with a binary compiled with --disable-generational-gc. It does not help probably, as I can reproduce your problem also with --disable-generational-gc. # cat memory.st "by David Stes - test to measure memory size (RSS and/or SIZE) of gst process" | d a b | a := 100000. b := 100000. d := OrderedCollection new. 1 to: a do: [ :i | 1 to: b do: [ :j | d add: Object new ] ] ! results in [Memory allocation failure] Can't allocate enough memory to continue. in my case it only seems to have consumed 166 MB of memory 26589 stes 166M 161M stop 0 0 0:00:31 30% gst/1 |
In reply to this post by stes
> The reason I asked is because a 1GB limit could point towards some 32bit > limitation, just a guess. Inspection of the source code shows some use of 'int' type. The 'int' type is 4 bytes, even in 64 bit executables whereas 'long' can be 8 bytes or 4 bytes, depending on compiled in 32bit or 64bit mode. So the use of the 'int' type may impose in some cases some kind of 32bit limitations; it is an indirect limitation as the counter refers to the number of objects, not to the actual memory size. So basically what you have to find out is whether anyone is using GNU st in 64 mode with large numbers of objects. It is perhaps not so difficult after all to remove those usages of 'int' and the resulting limitations. Also there is (almost non-existing) possibility that it is different on your platform, but extremely unlikely: $ cc -m32 sizeofint.c $ ./a.out sizeof(int) is 4 $ cc -m64 sizeofint.c $ ./a.out sizeof(int) is 4 $ cat sizeofint.c #include <stdio.h> int main(int argc,char *argv[]) { int i = 1 << 23; printf("sizeof(int) is %d\n",sizeof(int)); } |
I fiddled with the value of MAX_OOP_TABLE_SIZE and was able to get further through the original log file but I hit another barrier at about 2GB, which may be the int limitation as int is signed.
I redid the code slightly to try in Squeak and it worked fine, actually taking up less memory for the whole file whereas I was hitting the limits on gst at about the halfway mark. It was faster too but I hadn't tried the jit in gst so the comparison doesn't mean much. I confirmed that an int was 4-bytes here too, BTW. Thanks for the help. Thomas [hidden email] writes: >> The reason I asked is because a 1GB limit could point towards some 32bit >> limitation, just a guess. > > Inspection of the source code shows some use of 'int' type. > > The 'int' type is 4 bytes, even in 64 bit executables whereas 'long' can be 8 bytes or 4 bytes, > depending on compiled in 32bit or 64bit mode. > > So the use of the 'int' type may impose in some cases some kind of 32bit limitations; > it is an indirect limitation as the counter refers to the number of objects, not to the actual memory size. > > So basically what you have to find out is whether anyone is using GNU st in 64 mode with large numbers of objects. > > It is perhaps not so difficult after all to remove those usages of 'int' and the resulting limitations. > > Also there is (almost non-existing) possibility that it is different on your platform, but extremely unlikely: > > $ cc -m32 sizeofint.c > $ ./a.out > sizeof(int) is 4 > > $ cc -m64 sizeofint.c > $ ./a.out > sizeof(int) is 4 > > $ cat sizeofint.c > > #include <stdio.h> > > int main(int argc,char *argv[]) > { > int i = 1 << 23; > > printf("sizeof(int) is %d\n",sizeof(int)); > } |
Free forum by Nabble | Edit this page |