Reasons for Memory panic interrupt ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Reasons for Memory panic interrupt ?

Henrik Høyer
On one of our customers we are experiencing "Memory panic interrupt" on their Windows 2008 R2 Citrix servers. The app is running fine on their 2003 Citrix Servers and their Windows XP and Windows 7 workstations.

The memory panic appears way below the VM memory size settings, and there are plenty of available memory on the servers.

Will the VM trigger a memory panic operation if other resources then "plain memory" is exhausted ?


--
Henrik Høyer
Chief Software Architect
[hidden email] * (+45) 4029 2092
Marievej 15 * 4600 Køge
www.sPeople.dk * (+45) 7023 7775

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for Memory panic interrupt ?

Todor Todorov
I guess it has to do with memory allocation. Remember, it reserves and commits memory at different times. For example a GC can trigger this. Also, weak stuff uses more memory, because it needs to remember the weak (rescued) array.

As a side note, I looked at the VirtualAlloc function documentation (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx). It says that one must call FlushInstructionCache when code is written to executable memory. It says: "When creating a region that will be executable, the calling program bears responsibility for ensuring cache coherency via an appropriate call to FlushInstructionCache once the code has been set in place. Otherwise attempts to execute code out of the newly executable region may produce unpredictable results."

I guess the VM is soooooooo old, that it doesn't know of the FlushInstructionCache function and does not call it. Probably this gives "unpredictable results" on multi-processor (not multi-core) systems. Does your new Citrix have more physical processors than the old? BTW, this will trash the JIT cache, not end in the interrupt you described.



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Henrik Høyer
Sent: 9. februar 2012 13:29
To: [hidden email]
Subject: Reasons for Memory panic interrupt ?

On one of our customers we are experiencing "Memory panic interrupt" on their Windows 2008 R2 Citrix servers. The app is running fine on their 2003 Citrix Servers and their Windows XP and Windows 7 workstations.

The memory panic appears way below the VM memory size settings, and there are plenty of available memory on the servers.

Will the VM trigger a memory panic operation if other resources then "plain memory" is exhausted ?


--
Henrik Høyer
Chief Software Architect
[hidden email] * (+45) 4029 2092
Marievej 15 * 4600 Køge
www.sPeople.dk * (+45) 7023 7775

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for Memory panic interrupt ?

Andreas Rosenberg
I don't think this problem is related to FlushInstructionCache.

I think it is related to the "MMU page size" http://en.wikipedia.org/wiki/Page_(computer_memory)

If the VM reports out of memory, there could be several reasons:
 - no memory left in the memory pool
 - no more free entries in the page table (but still plenty of memory left)
 - trying to commit memory, that already has been committed

Here is a quote from a website explaining it a bit more:
"
Page Table Entries (PTEs) are an important Windows resource. In fact a PTE shortage can cause Windows to become so instable that the
operating system crashes. There is even a Performance Monitor counter (Memory \ Free System Page Table Entries) that you can use to
see how many PTEs Windows has available. The generally accepted rule is that if the number of available PTEs drop below 7,000 then
the system's stability becomes questionable.

The reason that PTEs are such a big deal is because Windows reserves a very limited amount of space for PTEs. A 32-bit version of
Windows Server 2003 only allocates about 900 MB of space for PTEs by default. If you use the /3GB switch in the BOOT.INI file to
allocate more address space to user mode processes, they you severely decrease the number of available PTEs. To put it simply, in
the 32-bit version of Windows Server 2003, PTE space is a scarce and valuable commodity. The more processes that run on a server,
the more PTE space is consumed. This is especially problematic in terminal server environments since the server must run its own
processes and a full set of processes for each user session.
"

Maybe this server is still running a 32 bit OS, but uses PAE to address more
memory.

The VM is designed for a 4K page size. Because even modern CPUs have
a rather low size for their page table, the page size is being increased
to address more memory.

As far as I know, the VM code expects pages to be 4K, but the OS
does round commit requests to the next page boundary. So it could
be that the OS uses 2MB pages which may conflict with the way the
VM manages the committed memory. So virtual allocs may fail because
the VM tries to commit memory twice.

Regards,
  Andreas

Andreas Rosenberg | eMail: [hidden email]
APIS GmbH         | Phone: +49 9482 9415-0
Im Haslet 42      | Fax: +49 9482 9415-55
93086 Wörth/D     | WWW: <http://www.apis.de/>
Germany           | <http://www.fmea.de/>




-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]]On Behalf Of Todor Todorov
Sent: Donnerstag, 9. Februar 2012 14:47
To: [hidden email]
Subject: Re: Reasons for Memory panic interrupt ?


I guess it has to do with memory allocation. Remember, it reserves and commits memory at different times. For example a GC can
trigger this. Also, weak stuff uses more memory, because it needs to remember the weak (rescued) array.

As a side note, I looked at the VirtualAlloc function documentation
(http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx). It says that one must call FlushInstructionCache
when code is written to executable memory. It says: "When creating a region that will be executable, the calling program bears
responsibility for ensuring cache coherency via an appropriate call to FlushInstructionCache once the code has been set in place.
Otherwise attempts to execute code out of the newly executable region may produce unpredictable results."

I guess the VM is soooooooo old, that it doesn't know of the FlushInstructionCache function and does not call it. Probably this
gives "unpredictable results" on multi-processor (not multi-core) systems. Does your new Citrix have more physical processors than
the old? BTW, this will trash the JIT cache, not end in the interrupt you described.



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Henrik Høyer
Sent: 9. februar 2012 13:29
To: [hidden email]
Subject: Reasons for Memory panic interrupt ?

On one of our customers we are experiencing "Memory panic interrupt" on their Windows 2008 R2 Citrix servers. The app is running
fine on their 2003 Citrix Servers and their Windows XP and Windows 7 workstations.

The memory panic appears way below the VM memory size settings, and there are plenty of available memory on the servers.

Will the VM trigger a memory panic operation if other resources then "plain memory" is exhausted ?


--
Henrik Høyer
Chief Software Architect
[hidden email] * (+45) 4029 2092
Marievej 15 * 4600 Køge
www.sPeople.dk * (+45) 7023 7775

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for Memory panic interrupt ?

Henrik Høyer
Windows 2008 and higher doesn't have the PTE issues.

http://media.kingston.com/images/usb/pdf/MKMS_1068_MSWS_wp_bw.pdf states:
"Windows Server 2008 introduces an architectural change to the Windows
Memory Manager, which enables dynamic kernel memory allocation of the above-listed shared virtual
memory kernel components. [Non-paged pool, paged pool, system cache and PTE] "

I will try to investigate further on the page size issue, thanks for this hint!

The above article mentions: "One of the key findings at the conclusion of the testing was that
although 32-bit applications performed just as well on a 64-bit platform as they did on 32-bit operating
systems, the memory requirements of running the same 32-bit applications on x64 Windows terminal
servers almost doubled, largely due to the increased memory data structures associated with 64-bit
architecture.  As a result, Microsoft recommends increasing the amount of RAM used in 64-bit terminal
servers by 1.5 to 2 times the capacity typically used in 32-bit implementations to accommodate the
increased memory demands expected from running 32-bit applications on a 64-bit platform"

I will try to investigate if their servers are indeed under-provisioned.



--
Henrik Høyer, sPeople Aps
Chief Software Architect
(+45) 4029 2092

-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg
Sent: 9. februar 2012 16:15
To: [hidden email]
Subject: Re: Reasons for Memory panic interrupt ?

I don't think this problem is related to FlushInstructionCache.

I think it is related to the "MMU page size" http://en.wikipedia.org/wiki/Page_(computer_memory)

If the VM reports out of memory, there could be several reasons:
 - no memory left in the memory pool
 - no more free entries in the page table (but still plenty of memory left)
 - trying to commit memory, that already has been committed

Here is a quote from a website explaining it a bit more:
"
Page Table Entries (PTEs) are an important Windows resource. In fact a PTE shortage can cause Windows to become so instable that the operating system crashes. There is even a Performance Monitor counter (Memory \ Free System Page Table Entries) that you can use to see how many PTEs Windows has available. The generally accepted rule is that if the number of available PTEs drop below 7,000 then the system's stability becomes questionable.

The reason that PTEs are such a big deal is because Windows reserves a very limited amount of space for PTEs. A 32-bit version of Windows Server 2003 only allocates about 900 MB of space for PTEs by default. If you use the /3GB switch in the BOOT.INI file to allocate more address space to user mode processes, they you severely decrease the number of available PTEs. To put it simply, in the 32-bit version of Windows Server 2003, PTE space is a scarce and valuable commodity. The more processes that run on a server, the more PTE space is consumed. This is especially problematic in terminal server environments since the server must run its own processes and a full set of processes for each user session.
"

Maybe this server is still running a 32 bit OS, but uses PAE to address more memory.

The VM is designed for a 4K page size. Because even modern CPUs have a rather low size for their page table, the page size is being increased to address more memory.

As far as I know, the VM code expects pages to be 4K, but the OS does round commit requests to the next page boundary. So it could be that the OS uses 2MB pages which may conflict with the way the VM manages the committed memory. So virtual allocs may fail because the VM tries to commit memory twice.

Regards,
  Andreas

Andreas Rosenberg | eMail: [hidden email]
APIS GmbH         | Phone: +49 9482 9415-0
Im Haslet 42      | Fax: +49 9482 9415-55
93086 Wörth/D     | WWW: <http://www.apis.de/>
Germany           | <http://www.fmea.de/>




-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Todor Todorov
Sent: Donnerstag, 9. Februar 2012 14:47
To: [hidden email]
Subject: Re: Reasons for Memory panic interrupt ?


I guess it has to do with memory allocation. Remember, it reserves and commits memory at different times. For example a GC can trigger this. Also, weak stuff uses more memory, because it needs to remember the weak (rescued) array.

As a side note, I looked at the VirtualAlloc function documentation (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx). It says that one must call FlushInstructionCache when code is written to executable memory. It says: "When creating a region that will be executable, the calling program bears responsibility for ensuring cache coherency via an appropriate call to FlushInstructionCache once the code has been set in place.
Otherwise attempts to execute code out of the newly executable region may produce unpredictable results."

I guess the VM is soooooooo old, that it doesn't know of the FlushInstructionCache function and does not call it. Probably this gives "unpredictable results" on multi-processor (not multi-core) systems. Does your new Citrix have more physical processors than the old? BTW, this will trash the JIT cache, not end in the interrupt you described.



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Henrik Høyer
Sent: 9. februar 2012 13:29
To: [hidden email]
Subject: Reasons for Memory panic interrupt ?

On one of our customers we are experiencing "Memory panic interrupt" on their Windows 2008 R2 Citrix servers. The app is running fine on their 2003 Citrix Servers and their Windows XP and Windows 7 workstations.

The memory panic appears way below the VM memory size settings, and there are plenty of available memory on the servers.

Will the VM trigger a memory panic operation if other resources then "plain memory" is exhausted ?


--
Henrik Høyer
Chief Software Architect
[hidden email] * (+45) 4029 2092
Marievej 15 * 4600 Køge
www.sPeople.dk * (+45) 7023 7775

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for Memory panic interrupt ?

Henrik Høyer
In reply to this post by Andreas Rosenberg
Windows will use 4K pages by default, you need special operations to enable large pages.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx

Reading about the topic, I stumbled upon:
http://www.7-max.com/
Has anybody tried anything like this with VSE?



--
Henrik Høyer
Chief Software Architect
[hidden email] * (+45) 4029 2092
Marievej 15 * 4600 Køge
www.sPeople.dk * (+45) 7023 7775


-----Original Message-----
The VM is designed for a 4K page size. Because even modern CPUs have a rather low size for their page table, the page size is being increased to address more memory.

As far as I know, the VM code expects pages to be 4K, but the OS does round commit requests to the next page boundary. So it could be that the OS uses 2MB pages which may conflict with the way the VM manages the committed memory. So virtual allocs may fail because the VM tries to commit memory twice.

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Pushing the memory limits for VSE and detected some VM bugs

Andreas Rosenberg
In reply to this post by Henrik Høyer
Hi everybody!

As some of you may know our app may deal with a lot data
and therefore we are interested in getting as much memory as possible.

We already rebased the load address of the VM from 0x10000000
to 0x02000000 some time ago. This gives additional 224 MB.
Btw. everybody can do this using the rebase tool which is part of the MS SDK.

http://www.codeproject.com/Articles/1434/Using-the-Rebase-utility-in-project-makefile

Our modified VM tries to allocate memory according to the physical available memory.

GlobalMemoryStatus - memStatus.dwTotalPhys
reports a maximum of 2GB for 32 bit processes,  even if the computer has more
physical memory.

Some of our customers are still using Windows XP. I noticed that on WinXP
the maximum contiguous memory block that can be allocated is only about ~1.2GB.

I remembered that some years ago I had been able to setup an image with an
arena of ~1.6 GB.

A closer look revealed, that our modified VM does load the comctl32.dll,
which is a side effect of using the API call CommandlineToArgvW which
is exported by SHELL.DLL. Unfortunately the comctl32.dll has a
unusual low base address: 0x5D090000.
Nearly all other system dlls are at addresses > 0x70000000.
So the comctl32.dll in fact causes fragmentation of the available address space.

Tests on Win7 showed, that Win7 does not have this behavior.

Rebasing comctl32.dll was no option.

Then I had an idea: allocating some memory at address 0x5D090000 using
VirtualAlloc before using the CommandlineToArgvW should force Windows
to do dynamic rebasing. I knew that this causes additional overhead for
the EXE loader, but I think this does not matter in our case.
Of course one needs to free the space at address 0x5D090000 before
allocating the VM arena.

And voila - allocating memory up to ~1.6 GB in Windows XP was possible again.

Together with this memory research, I took a closer look at the GC routines
because customers reported GPFs in out of memory situations.

Finally I detected several bugs in the GC routines that may cause a GPF
under very special circumstances:
The VM has internal lists used by the garbage collector, managing
references from oldSpace to newSpace a.s.o.

These lists are arrays residing in old space.
Sometimes these internal lists need to grow, so additional memory in oldSpace
is needed. A subroutine is called to allocate this additional memory.
But the code does not check if this allocate was successful and the error status
code is being interpreted as a the new address for this list thus resulting in a GPF.

This GPF can happen each time when an object residing in newSpace is assigned
to a slot of an object residing in oldSpace.

A similar GPF can happen during a global compact.

There is not much you can do in these situations except terminating the process.
But at least we can give a meaningfull error message to our customers.

Regards,
  Andreas

Andreas Rosenberg | eMail: [hidden email]
APIS GmbH         | Phone: +49 9482 9415-0
Im Haslet 42      | Fax: +49 9482 9415-55
93086 Wörth/D     | WWW: <http://www.apis.de/>
Germany           | <http://www.fmea.de/>


***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***