Crash dump report

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Crash dump report

sebastianconcept@gmail.co
Hi guys,

I'm sorry for this news.

We have a web app that's is increasingly crashing in production (on linux).

In development (OS X) we couldn't reproduce

We are falling back to the StackVM

Here is some dump data we could get




dump.zip (849K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Crash dump report

Stéphane Ducasse
thanks a lot for sharing that with us.
Did you try with the VM with the new semaphores? and without the new semaphores handling?
In the two cases the primitives used looks different.
Could you also provide a size of the image?
If the crash reproducible?
Stef

On Nov 12, 2012, at 8:12 PM, Sebastian Sastre wrote:

> Hi guys,
>
> I'm sorry for this news.
>
> We have a web app that's is increasingly crashing in production (on linux).
>
> In development (OS X) we couldn't reproduce
>
> We are falling back to the StackVM
>
> Here is some dump data we could get
>
> <dump.zip>
>
> sebastian
>
> o/
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Crash dump report

sebastianconcept@gmail.co
sure Stef.

Did you try with the VM with the new semaphores?
not sure. The method below answers this question?

and without the new semaphores handling?
again, now sure. 

In the two cases the primitives used looks different.
What should I look?

Could you also provide a size of the image?
it's a 53MB image but in RAM at the moment of the crash we don't know (we wanted to know but we didn't catched it in fraganti)

If the crash reproducible?
in development in OS X no it doesn't (same data). 

in production, since we started the app with the StackVM, we are having uninterrupted uptime (so far and counting...)


Pharo1.4
Latest update: #14459

VirtualMachine>>maxExternalSemaphores: aSize 
"This method should never be called as result of normal program
execution. If it is however, handle it differently:
- In development, signal an error to promt user to set a bigger size
at startup immediately.
- In production, accept the cost of potentially unhandled interrupts,
but log the action for later review.
See comment in maxExternalObjectsSilently: why this behaviour is
desirable, "
"Can't find a place where development/production is decided.
Suggest Smalltalk image inProduction, but use an overridable temp
meanwhile. "
| inProduction |
self maxExternalSemaphores
ifNil: [^ 0].
inProduction := true.
^ inProduction
ifTrue: [self maxExternalSemaphoresSilently: aSize.
self crTrace: 'WARNING: Had to increase size of semaphore signal handling table due to many external objects concurrently in use';
crTrace: 'You should increase this size at startup using #maxExternalObjectsSilently:';
crTrace: 'Current table size: ' , self maxExternalSemaphores printString]
ifFalse: ["Smalltalk image"
self error: 'Not enough space for external objects, set a larger size at startup!'
"Smalltalk image"]








On Nov 12, 2012, at 6:22 PM, Stéphane Ducasse wrote:

thanks a lot for sharing that with us.
Did you try with the VM with the new semaphores? and without the new semaphores handling?
In the two cases the primitives used looks different.
Could you also provide a size of the image?
If the crash reproducible?
Stef

On Nov 12, 2012, at 8:12 PM, Sebastian Sastre wrote:

Hi guys,

I'm sorry for this news.

We have a web app that's is increasingly crashing in production (on linux).

In development (OS X) we couldn't reproduce

We are falling back to the StackVM

Here is some dump data we could get

<dump.zip>

sebastian

o/








Reply | Threaded
Open this post in threaded view
|

Re: Crash dump report

Stéphane Ducasse

On Nov 12, 2012, at 9:32 PM, Sebastian Sastre wrote:

> sure Stef.
>
>> Did you try with the VM with the new semaphores?
> not sure. The method below answers this question?
>
>> and without the new semaphores handling?
> again, now sure.
>
>> In the two cases the primitives used looks different.
> What should I look?
sorry I was unclear. I look at the log and at the end you have recently invoked primitives.


>> Could you also provide a size of the image?
> it's a 53MB image but in RAM
so it does not look like large.

> at the moment of the crash we don't know (we wanted to know but we didn't catched it in fraganti)
>
>> If the crash reproducible?
> in development in OS X no it doesn't (same data).
>
> in production, since we started the app with the StackVM, we are having uninterrupted uptime (so far and counting…)

We should probably see if igor/esteban can have a look.
What you are saying is that it would be more a problem on linux.

>
> sebastian
>
> o/
>
>
> Pharo1.4
> Latest update: #14459
>
> VirtualMachine>>maxExternalSemaphores: aSize
> "This method should never be called as result of normal program
> execution. If it is however, handle it differently:
> - In development, signal an error to promt user to set a bigger size
> at startup immediately.
> - In production, accept the cost of potentially unhandled interrupts,
> but log the action for later review.
>
> See comment in maxExternalObjectsSilently: why this behaviour is
> desirable, "
> "Can't find a place where development/production is decided.
> Suggest Smalltalk image inProduction, but use an overridable temp
> meanwhile. "
> | inProduction |
> self maxExternalSemaphores
> ifNil: [^ 0].
> inProduction := true.
> ^ inProduction
> ifTrue: [self maxExternalSemaphoresSilently: aSize.
> self crTrace: 'WARNING: Had to increase size of semaphore signal handling table due to many external objects concurrently in use';
> crTrace: 'You should increase this size at startup using #maxExternalObjectsSilently:';
> crTrace: 'Current table size: ' , self maxExternalSemaphores printString]
> ifFalse: ["Smalltalk image"
> self error: 'Not enough space for external objects, set a larger size at startup!'
> "Smalltalk image"]
>
>
>
>
>
>
>
>
> On Nov 12, 2012, at 6:22 PM, Stéphane Ducasse wrote:
>
>> thanks a lot for sharing that with us.
>> Did you try with the VM with the new semaphores? and without the new semaphores handling?
>> In the two cases the primitives used looks different.
>> Could you also provide a size of the image?
>> If the crash reproducible?
>> Stef
>>
>> On Nov 12, 2012, at 8:12 PM, Sebastian Sastre wrote:
>>
>>> Hi guys,
>>>
>>> I'm sorry for this news.
>>>
>>> We have a web app that's is increasingly crashing in production (on linux).
>>>
>>> In development (OS X) we couldn't reproduce
>>>
>>> We are falling back to the StackVM
>>>
>>> Here is some dump data we could get
>>>
>>> <dump.zip>
>>>
>>> sebastian
>>>
>>> o/
>>>
>>>
>>>
>>>
>>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Crash dump report

Igor Stasenko
In reply to this post by sebastianconcept@gmail.co
yeah the dumps are really strange..
seems like most of the times crashing at this point:

0xffc05bd8 M ZnChunkedReadStream>next -613198972: a(n) ZnChunkedReadStream
0xffc05bfc M ZnUTF8Encoder>nextFromStream: -613185216: a(n) ZnUTF8Encoder
0xffc05c30 M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
ZnStringEntity
0xffc1333c M BlockClosure>on:do: -613184868: a(n) BlockClosure
0xffc1336c M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
ZnStringEntity
0xffc1338c M String class(SequenceableCollection
class)>new:streamContents: -679940384: a(n) String class
0xffc133ac M String class(SequenceableCollection
class)>streamContents: -679940384: a(n) String class
0xffc133d8 M ZnStringEntity>readUpToEndFrom: -613186408: a(n) ZnStringEntity
0xffc133f4 M ZnStringEntity>readFrom: -613186408: a(n) ZnStringEntity
0xffc13414 M ZnEntity class>readFrom:usingType:andLength: -678925448:
a(n) ZnEntity class
0xffc13440 M ZnEntityReader>readFrom:usingType:andLength: -613217204:
a(n) ZnEntityReader
0xffc13474 M ZnEntityReader>readEntityFromStream -613217204: a(n) ZnEntityReader
0xffc13490 M ZnEntityReader>readEntity -613217204: a(n) ZnEntityReader
0xffc134ac M ZnResponse(ZnMessage)>readEntityFrom: -614400344: a(n) ZnResponse
0xffc134c8 M ZnResponse>readEntityFrom: -614400344: a(n) ZnResponse


On 12 November 2012 16:12, Sebastian Sastre
<[hidden email]> wrote:

> Hi guys,
>
> I'm sorry for this news.
>
> We have a web app that's is increasingly crashing in production (on linux).
>
> In development (OS X) we couldn't reproduce
>
> We are falling back to the StackVM
>
> Here is some dump data we could get
>
>
>
> sebastian
>
> o/
>
>
>
>
>
>



--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: Crash dump report

Sven Van Caekenberghe-2
These situations are very hard to debug by outsiders, for the community to help we need a reproduceable case that does not depend on internal or private services.

The stacktrace is not exceptional as far as I can see: it is trying to read a response until EOF, after a POST.

It is quite strange that it would work with the stack VM, since the same Smalltalk code is then running, using the same primitives. Maybe the other VM just postpones/changes the way it fails.

Running out of semaphores is a resource management problem, caused directly or indirectly from certain usage patterns. Again, see the first sentence.

I would advice running units and/or stress tests on the server or on a Linux VM with UI on your Mac.

Bughunts are never easy…

On 13 Nov 2012, at 00:33, Igor Stasenko <[hidden email]> wrote:

> yeah the dumps are really strange..
> seems like most of the times crashing at this point:
>
> 0xffc05bd8 M ZnChunkedReadStream>next -613198972: a(n) ZnChunkedReadStream
> 0xffc05bfc M ZnUTF8Encoder>nextFromStream: -613185216: a(n) ZnUTF8Encoder
> 0xffc05c30 M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
> ZnStringEntity
> 0xffc1333c M BlockClosure>on:do: -613184868: a(n) BlockClosure
> 0xffc1336c M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
> ZnStringEntity
> 0xffc1338c M String class(SequenceableCollection
> class)>new:streamContents: -679940384: a(n) String class
> 0xffc133ac M String class(SequenceableCollection
> class)>streamContents: -679940384: a(n) String class
> 0xffc133d8 M ZnStringEntity>readUpToEndFrom: -613186408: a(n) ZnStringEntity
> 0xffc133f4 M ZnStringEntity>readFrom: -613186408: a(n) ZnStringEntity
> 0xffc13414 M ZnEntity class>readFrom:usingType:andLength: -678925448:
> a(n) ZnEntity class
> 0xffc13440 M ZnEntityReader>readFrom:usingType:andLength: -613217204:
> a(n) ZnEntityReader
> 0xffc13474 M ZnEntityReader>readEntityFromStream -613217204: a(n) ZnEntityReader
> 0xffc13490 M ZnEntityReader>readEntity -613217204: a(n) ZnEntityReader
> 0xffc134ac M ZnResponse(ZnMessage)>readEntityFrom: -614400344: a(n) ZnResponse
> 0xffc134c8 M ZnResponse>readEntityFrom: -614400344: a(n) ZnResponse
>
> On 12 November 2012 16:12, Sebastian Sastre <[hidden email]> wrote:
>> Hi guys,
>>
>> I'm sorry for this news.
>>
>> We have a web app that's is increasingly crashing in production (on linux).
>>
>> In development (OS X) we couldn't reproduce
>>
>> We are falling back to the StackVM
>>
>> Here is some dump data we could get
>>
>> sebastian
>>
>> o/
>
> --
> Best regards,
> Igor Stasenko.


Reply | Threaded
Open this post in threaded view
|

Re: Crash dump report

EstebanLM
is the same smalltalk code with same primitives, but not same vm.
I have spotten this problem time to time, and I have the feeling of being something related to the JIT (that's why my first suggestion to Sebastian was to use the StackVM). Most probably: some optimization flags behaves slightly different in different platforms and then some operations made by the jitter ends trying to access invalid memory segments.

but as I said, is just a "feeling", since is really hard to catch and I don't have real proof (besides  the "indirect one": it works fine if we took off the jit).

Esteban

On Nov 13, 2012, at 11:00 AM, Sven Van Caekenberghe <[hidden email]> wrote:

> These situations are very hard to debug by outsiders, for the community to help we need a reproduceable case that does not depend on internal or private services.
>
> The stacktrace is not exceptional as far as I can see: it is trying to read a response until EOF, after a POST.
>
> It is quite strange that it would work with the stack VM, since the same Smalltalk code is then running, using the same primitives. Maybe the other VM just postpones/changes the way it fails.
>
> Running out of semaphores is a resource management problem, caused directly or indirectly from certain usage patterns. Again, see the first sentence.
>
> I would advice running units and/or stress tests on the server or on a Linux VM with UI on your Mac.
>
> Bughunts are never easy…
>
> On 13 Nov 2012, at 00:33, Igor Stasenko <[hidden email]> wrote:
>
>> yeah the dumps are really strange..
>> seems like most of the times crashing at this point:
>>
>> 0xffc05bd8 M ZnChunkedReadStream>next -613198972: a(n) ZnChunkedReadStream
>> 0xffc05bfc M ZnUTF8Encoder>nextFromStream: -613185216: a(n) ZnUTF8Encoder
>> 0xffc05c30 M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
>> ZnStringEntity
>> 0xffc1333c M BlockClosure>on:do: -613184868: a(n) BlockClosure
>> 0xffc1336c M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
>> ZnStringEntity
>> 0xffc1338c M String class(SequenceableCollection
>> class)>new:streamContents: -679940384: a(n) String class
>> 0xffc133ac M String class(SequenceableCollection
>> class)>streamContents: -679940384: a(n) String class
>> 0xffc133d8 M ZnStringEntity>readUpToEndFrom: -613186408: a(n) ZnStringEntity
>> 0xffc133f4 M ZnStringEntity>readFrom: -613186408: a(n) ZnStringEntity
>> 0xffc13414 M ZnEntity class>readFrom:usingType:andLength: -678925448:
>> a(n) ZnEntity class
>> 0xffc13440 M ZnEntityReader>readFrom:usingType:andLength: -613217204:
>> a(n) ZnEntityReader
>> 0xffc13474 M ZnEntityReader>readEntityFromStream -613217204: a(n) ZnEntityReader
>> 0xffc13490 M ZnEntityReader>readEntity -613217204: a(n) ZnEntityReader
>> 0xffc134ac M ZnResponse(ZnMessage)>readEntityFrom: -614400344: a(n) ZnResponse
>> 0xffc134c8 M ZnResponse>readEntityFrom: -614400344: a(n) ZnResponse
>>
>> On 12 November 2012 16:12, Sebastian Sastre <[hidden email]> wrote:
>>> Hi guys,
>>>
>>> I'm sorry for this news.
>>>
>>> We have a web app that's is increasingly crashing in production (on linux).
>>>
>>> In development (OS X) we couldn't reproduce
>>>
>>> We are falling back to the StackVM
>>>
>>> Here is some dump data we could get
>>>
>>> sebastian
>>>
>>> o/
>>
>> --
>> Best regards,
>> Igor Stasenko.
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Crash dump report

sebastianconcept@gmail.co
can't say anything about jit but, for what is worth, it's flawlessly up since we started using the StackVM


On Nov 13, 2012, at 8:11 AM, Esteban Lorenzano wrote:

is the same smalltalk code with same primitives, but not same vm.
I have spotten this problem time to time, and I have the feeling of being something related to the JIT (that's why my first suggestion to Sebastian was to use the StackVM). Most probably: some optimization flags behaves slightly different in different platforms and then some operations made by the jitter ends trying to access invalid memory segments.

but as I said, is just a "feeling", since is really hard to catch and I don't have real proof (besides  the "indirect one": it works fine if we took off the jit).

Esteban

On Nov 13, 2012, at 11:00 AM, Sven Van Caekenberghe <[hidden email]> wrote:

These situations are very hard to debug by outsiders, for the community to help we need a reproduceable case that does not depend on internal or private services.

The stacktrace is not exceptional as far as I can see: it is trying to read a response until EOF, after a POST.

It is quite strange that it would work with the stack VM, since the same Smalltalk code is then running, using the same primitives. Maybe the other VM just postpones/changes the way it fails.

Running out of semaphores is a resource management problem, caused directly or indirectly from certain usage patterns. Again, see the first sentence.

I would advice running units and/or stress tests on the server or on a Linux VM with UI on your Mac.

Bughunts are never easy…

On 13 Nov 2012, at 00:33, Igor Stasenko <[hidden email]> wrote:

yeah the dumps are really strange..
seems like most of the times crashing at this point:

0xffc05bd8 M ZnChunkedReadStream>next -613198972: a(n) ZnChunkedReadStream
0xffc05bfc M ZnUTF8Encoder>nextFromStream: -613185216: a(n) ZnUTF8Encoder
0xffc05c30 M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
ZnStringEntity
0xffc1333c M BlockClosure>on:do: -613184868: a(n) BlockClosure
0xffc1336c M [] in ZnStringEntity>readUpToEndFrom: -613186408: a(n)
ZnStringEntity
0xffc1338c M String class(SequenceableCollection
class)>new:streamContents: -679940384: a(n) String class
0xffc133ac M String class(SequenceableCollection
class)>streamContents: -679940384: a(n) String class
0xffc133d8 M ZnStringEntity>readUpToEndFrom: -613186408: a(n) ZnStringEntity
0xffc133f4 M ZnStringEntity>readFrom: -613186408: a(n) ZnStringEntity
0xffc13414 M ZnEntity class>readFrom:usingType:andLength: -678925448:
a(n) ZnEntity class
0xffc13440 M ZnEntityReader>readFrom:usingType:andLength: -613217204:
a(n) ZnEntityReader
0xffc13474 M ZnEntityReader>readEntityFromStream -613217204: a(n) ZnEntityReader
0xffc13490 M ZnEntityReader>readEntity -613217204: a(n) ZnEntityReader
0xffc134ac M ZnResponse(ZnMessage)>readEntityFrom: -614400344: a(n) ZnResponse
0xffc134c8 M ZnResponse>readEntityFrom: -614400344: a(n) ZnResponse

On 12 November 2012 16:12, Sebastian Sastre <[hidden email]> wrote:
Hi guys,

I'm sorry for this news.

We have a web app that's is increasingly crashing in production (on linux).

In development (OS X) we couldn't reproduce

We are falling back to the StackVM

Here is some dump data we could get

sebastian

o/

--
Best regards,
Igor Stasenko.