Smalltalk › Cincom › VisualWorks

[vwnc] Absolutely weird PostgreSQLEXDI mistery

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

11 messages Options

Andre Schnoor

[vwnc] Absolutely weird PostgreSQLEXDI mistery

Hi,

a strange Postgres mistery currently drives me mad. I've never
experienced such an absurd anomaly. I lost the entire day trying to
find the cause, to no avail. Hopefully someone can point me to
something helpful.

Literally overnight, without changing any code, queries submitted via
PostgresSQLEXDI that worked yesterday do no longer work now. The only
thing I did was I restarted the postgres server this morning (Linux).
However, it is always started by the same script (via init.d), so the
config shouldn't have changed. The server also works fine with all
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an
answer to arrive from the server after the query has successfully been
submitted (the server log confirms a successful execution, along with
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables
only (reliably), while most others work fine. Example 1: After
inserting four tuples into a particular table (simple INSERT INTO ...
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ...
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't
be
a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is
appreciated, as I'm really desperate.

Many thanks in advance!

Andre

--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight-2

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:

Hi,

a strange Postgres mistery currently drives me mad. I've never
experienced such an absurd anomaly. I lost the entire day trying to
find the cause, to no avail. Hopefully someone can point me to
something helpful.

Literally overnight, without changing any code, queries submitted via
PostgresSQLEXDI that worked yesterday do no longer work now. The only
thing I did was I restarted the postgres server this morning (Linux).
However, it is always started by the same script (via init.d), so the
config shouldn't have changed. The server also works fine with all
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an
answer to arrive from the server after the query has successfully been
submitted (the server log confirms a successful execution, along with
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables
only (reliably), while most others work fine. Example 1: After
inserting four tuples into a particular table (simple INSERT INTO ...
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ...
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't
be
a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is
appreciated, as I'm really desperate.

Many thanks in advance!

Andre

--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight [|], Engineering Manager, Cincom Smalltalk

[hidden email]

http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andres Valloud-6

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration. I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7). The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.

Andres.

From: [hidden email] [mailto:[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Hi,

a strange Postgres mistery currently drives me mad. I've never
experienced such an absurd anomaly. I lost the entire day trying to
find the cause, to no avail. Hopefully someone can point me to
something helpful.

Literally overnight, without changing any code, queries submitted via
PostgresSQLEXDI that worked yesterday do no longer work now. The only
thing I did was I restarted the postgres server this morning (Linux).
However, it is always started by the same script (via init.d), so the
config shouldn't have changed. The server also works fine with all
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an
answer to arrive from the server after the query has successfully been
submitted (the server log confirms a successful execution, along with
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables
only (reliably), while most others work fine. Example 1: After
inserting four tuples into a particular table (simple INSERT INTO ...
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ...
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't
be
a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is
appreciated, as I'm really desperate.

Many thanks in advance!

Andre

--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight [|], Engineering Manager, Cincom Smalltalk

[hidden email]

http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight-2

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

I doubt that that would lead to it failing consistently on the same tables.

At 09:13 PM 2009-11-26, Valloud, Andres wrote:

Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01CA6F07.40CFD357"

Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration. I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7). The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.

Andres.

From: [hidden email] [[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:

Hi,

a strange Postgres mistery currently drives me mad. I've never
experienced such an absurd anomaly. I lost the entire day trying to
find the cause, to no avail. Hopefully someone can point me to
something helpful.

Literally overnight, without changing any code, queries submitted via
PostgresSQLEXDI that worked yesterday do no longer work now. The only
thing I did was I restarted the postgres server this morning (Linux).
However, it is always started by the same script (via init.d), so the
config shouldn't have changed. The server also works fine with all
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an
answer to arrive from the server after the query has successfully been
submitted (the server log confirms a successful execution, along with
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables
only (reliably), while most others work fine. Example 1: After
inserting four tuples into a particular table (simple INSERT INTO ...
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ...
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't
be
a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is
appreciated, as I'm really desperate.

Many thanks in advance!

Andre

--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight [|], Engineering Manager, Cincom Smalltalk

[hidden email]

http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

David Caster

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Um, I hate to stick my nose in where it might not be wanted, but did you rebuild your image Andre, that is, did you start with a fresh image and reconstruct your application by loading the code and trying your tests?

David L. Caster

On Nov 26, 2009, at 6:28 PM, Alan Knight wrote:

I doubt that that would lead to it failing consistently on the same tables.

At 09:13 PM 2009-11-26, Valloud, Andres wrote:

Content-class: <a href="urn:content-classes:message">urn:content-classes:message
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01CA6F07.40CFD357"

Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration. I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7). The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.

Andres.

From: [hidden email] [[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:

Hi,

a strange Postgres mistery currently drives me mad. I've never
experienced such an absurd anomaly. I lost the entire day trying to
find the cause, to no avail. Hopefully someone can point me to
something helpful.

Literally overnight, without changing any code, queries submitted via
PostgresSQLEXDI that worked yesterday do no longer work now. The only
thing I did was I restarted the postgres server this morning (Linux).
However, it is always started by the same script (via init.d), so the
config shouldn't have changed. The server also works fine with all
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an
answer to arrive from the server after the query has successfully been
submitted (the server log confirms a successful execution, along with
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables
only (reliably), while most others work fine. Example 1: After
inserting four tuples into a particular table (simple INSERT INTO ...
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ...
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't
be
a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is
appreciated, as I'm really desperate.

Many thanks in advance!

Andre

--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--

Alan Knight [|], Engineering Manager, Cincom Smalltalk

[hidden email]

[hidden email]

http://www.cincom.com/smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andres Valloud-6

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

In reply to this post by Alan Knight-2

Yes, the same table failure does seem odd. I'd be assuming, however, that whatever it is that is causing the problem is not exposing the socket bugs we fixed recently. For example, let's assume that some table index got damaged and now the db has to do linear search over the whole table. This takes time, and because it does, some socket misbehavior that would otherwise not be exercised is exposed. My concern is that, if something like that is going on, then proper socket behavior might allow proper Store behavior, even if it is slow, which would be better than outright failure to work. Maybe it's a far fetched assumption, but the possibility concerns me in light of the socket problems that existed in 7.5 and, thus, socket misbehavior is something I'd like to rule out if I was debugging the problem.

From: Alan Knight [mailto:[hidden email]]
Sent: Thursday, November 26, 2009 6:28 PM
To: Valloud, Andres; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

I doubt that that would lead to it failing consistently on the same tables.

At 09:13 PM 2009-11-26, Valloud, Andres wrote:

Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01CA6F07.40CFD357"

Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration. I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7). The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.

Andres.

From: [hidden email] [[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:

Hi,

a strange Postgres mistery currently drives me mad. I've never
experienced such an absurd anomaly. I lost the entire day trying to
find the cause, to no avail. Hopefully someone can point me to
something helpful.

Literally overnight, without changing any code, queries submitted via
PostgresSQLEXDI that worked yesterday do no longer work now. The only
thing I did was I restarted the postgres server this morning (Linux).
However, it is always started by the same script (via init.d), so the
config shouldn't have changed. The server also works fine with all
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an
answer to arrive from the server after the query has successfully been
submitted (the server log confirms a successful execution, along with
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables
only (reliably), while most others work fine. Example 1: After
inserting four tuples into a particular table (simple INSERT INTO ...
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ...
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't
be
a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is
appreciated, as I'm really desperate.

Many thanks in advance!

Andre

--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight [|], Engineering Manager, Cincom Smalltalk

[hidden email]

http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Yanni Chiu

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

In reply to this post by Andre Schnoor

Andre Schnoor wrote:
> What the heck? Is someone sniffing packets off my wire? What can I do
> to narrow down the issue? It must be something really creepy ...
>
> I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is
> appreciated, as I'm really desperate.

Things to try:

1. Increase the debug level of your postgres server. I think it can be
done with an interrupt signal, without restarting the server. You'll
then know for sure whether the server thinks that it has processed the
SQL, and sent the result, by looking a the server log.

2. Use a packet sniffer to see what bytes have been sent from the
postgres server to the smalltalk VM. It helps to know the postgres
frontend/backend protocol, but it may give some clue.

3. Does your app. have multiple threads/processes that read off the same
socket - i.e. more than one Process is using the same database
connection. If so, then it could be that one thread has consumed some
acknowledgment bytes that are part of the frontend/backend protocol.

4. Since this started with a server restart, are you sure the hardware
is okay. Maybe there's bad RAM or bad disk blocks.

Good luck.

--
Yanni
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andre Schnoor

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Thanks everyone for your suggestions. I much appreciate your help.

It looks this is a server-side problem, for the most part at least.
After a full vacuum and reindex of all tables, the issue became less
penetrant and eventually went away (?). Hm. I don't like issues that
go away without knowing why. The thing I still wonder about is why the
same queries worked fine on other clients. Actually this excludes a
server-side problem.

One thing I forgot to mention is that "ps -ef" showed the postmaster
was running since 2011(!), which was the actual reason to restart it
in the first place. This clock anomaly might have had some influence
on postgres meta data. Otherwise hardware seems to be ok (SMART),
although it's up for five years now. SMART shows many seek errors and
hardware ECC events, but no failures or sector relocations yet, so at
least data integrity is ok.

I also observed that on an already established and used connection,
jams occured less likely and rather randomly. I don't now if its
normal that the postmaster spawned multiple workers in idle state,
listing the same client IP. Actually it should reuse them?

@Yanni:

> 1. Increase the debug level of your postgres server.

I did that. It showed the query statements and execution times, but no
indication whether the answer was actually sent.

> 2. Use a packet sniffer

Very good idea. If it reoccurs, I'll try that.

> 3. ... multiple threads/processes that read off the same socket ...

In theory yes, however I use a RecursionLock to prevent this.

> For example, let's assume that some table index got damaged and now
> the db has to do linear search over the whole table. This takes
> time, and because it does, some socket misbehavior that would
> otherwise not be exercised is exposed.

@Andres: This would support the assumption of a combined server/client
issue. Both the connection and the server were very slow yesterday.

Unfortunately, I can't move to 7.7 yet, as this is way to expensive at
this time. This would hold up my business for weeks, if not months. I
would rather copy selected code only in both VM sources and image.

> ... did you rebuild your image Andre ... ?

@David: Sure, multiple times.

Thanks again,
Andre

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Jan Weerts

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Hi Andre!

Andre Schnoor wrote:
> It looks this is a server-side problem, for the most part at least.
> After a full vacuum and reindex of all tables, the issue became
> less penetrant and eventually went away (?). Hm. I don't like
> issues that go away without knowing why. The thing I still wonder
> about is why the same queries worked fine on other clients.
> Actually this excludes a server-side problem.

The already given suggestion to sniff the traffic with something like
wireshark might help. Comparing the traffic for the same request from
the failing connection to a working connection might give a clue, if
the server sends different responses.

> One thing I forgot to mention is that "ps -ef" showed the
> postmaster was running since 2011(!), which was the actual reason
> to restart it in the first place. This clock anomaly might have had
> some influence on postgres meta data. Otherwise hardware seems to
> be ok (SMART), although it's up for five years now. SMART shows
> many seek errors and hardware ECC events, but no failures or sector
> relocations yet, so at least data integrity is ok.

Well having these events is (IMHO) a sign of looming trouble, even if
no errors were recorded yet.

Did you upgrade the postgres software? Since you mentioned vacuuming
and reindexing: at least vacuuming should be done automatically by a
version 8 or above server (see autovacuum). If you are still using v7
or earlier server my advice is "upgrade asap". IIRC postgres offers
also some way to check your database, but i forgot about the details.

> I also observed that on an already established and used connection,
> jams occured less likely and rather randomly. I don't now if its
> normal that the postmaster spawned multiple workers in idle state,
> listing the same client IP. Actually it should reuse them?

Each user connected via store has an idle process running, my "ps"
output shows the idle connections and the user using it (this is
postgres v8.1).

Regards
Jan
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight-2

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

In reply to this post by Andre Schnoor

One possible reason for a difference between behaviour from Smalltalk and other clients is that the Postgresql client is using an older protocol which is deprecated. It continues to work, but it's possible that there are edge cases that expose problems that don't show up for things using the newer protocol.

At 04:55 AM 2009-11-27, Andre Schnoor wrote:

Thanks everyone for your suggestions. I much appreciate your help.

It looks this is a server-side problem, for the most part at least.
After a full vacuum and reindex of all tables, the issue became less
penetrant and eventually went away (?). Hm. I don't like issues that
go away without knowing why. The thing I still wonder about is why the
same queries worked fine on other clients. Actually this excludes a
server-side problem.

One thing I forgot to mention is that "ps -ef" showed the postmaster
was running since 2011(!), which was the actual reason to restart it
in the first place. This clock anomaly might have had some influence
on postgres meta data. Otherwise hardware seems to be ok (SMART),
although it's up for five years now. SMART shows many seek errors and
hardware ECC events, but no failures or sector relocations yet, so at
least data integrity is ok.

I also observed that on an already established and used connection,
jams occured less likely and rather randomly. I don't now if its
normal that the postmaster spawned multiple workers in idle state,
listing the same client IP. Actually it should reuse them?

@Yanni:

> 1. Increase the debug level of your postgres server.

I did that. It showed the query statements and execution times, but no
indication whether the answer was actually sent.

> 2. Use a packet sniffer

Very good idea. If it reoccurs, I'll try that.

> 3. ... multiple threads/processes that read off the same socket ...

In theory yes, however I use a RecursionLock to prevent this.

> For example, let's assume that some table index got damaged and now
> the db has to do linear search over the whole table. This takes
> time, and because it does, some socket misbehavior that would
> otherwise not be exercised is exposed.

@Andres: This would support the assumption of a combined server/client
issue. Both the connection and the server were very slow yesterday.

Unfortunately, I can't move to 7.7 yet, as this is way to expensive at
this time. This would hold up my business for weeks, if not months. I
would rather copy selected code only in both VM sources and image.

> ... did you rebuild your image Andre ... ?

@David: Sure, multiple times.

Thanks again,
Andre

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight [|], Engineering Manager, Cincom Smalltalk

[hidden email]

http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Bruce Badger

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Old, yes. Deprecated?

http://www.postgresql.org/docs/8.4/static/protocol.html

Does not seem to be. IIRC the focus of the newer versions of the
protocol was performance, not robustness. Nonetheless it would be
nice to move up the newer protocol version.

Any sponsors? :-)

All the best,
Bruce

2009/11/27 Alan Knight <[hidden email]>:

> One possible reason for a difference between behaviour from Smalltalk and
> other clients is that the Postgresql client is using an older protocol which
> is deprecated. It continues to work, but it's possible that there are edge
> cases that expose problems that don't show up for things using the newer
> protocol.
>
> At 04:55 AM 2009-11-27, Andre Schnoor wrote:
>
> Thanks everyone for your suggestions. I much appreciate your help.
>
> It looks this is a server-side problem, for the most part at least.
> After a full vacuum and reindex of all tables, the issue became less
> penetrant and eventually went away (?). Hm. I don't like issues that
> go away without knowing why. The thing I still wonder about is why the
> same queries worked fine on other clients. Actually this excludes a
> server-side problem.
>
> One thing I forgot to mention is that "ps -ef" showed the postmaster
> was running since 2011(!), which was the actual reason to restart it
> in the first place. This clock anomaly might have had some influence
> on postgres meta data. Otherwise hardware seems to be ok (SMART),
> although it's up for five years now. SMART shows many seek errors and
> hardware ECC events, but no failures or sector relocations yet, so at
> least data integrity is ok.
>
> I also observed that on an already established and used connection,
> jams occured less likely and rather randomly. I don't now if its
> normal that the postmaster spawned multiple workers in idle state,
> listing the same client IP. Actually it should reuse them?
>
>
> @Yanni:
>
>> 1. Increase the debug level of your postgres server.
>
> I did that. It showed the query statements and execution times, but no
> indication whether the answer was actually sent.
>
>> 2. Use a packet sniffer
>
> Very good idea. If it reoccurs, I'll try that.
>
>> 3. ... multiple threads/processes that read off the same socket ...
>
> In theory yes, however I use a RecursionLock to prevent this.
>
>> For example, let's assume that some table index got damaged and now
>> the db has to do linear search over the whole table. This takes
>> time, and because it does, some socket misbehavior that would
>> otherwise not be exercised is exposed.
>
> @Andres: This would support the assumption of a combined server/client
> issue. Both the connection and the server were very slow yesterday.
>
> Unfortunately, I can't move to 7.7 yet, as this is way to expensive at
> this time. This would hold up my business for weeks, if not months. I
> would rather copy selected code only in both VM sources and image.
>
>> ... did you rebuild your image Andre ... ?
>
> @David: Sure, multiple times.
>
> Thanks again,
> Andre
>
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>

--
Make the most of your skills - with OpenSkills
http://www.openskills.org/

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc