[vwnc] Absolutely weird PostgreSQLEXDI mistery

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Absolutely weird PostgreSQLEXDI mistery

Andre Schnoor
Hi,

a strange Postgres mistery currently drives me mad. I've never  
experienced such an absurd anomaly. I lost the entire day trying to  
find the cause, to no avail. Hopefully someone can point me to  
something helpful.

Literally overnight, without changing any code, queries submitted via  
PostgresSQLEXDI that worked yesterday do no longer work now. The only  
thing I did was I restarted the postgres server this morning (Linux).  
However, it is always started by the same script (via init.d), so the  
config shouldn't have changed. The server also works fine with all  
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an  
answer to arrive from the server after the query has successfully been  
submitted (the server log confirms a successful execution, along with  
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables  
only (reliably), while most others work fine. Example 1: After  
inserting four tuples into a particular table (simple INSERT INTO ...  
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ...  
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't  
be
   a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged  
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do  
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is  
appreciated, as I'm really desperate.

Many thanks in advance!

Andre


--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Alan Knight-2
When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:
Hi,

a strange Postgres mistery currently drives me mad. I've never 
experienced such an absurd anomaly. I lost the entire day trying to 
find the cause, to no avail. Hopefully someone can point me to 
something helpful.

Literally overnight, without changing any code, queries submitted via 
PostgresSQLEXDI that worked yesterday do no longer work now. The only 
thing I did was I restarted the postgres server this morning (Linux). 
However, it is always started by the same script (via init.d), so the 
config shouldn't have changed. The server also works fine with all 
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an 
answer to arrive from the server after the query has successfully been 
submitted (the server log confirms a successful execution, along with 
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables 
only (reliably), while most others work fine. Example 1: After 
inserting four tuples into a particular table (simple INSERT INTO ... 
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ... 
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't 
be
   a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged 
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do 
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is 
appreciated, as I'm really desperate.

Many thanks in advance!

Andre


--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Andres Valloud-6
Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration.  I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7).  The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.
 
Andres.


From: [hidden email] [mailto:[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:
Hi,

a strange Postgres mistery currently drives me mad. I've never 
experienced such an absurd anomaly. I lost the entire day trying to 
find the cause, to no avail. Hopefully someone can point me to 
something helpful.

Literally overnight, without changing any code, queries submitted via 
PostgresSQLEXDI that worked yesterday do no longer work now. The only 
thing I did was I restarted the postgres server this morning (Linux). 
However, it is always started by the same script (via init.d), so the 
config shouldn't have changed. The server also works fine with all 
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an 
answer to arrive from the server after the query has successfully been 
submitted (the server log confirms a successful execution, along with 
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables 
only (reliably), while most others work fine. Example 1: After 
inserting four tuples into a particular table (simple INSERT INTO ... 
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ... 
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't 
be
   a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged 
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do 
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is 
appreciated, as I'm really desperate.

Many thanks in advance!

Andre


--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Alan Knight-2
I doubt that that would lead to it failing consistently on the same tables.

At 09:13 PM 2009-11-26, Valloud, Andres wrote:
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
         boundary="----_=_NextPart_001_01CA6F07.40CFD357"

Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration.  I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7).  The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.
 
Andres.


From: [hidden email] [[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:
Hi,

a strange Postgres mistery currently drives me mad. I've never 
experienced such an absurd anomaly. I lost the entire day trying to 
find the cause, to no avail. Hopefully someone can point me to 
something helpful.

Literally overnight, without changing any code, queries submitted via 
PostgresSQLEXDI that worked yesterday do no longer work now. The only 
thing I did was I restarted the postgres server this morning (Linux). 
However, it is always started by the same script (via init.d), so the 
config shouldn't have changed. The server also works fine with all 
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an 
answer to arrive from the server after the query has successfully been 
submitted (the server log confirms a successful execution, along with 
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables 
only (reliably), while most others work fine. Example 1: After 
inserting four tuples into a particular table (simple INSERT INTO ... 
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ... 
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't 
be
   a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged 
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do 
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is 
appreciated, as I'm really desperate.

Many thanks in advance!

Andre


--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

David Caster
Um, I hate to stick my nose in where it might not be wanted, but did you rebuild your image Andre, that is, did you start with a fresh image and reconstruct your application by loading the code and trying your tests?

David L. Caster

~~ 
On Nov 26, 2009, at 6:28 PM, Alan Knight wrote:

I doubt that that would lead to it failing consistently on the same tables.

At 09:13 PM 2009-11-26, Valloud, Andres wrote:
Content-class: <a href="urn:content-classes:message">urn:content-classes:message
Content-Type: multipart/alternative;
         boundary="----_=_NextPart_001_01CA6F07.40CFD357"

Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration.  I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7).  The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.
 
Andres.


From: [hidden email] [[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:
Hi,

a strange Postgres mistery currently drives me mad. I've never 
experienced such an absurd anomaly. I lost the entire day trying to 
find the cause, to no avail. Hopefully someone can point me to 
something helpful.

Literally overnight, without changing any code, queries submitted via 
PostgresSQLEXDI that worked yesterday do no longer work now. The only 
thing I did was I restarted the postgres server this morning (Linux). 
However, it is always started by the same script (via init.d), so the 
config shouldn't have changed. The server also works fine with all 
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an 
answer to arrive from the server after the query has successfully been 
submitted (the server log confirms a successful execution, along with 
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables 
only (reliably), while most others work fine. Example 1: After 
inserting four tuples into a particular table (simple INSERT INTO ... 
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ... 
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't 
be
   a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged 
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do 
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is 
appreciated, as I'm really desperate.

Many thanks in advance!

Andre


--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Andres Valloud-6
In reply to this post by Alan Knight-2
Yes, the same table failure does seem odd.  I'd be assuming, however, that whatever it is that is causing the problem is not exposing the socket bugs we fixed recently.  For example, let's assume that some table index got damaged and now the db has to do linear search over the whole table.  This takes time, and because it does, some socket misbehavior that would otherwise not be exercised is exposed.  My concern is that, if something like that is going on, then proper socket behavior might allow proper Store behavior, even if it is slow, which would be better than outright failure to work.  Maybe it's a far fetched assumption, but the possibility concerns me in light of the socket problems that existed in 7.5 and, thus, socket misbehavior is something I'd like to rule out if I was debugging the problem.


From: Alan Knight [mailto:[hidden email]]
Sent: Thursday, November 26, 2009 6:28 PM
To: Valloud, Andres; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

I doubt that that would lead to it failing consistently on the same tables.

At 09:13 PM 2009-11-26, Valloud, Andres wrote:
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
         boundary="----_=_NextPart_001_01CA6F07.40CFD357"

Andre, I would not be surprised if sockets misbehaved every so often with a 7.5 configuration.  I'd suggest using at least a 7.6c VM, preferably with a 7.7 image (or at least using the support resolution that addresses potential socket issues when multiple Smalltalk processes use the same socket --- the more thorough fix for this problem is included in 7.7).  The 7.7 VM/image socket implementation is significantly better than that of 7.6 and earlier releases, and especially so on Windows.
 
Andres.


From: [hidden email] [[hidden email]] On Behalf Of Alan Knight
Sent: Thursday, November 26, 2009 3:56 PM
To: Andre Schnoor; VWNC NC
Subject: Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

When you do this, does the server confirm the number of rows returned as well? If it thinks it's done the query, then it seems like there are only a few options. The bits aren't ever getting to the Smalltalk process, or the wrong bits arrive. The bits get to the Smalltalk process, and are somehow eaten before they get to Smalltalk code. The bits arrive in Smalltalk and are misinterpreted. The last one seems easiest to deal with. So, when the socket hangs, how much data has it already read, and what does it say? You might also try the queries using a current Postgresql driver and/or a current image/VM, and if that works then you at least know the problem is fixed, and it's just a matter of finding out where.

At 05:33 PM 2009-11-26, Andre Schnoor wrote:
Hi,

a strange Postgres mistery currently drives me mad. I've never 
experienced such an absurd anomaly. I lost the entire day trying to 
find the cause, to no avail. Hopefully someone can point me to 
something helpful.

Literally overnight, without changing any code, queries submitted via 
PostgresSQLEXDI that worked yesterday do no longer work now. The only 
thing I did was I restarted the postgres server this morning (Linux). 
However, it is always started by the same script (via init.d), so the 
config shouldn't have changed. The server also works fine with all 
other applications.

As the stack below implies, a SocketAccessor waits infinitely for an 
answer to arrive from the server after the query has successfully been 
submitted (the server log confirms a successful execution, along with 
the execution time of the query).

Now for the fun: This happens with /particular/ queries and tables 
only (reliably), while most others work fine. Example 1: After 
inserting four tuples into a particular table (simple INSERT INTO ... 
VALUES ...), it always jams. Example 2: A simple "SELECT * FROM ... 
ORDER BY" jams (not a single row returned).

Even more fun:
- The same queries work on a local database without problems!
- The same queries work fine with another tool (pgAdmin3), so it can't 
be
   a server or hardware (disk) issue.
- The PostgreSQLEXDI parcel is the same as yesterday
- No errors are reported (neither local, nor at the server side)
- I'm on a Mac, but this also happens on a Windows VM (image unchanged 
since days)
- No, I didn't change any database related code
- No, I didn't tamper with the VM sources

What the heck? Is someone sniffing packets off my wire? What can I do 
to narrow down the issue? It must be something really creepy ...

I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is 
appreciated, as I'm really desperate.

Many thanks in advance!

Andre


--

User Interrupt
Semaphore>>wait
SocketAccessor(IOAccessor)>>readWaitWithTimeoutMs:
SpSocket>>waitForReadDataUpToMs:
OSkSocketReadStream>>fillBufferFromSocket
OSkSocketReadStream>>read:into:
OSkSocketReadStream>>readFromBuffer:
OSkSocketReadStream>>next:
ByteValue(FixedSizeValue)>>size:readFrom:for:
ByteValue class(FixedSizeValue class)>>size:readFrom:for:
PostgreSQLMessage class>>readFrom:for:
PostgreSQLMessageReadStream>>nextInTheContextOf:
PostgreSQLMessageReadStream>>next
PostgreSQLQueryResult(PostgreSQLRequestResult)>>readResult
PostgreSQLQueryResult(PostgreSQLRequestResult)>>for:
PostgreSQLQueryResult class(PostgreSQLRequestResult class)>>for:
PostgreSQLQueryResultList>>for:
PostgreSQLQueryResultList class>>for:
PostgreSQLQuery>>readResults
PostgreSQLQuery>>execute
optimized [] in PostgreSQLEXDISession>>executeExternal
BlockClosure>>on:do:
PostgreSQLEXDISession>>executeExternal
optimized [] in ExternalDatabaseSession>>execute
BlockClosure>>ifCurtailed:
PostgreSQLEXDISession(ExternalDatabaseSession)>>execute

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Yanni Chiu
In reply to this post by Andre Schnoor
Andre Schnoor wrote:
> What the heck? Is someone sniffing packets off my wire? What can I do  
> to narrow down the issue? It must be something really creepy ...
>
> I'm using VW 7.5 and the PostgreSQLEXDI that came with it. Any hint is  
> appreciated, as I'm really desperate.

Things to try:

1. Increase the debug level of your postgres server. I think it can be
done with an interrupt signal, without restarting the server. You'll
then know for sure whether the server thinks that it has processed the
SQL, and sent the result, by looking a the server log.

2. Use a packet sniffer to see what bytes have been sent from the
postgres server to the smalltalk VM. It helps to know the postgres
frontend/backend protocol, but it may give some clue.

3. Does your app. have multiple threads/processes that read off the same
socket - i.e. more than one Process is using the same database
connection. If so, then it could be that one thread has consumed some
acknowledgment bytes that are part of the frontend/backend protocol.

4. Since this started with a server restart, are you sure the hardware
is okay. Maybe there's bad RAM or bad disk blocks.

Good luck.

--
Yanni
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Andre Schnoor
Thanks everyone for your suggestions. I much appreciate your help.

It looks this is a server-side problem, for the most part at least.  
After a full vacuum and reindex of all tables, the issue became less  
penetrant and eventually went away (?). Hm. I don't like issues that  
go away without knowing why. The thing I still wonder about is why the  
same queries worked fine on other clients. Actually this excludes a  
server-side problem.

One thing I forgot to mention is that "ps -ef" showed the postmaster  
was running since 2011(!), which was the actual reason to restart it  
in the first place. This clock anomaly might have had some influence  
on postgres meta data. Otherwise hardware seems to be ok (SMART),  
although it's up for five years now. SMART shows many seek errors and  
hardware ECC events, but no failures or sector relocations yet, so at  
least data integrity is ok.

I also observed that on an already established and used connection,  
jams occured less likely and rather randomly. I don't now if its  
normal that the postmaster spawned multiple workers in idle state,  
listing the same client IP. Actually it should reuse them?


@Yanni:

> 1. Increase the debug level of your postgres server.

I did that. It showed the query statements and execution times, but no  
indication whether the answer was actually sent.

> 2. Use a packet sniffer

Very good idea. If it reoccurs, I'll try that.

> 3. ... multiple threads/processes that read off the same socket ...

In theory yes, however I use a RecursionLock to prevent this.

> For example, let's assume that some table index got damaged and now  
> the db has to do linear search over the whole table.  This takes  
> time, and because it does, some socket misbehavior that would  
> otherwise not be exercised is exposed.

@Andres: This would support the assumption of a combined server/client  
issue. Both the connection and the server were very slow yesterday.

Unfortunately, I can't move to 7.7 yet, as this is way to expensive at  
this time. This would hold up my business for weeks, if not months. I  
would rather copy selected code only in both VM sources and image.

> ... did you rebuild your image Andre ... ?

@David: Sure, multiple times.

Thanks again,
Andre




_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Jan Weerts
Hi Andre!

Andre Schnoor wrote:
> It looks this is a server-side problem, for the most part at least.
> After a full vacuum and reindex of all tables, the issue became
> less penetrant and eventually went away (?). Hm. I don't like
> issues that go away without knowing why. The thing I still wonder
> about is why the same queries worked fine on other clients.
> Actually this excludes a server-side problem.

The already given suggestion to sniff the traffic with something like
wireshark might help. Comparing the traffic for the same request from
the failing connection to a working connection might give a clue, if
the server sends different responses.

> One thing I forgot to mention is that "ps -ef" showed the
> postmaster was running since 2011(!), which was the actual reason
> to restart it in the first place. This clock anomaly might have had
> some influence on postgres meta data. Otherwise hardware seems to
> be ok (SMART), although it's up for five years now. SMART shows
> many seek errors and hardware ECC events, but no failures or sector
> relocations yet, so at least data integrity is ok.

Well having these events is (IMHO) a sign of looming trouble, even if
no errors were recorded yet.

Did you upgrade the postgres software? Since you mentioned vacuuming
and reindexing: at least vacuuming should be done automatically by a
version 8 or above server (see autovacuum). If you are still using v7
or earlier server my advice is "upgrade asap". IIRC postgres offers
also some way to check your database, but i forgot about the details.

> I also observed that on an already established and used connection,
> jams occured less likely and rather randomly. I don't now if its
> normal that the postmaster spawned multiple workers in idle state,
> listing the same client IP. Actually it should reuse them?

Each user connected via store has an idle process running, my "ps"
output shows the idle connections and the user using it (this is
postgres v8.1).

Regards
   Jan
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Alan Knight-2
In reply to this post by Andre Schnoor
One possible reason for a difference between behaviour from Smalltalk and other clients is that the Postgresql client is using an older protocol which is deprecated. It continues to work, but it's possible that there are edge cases that expose problems that don't show up for things using the newer protocol.

At 04:55 AM 2009-11-27, Andre Schnoor wrote:
Thanks everyone for your suggestions. I much appreciate your help.

It looks this is a server-side problem, for the most part at least. 
After a full vacuum and reindex of all tables, the issue became less 
penetrant and eventually went away (?). Hm. I don't like issues that 
go away without knowing why. The thing I still wonder about is why the 
same queries worked fine on other clients. Actually this excludes a 
server-side problem.

One thing I forgot to mention is that "ps -ef" showed the postmaster 
was running since 2011(!), which was the actual reason to restart it 
in the first place. This clock anomaly might have had some influence 
on postgres meta data. Otherwise hardware seems to be ok (SMART), 
although it's up for five years now. SMART shows many seek errors and 
hardware ECC events, but no failures or sector relocations yet, so at 
least data integrity is ok.

I also observed that on an already established and used connection, 
jams occured less likely and rather randomly. I don't now if its 
normal that the postmaster spawned multiple workers in idle state, 
listing the same client IP. Actually it should reuse them?


@Yanni:

> 1. Increase the debug level of your postgres server.

I did that. It showed the query statements and execution times, but no 
indication whether the answer was actually sent.

> 2. Use a packet sniffer

Very good idea. If it reoccurs, I'll try that.

> 3. ... multiple threads/processes that read off the same socket ...

In theory yes, however I use a RecursionLock to prevent this.

> For example, let's assume that some table index got damaged and now 
> the db has to do linear search over the whole table.  This takes 
> time, and because it does, some socket misbehavior that would 
> otherwise not be exercised is exposed.

@Andres: This would support the assumption of a combined server/client 
issue. Both the connection and the server were very slow yesterday.

Unfortunately, I can't move to 7.7 yet, as this is way to expensive at 
this time. This would hold up my business for weeks, if not months. I 
would rather copy selected code only in both VM sources and image.

> ... did you rebuild your image Andre ... ?

@David: Sure, multiple times.

Thanks again,
Andre




_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Absolutely weird PostgreSQLEXDI mistery

Bruce Badger
Old, yes.  Deprecated?

http://www.postgresql.org/docs/8.4/static/protocol.html

Does not seem to be.   IIRC the focus of the newer versions of the
protocol was performance, not robustness.  Nonetheless it would be
nice to move up the newer protocol version.

Any sponsors? :-)

All the best,
    Bruce

2009/11/27 Alan Knight <[hidden email]>:

> One possible reason for a difference between behaviour from Smalltalk and
> other clients is that the Postgresql client is using an older protocol which
> is deprecated. It continues to work, but it's possible that there are edge
> cases that expose problems that don't show up for things using the newer
> protocol.
>
> At 04:55 AM 2009-11-27, Andre Schnoor wrote:
>
> Thanks everyone for your suggestions. I much appreciate your help.
>
> It looks this is a server-side problem, for the most part at least.
> After a full vacuum and reindex of all tables, the issue became less
> penetrant and eventually went away (?). Hm. I don't like issues that
> go away without knowing why. The thing I still wonder about is why the
> same queries worked fine on other clients. Actually this excludes a
> server-side problem.
>
> One thing I forgot to mention is that "ps -ef" showed the postmaster
> was running since 2011(!), which was the actual reason to restart it
> in the first place. This clock anomaly might have had some influence
> on postgres meta data. Otherwise hardware seems to be ok (SMART),
> although it's up for five years now. SMART shows many seek errors and
> hardware ECC events, but no failures or sector relocations yet, so at
> least data integrity is ok.
>
> I also observed that on an already established and used connection,
> jams occured less likely and rather randomly. I don't now if its
> normal that the postmaster spawned multiple workers in idle state,
> listing the same client IP. Actually it should reuse them?
>
>
> @Yanni:
>
>> 1. Increase the debug level of your postgres server.
>
> I did that. It showed the query statements and execution times, but no
> indication whether the answer was actually sent.
>
>> 2. Use a packet sniffer
>
> Very good idea. If it reoccurs, I'll try that.
>
>> 3. ... multiple threads/processes that read off the same socket ...
>
> In theory yes, however I use a RecursionLock to prevent this.
>
>> For example, let's assume that some table index got damaged and now
>> the db has to do linear search over the whole table.  This takes
>> time, and because it does, some socket misbehavior that would
>> otherwise not be exercised is exposed.
>
> @Andres: This would support the assumption of a combined server/client
> issue. Both the connection and the server were very slow yesterday.
>
> Unfortunately, I can't move to 7.7 yet, as this is way to expensive at
> this time. This would hold up my business for weeks, if not months. I
> would rather copy selected code only in both VM sources and image.
>
>> ... did you rebuild your image Andre ... ?
>
> @David: Sure, multiple times.
>
> Thanks again,
> Andre
>
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>



--
Make the most of your skills - with OpenSkills
http://www.openskills.org/

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc