Psql or test application hangs when interface is down for the DB server
Hi,
Environment used:
Postgres 8.3.1
psqlODBC 08.03.0200
Testcase:
In the postgres database there is table 'COUNTER_TABLE' with column
integer type 'COUNTER'. The test application attached in this mail, will
start a transaction, gets the current value in the COUNTER, increments
the value and updates the incremented value into the COUNTER column.
This is being done in a loop. The program is started in a remote client
and after few transactions, the interface between the client & the
database server is brought down (example I used "ifconfig eth0 down" in
the server). With this the test application hangs and does not return
from the API of postgres (ex. 'PQexec').
<<pg_test_app.cpp>>
In another example, run the psql from the remote client and connect to
the database server. Execute the SQL to update the COUNTER_TABLE. After
successful execution, next bring the network interface down on the
database server (Ex. I use the command "ifconfig eth0 down") and next
execute the SQL command to update the COUNTER_TABLE again from the same
remote client and the same DB session. The SQL command hangs.
regards,
Niranjan
Attachments:
"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
In the postgres database there is table 'COUNTER_TABLE' with column
integer type 'COUNTER'. The test application attached in this mail, will
start a transaction, gets the current value in the COUNTER, increments
the value and updates the incremented value into the COUNTER column.
This is being done in a loop. The program is started in a remote client
and after few transactions, the interface between the client & the
database server is brought down (example I used "ifconfig eth0 down" in
the server). With this the test application hangs and does not return
from the API of postgres (ex. 'PQexec').
If you waited long enough for the TCP connection to time out, it would
return (with an error, of course). This behavior is not a bug, it is
the expected behavior of any program using a network connection.
regards, tom lane
Currently the test application or the psql will unblock after ~15
minutes. This is a very huge time to realize for programs this situation
which do database updates.
As far as I have debugged, I see that the execution is waiting on
'poll()' system call in the function pqSocketPoll() which is called as a
result of 'PQexec()' and the timeout paramater provided will be -1,
which means infinite wait time. It not clear how this is getting
unblocked after 15 minutes. Who will write to the socket or who will
interrupt the poll() system call?
Is there any other workaround or alternative so that the situation about
the interface is down is known and based on that the 'PQexec' does not
get blocked for ~15 minutes.
regards,
Niranjan
-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 15, 2008 8:16 PM
To: K, Niranjan (NSN - IN/Bangalore)
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when interface is
down for the DB server
"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
In the postgres database there is table 'COUNTER_TABLE' with column
integer type 'COUNTER'. The test application attached in this mail,
will start a transaction, gets the current value in the COUNTER,
increments the value and updates the incremented value into the
COUNTER column.
This is being done in a loop. The program is started in a remote
client and after few transactions, the interface between the client &
the database server is brought down (example I used "ifconfig eth0
down" in the server). With this the test application hangs and does
not return from the API of postgres (ex. 'PQexec').
If you waited long enough for the TCP connection to time out, it would
return (with an error, of course). This behavior is not a bug, it is
the expected behavior of any program using a network connection.
regards, tom lane
"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
Is there any other workaround or alternative so that the situation about
the interface is down is known and based on that the 'PQexec' does not
get blocked for ~15 minutes.
Absent threads I think you have to use alarm() and a SIGALRM signal handler.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Gregory Stark <stark@enterprisedb.com> writes:
"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
Is there any other workaround or alternative so that the situation about
the interface is down is known and based on that the 'PQexec' does not
get blocked for ~15 minutes.
Absent threads I think you have to use alarm() and a SIGALRM signal handler.
On most modern platforms you can adjust the TCP timeouts for the
connection. There's no explicit support for that in libpq, but
you can just get the socket FD from it and do setsockopt().
regards, tom lane
I have noticed this as well. Blocks in poll(), timeout parameter -1, meaning infinite then after 4 minutes on my system poll() returns 1 and
getsockopt() is called with SO_ERROR. SYN packets are tried only for the default tcp timeout of 20 seconds.
Consider using threads that way you can set your own timeout value.
Regards
Val
--- On Wed, 16/7/08, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote:
From: K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com>
Subject: Re: [BUGS] Psql or test application hangs when interface is down for the DB server
To: "ext Tom Lane" <tgl@sss.pgh.pa.us>
Cc: pgsql-bugs@postgresql.org
Date: Wednesday, 16 July, 2008, 6:55 AM
Currently the test application or the psql will unblock
after ~15
minutes. This is a very huge time to realize for programs
this situation
which do database updates.
As far as I have debugged, I see that the execution is
waiting on
'poll()' system call in the function pqSocketPoll()
which is called as a
result of 'PQexec()' and the timeout paramater
provided will be -1,
which means infinite wait time. It not clear how this is
getting
unblocked after 15 minutes. Who will write to the socket or
who will
interrupt the poll() system call?Is there any other workaround or alternative so that the
situation about
the interface is down is known and based on that the
'PQexec' does not
get blocked for ~15 minutes.regards,
Niranjan-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 15, 2008 8:16 PM
To: K, Niranjan (NSN - IN/Bangalore)
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when
interface is
down for the DB server"K, Niranjan (NSN - IN/Bangalore)"
<niranjan.k@nsn.com> writes:In the postgres database there is table
'COUNTER_TABLE' with column
integer type 'COUNTER'. The test application
attached in this mail,
will start a transaction, gets the current value in
the COUNTER,
increments the value and updates the incremented value
into the
COUNTER column.This is being done in a loop. The program is started
in a remote
client and after few transactions, the interface
between the client &
the database server is brought down (example I used
"ifconfig eth0
down" in the server). With this the test
application hangs and does
not return from the API of postgres (ex.
'PQexec').
If you waited long enough for the TCP connection to time
out, it would
return (with an error, of course). This behavior is not a
bug, it is
the expected behavior of any program using a network
connection.regards, tom lane
--
Sent via pgsql-bugs mailing list
(pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
"Valentin Bogdanov" <valiouk@yahoo.co.uk> writes:
I have noticed this as well. Blocks in poll(), timeout parameter -1,
Oh good point. non-blocking sockets and poll/select let you control the
timeout too.
meaning infinite then after 4 minutes on my system poll() returns 1 and
getsockopt() is called with SO_ERROR. SYN packets are tried only for the
default tcp timeout of 20 seconds.
Uhm, 20 seconds would be an unreasonably low default. I think the RFCs mandate
timeouts closer to the 4 minutes you describe.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!
Thanks Gregory,
You right, of course, about that. It is 4 minutes I wasn't paying attention and thought that I have found something odd. The last packet is sent a minute and a half after the first and I miss-read that for 20 seconds.
Cheers,
Val
--- On Wed, 16/7/08, Gregory Stark <stark@enterprisedb.com> wrote:
From: Gregory Stark <stark@enterprisedb.com>
Subject: Re: [BUGS] Psql or test application hangs when interface is down for the DB server
To: valiouk@yahoo.co.uk
Cc: "ext Tom Lane" <tgl@sss.pgh.pa.us>, "K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com>, pgsql-bugs@postgresql.org
Date: Wednesday, 16 July, 2008, 6:33 PM
"Valentin Bogdanov" <valiouk@yahoo.co.uk>
writes:I have noticed this as well. Blocks in poll(), timeout
parameter -1,
Oh good point. non-blocking sockets and poll/select let you
control the
timeout too.meaning infinite then after 4 minutes on my system
poll() returns 1 and
getsockopt() is called with SO_ERROR. SYN packets are
tried only for the
default tcp timeout of 20 seconds.
Uhm, 20 seconds would be an unreasonably low default. I
think the RFCs mandate
timeouts closer to the 4 minutes you describe.--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!--
Sent via pgsql-bugs mailing list
(pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
Thanks Gregory,
You right, of course, about that. It is 4 minutes I wasn't paying attention and thought that I have found something odd. The last packet is sent a minute and a half after the first and I miss-read that for 20 seconds.
Cheers,
Val
--- On Wed, 16/7/08, Gregory Stark <stark@enterprisedb.com> wrote:
From: Gregory Stark <stark@enterprisedb.com>
Subject: Re: [BUGS] Psql or test application hangs when interface is down for the DB server
To: valiouk@yahoo.co.uk
Cc: "ext Tom Lane" <tgl@sss.pgh.pa.us>, "K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com>, pgsql-bugs@postgresql.org
Date: Wednesday, 16 July, 2008, 6:33 PM
"Valentin Bogdanov" <valiouk@yahoo.co.uk>
writes:I have noticed this as well. Blocks in poll(), timeout
parameter -1,
Oh good point. non-blocking sockets and poll/select let you
control the
timeout too.meaning infinite then after 4 minutes on my system
poll() returns 1 and
getsockopt() is called with SO_ERROR. SYN packets are
tried only for the
default tcp timeout of 20 seconds.
Uhm, 20 seconds would be an unreasonably low default. I
think the RFCs mandate
timeouts closer to the 4 minutes you describe.--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!--
Sent via pgsql-bugs mailing list
(pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
Isn't it not possible to check that the connectivity is broken in
advance and if so, wait on the socket would not be required.
If we have to timeout (even 1-2 seconds), it will be pretty long for the
highly available applications.
Is there any way to check the health of the interface?
regards,
Niranjan
-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, July 16, 2008 8:03 PM
To: Gregory Stark
Cc: K, Niranjan (NSN - IN/Bangalore); pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when interface is
down for the DB server
Gregory Stark <stark@enterprisedb.com> writes:
"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
Is there any other workaround or alternative so that the situation
about the interface is down is known and based on that the 'PQexec'
does not get blocked for ~15 minutes.
Absent threads I think you have to use alarm() and a SIGALRM signal
handler.
On most modern platforms you can adjust the TCP timeouts for the
connection. There's no explicit support for that in libpq, but you can
just get the socket FD from it and do setsockopt().
regards, tom lane