Psql or test application hangs when interface is down for the DB server

Started by K, Niranjan (NSN - IN/Bangalore)almost 18 years ago10 messagesbugs
Jump to latest

Hi,

Environment used:
Postgres 8.3.1
psqlODBC 08.03.0200

Testcase:
In the postgres database there is table 'COUNTER_TABLE' with column
integer type 'COUNTER'. The test application attached in this mail, will
start a transaction, gets the current value in the COUNTER, increments
the value and updates the incremented value into the COUNTER column.
This is being done in a loop. The program is started in a remote client
and after few transactions, the interface between the client & the
database server is brought down (example I used "ifconfig eth0 down" in
the server). With this the test application hangs and does not return
from the API of postgres (ex. 'PQexec').

<<pg_test_app.cpp>>
In another example, run the psql from the remote client and connect to
the database server. Execute the SQL to update the COUNTER_TABLE. After
successful execution, next bring the network interface down on the
database server (Ex. I use the command "ifconfig eth0 down") and next
execute the SQL command to update the COUNTER_TABLE again from the same
remote client and the same DB session. The SQL command hangs.

regards,
Niranjan

Attachments:

pg_test_app.cppapplication/octet-stream; name=pg_test_app.cppDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: K, Niranjan (NSN - IN/Bangalore) (#1)
Re: Psql or test application hangs when interface is down for the DB server

"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:

In the postgres database there is table 'COUNTER_TABLE' with column
integer type 'COUNTER'. The test application attached in this mail, will
start a transaction, gets the current value in the COUNTER, increments
the value and updates the incremented value into the COUNTER column.
This is being done in a loop. The program is started in a remote client
and after few transactions, the interface between the client & the
database server is brought down (example I used "ifconfig eth0 down" in
the server). With this the test application hangs and does not return
from the API of postgres (ex. 'PQexec').

If you waited long enough for the TCP connection to time out, it would
return (with an error, of course). This behavior is not a bug, it is
the expected behavior of any program using a network connection.

regards, tom lane

In reply to: Tom Lane (#2)
Re: Psql or test application hangs when interface is down for the DB server

Currently the test application or the psql will unblock after ~15
minutes. This is a very huge time to realize for programs this situation
which do database updates.
As far as I have debugged, I see that the execution is waiting on
'poll()' system call in the function pqSocketPoll() which is called as a
result of 'PQexec()' and the timeout paramater provided will be -1,
which means infinite wait time. It not clear how this is getting
unblocked after 15 minutes. Who will write to the socket or who will
interrupt the poll() system call?

Is there any other workaround or alternative so that the situation about
the interface is down is known and based on that the 'PQexec' does not
get blocked for ~15 minutes.

regards,
Niranjan

-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 15, 2008 8:16 PM
To: K, Niranjan (NSN - IN/Bangalore)
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when interface is
down for the DB server

"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:

In the postgres database there is table 'COUNTER_TABLE' with column
integer type 'COUNTER'. The test application attached in this mail,
will start a transaction, gets the current value in the COUNTER,
increments the value and updates the incremented value into the

COUNTER column.

This is being done in a loop. The program is started in a remote
client and after few transactions, the interface between the client &
the database server is brought down (example I used "ifconfig eth0
down" in the server). With this the test application hangs and does
not return from the API of postgres (ex. 'PQexec').

If you waited long enough for the TCP connection to time out, it would
return (with an error, of course). This behavior is not a bug, it is
the expected behavior of any program using a network connection.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: K, Niranjan (NSN - IN/Bangalore) (#3)
Re: Psql or test application hangs when interface is down for the DB server

"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:

Is there any other workaround or alternative so that the situation about
the interface is down is known and based on that the 'PQexec' does not
get blocked for ~15 minutes.

Absent threads I think you have to use alarm() and a SIGALRM signal handler.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: Psql or test application hangs when interface is down for the DB server

Gregory Stark <stark@enterprisedb.com> writes:

"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:

Is there any other workaround or alternative so that the situation about
the interface is down is known and based on that the 'PQexec' does not
get blocked for ~15 minutes.

Absent threads I think you have to use alarm() and a SIGALRM signal handler.

On most modern platforms you can adjust the TCP timeouts for the
connection. There's no explicit support for that in libpq, but
you can just get the socket FD from it and do setsockopt().

regards, tom lane

#6Valentin Bogdanov
valiouk@yahoo.co.uk
In reply to: K, Niranjan (NSN - IN/Bangalore) (#3)
Re: Psql or test application hangs when interface is down for the DB server

I have noticed this as well. Blocks in poll(), timeout parameter -1, meaning infinite then after 4 minutes on my system poll() returns 1 and
getsockopt() is called with SO_ERROR. SYN packets are tried only for the default tcp timeout of 20 seconds.

Consider using threads that way you can set your own timeout value.

Regards

Val

--- On Wed, 16/7/08, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> wrote:

From: K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com>
Subject: Re: [BUGS] Psql or test application hangs when interface is down for the DB server
To: "ext Tom Lane" <tgl@sss.pgh.pa.us>
Cc: pgsql-bugs@postgresql.org
Date: Wednesday, 16 July, 2008, 6:55 AM
Currently the test application or the psql will unblock
after ~15
minutes. This is a very huge time to realize for programs
this situation
which do database updates.
As far as I have debugged, I see that the execution is
waiting on
'poll()' system call in the function pqSocketPoll()
which is called as a
result of 'PQexec()' and the timeout paramater
provided will be -1,
which means infinite wait time. It not clear how this is
getting
unblocked after 15 minutes. Who will write to the socket or
who will
interrupt the poll() system call?

Is there any other workaround or alternative so that the
situation about
the interface is down is known and based on that the
'PQexec' does not
get blocked for ~15 minutes.

regards,
Niranjan

-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 15, 2008 8:16 PM
To: K, Niranjan (NSN - IN/Bangalore)
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when
interface is
down for the DB server

"K, Niranjan (NSN - IN/Bangalore)"
<niranjan.k@nsn.com> writes:

In the postgres database there is table

'COUNTER_TABLE' with column

integer type 'COUNTER'. The test application

attached in this mail,

will start a transaction, gets the current value in

the COUNTER,

increments the value and updates the incremented value

into the
COUNTER column.

This is being done in a loop. The program is started

in a remote

client and after few transactions, the interface

between the client &

the database server is brought down (example I used

"ifconfig eth0

down" in the server). With this the test

application hangs and does

not return from the API of postgres (ex.

'PQexec').

If you waited long enough for the TCP connection to time
out, it would
return (with an error, of course). This behavior is not a
bug, it is
the expected behavior of any program using a network
connection.

regards, tom lane

--
Sent via pgsql-bugs mailing list
(pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

#7Bruce Momjian
bruce@momjian.us
In reply to: Valentin Bogdanov (#6)
Re: Psql or test application hangs when interface is down for the DB server

"Valentin Bogdanov" <valiouk@yahoo.co.uk> writes:

I have noticed this as well. Blocks in poll(), timeout parameter -1,

Oh good point. non-blocking sockets and poll/select let you control the
timeout too.

meaning infinite then after 4 minutes on my system poll() returns 1 and
getsockopt() is called with SO_ERROR. SYN packets are tried only for the
default tcp timeout of 20 seconds.

Uhm, 20 seconds would be an unreasonably low default. I think the RFCs mandate
timeouts closer to the 4 minutes you describe.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

#8Valentin Bogdanov
valiouk@yahoo.co.uk
In reply to: Bruce Momjian (#7)
Re: Psql or test application hangs when interface is down for the DB server

Thanks Gregory,

You right, of course, about that. It is 4 minutes I wasn't paying attention and thought that I have found something odd. The last packet is sent a minute and a half after the first and I miss-read that for 20 seconds.

Cheers,

Val

--- On Wed, 16/7/08, Gregory Stark <stark@enterprisedb.com> wrote:

From: Gregory Stark <stark@enterprisedb.com>
Subject: Re: [BUGS] Psql or test application hangs when interface is down for the DB server
To: valiouk@yahoo.co.uk
Cc: "ext Tom Lane" <tgl@sss.pgh.pa.us>, "K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com>, pgsql-bugs@postgresql.org
Date: Wednesday, 16 July, 2008, 6:33 PM
"Valentin Bogdanov" <valiouk@yahoo.co.uk>
writes:

I have noticed this as well. Blocks in poll(), timeout

parameter -1,

Oh good point. non-blocking sockets and poll/select let you
control the
timeout too.

meaning infinite then after 4 minutes on my system

poll() returns 1 and

getsockopt() is called with SO_ERROR. SYN packets are

tried only for the

default tcp timeout of 20 seconds.

Uhm, 20 seconds would be an unreasonably low default. I
think the RFCs mandate
timeouts closer to the 4 minutes you describe.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

--
Sent via pgsql-bugs mailing list
(pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

#9Valentin Bogdanov
valiouk@yahoo.co.uk
In reply to: Bruce Momjian (#7)
Re: Psql or test application hangs when interface is down for the DB server

Thanks Gregory,

You right, of course, about that. It is 4 minutes I wasn't paying attention and thought that I have found something odd. The last packet is sent a minute and a half after the first and I miss-read that for 20 seconds.

Cheers,

Val

--- On Wed, 16/7/08, Gregory Stark <stark@enterprisedb.com> wrote:

From: Gregory Stark <stark@enterprisedb.com>
Subject: Re: [BUGS] Psql or test application hangs when interface is down for the DB server
To: valiouk@yahoo.co.uk
Cc: "ext Tom Lane" <tgl@sss.pgh.pa.us>, "K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com>, pgsql-bugs@postgresql.org
Date: Wednesday, 16 July, 2008, 6:33 PM
"Valentin Bogdanov" <valiouk@yahoo.co.uk>
writes:

I have noticed this as well. Blocks in poll(), timeout

parameter -1,

Oh good point. non-blocking sockets and poll/select let you
control the
timeout too.

meaning infinite then after 4 minutes on my system

poll() returns 1 and

getsockopt() is called with SO_ERROR. SYN packets are

tried only for the

default tcp timeout of 20 seconds.

Uhm, 20 seconds would be an unreasonably low default. I
think the RFCs mandate
timeouts closer to the 4 minutes you describe.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

--
Sent via pgsql-bugs mailing list
(pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

In reply to: Tom Lane (#5)
Re: Psql or test application hangs when interface is down for the DB server

Isn't it not possible to check that the connectivity is broken in
advance and if so, wait on the socket would not be required.

If we have to timeout (even 1-2 seconds), it will be pretty long for the
highly available applications.

Is there any way to check the health of the interface?

regards,
Niranjan

-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, July 16, 2008 8:03 PM
To: Gregory Stark
Cc: K, Niranjan (NSN - IN/Bangalore); pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when interface is
down for the DB server

Gregory Stark <stark@enterprisedb.com> writes:

"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:

Is there any other workaround or alternative so that the situation
about the interface is down is known and based on that the 'PQexec'
does not get blocked for ~15 minutes.

Absent threads I think you have to use alarm() and a SIGALRM signal

handler.

On most modern platforms you can adjust the TCP timeouts for the
connection. There's no explicit support for that in libpq, but you can
just get the socket FD from it and do setsockopt().

regards, tom lane