32 bit libpq fail to connecting when set a very large "connect_timeout" value
Hi,all
This a bug report.
1)The problem
When set a very large value(such as 2147483647 which is the maximum of int ) to connect_timeout,and then connect to the backend using 32 bit psql, a timeout will occur.
For example:
[chenhj@node2 ~]$ export PGCONNECT_TIMEOUT=2147483647
[chenhj@node2 ~]$ psql
psql: timeout expired
While it's OK when using 64 bit psql.
[chenhj@node2 ~]$ export PGCONNECT_TIMEOUT=2147483647
[chenhj@node2 ~]$ psql
psql (9.2.8)
Type "help" for help.
postgres=# \q
2)The reason
I looked into the source code, and found the reason is a data overflow when calculating connecting's finish_time as the following.
src\interfaces\libpq\fe-connect.c:
static int
connectDBComplete(PGconn *conn)
{
PostgresPollingStatusType flag = PGRES_POLLING_WRITING;
time_tfinish_time = ((time_t) -1);
if (conn == NULL || conn->status == CONNECTION_BAD)
return 0;
/*
* Set up a time limit, if connect_timeout isn't zero.
*/
if (conn->connect_timeout != NULL)
{
inttimeout = atoi(conn->connect_timeout);
if (timeout > 0)
{
/*
* Rounding could cause connection to fail; need at least 2 secs
*/
if (timeout < 2)
timeout = 2;
/* calculate the finish time based on start + timeout */
finish_time = time(NULL) + timeout;// In 32 bit application time_t is a 32 bit, when timeout is very large, data overflow will occur and finish_time became a negative value.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
}
}
...
}
Of course this is a infrequence bug, but i think it should be fixed, such as the following(may be there's better way to fix it).
finish_time = time(NULL) + timeout;
==>
finish_time = time(NULL) + timeout;
if(finish_time < ((time_t) 0))
{
finish_time = ((time_t) -1);
}
Best Regards
Chen Huajun
chenhj <chjischj@163.com> writes:
[chenhj@node2 ~]$ export PGCONNECT_TIMEOUT=2147483647
That's a pretty darn silly setting, wouldn't you agree?
finish_time = time(NULL) + timeout;
if(finish_time < ((time_t) 0))
{
finish_time = ((time_t) -1);
}
I don't care for this "fix" because it assumes that all current and future
time_t's are positive. On 32-bit machines that would break in 2038 ...
unless time_t is redefined as unsigned 32-bit, in which case the test
becomes useless.
It might be possible to develop a test for overflow that would work
regardless of the width or signedness of time_t, but ISTM the work would
be vastly out of proportion to the benefit. And you'd still not have
dealt with PGCONNECT_TIMEOUT > 2^31, which arguably is another "bug"
case here.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom
Thanks for your response!
That's a pretty darn silly setting, wouldn't you agree?
In practice, yes.
And the manual of PG also does not say how large value can be set to "connect_timeout" , It cannot be even call it a "bug"
But sometimes it maybe useful to know the the value range of "connect_timeout".
If without this problem we can say 2~2^31,but now ...
I don't care for this "fix" because it assumes that all current and future
time_t's are positive. On 32-bit machines that would break in 2038 ...
Currently PG has assumed time_t's are positive, didn't it?
static int
pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
{
...
if (end_time == ((time_t) -1))
timeout_ms = -1;
else
{
time_tnow = time(NULL);
if (end_time > now)
...
}
And you'd still not have dealt with PGCONNECT_TIMEOUT > 2^31, which arguably is another "bug"
case here.
In my test when PGCONNECT_TIMEOUT > 2^31, atoi() will always return 2^31.
inttimeout = atoi(conn->connect_timeout);
Best Regards
Chen Huajun
On 10/21/2014 9:09 AM, chenhj wrote:
Currently PG has assumed time_t's are positive, didn't it?
IIRC, time_t is an unsigned int.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
John R Pierce <pierce@hogranch.com> writes:
On 10/21/2014 9:09 AM, chenhj wrote:
Currently PG has assumed time_t's are positive, didn't it?
IIRC, time_t is an unsigned int.
If it were, Unix programs couldn't deal with dates before 1970.
Originally time_t was usually a signed 32-bit int, allowing a range of
dates from about 1901 to 2038. There were proposals at one time to "fix"
the Y2038 problem by redefining time_t as unsigned 32-bit, which would
kick the can down the road another 70 years at the price of breaking
pre-1970 dates. But I think this idea has mostly fallen by the wayside
in favor of migrating to 64-bit-signed time_t.
On my Linux box, it looks like time_t is defined as "long int", which
means problem solved on 64-bit machines. If anyone is still using
32-bit hardware in 2038, they're probably going to have to endure an
ABI break to widen time_t to int64.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 10/21/2014 10:49 AM, Tom Lane wrote:
On my Linux box, it looks like time_t is defined as "long int", which
means problem solved on 64-bit machines. If anyone is still using
32-bit hardware in 2038, they're probably going to have to endure an
ABI break to widen time_t to int64.
FWIW, on a 32bit CentOS 5 (kernel 2.6.18.el5) box I still have around,
it looks like time_t is __TIME_T_TYPE which is __SLONGWORD_TYPE which is
long int
(per grepping .h stuff in /usr/include...)
$ uname -a
Linux *** 2.6.18-398.el5 #1 SMP Tue Sep 16 20:51:48 EDT 2014 i686 i686
i386 GNU/Linux
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs