BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

Started by Igorover 13 years ago28 messagesbugs
Jump to latest
#1Igor
igorya.inscriptio@gmail.com

The following bug has been logged on the website:

Bug reference: 7670
Logged by: Igor
Email address: igorya.inscriptio@gmail.com
PostgreSQL version: 9.2.1
Operating system: FreeBSD 9.0
Description:

The same a BUG as 7545.

Pgsql compiled from FreeBSD ports with options:

_OPTIONS_READ=postgresql-server-9.2.1
_FILE_COMPLETE_OPTIONS_LIST= NLS DTRACE PAM LDAP MIT_KRB5 HEIMDAL_KRB5
GSSAPI OPTIMIZED_CFLAGS XML TZDATA DEBUG ICU INTDATE SSL
OPTIONS_FILE_SET+=NLS
OPTIONS_FILE_UNSET+=DTRACE
OPTIONS_FILE_UNSET+=PAM
OPTIONS_FILE_UNSET+=LDAP
OPTIONS_FILE_UNSET+=MIT_KRB5
OPTIONS_FILE_UNSET+=HEIMDAL_KRB5
OPTIONS_FILE_UNSET+=GSSAPI
OPTIONS_FILE_UNSET+=OPTIMIZED_CFLAGS
OPTIONS_FILE_SET+=XML
OPTIONS_FILE_SET+=TZDATA
OPTIONS_FILE_UNSET+=DEBUG
OPTIONS_FILE_UNSET+=ICU
OPTIONS_FILE_SET+=INTDATE
OPTIONS_FILE_SET+=SSL

#2Andres Freund
andres@anarazel.de
In reply to: Igor (#1)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

On 2012-11-17 21:00:36 +0000, igorya.inscriptio@gmail.com wrote:

The following bug has been logged on the website:

Bug reference: 7670
Logged by: Igor
Email address: igorya.inscriptio@gmail.com
PostgreSQL version: 9.2.1
Operating system: FreeBSD 9.0
Description:

The same a BUG as 7545.

Pgsql compiled from FreeBSD ports with options:

_OPTIONS_READ=postgresql-server-9.2.1
_FILE_COMPLETE_OPTIONS_LIST= NLS DTRACE PAM LDAP MIT_KRB5 HEIMDAL_KRB5
GSSAPI OPTIMIZED_CFLAGS XML TZDATA DEBUG ICU INTDATE SSL
OPTIONS_FILE_SET+=NLS
OPTIONS_FILE_UNSET+=DTRACE
OPTIONS_FILE_UNSET+=PAM
OPTIONS_FILE_UNSET+=LDAP
OPTIONS_FILE_UNSET+=MIT_KRB5
OPTIONS_FILE_UNSET+=HEIMDAL_KRB5
OPTIONS_FILE_UNSET+=GSSAPI
OPTIONS_FILE_UNSET+=OPTIMIZED_CFLAGS
OPTIONS_FILE_SET+=XML
OPTIONS_FILE_SET+=TZDATA
OPTIONS_FILE_UNSET+=DEBUG
OPTIONS_FILE_UNSET+=ICU
OPTIONS_FILE_SET+=INTDATE
OPTIONS_FILE_SET+=SSL

Could you give a bit more context about when exactly that error is
happening?
It would also be helpful if you could set log_error_verbosity =
'verbose'; in the config file and report back with the complete error
message, including line numbers.

If the error is easy to reproduce it would also be helpful to see the
failing command with ktruss or similar.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Igor
igorya.inscriptio@gmail.com
In reply to: Andres Freund (#2)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

...

Could you give a bit more context about when exactly that error is

happening?

No :( This is the random event.

It would also be helpful if you could set log_error_verbosity =

'verbose';

Done. And will wait.

--
Regards
System and web developer.
<http://amkmobile.com&gt;

#4Igor
igorya.inscriptio@gmail.com
In reply to: Andres Freund (#2)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

..

Could you give a bit more context about when exactly that error is
happening?
It would also be helpful if you could set log_error_verbosity =
'verbose'; in the config file and report back with the complete error
message, including line numbers.

I try to restart DB and seems get this situation:

===
2012-11-18 18:30:25 EET pid:93744 LOG: received SIGHUP, reloading
configuration files
2012-11-18 18:30:25 EET pid:93744 LOG: 00000: parameter
"log_error_verbosity" changed to "verbose"
2012-11-18 18:30:25 EET pid:93744 LOCATION: ProcessConfigFile,
guc-file.l:318
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: received fast shutdown
request
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2187
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: aborting any active
transactions
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2211
2012-11-18 18:36:45 EET pid:93749 LOG: 00000: autovacuum launcher
shutting down
2012-11-18 18:36:45 EET pid:93749 LOCATION: AutoVacLauncherMain,
autovacuum.c:767
2012-11-18 18:36:45 EET pid:93746 LOG: 00000: shutting down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG, xlog.c:7560
2012-11-18 18:36:45 EET pid:93746 LOG: 00000: database system is shut
down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG, xlog.c:7582
2012-11-18 18:36:46 EET pid:95528 FATAL: XX000: poll() failed: Invalid
argument
2012-11-18 18:36:46 EET pid:95528 LOCATION: WaitLatchOrSocket,
pg_latch.c:296
===

After several minutes the rc.d script said:

pg_ctl: could not start server
Examine the log output.

Then I did:
killall -2 postgres
killall -3 postgres

And this remains procesess:
95527 pgsql 1 52 0 2159M 82400K pipewr 10 0:00 0.00%
postgres
95540 pgsql 1 20 0 2163M 83432K pipewr 0 0:00 0.00%
postgres

I can stop only via killall -9 :(

--
Regards
System and web developer.
<http://amkmobile.com&gt;

#5Igor
igorya.inscriptio@gmail.com
In reply to: Igor (#4)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

And for my previous letter:

I feel :) that the problem is in "logging_collector" setting - I used this
feature. But now I comments this setting in config. And will watching ..

--
Regards
System and web developer.
<http://amkmobile.com&gt;

#6Andres Freund
andres@anarazel.de
In reply to: Igor (#4)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

On 2012-11-18 18:53:13 +0200, Igor wrote:

..

Could you give a bit more context about when exactly that error is
happening?
It would also be helpful if you could set log_error_verbosity =
'verbose'; in the config file and report back with the complete error
message, including line numbers.

I try to restart DB and seems get this situation:

===
2012-11-18 18:30:25 EET pid:93744 LOG: received SIGHUP, reloading
configuration files
2012-11-18 18:30:25 EET pid:93744 LOG: 00000: parameter
"log_error_verbosity" changed to "verbose"
2012-11-18 18:30:25 EET pid:93744 LOCATION: ProcessConfigFile,
guc-file.l:318
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: received fast shutdown
request
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2187
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: aborting any active
transactions
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2211
2012-11-18 18:36:45 EET pid:93749 LOG: 00000: autovacuum launcher
shutting down
2012-11-18 18:36:45 EET pid:93749 LOCATION: AutoVacLauncherMain,
autovacuum.c:767
2012-11-18 18:36:45 EET pid:93746 LOG: 00000: shutting down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG, xlog.c:7560
2012-11-18 18:36:45 EET pid:93746 LOG: 00000: database system is shut
down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG, xlog.c:7582
2012-11-18 18:36:46 EET pid:95528 FATAL: XX000: poll() failed: Invalid
argument
2012-11-18 18:36:46 EET pid:95528 LOCATION: WaitLatchOrSocket,
pg_latch.c:296
===

Ok, this is already helpful. Do you have anything the log that shows
what process pid 95528 is? Could you grep for it?

Are you able to recompile postgres with other compilation options and/or
some debugging patches applied?
The first thing would be to recompile postgres with --enable-cassert...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Igor
igorya.inscriptio@gmail.com
In reply to: Andres Freund (#6)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

2012/11/18 Andres Freund <andres@2ndquadrant.com>

On 2012-11-18 18:53:13 +0200, Igor wrote:

..

Could you give a bit more context about when exactly that error is
happening?
It would also be helpful if you could set log_error_verbosity =
'verbose'; in the config file and report back with the complete error
message, including line numbers.

I try to restart DB and seems get this situation:

===
2012-11-18 18:30:25 EET pid:93744 LOG: received SIGHUP, reloading
configuration files
2012-11-18 18:30:25 EET pid:93744 LOG: 00000: parameter
"log_error_verbosity" changed to "verbose"
2012-11-18 18:30:25 EET pid:93744 LOCATION: ProcessConfigFile,
guc-file.l:318
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: received fast shutdown
request
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2187
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: aborting any active
transactions
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2211
2012-11-18 18:36:45 EET pid:93749 LOG: 00000: autovacuum launcher
shutting down
2012-11-18 18:36:45 EET pid:93749 LOCATION: AutoVacLauncherMain,
autovacuum.c:767
2012-11-18 18:36:45 EET pid:93746 LOG: 00000: shutting down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG,

xlog.c:7560

2012-11-18 18:36:45 EET pid:93746 LOG: 00000: database system is

shut

down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG,

xlog.c:7582

2012-11-18 18:36:46 EET pid:95528 FATAL: XX000: poll() failed:

Invalid

argument
2012-11-18 18:36:46 EET pid:95528 LOCATION: WaitLatchOrSocket,
pg_latch.c:296
===

Ok, this is already helpful. Do you have anything the log that shows
what process pid 95528 is? Could you grep for it?

No, this string - is the last string in the log for this event.
I think, I finished the process 95528 by kill -3 (or -2) signal. I did this
after unsuccessfully report of the pg_ctl (rc.d script).

Are you able to recompile postgres with other compilation options and/or
some debugging patches applied?

Will try. Give me this patches.

The first thing would be to recompile postgres with --enable-cassert...

That all ?
Do you understand my english ? :)

--
Regards
System and web developer.
<http://amkmobile.com&gt;

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Igor (#5)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

Igor <igorya.inscriptio@gmail.com> writes:

I feel :) that the problem is in "logging_collector" setting - I used this
feature. But now I comments this setting in config. And will watching ..

I was just about to ask what you have log_rotation_age set to ...

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Igor (#7)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

On 2012-11-18 19:22:04 +0200, Igor wrote:

2012/11/18 Andres Freund <andres@2ndquadrant.com>

On 2012-11-18 18:53:13 +0200, Igor wrote:

..

Could you give a bit more context about when exactly that error is
happening?
It would also be helpful if you could set log_error_verbosity =
'verbose'; in the config file and report back with the complete error
message, including line numbers.

I try to restart DB and seems get this situation:

===
2012-11-18 18:30:25 EET pid:93744 LOG: received SIGHUP, reloading
configuration files
2012-11-18 18:30:25 EET pid:93744 LOG: 00000: parameter
"log_error_verbosity" changed to "verbose"
2012-11-18 18:30:25 EET pid:93744 LOCATION: ProcessConfigFile,
guc-file.l:318
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: received fast shutdown
request
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2187
2012-11-18 18:36:45 EET pid:93744 LOG: 00000: aborting any active
transactions
2012-11-18 18:36:45 EET pid:93744 LOCATION: pmdie, postmaster.c:2211
2012-11-18 18:36:45 EET pid:93749 LOG: 00000: autovacuum launcher
shutting down
2012-11-18 18:36:45 EET pid:93749 LOCATION: AutoVacLauncherMain,
autovacuum.c:767
2012-11-18 18:36:45 EET pid:93746 LOG: 00000: shutting down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG,

xlog.c:7560

2012-11-18 18:36:45 EET pid:93746 LOG: 00000: database system is

shut

down
2012-11-18 18:36:45 EET pid:93746 LOCATION: ShutdownXLOG,

xlog.c:7582

2012-11-18 18:36:46 EET pid:95528 FATAL: XX000: poll() failed:

Invalid

argument
2012-11-18 18:36:46 EET pid:95528 LOCATION: WaitLatchOrSocket,
pg_latch.c:296
===

Ok, this is already helpful. Do you have anything the log that shows
what process pid 95528 is? Could you grep for it?

No, this string - is the last string in the log for this event.

Actually I would be interested in *earlier* entries in the log ;)

Are you able to recompile postgres with other compilation options and/or
some debugging patches applied?

Will try. Give me this patches.

Ok, let me prepare something.

The first thing would be to recompile postgres with --enable-cassert...

That all ?

Yes, that could already catch the source of the error. Note that you
probably will want to revert to non --enable-cassert builds after we
debugged this, it has a non-negligible overhead.

Do you understand my english ? :)

I try ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Igor
igorya.inscriptio@gmail.com
In reply to: Tom Lane (#8)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

2012/11/18 Tom Lane <tgl@sss.pgh.pa.us>

Igor <igorya.inscriptio@gmail.com> writes:

I feel :) that the problem is in "logging_collector" setting - I used

this

feature. But now I comments this setting in config. And will watching ..

I was just about to ask what you have log_rotation_age set to ...

Was:
log_rotation_age = 30d

--
Regards
System and web developer.
<http://amkmobile.com&gt;

In reply to: Andres Freund (#6)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

On 18 November 2012 17:12, Andres Freund <andres@2ndquadrant.com> wrote:

Ok, this is already helpful. Do you have anything the log that shows
what process pid 95528 is? Could you grep for it?

I haven't looked at this in detail, but are you sure that this isn't
the bug that was fixed by commit
e81e8f9342b037246b284bad15e42e21b1929301 ? The "invalid argument"
poll() error is too generic to be sure what is really at fault, but
this doesn't need to be a FATAL error, it seems.

Is this a 32-bit platform? It's probably just that there is a large
number of sockets, which Google seems to think can cause this error
with poll(), so this seems like a slight variant of the problem
reported by Sean Chittenden (kernel resource exhaustion causes pipe
creation to fail, with unnecessarily bad consequences for Postgres).

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#8)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

On 2012-11-18 12:29:29 -0500, Tom Lane wrote:

Igor <igorya.inscriptio@gmail.com> writes:

I feel :) that the problem is in "logging_collector" setting - I used this
feature. But now I comments this setting in config. And will watching ..

I was just about to ask what you have log_rotation_age set to ...

According to the manpages freebsds poll won't return EINVAL if a
contained fd is invalid... I wonder if the the timeout value is screwed
and overflowed the cast from 'long' to int and got negative.

Igor, I guess youre on a 64bit system?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Igor (#10)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

Igor <igorya.inscriptio@gmail.com> writes:

2012/11/18 Tom Lane <tgl@sss.pgh.pa.us>

I was just about to ask what you have log_rotation_age set to ...

Was:
log_rotation_age = 30d

That's the problem then. Internally that gets converted to
milliseconds, which'll overflow at 24-something days. Use a smaller
value.

So the actual bug here is an over-optimistic maximum value for
log_rotation_age in guc.c. After looking through all the callers of
WaitLatch, I think the only other similar issue is for
wal_sender_timeout, which currently is allowed to range up to INT_MAX;
but such values seem pretty useless.

I'm inclined to propose limiting both of these to the equivalent of 15
days. Alternatively we could try to rejigger things to prevent asking
WaitLatch to wait for more than 2^31 msec, but it's not real clear to
me that it's worth the trouble.

regards, tom lane

#14Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#11)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

On 2012-11-18 17:36:40 +0000, Peter Geoghegan wrote:

On 18 November 2012 17:12, Andres Freund <andres@2ndquadrant.com> wrote:

Ok, this is already helpful. Do you have anything the log that shows
what process pid 95528 is? Could you grep for it?

I haven't looked at this in detail, but are you sure that this isn't
the bug that was fixed by commit
e81e8f9342b037246b284bad15e42e21b1929301 ? The "invalid argument"
poll() error is too generic to be sure what is really at fault, but
this doesn't need to be a FATAL error, it seems.

I looked at that commit at first as well, but I don't really see how it
could have fixed that issue.
Btw, its not a FATAL but a plain ERROR.

Is this a 32-bit platform? It's probably just that there is a large
number of sockets, which Google seems to think can cause this error
with poll(), so this seems like a slight variant of the problem
reported by Sean Chittenden (kernel resource exhaustion causes pipe
creation to fail, with unnecessarily bad consequences for Postgres).

We never have more than 3 fds in the poll. And a failure to create the
pipe should have caused problems at other locations before this...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#11)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

Peter Geoghegan <peter@2ndquadrant.com> writes:

I haven't looked at this in detail, but are you sure that this isn't
the bug that was fixed by commit
e81e8f9342b037246b284bad15e42e21b1929301 ? The "invalid argument"
poll() error is too generic to be sure what is really at fault, but
this doesn't need to be a FATAL error, it seems.

He's running BSD, which is not at all wishy-washy about what it means:

[EINVAL] The specified time limit is negative.

The reason it's FATAL is that it's happening in the syslogger process,
which has no error recovery capability.

regards, tom lane

#16Igor
igorya.inscriptio@gmail.com
In reply to: Andres Freund (#12)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

2012/11/18 Andres Freund <andres@2ndquadrant.com>

On 2012-11-18 12:29:29 -0500, Tom Lane wrote:

Igor <igorya.inscriptio@gmail.com> writes:

I feel :) that the problem is in "logging_collector" setting - I used

this

feature. But now I comments this setting in config. And will watching

..

I was just about to ask what you have log_rotation_age set to ...

According to the manpages freebsds poll won't return EINVAL if a
contained fd is invalid... I wonder if the the timeout value is screwed
and overflowed the cast from 'long' to int and got negative.

Igor, I guess youre on a 64bit system?

Yes. FreeBSD 9.0-RELEASE amd64
And the problems of free resources on the server does not exist. And there
is minimal load.

--
Regards
System and web developer.
<http://amkmobile.com&gt;

#17Igor
igorya.inscriptio@gmail.com
In reply to: Tom Lane (#13)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

2012/11/18 Tom Lane <tgl@sss.pgh.pa.us>

Igor <igorya.inscriptio@gmail.com> writes:

2012/11/18 Tom Lane <tgl@sss.pgh.pa.us>

I was just about to ask what you have log_rotation_age set to ...

Was:
log_rotation_age = 30d

That's the problem then. Internally that gets converted to
milliseconds, which'll overflow at 24-something days. Use a smaller
value.

So the actual bug here is an over-optimistic maximum value for
log_rotation_age in guc.c. After looking through all the callers of
WaitLatch, I think the only other similar issue is for
wal_sender_timeout, which currently is allowed to range up to INT_MAX;
but such values seem pretty useless.

I'm inclined to propose limiting both of these to the equivalent of 15
days.

# Set this options to:
logging_collector = on
log_rotation_age = 15d
log_rotation_size = 100MB # was early

And will watch..

--
Regards
System and web developer.
<http://amkmobile.com&gt;

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

On 2012-11-18 12:44:55 -0500, Tom Lane wrote:

Igor <igorya.inscriptio@gmail.com> writes:

2012/11/18 Tom Lane <tgl@sss.pgh.pa.us>

I was just about to ask what you have log_rotation_age set to ...

Was:
log_rotation_age = 30d

That's the problem then. Internally that gets converted to
milliseconds, which'll overflow at 24-something days. Use a smaller
value.

Nicely guessed.

So the actual bug here is an over-optimistic maximum value for
log_rotation_age in guc.c. After looking through all the callers of
WaitLatch, I think the only other similar issue is for
wal_sender_timeout, which currently is allowed to range up to INT_MAX;
but such values seem pretty useless.

I'm inclined to propose limiting both of these to the equivalent of 15
days. Alternatively we could try to rejigger things to prevent asking
WaitLatch to wait for more than 2^31 msec, but it's not real clear to
me that it's worth the trouble.

In general I have no problem imposing lower limits, but it seems to be a
ugly to get errors for an invalid configuration file after a minor
version upgrade. While the wal_sender_timeout isn't really likely to be
that high I don't think the log_rotation_age one is unlikely to be set
to something in the month range, thats not an unreasonable value.

I suggest adding an Assert before the (int)timeout cast, but I guess you
already plan to do so.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

Andres Freund <andres@2ndquadrant.com> writes:

On 2012-11-18 12:44:55 -0500, Tom Lane wrote:

I'm inclined to propose limiting both of these to the equivalent of 15
days. Alternatively we could try to rejigger things to prevent asking
WaitLatch to wait for more than 2^31 msec, but it's not real clear to
me that it's worth the trouble.

In general I have no problem imposing lower limits, but it seems to be a
ugly to get errors for an invalid configuration file after a minor
version upgrade. While the wal_sender_timeout isn't really likely to be
that high I don't think the log_rotation_age one is unlikely to be set
to something in the month range, thats not an unreasonable value.

Well, we have two reports of people trying such values (assuming that
#7545 actually is the same thing), and it didn't work for either of
them. I don't think it's a problem to restrict the value to something
that will work rather than fail.

If you're worried that there's somebody out there using 20 or 21 days
as log_rotation_age, I guess we could limit to INT_MAX/1000 seconds or
something just less than that.

regards, tom lane

#20Igor
igorya.inscriptio@gmail.com
In reply to: Tom Lane (#19)
Re: BUG #7670: BUG #7545: Unresponsive server with error log reporting: "poll() failed: Invalid argument"

2012/11/18 Tom Lane <tgl@sss.pgh.pa.us>

Andres Freund <andres@2ndquadrant.com> writes:

On 2012-11-18 12:44:55 -0500, Tom Lane wrote:

I'm inclined to propose limiting both of these to the equivalent of 15
days. Alternatively we could try to rejigger things to prevent asking
WaitLatch to wait for more than 2^31 msec, but it's not real clear to
me that it's worth the trouble.

In general I have no problem imposing lower limits, but it seems to be a
ugly to get errors for an invalid configuration file after a minor
version upgrade. While the wal_sender_timeout isn't really likely to be
that high I don't think the log_rotation_age one is unlikely to be set
to something in the month range, thats not an unreasonable value.

Well, we have two reports of people trying such values (assuming that
#7545 actually is the same thing), and it didn't work for either of
them. I don't think it's a problem to restrict the value to something
that will work rather than fail.

If you're worried that there's somebody out there using 20 or 21 days
as log_rotation_age, I guess we could limit to INT_MAX/1000 seconds or
something just less than that.

Some experiments are necessary on my side ?

--
Regards
System and web developer.
<http://amkmobile.com&gt;

In reply to: Tom Lane (#19)
#22Marc Balmer
marc@msys.ch
In reply to: Peter Geoghegan (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#19)
In reply to: Marc Balmer (#22)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#23)
#26Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marc Balmer (#22)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#26)