recovery_min_delay casting problems lead to busy looping
Hi,
recoveryApplyDelay() does:
TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
&secs, µsecs);
if (secs <= 0 && microsecs <= 0)
break;
elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds",
secs, microsecs / 1000);
WaitLatch(&XLogCtl->recoveryWakeupLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
secs * 1000L + microsecs / 1000);
The problem is that the 'microsecs <= 0' comparison is done while in
microsecs, but the sleeping converts to milliseconds. Which will often
be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not
terrible.
I think we should simply make the abort condition '&& microsecs / 1000
<= 0'.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 23, 2015 at 10:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
recoveryApplyDelay() does:
TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
&secs, µsecs);if (secs <= 0 && microsecs <= 0)
break;elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds",
secs, microsecs / 1000);WaitLatch(&XLogCtl->recoveryWakeupLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
secs * 1000L + microsecs / 1000);The problem is that the 'microsecs <= 0' comparison is done while in
microsecs, but the sleeping converts to milliseconds. Which will often
be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not
terrible.I think we should simply make the abort condition '&& microsecs / 1000
<= 0'.
That's a subtle violation of the documented behavior, although there's
a good chance nobody would ever care. What about just changing the
WaitLatch call to say Max(secs * 1000L + microsecs / 1000, 1)?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-03-23 10:25:48 -0400, Robert Haas wrote:
On Mon, Mar 23, 2015 at 10:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
recoveryApplyDelay() does:
TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
&secs, µsecs);if (secs <= 0 && microsecs <= 0)
break;elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds",
secs, microsecs / 1000);WaitLatch(&XLogCtl->recoveryWakeupLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
secs * 1000L + microsecs / 1000);The problem is that the 'microsecs <= 0' comparison is done while in
microsecs, but the sleeping converts to milliseconds. Which will often
be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not
terrible.I think we should simply make the abort condition '&& microsecs / 1000
<= 0'.That's a subtle violation of the documented behavior
Would it be? The delay is specified on a millisecond resolution, so not
waiting if below one ms doesn't seem unreasonable to me.
, although there's
a good chance nobody would ever care. What about just changing the
WaitLatch call to say Max(secs * 1000L + microsecs / 1000, 1)?
I could live with that as well. Although we at least should convert the
elog(DEBUG) to log milliseconds in floating point in that case.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers