Allow interrupts on waiting standby
Currently a waiting standby doesn't allow interrupts.
Patch implements that.
Barring objection, patching today with backpatches.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
interrupt_waiting_standby.v1.patchapplication/octet-stream; name=interrupt_waiting_standby.v1.patchDownload
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 547f1a8..174e42a 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -160,6 +160,8 @@ WaitExceedsMaxStandbyDelay(void)
{
TimestampTz ltime;
+ CHECK_FOR_INTERRUPTS();
+
/* Are we past the limit time? */
ltime = GetStandbyLimitTime();
if (ltime && GetCurrentTimestamp() >= ltime)
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Currently a waiting standby doesn't allow interrupts.
Patch implements that.
Barring objection, patching today with backpatches.
"today" is a little quick, but the patch looks fine. I doubt anyone's
going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Currently a waiting standby doesn't allow interrupts.
Patch implements that.
Barring objection, patching today with backpatches.
"today" is a little quick, but the patch looks fine. I doubt anyone's
going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.
I don't quite get asking for agreement, and then not waiting as
suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Currently a waiting standby doesn't allow interrupts.
Patch implements that.
Barring objection, patching today with backpatches.
"today" is a little quick, but the patch looks fine. I doubt anyone's
going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.I don't quite get asking for agreement, and then not waiting as
suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.
I have waited, so not sure what you mean. Tomorrow is too late.
Replacing with a latch wouldn't be backpatchable, IMHO.
I've no problem if you want to work on a deeper fix for future versions.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-01-26 19:36:11 +0000, Simon Riggs wrote:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Currently a waiting standby doesn't allow interrupts.
Patch implements that.
Barring objection, patching today with backpatches.
"today" is a little quick, but the patch looks fine. I doubt anyone's
going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.I don't quite get asking for agreement, and then not waiting as
suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.I have waited, so not sure what you mean.
Well, Robert today said >> "today" is a little quick <<.
Tomorrow is too late.
Huh? We're not wrapping today/tomorrow, are we? If I missed something
and we are, then sure, it makes sense to push ahead.
Replacing with a latch wouldn't be backpatchable, IMHO.
Hm, don't quite see why - isn't it just like three lines?
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Andres Freund (andres@anarazel.de) wrote:
On 2017-01-26 19:36:11 +0000, Simon Riggs wrote:
Tomorrow is too late.
Huh? We're not wrapping today/tomorrow, are we? If I missed something
and we are, then sure, it makes sense to push ahead.
I haven't seen anyone suggest that we're changing from the regular
release cycle, which would mean that we wrap on Monday, Feb 6th, for
release on Feb 9th.
Thanks!
Stephen
Simon Riggs wrote:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.I have waited, so not sure what you mean. Tomorrow is too late.
Replacing with a latch wouldn't be backpatchable, IMHO.
Hmm, you pushed this to the master branch as commit e8ee3d6b859a, but I
see no backpatch.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Currently a waiting standby doesn't allow interrupts.
Patch implements that.
Barring objection, patching today with backpatches.
"today" is a little quick, but the patch looks fine. I doubt anyone's
going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.I don't quite get asking for agreement, and then not waiting as
suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.I have waited, so not sure what you mean. Tomorrow is too late.
This gives really little time for any feedback :(
Replacing with a latch wouldn't be backpatchable, IMHO.
I've no problem if you want to work on a deeper fix for future versions.
A deeper fix for HEAD proves to not be that complicated. Please see
the attached. The other two calls of pg_usleep() in standby.c are
waiting with 5ms and 10ms, it is not worth switching them to a latch.
--
Michael
Attachments:
standby-delay-latch.patchapplication/octet-stream; name=standby-delay-latch.patchDownload
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6532240dd1..19267b83b9 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -148,8 +148,8 @@ GetStandbyLimitTime(void)
}
}
-#define STANDBY_INITIAL_WAIT_US 1000
-static int standbyWait_us = STANDBY_INITIAL_WAIT_US;
+#define STANDBY_INITIAL_WAIT_MS 1
+static int standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/*
* Standby wait logic for ResolveRecoveryConflictWithVirtualXIDs.
@@ -171,15 +171,19 @@ WaitExceedsMaxStandbyDelay(void)
/*
* Sleep a bit (this is essential to avoid busy-waiting).
*/
- pg_usleep(standbyWait_us);
+ (void) WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ standbyWait_ms,
+ WAIT_EVENT_STANDBY_DELAY);
+ ResetLatch(MyLatch);
/*
- * Progressively increase the sleep times, but not to more than 1s, since
- * pg_usleep isn't interruptable on some platforms.
+ * Progressively increase the sleep times, but not to more than 1s
+ * to keep process in a respectable range.
*/
- standbyWait_us *= 2;
- if (standbyWait_us > 1000000)
- standbyWait_us = 1000000;
+ standbyWait_ms *= 2;
+ if (standbyWait_ms > 1000)
+ standbyWait_ms = 1000;
return false;
}
@@ -206,8 +210,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
while (VirtualTransactionIdIsValid(*waitlist))
{
- /* reset standbyWait_us for each xact we wait for */
- standbyWait_us = STANDBY_INITIAL_WAIT_US;
+ /* reset standbyWait_ms for each xact we wait for */
+ standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/* wait until the virtual xid is gone */
while (!VirtualXactLock(*waitlist, false))
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index de8225b989..83194db520 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -800,7 +800,8 @@ typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
+ WAIT_EVENT_RECOVERY_APPLY_DELAY,
+ WAIT_EVENT_STANDBY_DELAY
} WaitEventTimeout;
/* ----------
On 26 January 2017 at 20:36, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Simon Riggs wrote:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.I have waited, so not sure what you mean. Tomorrow is too late.
Replacing with a latch wouldn't be backpatchable, IMHO.
Hmm, you pushed this to the master branch as commit e8ee3d6b859a, but I
see no backpatch.
There appeared to be a complaint at my speed, so I slowed down.
On reading that there are no objections to this being backpatched, I
will do that now.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27 January 2017 at 01:35, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.
Replacing with a latch wouldn't be backpatchable, IMHO.
I've no problem if you want to work on a deeper fix for future versions.A deeper fix for HEAD proves to not be that complicated. Please see
the attached. The other two calls of pg_usleep() in standby.c are
waiting with 5ms and 10ms, it is not worth switching them to a latch.
So you think 2 calls to pg_usleep() can stay; my opinion is 3 can stay.
I'm not clear why this particular call is worthy, while dozens of
calls in other modules remain unchanged. This seems like a code issue
rather than anything to do with Hot Standby in particular, so it
should be another thread. Doesn't seem important compared to other
things for this release I should work on.
Please add to the next CF so it gets proper review.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 27, 2017 at 9:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 27 January 2017 at 01:35, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
A deeper fix for HEAD proves to not be that complicated. Please see
the attached. The other two calls of pg_usleep() in standby.c are
waiting with 5ms and 10ms, it is not worth switching them to a latch.So you think 2 calls to pg_usleep() can stay; my opinion is 3 can stay.
This patch replaces one call of pg_usleep() where the wait can go up
to 1s, this is largely noticeable by the user. The two others sleep
for a maximum of 10ms. Even if a latch is used the code path is going
to exit immediately anyway. That's why you added a call to
CHECK_FOR_INTERRUPTS only here, no?
I'm not clear why this particular call is worthy, while dozens of
calls in other modules remain unchanged. This seems like a code issue
rather than anything to do with Hot Standby in particular, so it
should be another thread.
Some should switch to Latch.
Doesn't seem important compared to other
things for this release I should work on.
That's your call.
Please add to the next CF so it gets proper review.
Sure.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 27, 2017 at 10:35 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Currently a waiting standby doesn't allow interrupts.
Patch implements that.
Barring objection, patching today with backpatches.
"today" is a little quick, but the patch looks fine. I doubt anyone's
going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.I don't quite get asking for agreement, and then not waiting as
suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.I have waited, so not sure what you mean. Tomorrow is too late.
This gives really little time for any feedback :(
Replacing with a latch wouldn't be backpatchable, IMHO.
I've no problem if you want to work on a deeper fix for future versions.A deeper fix for HEAD proves to not be that complicated. Please see
the attached. The other two calls of pg_usleep() in standby.c are
waiting with 5ms and 10ms, it is not worth switching them to a latch.
Two things I forgot in this patch:
- documentation for the new wait event
- the string for the wait event or this would show up as "???" in
pg_stat_activity.
There are no default clauses in the pgstat_get_wait_* routines so my
compiler is actually complaining...
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 27, 2017 at 8:17 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
There are no default clauses in the pgstat_get_wait_* routines so my
compiler is actually complaining...
That's exactly WHY there are no default clauses there. :-)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 28, 2017 at 2:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 27, 2017 at 8:17 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:There are no default clauses in the pgstat_get_wait_* routines so my
compiler is actually complaining...That's exactly WHY there are no default clauses there. :-)
And that's exactly why I missed them! ;p
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 27, 2017 at 10:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Two things I forgot in this patch:
- documentation for the new wait event
- the string for the wait event or this would show up as "???" in
pg_stat_activity.
There are no default clauses in the pgstat_get_wait_* routines so my
compiler is actually complaining...
Both things are fixed in the new version attached. I have added as
well this patch to the next commit fest:
https://commitfest.postgresql.org/13/977/
--
Michael
Attachments:
standby-delay-latch-v2.patchapplication/octet-stream; name=standby-delay-latch-v2.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 01fad3870f..56128016af 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1260,6 +1260,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>RecoveryApplyDelay</></entry>
<entry>Waiting to apply WAL at recovery because it is delayed.</entry>
</row>
+ <row>
+ <entry><literal>RecoveryConflict</></entry>
+ <entry>Waiting for conflict resolution of query running on standby.</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7176cf1bbe..1b57c10bbd 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3426,6 +3426,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
case WAIT_EVENT_RECOVERY_APPLY_DELAY:
event_name = "RecoveryApplyDelay";
break;
+ case WAIT_EVENT_RECOVERY_CONFLICT:
+ event_name = "RecoveryConflict";
+ break;
/* no default case, so that compiler will warn */
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6532240dd1..eed1581c0b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -148,8 +148,8 @@ GetStandbyLimitTime(void)
}
}
-#define STANDBY_INITIAL_WAIT_US 1000
-static int standbyWait_us = STANDBY_INITIAL_WAIT_US;
+#define STANDBY_INITIAL_WAIT_MS 1
+static int standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/*
* Standby wait logic for ResolveRecoveryConflictWithVirtualXIDs.
@@ -171,15 +171,19 @@ WaitExceedsMaxStandbyDelay(void)
/*
* Sleep a bit (this is essential to avoid busy-waiting).
*/
- pg_usleep(standbyWait_us);
+ (void) WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ standbyWait_ms,
+ WAIT_EVENT_RECOVERY_CONFLICT);
+ ResetLatch(MyLatch);
/*
- * Progressively increase the sleep times, but not to more than 1s, since
- * pg_usleep isn't interruptable on some platforms.
+ * Progressively increase the sleep times, but not to more than 1s
+ * to keep process in a respectable range.
*/
- standbyWait_us *= 2;
- if (standbyWait_us > 1000000)
- standbyWait_us = 1000000;
+ standbyWait_ms *= 2;
+ if (standbyWait_ms > 1000)
+ standbyWait_ms = 1000;
return false;
}
@@ -206,8 +210,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
while (VirtualTransactionIdIsValid(*waitlist))
{
- /* reset standbyWait_us for each xact we wait for */
- standbyWait_us = STANDBY_INITIAL_WAIT_US;
+ /* reset standbyWait_ms for each xact we wait for */
+ standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/* wait until the virtual xid is gone */
while (!VirtualXactLock(*waitlist, false))
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index de8225b989..e5d45b585c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -800,7 +800,8 @@ typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
+ WAIT_EVENT_RECOVERY_APPLY_DELAY,
+ WAIT_EVENT_RECOVERY_CONFLICT
} WaitEventTimeout;
/* ----------
On 1/30/17 20:34, Michael Paquier wrote:
On Fri, Jan 27, 2017 at 10:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Two things I forgot in this patch:
- documentation for the new wait event
- the string for the wait event or this would show up as "???" in
pg_stat_activity.
There are no default clauses in the pgstat_get_wait_* routines so my
compiler is actually complaining...Both things are fixed in the new version attached. I have added as
well this patch to the next commit fest:
https://commitfest.postgresql.org/13/977/
I'm not clear now what this patch is intending to accomplish, seeing
that the original issue has already been fixed.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 8, 2017 at 5:45 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 1/30/17 20:34, Michael Paquier wrote:
Both things are fixed in the new version attached. I have added as
well this patch to the next commit fest:
https://commitfest.postgresql.org/13/977/I'm not clear now what this patch is intending to accomplish, seeing
that the original issue has already been fixed.
This makes ResolveRecoveryConflictWithVirtualXIDs() more responsive if
the process is signaled, as the wait in pg_usleep can go up to 1s if
things remain stuck for a long time.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
Both things are fixed in the new version attached. I have added as
well this patch to the next commit fest:
https://commitfest.postgresql.org/13/977/
Couple of thoughts on this patch ---
1. Shouldn't WaitExceedsMaxStandbyDelay's CHECK_FOR_INTERRUPTS be moved to
after the WaitLatch call? Not much point in being woken immediately by
an interrupt if you're not going to respond.
2. Is it OK to ResetLatch here? If the only possible latch event in this
process is interrupt requests, then I think WaitLatch, then ResetLatch,
then CHECK_FOR_INTERRUPTS is OK; but otherwise it seems like you risk
discarding events that need to be serviced later.
3. In the same vein, if we're going to check WL_POSTMASTER_DEATH, should
there be a test for that and immediate exit(1) here?
4. I'd be inclined to increase the sleep interval only if we did time out,
not if we were awakened by some other event.
5. The comment about maximum sleep length needs some work. At first
glance you might think that without the motivation of preventing long
uninterruptible sleeps, we might as well allow the sleep length to grow
well past 1s. I think that'd be bad, because we want to wake up
reasonably soon after the xact(s) we're waiting for commit. But neither
the original text nor the proposed replacement mention this.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 19, 2017 at 5:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Couple of thoughts on this patch ---
Thanks!
1. Shouldn't WaitExceedsMaxStandbyDelay's CHECK_FOR_INTERRUPTS be moved to
after the WaitLatch call? Not much point in being woken immediately by
an interrupt if you're not going to respond.2. Is it OK to ResetLatch here? If the only possible latch event in this
process is interrupt requests, then I think WaitLatch, then ResetLatch,
then CHECK_FOR_INTERRUPTS is OK; but otherwise it seems like you risk
discarding events that need to be serviced later.
Right, I have switched to WaitLatch(), ResetLatch() and then
CHECK_FOR_INTERRUPTS().
3. In the same vein, if we're going to check WL_POSTMASTER_DEATH, should
there be a test for that and immediate exit(1) here?
OK, if the postmaster has died, there is not much recovery conflict
needed anyway.
4. I'd be inclined to increase the sleep interval only if we did time out,
not if we were awakened by some other event.
OK, that makes sense.
5. The comment about maximum sleep length needs some work. At first
glance you might think that without the motivation of preventing long
uninterruptible sleeps, we might as well allow the sleep length to grow
well past 1s. I think that'd be bad, because we want to wake up
reasonably soon after the xact(s) we're waiting for commit. But neither
the original text nor the proposed replacement mention this.
OK, I did some work on this comment.
What do you think about the updated version attached?
--
Michael
Attachments:
standby-delay-latch-v3.patchapplication/octet-stream; name=standby-delay-latch-v3.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dcb2d3303c..11d63bd4e8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1543,6 +1543,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>WALWrite</></entry>
<entry>Waiting for a write to a WAL file.</entry>
</row>
+ <row>
+ <entry><literal>RecoveryConflict</></entry>
+ <entry>Waiting for conflict resolution of query running on standby.</entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3a50488db3..9a7dca8577 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3443,6 +3443,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
case WAIT_EVENT_RECOVERY_APPLY_DELAY:
event_name = "RecoveryApplyDelay";
break;
+ case WAIT_EVENT_RECOVERY_CONFLICT:
+ event_name = "RecoveryConflict";
+ break;
/* no default case, so that compiler will warn */
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070722..17792e4678 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -148,8 +148,8 @@ GetStandbyLimitTime(void)
}
}
-#define STANDBY_INITIAL_WAIT_US 1000
-static int standbyWait_us = STANDBY_INITIAL_WAIT_US;
+#define STANDBY_INITIAL_WAIT_MS 1
+static int standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/*
* Standby wait logic for ResolveRecoveryConflictWithVirtualXIDs.
@@ -160,8 +160,7 @@ static bool
WaitExceedsMaxStandbyDelay(void)
{
TimestampTz ltime;
-
- CHECK_FOR_INTERRUPTS();
+ int rc;
/* Are we past the limit time? */
ltime = GetStandbyLimitTime();
@@ -171,15 +170,32 @@ WaitExceedsMaxStandbyDelay(void)
/*
* Sleep a bit (this is essential to avoid busy-waiting).
*/
- pg_usleep(standbyWait_us);
+ rc = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ standbyWait_ms,
+ WAIT_EVENT_RECOVERY_CONFLICT);
+
+ if (rc & WL_LATCH_SET)
+ ResetLatch(MyLatch);
+
+ /* emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
/*
- * Progressively increase the sleep times, but not to more than 1s, since
- * pg_usleep isn't interruptable on some platforms.
+ * Progressively increase the sleep times, but not to more than 1s.
+ * This process is expected to wake up reasonably fast after the
+ * transactions this is waiting for are committed.
*/
- standbyWait_us *= 2;
- if (standbyWait_us > 1000000)
- standbyWait_us = 1000000;
+ if (rc & WL_TIMEOUT)
+ {
+ standbyWait_ms *= 2;
+ if (standbyWait_ms > 1000)
+ standbyWait_ms = 1000;
+ }
+
+ /* An interrupt may have occurred while waiting */
+ CHECK_FOR_INTERRUPTS();
return false;
}
@@ -206,8 +222,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
while (VirtualTransactionIdIsValid(*waitlist))
{
- /* reset standbyWait_us for each xact we wait for */
- standbyWait_us = STANDBY_INITIAL_WAIT_US;
+ /* reset standbyWait_ms for each xact we wait for */
+ standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/* wait until the virtual xid is gone */
while (!VirtualXactLock(*waitlist, false))
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f2daf32e1a..24148c9ac2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -803,7 +803,8 @@ typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
+ WAIT_EVENT_RECOVERY_APPLY_DELAY,
+ WAIT_EVENT_RECOVERY_CONFLICT
} WaitEventTimeout;
/* ----------
Hi, Michael,
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
What do you think about the updated version attached?
I reviewed this patch. Here are some comments and questions:
(1) monitoring.sgml
The new row needs to be placed in the "Timeout" group. The current patch puts the row at the end of IO group. Please insert the new row according to the alphabetical order.
In addition, "morerows" attribute of the following line appears to be increased.
<entry morerows="2"><literal>Timeout</></entry>
By the way, doesn't this wait event belong to IPC wait event type, because the process is waiting for other conflicting processes to terminate the conflict conditions? Did you choose Timeout type because max_standby_{archive | streaming}_delay relates to this wait? I'm not confident which is appropriate, but I'm afraid users can associate this wait with a timeout.
(2) standby.c
Do we need to specify WL_LATCH_SET? Who can set the latch? Do the backends who ended the conflict set the latch?
(3) standby.c
+ if (rc & WL_LATCH_SET)
+ ResetLatch(MyLatch);
+
+ /* emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
I thought the child processes have to terminate as soon as postmaster vanishes. So, it would be better for the order of the two if statements above to be reversed. proc_exit() could be exit(), as some children do in postmaster/*.c.
(4) standby.c
The latch is not reset when the wait timed out. The next WaitLatch() would return immediately.
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(4) standby.c
The latch is not reset when the wait timed out. The next WaitLatch() would
return immediately.
Sorry, let me withdraw this. This is my misunderstanding.
OTOH, when is the latch reset before the wait? Is there an assumption that MyLatch has been in reset state since it was initialized?
Is it safe to use MyLatch here, not the special latch like something used in xlog.c?
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 29, 2017 at 5:04 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
What do you think about the updated version attached?I reviewed this patch. Here are some comments and questions:
Thanks!
(1) monitoring.sgml
The new row needs to be placed in the "Timeout" group. The current patch puts the row at the end of IO group. Please insert the new row according to the alphabetical order.In addition, "morerows" attribute of the following line appears to be increased.
<entry morerows="2"><literal>Timeout</></entry>
By the way, doesn't this wait event belong to IPC wait event type, because the process is waiting for other conflicting processes to terminate the conflict conditions? Did you choose Timeout type because max_standby_{archive | streaming}_delay relates to this wait? I'm not confident which is appropriate, but I'm afraid users can associate this wait with a timeout.
Actually I think that this event belongs to the timeout category,
because we wait until the timeout has finished, the latch being here
to be sure that the process is more responsive in case of a postmaster
death.
(2) standby.c
Do we need to specify WL_LATCH_SET? Who can set the latch? Do the backends who ended the conflict set the latch?
This makes the process able to react on SIGHUP. That's useful for
responsiveness.
(3) standby.c + if (rc & WL_LATCH_SET) + ResetLatch(MyLatch); + + /* emergency bailout if postmaster has died */ + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1);I thought the child processes have to terminate as soon as postmaster vanishes. So, it would be better for the order of the two if statements above to be reversed. proc_exit() could be exit(), as some children do in postmaster/*.c.
OK, reversed this order.
Attached is v4.
--
Michael
Attachments:
standby-delay-latch-v4.patchapplication/octet-stream; name=standby-delay-latch-v4.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9856968997..16c5a6ee65 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1281,7 +1281,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting for confirmation from remote server during synchronous replication.</entry>
</row>
<row>
- <entry morerows="2"><literal>Timeout</></entry>
+ <entry morerows="3"><literal>Timeout</></entry>
<entry><literal>BaseBackupThrottle</></entry>
<entry>Waiting during base backup when throttling activity.</entry>
</row>
@@ -1294,6 +1294,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting to apply WAL at recovery because it is delayed.</entry>
</row>
<row>
+ <entry><literal>RecoveryConflict</></entry>
+ <entry>Waiting for conflict resolution of query running on standby.</entry>
+ </row>
+ <row>
<entry morerows="65"><literal>IO</></entry>
<entry><literal>BufFileRead</></entry>
<entry>Waiting for a read from a buffered file.</entry>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 56a8bf2d17..6309014703 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3600,6 +3600,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
case WAIT_EVENT_RECOVERY_APPLY_DELAY:
event_name = "RecoveryApplyDelay";
break;
+ case WAIT_EVENT_RECOVERY_CONFLICT:
+ event_name = "RecoveryConflict";
+ break;
/* no default case, so that compiler will warn */
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 8e57f933ca..762165ca58 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -148,8 +148,8 @@ GetStandbyLimitTime(void)
}
}
-#define STANDBY_INITIAL_WAIT_US 1000
-static int standbyWait_us = STANDBY_INITIAL_WAIT_US;
+#define STANDBY_INITIAL_WAIT_MS 1
+static int standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/*
* Standby wait logic for ResolveRecoveryConflictWithVirtualXIDs.
@@ -160,8 +160,7 @@ static bool
WaitExceedsMaxStandbyDelay(void)
{
TimestampTz ltime;
-
- CHECK_FOR_INTERRUPTS();
+ int rc;
/* Are we past the limit time? */
ltime = GetStandbyLimitTime();
@@ -171,15 +170,32 @@ WaitExceedsMaxStandbyDelay(void)
/*
* Sleep a bit (this is essential to avoid busy-waiting).
*/
- pg_usleep(standbyWait_us);
+ rc = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ standbyWait_ms,
+ WAIT_EVENT_RECOVERY_CONFLICT);
+
+ /* emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ proc_exit(1);
+
+ if (rc & WL_LATCH_SET)
+ ResetLatch(MyLatch);
/*
- * Progressively increase the sleep times, but not to more than 1s, since
- * pg_usleep isn't interruptable on some platforms.
+ * Progressively increase the sleep times, but not to more than 1s.
+ * This process is expected to wake up reasonably fast after the
+ * transactions this is waiting for are committed.
*/
- standbyWait_us *= 2;
- if (standbyWait_us > 1000000)
- standbyWait_us = 1000000;
+ if (rc & WL_TIMEOUT)
+ {
+ standbyWait_ms *= 2;
+ if (standbyWait_ms > 1000)
+ standbyWait_ms = 1000;
+ }
+
+ /* An interrupt may have occurred while waiting */
+ CHECK_FOR_INTERRUPTS();
return false;
}
@@ -206,8 +222,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
while (VirtualTransactionIdIsValid(*waitlist))
{
- /* reset standbyWait_us for each xact we wait for */
- standbyWait_us = STANDBY_INITIAL_WAIT_US;
+ /* reset standbyWait_ms for each xact we wait for */
+ standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/* wait until the virtual xid is gone */
while (!VirtualXactLock(*waitlist, false))
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index e29397f25b..2bb213fdd4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -824,7 +824,8 @@ typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
+ WAIT_EVENT_RECOVERY_APPLY_DELAY,
+ WAIT_EVENT_RECOVERY_CONFLICT
} WaitEventTimeout;
/* ----------
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
By the way, doesn't this wait event belong to IPC wait event type, because
the process is waiting for other conflicting processes to terminate the
conflict conditions? Did you choose Timeout type because
max_standby_{archive | streaming}_delay relates to this wait? I'm not
confident which is appropriate, but I'm afraid users can associate this
wait with a timeout.Actually I think that this event belongs to the timeout category, because
we wait until the timeout has finished, the latch being here to be sure
that the process is more responsive in case of a postmaster death.
OK. I confirmed the doc is fixed.
(2) standby.c
Do we need to specify WL_LATCH_SET? Who can set the latch? Do thebackends who ended the conflict set the latch?
This makes the process able to react on SIGHUP. That's useful for
responsiveness.
Oh, I see. But how does the startup process respond quickly? It seems that you need to call HandleStartupProcInterrupts() instead of CHECK_FOR_INTERRUPTS(). But I'm not sure whether HandleStartupProcInterrupts() can be called here.
(3) standby.c + if (rc & WL_LATCH_SET) + ResetLatch(MyLatch); + + /* emergency bailout if postmaster has died */ + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1);I thought the child processes have to terminate as soon as postmaster
vanishes. So, it would be better for the order of the two if statements
above to be reversed. proc_exit() could be exit(), as some children do
in postmaster/*.c.OK, reversed this order.
I think exit() instead of proc_exit() better represents what the code wants to do -- terminate the process ASAP without cleaning up. Many other background children do so.
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 30, 2017 at 1:52 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
From: pgsql-hackers-owner@postgresql.org
(2) standby.c
Do we need to specify WL_LATCH_SET? Who can set the latch? Do thebackends who ended the conflict set the latch?
This makes the process able to react on SIGHUP. That's useful for
responsiveness.Oh, I see. But how does the startup process respond quickly? It seems that you need to call HandleStartupProcInterrupts() instead of CHECK_FOR_INTERRUPTS(). But I'm not sure whether HandleStartupProcInterrupts() can be called here.
Bah. Of course you are right. We don't care about SetLatch() here as
signals are processed with a different code path than normal backends.
(3) standby.c + if (rc & WL_LATCH_SET) + ResetLatch(MyLatch); + + /* emergency bailout if postmaster has died */ + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1);I thought the child processes have to terminate as soon as postmaster
vanishes. So, it would be better for the order of the two if statements
above to be reversed. proc_exit() could be exit(), as some children do
in postmaster/*.c.OK, reversed this order.
I think exit() instead of proc_exit() better represents what the code wants to do -- terminate the process ASAP without cleaning up. Many other background children do so.
Hm... OK.
--
Michael
Attachments:
standby-delay-latch-v5.patchapplication/octet-stream; name=standby-delay-latch-v5.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9856968997..16c5a6ee65 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1281,7 +1281,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting for confirmation from remote server during synchronous replication.</entry>
</row>
<row>
- <entry morerows="2"><literal>Timeout</></entry>
+ <entry morerows="3"><literal>Timeout</></entry>
<entry><literal>BaseBackupThrottle</></entry>
<entry>Waiting during base backup when throttling activity.</entry>
</row>
@@ -1294,6 +1294,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting to apply WAL at recovery because it is delayed.</entry>
</row>
<row>
+ <entry><literal>RecoveryConflict</></entry>
+ <entry>Waiting for conflict resolution of query running on standby.</entry>
+ </row>
+ <row>
<entry morerows="65"><literal>IO</></entry>
<entry><literal>BufFileRead</></entry>
<entry>Waiting for a read from a buffered file.</entry>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 56a8bf2d17..6309014703 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3600,6 +3600,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
case WAIT_EVENT_RECOVERY_APPLY_DELAY:
event_name = "RecoveryApplyDelay";
break;
+ case WAIT_EVENT_RECOVERY_CONFLICT:
+ event_name = "RecoveryConflict";
+ break;
/* no default case, so that compiler will warn */
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 8e57f933ca..8f0017e8e2 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -148,8 +148,8 @@ GetStandbyLimitTime(void)
}
}
-#define STANDBY_INITIAL_WAIT_US 1000
-static int standbyWait_us = STANDBY_INITIAL_WAIT_US;
+#define STANDBY_INITIAL_WAIT_MS 1
+static int standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/*
* Standby wait logic for ResolveRecoveryConflictWithVirtualXIDs.
@@ -160,8 +160,7 @@ static bool
WaitExceedsMaxStandbyDelay(void)
{
TimestampTz ltime;
-
- CHECK_FOR_INTERRUPTS();
+ int rc;
/* Are we past the limit time? */
ltime = GetStandbyLimitTime();
@@ -171,15 +170,29 @@ WaitExceedsMaxStandbyDelay(void)
/*
* Sleep a bit (this is essential to avoid busy-waiting).
*/
- pg_usleep(standbyWait_us);
+ rc = WaitLatch(MyLatch,
+ WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ standbyWait_ms,
+ WAIT_EVENT_RECOVERY_CONFLICT);
+
+ /* emergency bailout if postmaster has died */
+ if (rc & WL_POSTMASTER_DEATH)
+ exit(1);
/*
- * Progressively increase the sleep times, but not to more than 1s, since
- * pg_usleep isn't interruptable on some platforms.
+ * Progressively increase the sleep times, but not to more than 1s.
+ * This process is expected to wake up reasonably fast after the
+ * transactions this is waiting for are committed.
*/
- standbyWait_us *= 2;
- if (standbyWait_us > 1000000)
- standbyWait_us = 1000000;
+ if (rc & WL_TIMEOUT)
+ {
+ standbyWait_ms *= 2;
+ if (standbyWait_ms > 1000)
+ standbyWait_ms = 1000;
+ }
+
+ /* An interrupt may have occurred while waiting */
+ CHECK_FOR_INTERRUPTS();
return false;
}
@@ -206,8 +219,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
while (VirtualTransactionIdIsValid(*waitlist))
{
- /* reset standbyWait_us for each xact we wait for */
- standbyWait_us = STANDBY_INITIAL_WAIT_US;
+ /* reset standbyWait_ms for each xact we wait for */
+ standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/* wait until the virtual xid is gone */
while (!VirtualXactLock(*waitlist, false))
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index e29397f25b..2bb213fdd4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -824,7 +824,8 @@ typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
+ WAIT_EVENT_RECOVERY_APPLY_DELAY,
+ WAIT_EVENT_RECOVERY_CONFLICT
} WaitEventTimeout;
/* ----------
Hi Michael, Simon,
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
Oh, I see. But how does the startup process respond quickly? It seems
that you need to call HandleStartupProcInterrupts() instead of
CHECK_FOR_INTERRUPTS(). But I'm not sure whether
HandleStartupProcInterrupts() can be called here.Bah. Of course you are right. We don't care about SetLatch() here as signals
are processed with a different code path than normal backends.
So, I understood you agreed that CHECK_FOR_INTERRUPTS() here does nothing. But your patch still calls it:
+ /* An interrupt may have occurred while waiting */
+ CHECK_FOR_INTERRUPTS();
I got confused because the problem is not defined in this thread. What problem does this patch address? These ones?
* The startup process terminates as soon as postmaster dies.
* pg_stat_activity does not show the wait event of startup process waiting for a recovery conflict resolution.
My guess about why you decided to not call HandleStartupProcInterrupts() here is:
* Respond to postmaster death here.
* No need to reload config file here because there's no parameter to affect this conflict wait. But max_standby_{archive | streaming}_delay seems to affect the wait period.
* No need to handle SIGTERM and exit here, because the startup process doesn't wait for a conflict resolution here when he can terminate.
I think you can call HandleStartupProcInterrupts() here, instead of checking postmaster death. Did you perform tests?
Did Simon's committed patch solve the problem as expected?
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 30, 2017 at 4:02 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
I think you can call HandleStartupProcInterrupts() here, instead of checking postmaster death.
Oops, sorry for that, I quite mess up with this patch. The WaitLatch()
call should still have WL_POSTMASTER_DEATH so as it can leave earlier,
but yes I agree with your analysis that HandleStartupProcInterrupts()
as this is part of the redo work.
Did Simon's committed patch solve the problem as expected?
Does not seem so, I'll let Simon comment on this matter...
--
Michael
Attachments:
standby-delay-latch-v6.patchapplication/octet-stream; name=standby-delay-latch-v6.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9856968997..16c5a6ee65 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1281,7 +1281,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting for confirmation from remote server during synchronous replication.</entry>
</row>
<row>
- <entry morerows="2"><literal>Timeout</></entry>
+ <entry morerows="3"><literal>Timeout</></entry>
<entry><literal>BaseBackupThrottle</></entry>
<entry>Waiting during base backup when throttling activity.</entry>
</row>
@@ -1294,6 +1294,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting to apply WAL at recovery because it is delayed.</entry>
</row>
<row>
+ <entry><literal>RecoveryConflict</></entry>
+ <entry>Waiting for conflict resolution of query running on standby.</entry>
+ </row>
+ <row>
<entry morerows="65"><literal>IO</></entry>
<entry><literal>BufFileRead</></entry>
<entry>Waiting for a read from a buffered file.</entry>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 56a8bf2d17..6309014703 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3600,6 +3600,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
case WAIT_EVENT_RECOVERY_APPLY_DELAY:
event_name = "RecoveryApplyDelay";
break;
+ case WAIT_EVENT_RECOVERY_CONFLICT:
+ event_name = "RecoveryConflict";
+ break;
/* no default case, so that compiler will warn */
}
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 8e57f933ca..38eebe4f5d 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -148,8 +148,8 @@ GetStandbyLimitTime(void)
}
}
-#define STANDBY_INITIAL_WAIT_US 1000
-static int standbyWait_us = STANDBY_INITIAL_WAIT_US;
+#define STANDBY_INITIAL_WAIT_MS 1
+static int standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/*
* Standby wait logic for ResolveRecoveryConflictWithVirtualXIDs.
@@ -160,8 +160,7 @@ static bool
WaitExceedsMaxStandbyDelay(void)
{
TimestampTz ltime;
-
- CHECK_FOR_INTERRUPTS();
+ int rc;
/* Are we past the limit time? */
ltime = GetStandbyLimitTime();
@@ -171,15 +170,25 @@ WaitExceedsMaxStandbyDelay(void)
/*
* Sleep a bit (this is essential to avoid busy-waiting).
*/
- pg_usleep(standbyWait_us);
+ rc = WaitLatch(MyLatch,
+ WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ standbyWait_ms,
+ WAIT_EVENT_RECOVERY_CONFLICT);
/*
- * Progressively increase the sleep times, but not to more than 1s, since
- * pg_usleep isn't interruptable on some platforms.
+ * Progressively increase the sleep times, but not to more than 1s.
+ * This process is expected to wake up reasonably fast after the
+ * transactions this is waiting for are committed.
*/
- standbyWait_us *= 2;
- if (standbyWait_us > 1000000)
- standbyWait_us = 1000000;
+ if (rc & WL_TIMEOUT)
+ {
+ standbyWait_ms *= 2;
+ if (standbyWait_ms > 1000)
+ standbyWait_ms = 1000;
+ }
+
+ /* An interrupt may have occurred while waiting */
+ HandleStartupProcInterrupts();
return false;
}
@@ -206,8 +215,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
while (VirtualTransactionIdIsValid(*waitlist))
{
- /* reset standbyWait_us for each xact we wait for */
- standbyWait_us = STANDBY_INITIAL_WAIT_US;
+ /* reset standbyWait_ms for each xact we wait for */
+ standbyWait_ms = STANDBY_INITIAL_WAIT_MS;
/* wait until the virtual xid is gone */
while (!VirtualXactLock(*waitlist, false))
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index e29397f25b..2bb213fdd4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -824,7 +824,8 @@ typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
WAIT_EVENT_PG_SLEEP,
- WAIT_EVENT_RECOVERY_APPLY_DELAY
+ WAIT_EVENT_RECOVERY_APPLY_DELAY,
+ WAIT_EVENT_RECOVERY_CONFLICT
} WaitEventTimeout;
/* ----------
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Oops, sorry for that, I quite mess up with this patch. The WaitLatch() call
should still have WL_POSTMASTER_DEATH so as it can leave earlier, but yes
I agree with your analysis that HandleStartupProcInterrupts() as this is
part of the redo work.
Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally? I understood you added it for startup process to respond quickly to events other than the postmaster death. Why don't we restore WL_LATCH_SET? I won't object to not adding the flag if there's a reason.
I'll mark this as ready for committer when I see WL_LATCH_SET added (optional) and you have reported that you did the following test cases:
* Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be resolved.
* Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be resolved.
* Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1 to a short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be resolved.
Did Simon's committed patch solve the problem as expected?
Does not seem so, I'll let Simon comment on this matter...
Agreed. I guess his patch for earlier releases should work if CHECK_FOR_INTERRUPTS() is replaced with HandleStartupProcInterrupts().
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 31, 2017 at 4:46 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally? I understood you added it for startup process to respond quickly to events other than the postmaster death. Why don't we restore WL_LATCH_SET? I won't object to not adding the flag if there's a reason.
It doesn't matter. MyLatch is never set in the context of the startup
process. The other code paths of the startup process using WaitLatch()
may be awaken by the latch set by the WAL receiver, but that does not
apply here.
I'll mark this as ready for committer when I see WL_LATCH_SET added (optional)
Objection on this point.
and you have reported that you did the following test cases:
* Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be resolved.
* Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be resolved.
* Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1 to a short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be resolved.
Fine for me, I have checked those multiple scenarios and the startup
process is more responsive. I have emulated the conflict with a
transaction doing repeatable read on the standby while the master was
deleting and vacuuming a table.
But, actually, after looking again at this patch with fresher eyes, I
am being a bit doubtful that this is really correct... Calling
HandleStartupProcInterrupts() is actually dangerous I think, because
it would reload the GUC context while replaying a record. But I think
that it is cleaner to reload the context after being done with a
record.
It seems to me that the correct way to do things would be to switch
all the conflict code paths to use a latch instead of pg_usleep, by
either use a dedicated latch like the recovery wakeup one, or the
MyLatch of the startup process. Then, when a signal arrives in, we
simply set up the conflict latch which wakes up what is waiting for
activity. The latch needs to be reset because calling WaitLatch().
Leaving immediately on WL_POSTMASTER_DEATH looks fine though as far as
I have tested.
As time is growing short, I am marking this patch as returned with
feedback. Thanks for the input, Tsunakawa-san!
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers