Gripes about walsender command processing
While trying to understand a recent buildfarm failure [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03, I got annoyed
by the fact that a walsender exposes its last SQL command in
pg_stat_activity even when that was a long time ago and what it's been
doing since then is replication commands. This leads to *totally*
misleading reports, for instance in [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03 we read
2020-09-13 19:10:09.632 CEST [5f5e526d.242415:4] LOG: server process (PID 2368853) was terminated by signal 11: Segmentation fault
2020-09-13 19:10:09.632 CEST [5f5e526d.242415:5] DETAIL: Failed process was running: SELECT pg_catalog.set_config('search_path', '', false);
even though the process's own log messages paint a much better picture of
reality:
2020-09-13 19:10:07.302 CEST [5f5e526f.242555:1] LOG: connection received: host=[local]
2020-09-13 19:10:07.303 CEST [5f5e526f.242555:2] LOG: replication connection authorized: user=bf application_name=sub2
2020-09-13 19:10:07.303 CEST [5f5e526f.242555:3] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false);
2020-09-13 19:10:07.334 CEST [5f5e526f.242555:4] LOG: received replication command: IDENTIFY_SYSTEM
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:5] LOG: received replication command: START_REPLICATION SLOT "sub2" LOGICAL 0/0 (proto_version '2', publication_names '"pub2"')
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:6] LOG: starting logical decoding for slot "sub2"
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:7] DETAIL: Streaming transactions committing after 0/159EB38, reading WAL from 0/159EB00.
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:8] LOG: logical decoding found consistent point at 0/159EB00
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:9] DETAIL: There are no running transactions.
I think that walsenders ought to report their current replication command
as though it were a SQL command. They oughta set debug_query_string, too.
While trying to fix this, I also observed that exec_replication_command
fails to clean up the temp context it made for parsing the command string,
if that turns out to be a SQL command. This very accidentally fails to
lead to a long-term memory leak, because that context will be a child of
MessageContext so it'll be cleaned out in the next iteration of
PostgresMain's main loop. But it's still unacceptably sloppy coding
IMHO, and it's closely related to a lot of other randomness in the
function; it'd be better to have a separate early-exit path for SQL
commands.
Attached find a proposed fix for these things.
What I'd really like to do is adjust pgstat_report_activity so that
callers are *required* to provide some kind of command string when
transitioning into STATE_RUNNING state, ie something like
Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL);
However, before we could do that, we'd have to clean up the rat's nest
of seemingly-inserted-with-the-aid-of-a-dartboard pgstat_report_activity
calls in replication/logical/worker.c. I couldn't make heads or tails
of what is going on there, so I did not try.
BTW, while I didn't change it here, isn't the second
SnapBuildClearExportedSnapshot call in exec_replication_command just
useless? We already did that before parsing the command.
regards, tom lane
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03
Attachments:
clean-up-exec_replication_command.patchtext/x-diff; charset=us-ascii; name=clean-up-exec_replication_command.patchDownload+34-22
I wrote:
I think that walsenders ought to report their current replication command
as though it were a SQL command. They oughta set debug_query_string, too.
I also notice that a walsender does not maintain its ps status:
postgres 921109 100 0.0 32568 11880 ? Rs 18:57 0:51 postgres: publisher: walsender postgres [local] idle
I don't mind if we decide we don't need to reflect the walsender's
true activity, but let's not have it showing an outright lie.
regards, tom lane
On Sun, Sep 13, 2020 at 03:47:51PM -0400, Tom Lane wrote:
While trying to fix this, I also observed that exec_replication_command
fails to clean up the temp context it made for parsing the command string,
if that turns out to be a SQL command. This very accidentally fails to
lead to a long-term memory leak, because that context will be a child of
MessageContext so it'll be cleaned out in the next iteration of
PostgresMain's main loop. But it's still unacceptably sloppy coding
IMHO, and it's closely related to a lot of other randomness in the
function; it'd be better to have a separate early-exit path for SQL
commands.
Looks fine seen from here. +1.
What I'd really like to do is adjust pgstat_report_activity so that
callers are *required* to provide some kind of command string when
transitioning into STATE_RUNNING state, ie something likeAssert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL);
However, before we could do that, we'd have to clean up the rat's nest
of seemingly-inserted-with-the-aid-of-a-dartboard pgstat_report_activity
calls in replication/logical/worker.c. I couldn't make heads or tails
of what is going on there, so I did not try.
For this one, I don't actually agree. For now the state and the query
string, actually the activity string, are completely independent. So
external modules like bgworkers can mix the state enum and the
activity string as they wish. I think that this flexibility is useful
to keep.
BTW, while I didn't change it here, isn't the second
SnapBuildClearExportedSnapshot call in exec_replication_command just
useless? We already did that before parsing the command.
Indeed.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Sun, Sep 13, 2020 at 03:47:51PM -0400, Tom Lane wrote:
While trying to fix this, I also observed that exec_replication_command
fails to clean up the temp context it made for parsing the command string,
if that turns out to be a SQL command. This very accidentally fails to
lead to a long-term memory leak, because that context will be a child of
MessageContext so it'll be cleaned out in the next iteration of
PostgresMain's main loop. But it's still unacceptably sloppy coding
IMHO, and it's closely related to a lot of other randomness in the
function; it'd be better to have a separate early-exit path for SQL
commands.
Looks fine seen from here. +1.
Pushed, thanks for looking.
What I'd really like to do is adjust pgstat_report_activity so that
callers are *required* to provide some kind of command string when
transitioning into STATE_RUNNING state, ie something like
Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL);
For this one, I don't actually agree. For now the state and the query
string, actually the activity string, are completely independent. So
external modules like bgworkers can mix the state enum and the
activity string as they wish. I think that this flexibility is useful
to keep.
Meh ... but I'm not excited enough about the point to argue.
I looked into the other point about ps status always claiming the
walsender is "idle". This turns out to be something PostgresMain
does automatically, so unless we want to uglify that logic, we have
to make walsenders set the field honestly. So I propose the attached.
(Is there a better way to get the name of a replication command?
I didn't look very hard for one.)
regards, tom lane
Attachments:
make-walsenders-set-ps-status.patchtext/x-diff; charset=us-ascii; name=make-walsenders-set-ps-status.patchDownload+12-2
On Mon, Sep 14, 2020 at 12:51:39PM -0400, Tom Lane wrote:
I looked into the other point about ps status always claiming the
walsender is "idle". This turns out to be something PostgresMain
does automatically, so unless we want to uglify that logic, we have
to make walsenders set the field honestly. So I propose the attached.
(Is there a better way to get the name of a replication command?
I didn't look very hard for one.)
Wouldn't that just be in cmdtaglist.h, but extended for nodes used for
replication commands? Then you could just call CreateCommandTag() to
get the command string to print as postgres.c does. There is already
one for repslot drop, in some way. This would have the advantage to
just call once set_ps_display() before the switch split.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Mon, Sep 14, 2020 at 12:51:39PM -0400, Tom Lane wrote:
(Is there a better way to get the name of a replication command?
I didn't look very hard for one.)
Wouldn't that just be in cmdtaglist.h, but extended for nodes used for
replication commands? Then you could just call CreateCommandTag() to
get the command string to print as postgres.c does. There is already
one for repslot drop, in some way. This would have the advantage to
just call once set_ps_display() before the switch split.
Mmm, not sure whether having CreateCommandTag know about replication
commands is a good thing or not. We certainly could do it like that,
since there's only one namespace of NodeTag values, but conceptually
it feels a bit weird to me. There's a lot of other infrastructure
for SQL command nodes that we're surely never going to build out for
replication commands, so should we do it in CreateCommandTag?
Anybody else have an opinion about it?
(I don't quite follow your comment about repslot drop, btw.)
regards, tom lane
On Mon, Sep 14, 2020 at 11:34:44PM -0400, Tom Lane wrote:
(I don't quite follow your comment about repslot drop, btw.)
There is already a command tag equivalent to DROP_REPLICATION_SLOT:
$ git grep REPLICATION -- */cmdtaglist.h
src/include/tcop/cmdtaglist.h:PG_CMDTAG(CMDTAG_DROP_REPLICATION_SLOT,
"DROP REPLICATION SLOT", false, false, false)
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Mon, Sep 14, 2020 at 11:34:44PM -0400, Tom Lane wrote:
(I don't quite follow your comment about repslot drop, btw.)
There is already a command tag equivalent to DROP_REPLICATION_SLOT:
$ git grep REPLICATION -- */cmdtaglist.h
src/include/tcop/cmdtaglist.h:PG_CMDTAG(CMDTAG_DROP_REPLICATION_SLOT,
"DROP REPLICATION SLOT", false, false, false)
[ blink ... ] So why is it that DropReplicationSlot does
SetQueryCompletion(&qc, CMDTAG_DROP_REPLICATION_SLOT, 0);
EndCommand(&qc, DestRemote, false);
when the caller will immediately after that do
SetQueryCompletion(&qc, CMDTAG_SELECT, 0);
EndCommand(&qc, DestRemote, true);
StartLogicalReplication has a similar weirdness.
The more I look at this code, the more broken it seems.
Anyway, independently of whether walsender should be sending multiple
command-complete messages, I don't think I really approve of including
replication commands in the CommandTag enum. It still seems like a
type pun. However, it looks like we'd have to duplicate
SetQueryCompletion/EndCommand if we don't want to do that, so maybe
I'd better just hold my nose and do it that way.
regards, tom lane
On 2020-Sep-15, Tom Lane wrote:
[ blink ... ] So why is it that DropReplicationSlot does
SetQueryCompletion(&qc, CMDTAG_DROP_REPLICATION_SLOT, 0);
EndCommand(&qc, DestRemote, false);when the caller will immediately after that do
SetQueryCompletion(&qc, CMDTAG_SELECT, 0);
EndCommand(&qc, DestRemote, true);StartLogicalReplication has a similar weirdness.
The more I look at this code, the more broken it seems.
I overlooked this in 2f9661311b83. From this perspective, I agree it
looks wrong. We still have to send *some* completion tag (the 'C'
message), but maybe we can invent a separate entry point in dest.c for
that -- EndReplicationCommand() or some such -- that takes values from a
separate enum?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Sep-15, Alvaro Herrera wrote:
I overlooked this in 2f9661311b83. From this perspective, I agree it
looks wrong. We still have to send *some* completion tag (the 'C'
message), but maybe we can invent a separate entry point in dest.c for
that -- EndReplicationCommand() or some such -- that takes values from a
separate enum?
It seems simpler than that actually; we don't need to build a lot of
infrastructure.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
replic-cmdtag.patchtext/x-diff; charset=us-asciiDownload+18-10
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Sep-15, Alvaro Herrera wrote:
I overlooked this in 2f9661311b83. From this perspective, I agree it
looks wrong. We still have to send *some* completion tag (the 'C'
message), but maybe we can invent a separate entry point in dest.c for
that -- EndReplicationCommand() or some such -- that takes values from a
separate enum?
It seems simpler than that actually; we don't need to build a lot of
infrastructure.
This looks moderately reasonable to me. However, with the process
title reporting I want to add, we're going to end up with a switch
that looks like
case T_IdentifySystemCmd:
+ set_ps_display("IDENTIFY_SYSTEM");
IdentifySystem();
+ EndReplicationCommand("IDENTIFY_SYSTEM");
break;
case T_BaseBackupCmd:
+ set_ps_display("BASE_BACKUP");
PreventInTransactionBlock(true, "BASE_BACKUP");
SendBaseBackup((BaseBackupCmd *) cmd_node);
+ EndReplicationCommand("BASE_BACKUP");
break;
which is starting to look a bit repetitive and copy-pasteo-prone.
I don't see an easy way to improve on it though. The only obvious
alternative would be to put another switch before the main one that
just fills a "const char *cmdtag" variable, but that seems ugly.
regards, tom lane
On 2020-Sep-16, Tom Lane wrote:
This looks moderately reasonable to me. However, with the process
title reporting I want to add, we're going to end up with a switch
that looks likecase T_IdentifySystemCmd: + set_ps_display("IDENTIFY_SYSTEM"); IdentifySystem(); + EndReplicationCommand("IDENTIFY_SYSTEM"); break;case T_BaseBackupCmd: + set_ps_display("BASE_BACKUP"); PreventInTransactionBlock(true, "BASE_BACKUP"); SendBaseBackup((BaseBackupCmd *) cmd_node); + EndReplicationCommand("BASE_BACKUP"); break;which is starting to look a bit repetitive and copy-pasteo-prone.
Agreed.
I don't see an easy way to improve on it though. The only obvious
alternative would be to put another switch before the main one that
just fills a "const char *cmdtag" variable, but that seems ugly.
The alternative of doing the assignment in each case of the same switch
does not look too terrible:
case T_IdentifySystemCmd:
+ cmdtag = "IDENTIFY_SYSTEM";
+ set_ps_display(cmdtag);
IdentifySystem();
+ EndReplicationCommand(cmdtag);
break;
case T_BaseBackupCmd:
+ cmdtag = "BASE_BACKUP";
+ set_ps_display(cmdtag);
PreventInTransactionBlock(true, cmdtag);
SendBaseBackup((BaseBackupCmd *) cmd_node);
+ EndReplicationCommand(cmdtag);
break;
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Sep-16, Tom Lane wrote:
I don't see an easy way to improve on it though. The only obvious
alternative would be to put another switch before the main one that
just fills a "const char *cmdtag" variable, but that seems ugly.
The alternative of doing the assignment in each case of the same switch
does not look too terrible:
case T_IdentifySystemCmd: + cmdtag = "IDENTIFY_SYSTEM"; + set_ps_display(cmdtag); IdentifySystem(); + EndReplicationCommand(cmdtag); break;
case T_BaseBackupCmd:
+ cmdtag = "BASE_BACKUP";
+ set_ps_display(cmdtag);
PreventInTransactionBlock(true, cmdtag);
SendBaseBackup((BaseBackupCmd *) cmd_node);
+ EndReplicationCommand(cmdtag);
break;
Yeah, that works for me. It doesn't allow for having just one
set_ps_display() call ahead of the switch, but that isn't that
big a loss. We cannot merge the EndReplicationCommand calls to
after the switch, because some of the cases don't want one here;
so that partial duplication is inescapable.
Note that your changes need to be backpatched into v13,
because AFAICS this code is violating the FE/BE protocol
right now --- it's just luck that libpq isn't moaning
about extra CommandComplete messages. But I don't think
we want to change the ps title behavior in v13 at this
late date, so that part should be HEAD-only.
regards, tom lane
On 2020-Sep-16, Tom Lane wrote:
Yeah, that works for me. It doesn't allow for having just one
set_ps_display() call ahead of the switch, but that isn't that
big a loss. We cannot merge the EndReplicationCommand calls to
after the switch, because some of the cases don't want one here;
so that partial duplication is inescapable.
Right.
Note that your changes need to be backpatched into v13,
because AFAICS this code is violating the FE/BE protocol
right now --- it's just luck that libpq isn't moaning
about extra CommandComplete messages. But I don't think
we want to change the ps title behavior in v13 at this
late date, so that part should be HEAD-only.
Ok, will do.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Sep-16, Tom Lane wrote:
Note that your changes need to be backpatched into v13,
because AFAICS this code is violating the FE/BE protocol
right now --- it's just luck that libpq isn't moaning
about extra CommandComplete messages. But I don't think
we want to change the ps title behavior in v13 at this
late date, so that part should be HEAD-only.
I pushed to 13, but it's because of the removal of the command tag; the
duplicate tags we've been sending since 9.3. ¯\_(ツ)_/¯
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Sep-16, Tom Lane wrote:
Note that your changes need to be backpatched into v13,
because AFAICS this code is violating the FE/BE protocol
right now --- it's just luck that libpq isn't moaning
about extra CommandComplete messages. But I don't think
we want to change the ps title behavior in v13 at this
late date, so that part should be HEAD-only.
I pushed to 13, but it's because of the removal of the command tag; the
duplicate tags we've been sending since 9.3. ¯\_(ツ)_/¯
Ugh. Well, fixing v13 is an improvement anyway. Thanks!
regards, tom lane