Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

Started by Michael Paquierover 7 years ago30 messageshackers

michael@paquier.xyz

over 7 years ago

Hi all,

Since 2dedf4d, recovery.conf is dead and all recovery parameters are now
GUCs. While looking at a patch to switch primary_conninfo and
primary_slot_name to be reloadable, Sergei Kornilov had a very good
point that the WAL receiver uses a connection string and a physical slot
name based on what the startup process wants the WAL receiver to use:
/messages/by-id/20181212043208.GI17695@paquier.xyz

It seems to me that doing so is now strange as the WAL receiver knows
about the GUC context, and actually it knows the parameters it should
use, so there is no point to pass down the values when requesting the
WAL receiver to start.

What do you think about the attached to simplify the logic? Even if
primary_conninfo and primary_slot_name are not switched to SIGHUP this
cleanup looks like a good thing to me.

Thoughts?
--
Michael

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Michael Paquier (#1)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On December 11, 2018 9:30:42 PM PST, Michael Paquier <michael@paquier.xyz> wrote:

Hi all,

Since 2dedf4d, recovery.conf is dead and all recovery parameters are
now
GUCs. While looking at a patch to switch primary_conninfo and
primary_slot_name to be reloadable, Sergei Kornilov had a very good
point that the WAL receiver uses a connection string and a physical
slot
name based on what the startup process wants the WAL receiver to use:
/messages/by-id/20181212043208.GI17695@paquier.xyz

It seems to me that doing so is now strange as the WAL receiver knows
about the GUC context, and actually it knows the parameters it should
use, so there is no point to pass down the values when requesting the
WAL receiver to start.

What do you think about the attached to simplify the logic? Even if
primary_conninfo and primary_slot_name are not switched to SIGHUP this
cleanup looks like a good thing to me.

I am not convinced this is a good idea. This allows the state of walrcv and startup to diverge, they could e.g. have different configuration, due to differently time config reloads. And they need to communicate via shmem anyway, so there's not a lot of complexity avoided. And I think it's good to be able to show the active connection via functions, rather than the one currently in pg.conf, which might be different.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Andres Freund (#2)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Tue, Dec 11, 2018 at 09:34:58PM -0800, Andres Freund wrote:

I am not convinced this is a good idea. This allows the state of
walrcv and startup to diverge, they could e.g. have different
configuration, due to differently time config reloads. And they need
to communicate via shmem anyway, so there's not a lot of complexity
avoided. And I think it's good to be able to show the active
connection via functions, rather than the one currently in pg.conf,
which might be different.

Well, the conninfo and slot name accessible to the user are the values
available only once the information of the WAL receiver in shared memory
is ready to be displayed. Relying more on the GUC context has the
advantage to never have in shared memory the original string and only
store the clobbered one, which actually simplifies what's stored in
shared memory because you can entirely remove ready_to_display (I forgot
to remove that in the patch actually). If those parameters become
reloadable, you actually rely only on what the param context has, and
not on what the shared memory context may have or not.

Removing entirely ready_to_display is quite appealing actually..
--
Michael

Simon Riggs

simon@2ndQuadrant.com

over 7 years ago

In reply to: Andres Freund (#2)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Wed, 12 Dec 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote:

What do you think about the attached to simplify the logic? Even if
primary_conninfo and primary_slot_name are not switched to SIGHUP this
cleanup looks like a good thing to me.

I am not convinced this is a good idea. This allows the state of walrcv
and startup to diverge, they could e.g. have different configuration, due
to differently time config reloads.

That sounds bad, but most parameters apply to one or the other, not both.

If there are some that apply to both, then yes, coordination would be
important.

It does seem likely that the new scheme will require us to look carefully
at when parameters are reloaded, since the timing of reloads was never
taken into account in the previous coding.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Andres Freund (#2)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

Hello

This allows the state of walrcv and startup to diverge, they could e.g. have different configuration, due to differently time config reloads.

So we have exactly same problem with, for example, hot_standby_feedback, right?
We change hot_standby_feedback value, reload it and we can have 'show hot_standby_feedback' different to currently running walreceiver.
But why we have nothing about hot_standby_feedback in pg_stat_get_wal_receiver()?
Where is difference?

regards, Sergei

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Sergei Kornilov (#5)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Wed, Dec 12, 2018 at 06:55:04PM +0300, Sergei Kornilov wrote:

This allows the state of walrcv and startup to diverge, they could
e.g. have different configuration, due to differently time config
reloads.

So we have exactly same problem with, for example, hot_standby_feedback, right?
We change hot_standby_feedback value, reload it and we can have 'show
hot_standby_feedback' different to currently running walreceiver. But
why we have nothing about hot_standby_feedback in pg_stat_get_wal_receiver()?
Where is difference?

The difference is in the fact that a WAL receiver uses the connection
string and the slot name when establishing the connection when using
START_STREAMING through the replication protocol, and
hot_standby_feedback gets used depending on the context of the messages
exchanged.
--
Michael

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#3)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Wed, Dec 12, 2018 at 02:55:17PM +0900, Michael Paquier wrote:

Well, the conninfo and slot name accessible to the user are the values
available only once the information of the WAL receiver in shared memory
is ready to be displayed. Relying more on the GUC context has the
advantage to never have in shared memory the original string and only
store the clobbered one, which actually simplifies what's stored in
shared memory because you can entirely remove ready_to_display (I forgot
to remove that in the patch actually). If those parameters become
reloadable, you actually rely only on what the param context has, and
not on what the shared memory context may have or not.

Removing entirely ready_to_display is quite appealing actually..

I have been looking at that stuff this morning, and indeed
ready_for_display can be entirely removed. I think that's quite an
advantage as WalRcv->conninfo only stores the connection string
cloberred to hide security-sensitive fields and it never has to save the
initial state which could have sensitive data, simplifying how the WAL
receiver data is filled. I am adding that to the next commit fest.
--
Michael

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Michael Paquier (#7)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

Not sure i can be reviewer since this was initially my proposal.
I vote +1 for this patch. Code looks good and i think we have no reason to leave RequestXLogStreaming as-is.

regards, Sergei

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Sergei Kornilov (#8)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Fri, Dec 14, 2018 at 12:23:09PM +0300, Sergei Kornilov wrote:

Not sure i can be reviewer since this was initially my proposal.

No issues from me if you wish to review the code. In my opinion, it is
not a problem if you are a reviewer as I wrote the code.

I vote +1 for this patch. Code looks good and i think we have no
reason to leave RequestXLogStreaming as-is.

Thanks for the input. The take on my side is to be able to remove
ready_to_display from WalRcv so let's see what others think.
--
Michael

#10

Donald Dong

xdong@csumb.edu

over 7 years ago

In reply to: Michael Paquier (#3)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

Hi Michael,

Thank you for the information!

On Dec 11, 2018, at 9:55 PM, Michael Paquier <michael@paquier.xyz> wrote:

Well, the conninfo and slot name accessible to the user are the values
available only once the information of the WAL receiver in shared memory
is ready to be displayed. Relying more on the GUC context has the
advantage to never have in shared memory the original string and only
store the clobbered one, which actually simplifies what's stored in
shared memory because you can entirely remove ready_to_display (I forgot
to remove that in the patch actually). If those parameters become
reloadable, you actually rely only on what the param context has, and
not on what the shared memory context may have or not.

I wonder why do we need to have this addition?

src/backend/replication/walreceiver.c
+       memset(walrcv->slotname, 0, NAMEDATALEN);
+       if (PrimarySlotName)
+               strlcpy((char *) walrcv->slotname, PrimarySlotName, NAMEDATALEN);

src/backend/replication/walreceiver.c
288: PrimaryConnInfo == NULL || PrimaryConnInfo[0] == '\0'
291: errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined")));
392: PrimarySlotName && PrimarySlotName[0] != '\0'

src/backend/access/transam/xlog.c
11824: * If primary_conninfo is set, launch walreceiver to try
11833: PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0

I notice these patterns appear in different places. Maybe it will be better to have a common method such as pg_str_empty()?

Regards,
Donald Dong

#11

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Donald Dong (#10)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Wed, Jan 09, 2019 at 06:00:23AM -0800, Donald Dong wrote:

I wonder why do we need to have this addition?

src/backend/replication/walreceiver.c
+       memset(walrcv->slotname, 0, NAMEDATALEN);
+       if (PrimarySlotName)
+               strlcpy((char *) walrcv->slotname, PrimarySlotName, NAMEDATALEN);

That's much cleaner to me to clean up the field for safety before
starting the process. When requesting a WAL receiver to start,
slotname and conninfo gets zeroed before doing anything, you are right
that we could do without it actually.

I notice these patterns appear in different places. Maybe it will be
better to have a common method such as pg_str_empty()?

That's a pattern used quite a lot for many GUCs, so that's quite
separate from this patch. Perhaps it would make sense to have a macro
for that purpose for GUCs, I am not sure if that's worth it though.
--
Michael

#12

Donald Dong

xdong@csumb.edu

over 7 years ago

In reply to: Michael Paquier (#11)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Jan 9, 2019, at 4:47 PM, Michael Paquier <michael@paquier.xyz> wrote:

That's much cleaner to me to clean up the field for safety before
starting the process. When requesting a WAL receiver to start,
slotname and conninfo gets zeroed before doing anything, you are right
that we could do without it actually.

Cool! I agree that cleaning up the field would make the logic cleaner.

That's a pattern used quite a lot for many GUCs, so that's quite
separate from this patch. Perhaps it would make sense to have a macro
for that purpose for GUCs, I am not sure if that's worth it though.

Yeah. I think having such macro would make the code a bit cleaner. Tho the
duplication of logic is not too significant.

#13

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Donald Dong (#12)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Wed, Jan 09, 2019 at 05:38:53PM -0800, Donald Dong wrote:

On Jan 9, 2019, at 4:47 PM, Michael Paquier <michael@paquier.xyz> wrote:

That's much cleaner to me to clean up the field for safety before
starting the process. When requesting a WAL receiver to start,
slotname and conninfo gets zeroed before doing anything, you are right
that we could do without it actually.

Cool! I agree that cleaning up the field would make the logic cleaner.

I was just double-checking the code, and it is possible to remove the
part from RequestXLogStreaming() which sets slotname and conninfo to
'\0', so I removed that part from my local branch to simplify the
code.
--
Michael

#14

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Michael Paquier (#13)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

Hello

I was just double-checking the code, and it is possible to remove the
part from RequestXLogStreaming() which sets slotname and conninfo to
'\0', so I removed that part from my local branch to simplify the
code.

Without both ready_to_display and cleaning in RequestXLogStreaming we do not show outdated PrimaryConnInfo in case walreceiver just started, but does not connected yet? While waiting walrcv_connect for example.

regards, Sergei

#15

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Sergei Kornilov (#14)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Thu, Jan 10, 2019 at 01:10:16PM +0300, Sergei Kornilov wrote:

Without both ready_to_display and cleaning in RequestXLogStreaming
we do not show outdated PrimaryConnInfo in case walreceiver just
started, but does not connected yet? While waiting walrcv_connect
for example.

Good point. There is a gap between the moment the WAL receiver PID is
set when the WAL receiver starts up and the moment where the different
fields are reset to 0 which is not good as it could be possible that
the user sees the information from the previous WAL receiver, so we
should move the memset calls when the PID is set, reaching a state
where the PID is alive but where there is no connection yet. The port
number needs a similar treatment.

Looking closer at the code, it seems to me that it would be a good
thing if we update the shared memory state when the WAL receiver dies,
so as not only the PID is set to 0, but all connection-related
information gets the call.

With all those comments I am finishing with the updated attached.
Thoughts?
--
Michael

#16

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Michael Paquier (#15)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

Thank you, patch looks good for me.

regards, Sergei

#17

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Sergei Kornilov (#16)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote:

Thank you, patch looks good for me.

Thanks Sergei for the review. I have been looking at the patch again
this morning with a fresh set of eyes and the thing looks in a
committable shape (the GUC values for NULL checks are a bit
inconsistent in the last patch by the way, so I have fixed them
locally).

I'll do an extra pass on it in the next couple of days and commit.
Then let's move on with the reloadable portions.
--
Michael

#18

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Michael Paquier (#17)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

Hi,

On 2019-01-11 09:34:28 +0900, Michael Paquier wrote:

On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote:

Thank you, patch looks good for me.

Thanks Sergei for the review. I have been looking at the patch again
this morning with a fresh set of eyes and the thing looks in a
committable shape (the GUC values for NULL checks are a bit
inconsistent in the last patch by the way, so I have fixed them
locally).

I'll do an extra pass on it in the next couple of days and commit.
Then let's move on with the reloadable portions.

I still think this whole direction of accessing the GUC in walreceiver
is a bad idea and shouldn't be pursued further. There's definite
potential for startup process and WAL receiver having different states
of GUCs, the code doesn't get meaningfully simpler, the GUC value checks
in walreceiver make for horrible reporting up the chain.

Greetings,

Andres Freund

#19

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Andres Freund (#18)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On Thu, Jan 10, 2019 at 04:41:47PM -0800, Andres Freund wrote:

I still think this whole direction of accessing the GUC in walreceiver
is a bad idea and shouldn't be pursued further. There's definite
potential for startup process and WAL receiver having different states
of GUCs, the code doesn't get meaningfully simpler, the GUC value checks
in walreceiver make for horrible reporting up the chain.

Did you notice the set of messages from upthread? The code *gets*
simpler by removing ready_to_display and the need to manipulate the
non-clobbered connection string sent directly from the startup
process. In my opinion that's a clear gain. We gain also the
possibility to track down that a WAL receiver is started but not
connected yet for monitoring tools.
--
Michael

#20

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Michael Paquier (#19)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

On 2019-01-11 09:50:49 +0900, Michael Paquier wrote:

On Thu, Jan 10, 2019 at 04:41:47PM -0800, Andres Freund wrote:

I still think this whole direction of accessing the GUC in walreceiver
is a bad idea and shouldn't be pursued further. There's definite
potential for startup process and WAL receiver having different states
of GUCs, the code doesn't get meaningfully simpler, the GUC value checks
in walreceiver make for horrible reporting up the chain.

Did you notice the set of messages from upthread? The code *gets*
simpler by removing ready_to_display and the need to manipulate the
non-clobbered connection string sent directly from the startup
process.

It's a minor simplification for code that's existed for quite a
while. Not worth it.

#21

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Andres Freund (#20)

#22

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#21)

#23