A recent message added to pg_upgade

Started by Kyotaro Horiguchiover 2 years ago89 messages
Jump to latest
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com

Hello.

Some messages recently introduced by commit 29d0a77fa6 seem to deviate
slightly from our standards.

+		if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+		{
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("replication slots must not be invalidated during the upgrade"),
+					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

The message for errhint is not a complete sentence. And errmsg is not
in telegraph style. The first attached makes minimum changes.

However, if allowed, I'd like to propose an alternative set of
messages as follows:

+					errmsg("replication slot is invalidated during upgrade"),
+					errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation."));

The second attached does this.

What do you think about those?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_messages_in_slot_c_1.patchtext/x-patch; charset=us-asciiDownload+2-2
fix_messages_in_slot_c_2.patchtext/x-patch; charset=us-asciiDownload+2-2
#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#1)
Re: A recent message added to pg_upgade

On Fri, Oct 27, 2023 at 8:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Hello.

Some messages recently introduced by commit 29d0a77fa6 seem to deviate
slightly from our standards.

+               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+               {
+                       ereport(ERROR,
+                                       errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                       errmsg("replication slots must not be invalidated during the upgrade"),
+                                       errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

The message for errhint is not a complete sentence.

Yeah, the hint message should have ended with a period -
https://www.postgresql.org/docs/current/error-style-guide.html#ERROR-STYLE-GUIDE-GRAMMAR-PUNCTUATION.

The second attached does this.

What do you think about those?

+                    errmsg("replication slot is invalidated during upgrade"),
+                    errhint("Set \"max_slot_wal_keep_size\" to -1 to
avoid invalidation."));
         }

The above errhint LGTM. How about a slightly different errmsg, like
the following?

+                    errmsg("cannot invalidate replication slots when
in binary upgrade mode"),
+                    errhint("Set \"max_slot_wal_keep_size\" to -1 to
avoid invalidation."));

".... when in binary upgrade mode" is being used in many places.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#2)
Re: A recent message added to pg_upgade

On Fri, Oct 27, 2023 at 8:52 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Oct 27, 2023 at 8:28 AM Kyotaro Horiguchi:
The above errhint LGTM. How about a slightly different errmsg, like
the following?

+                    errmsg("cannot invalidate replication slots when
in binary upgrade mode"),
+                    errhint("Set \"max_slot_wal_keep_size\" to -1 to
avoid invalidation."));

".... when in binary upgrade mode" is being used in many places.

By this time slot may be already invalidated, so how about:
"replication slot was invalidated when in binary upgrade mode"?

--
With Regards,
Amit Kapila.

#4Peter Smith
smithpb2250@gmail.com
In reply to: Kyotaro Horiguchi (#1)
Re: A recent message added to pg_upgade

On Fri, Oct 27, 2023 at 1:58 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Hello.

Some messages recently introduced by commit 29d0a77fa6 seem to deviate
slightly from our standards.

+               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+               {
+                       ereport(ERROR,
+                                       errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                       errmsg("replication slots must not be invalidated during the upgrade"),
+                                       errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

The message for errhint is not a complete sentence. And errmsg is not
in telegraph style. The first attached makes minimum changes.

However, if allowed, I'd like to propose an alternative set of
messages as follows:

+                                       errmsg("replication slot is invalidated during upgrade"),
+                                       errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation."));

The second attached does this.

What do you think about those?

IIUC the only possible way to reach this error (according to the
comment preceding it) is by the user overriding the GUC value (which
was defaulted -1) on the command line.

+ /*
+ * The logical replication slots shouldn't be invalidated as
+ * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+ *
+ * The following is just a sanity check.
+ */

Given that, I felt a more relevant msg/hint might be like:

errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"),
errhint("Do not override \"max_slot_wal_keep_size\" using command line
options."));

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#4)
Re: A recent message added to pg_upgade

On Fri, Oct 27, 2023 at 9:37 AM Peter Smith <smithpb2250@gmail.com> wrote:

On Fri, Oct 27, 2023 at 1:58 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Hello.

Some messages recently introduced by commit 29d0a77fa6 seem to deviate
slightly from our standards.

+               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+               {
+                       ereport(ERROR,
+                                       errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                       errmsg("replication slots must not be invalidated during the upgrade"),
+                                       errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

The message for errhint is not a complete sentence. And errmsg is not
in telegraph style. The first attached makes minimum changes.

However, if allowed, I'd like to propose an alternative set of
messages as follows:

+                                       errmsg("replication slot is invalidated during upgrade"),
+                                       errhint("Set \"max_slot_wal_keep_size\" to -1 to avoid invalidation."));

The second attached does this.

What do you think about those?

IIUC the only possible way to reach this error (according to the
comment preceding it) is by the user overriding the GUC value (which
was defaulted -1) on the command line.

Yeah, this is my understanding as well.

+ /*
+ * The logical replication slots shouldn't be invalidated as
+ * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+ *
+ * The following is just a sanity check.
+ */

Given that, I felt a more relevant msg/hint might be like:

errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"),
errhint("Do not override \"max_slot_wal_keep_size\" using command line
options."));

But OTOH, we don't have a value of user-passed options to ensure that.
So, how about a slightly different message: "This can be caused by
overriding \"max_slot_wal_keep_size\" using command line options." or
something along those lines? I see a somewhat similar message in the
existing code (errhint("This can be caused ...")).

--
With Regards,
Amit Kapila.

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#3)
Re: A recent message added to pg_upgade

On Fri, Oct 27, 2023 at 9:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 27, 2023 at 8:52 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Oct 27, 2023 at 8:28 AM Kyotaro Horiguchi:
The above errhint LGTM. How about a slightly different errmsg, like
the following?

+                    errmsg("cannot invalidate replication slots when
in binary upgrade mode"),
+                    errhint("Set \"max_slot_wal_keep_size\" to -1 to
avoid invalidation."));

".... when in binary upgrade mode" is being used in many places.

By this time slot may be already invalidated, so how about:
"replication slot was invalidated when in binary upgrade mode"?

In this error spot, the is invalidated in memory but the invalidated
state is not persisted to disk which happens after somewhere later:

else
{
/*
* We hold the slot now and have already invalidated it; flush it
* to ensure that state persists.
*

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#7Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#5)
Re: A recent message added to pg_upgade

On Fri, Oct 27, 2023 at 9:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"),
errhint("Do not override \"max_slot_wal_keep_size\" using command line
options."));

But OTOH, we don't have a value of user-passed options to ensure that.
So, how about a slightly different message: "This can be caused by
overriding \"max_slot_wal_keep_size\" using command line options." or
something along those lines? I see a somewhat similar message in the
existing code (errhint("This can be caused ...")).

I get it. I think having errdetail explaining the possible cause of
the error is wanted here, something like:

errmsg("cannot invalidate replication slots when in binary upgrade mode"),
errdetail("This can be caused by overriding \"max_slot_wal_keep_size\"
using command line options."));
errhint("Do not override or set \"max_slot_wal_keep_size\" to -1 ."));

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#5)
Re: A recent message added to pg_upgade

At Fri, 27 Oct 2023 09:51:43 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in

On Fri, Oct 27, 2023 at 9:37 AM Peter Smith <smithpb2250@gmail.com> wrote:

IIUC the only possible way to reach this error (according to the
comment preceding it) is by the user overriding the GUC value (which
was defaulted -1) on the command line.

Yeah, this is my understanding as well.

+ /*
+ * The logical replication slots shouldn't be invalidated as
+ * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+ *
+ * The following is just a sanity check.
+ */

Given that, I felt a more relevant msg/hint might be like:

errmsg("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"),
errhint("Do not override \"max_slot_wal_keep_size\" using command line
options."));

But OTOH, we don't have a value of user-passed options to ensure that.
So, how about a slightly different message: "This can be caused by
overriding \"max_slot_wal_keep_size\" using command line options." or
something along those lines? I see a somewhat similar message in the
existing code (errhint("This can be caused ...")).

The suggested error message looks to me like that of the GUC
mechanism. While I don't have the wider picture about the feature,
might we consider rejecting the parameter setting? With that
modification, this message can be changed to elog one.

I believe it's somewhat inconsiderate to point out what shouldn't have
been done only after a problem has occurred.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#1)
Re: A recent message added to pg_upgade

On 2023-Oct-27, Kyotaro Horiguchi wrote:

@@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
{
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					errmsg("replication slots must not be invalidated during the upgrade"),
-					errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

Hmm, if I read this code right, this error is going to be thrown by the
checkpointer while finishing a checkpoint. Fortunately, the checkpoint
record has already been written, but I don't know what would happen if
this is thrown while trying to write the shutdown checkpoint. Probably
nothing terribly good.

I don't think this is useful. If the setting is invalid during binary
upgrade, let's prevent it from being set at all right from the start of
the upgrade process. In InvalidatePossiblyObsoleteSlot() we could have
just an Assert() or elog(PANIC).

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#9)
Re: A recent message added to pg_upgade

On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Oct-27, Kyotaro Horiguchi wrote:

@@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
{
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                                     errmsg("replication slots must not be invalidated during the upgrade"),
-                                     errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

Hmm, if I read this code right, this error is going to be thrown by the
checkpointer while finishing a checkpoint. Fortunately, the checkpoint
record has already been written, but I don't know what would happen if
this is thrown while trying to write the shutdown checkpoint. Probably
nothing terribly good.

I don't think this is useful. If the setting is invalid during binary
upgrade, let's prevent it from being set at all right from the start of
the upgrade process.

We are already forcing the required setting
"max_slot_wal_keep_size=-1" during the upgrade similar to some of the
other settings like "full_page_writes". However, the user can provide
an option for "max_slot_wal_keep_size" in which case it will be
overridden. Now, I think (a) we can ensure that our setting always
takes precedence in this case. The other idea is (b) to parse the
user-provided options and check if "max_slot_wal_keep_size" has a
value different than expected and raise an error accordingly. Or we
can simply (c) document the usage of max_slot_wal_keep_size in the
upgrade. I am not sure whether it is worth complicating the code for
this as the user shouldn't be using such an option during the upgrade.
So, I think doing (a) and (c) could be simpler.

In InvalidatePossiblyObsoleteSlot() we could have
just an Assert() or elog(PANIC).

Yeah, we can change to either of those.

--
With Regards,
Amit Kapila.

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#10)
Re: A recent message added to pg_upgade

At Fri, 27 Oct 2023 14:57:10 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in

On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Oct-27, Kyotaro Horiguchi wrote:

@@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
{
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                                     errmsg("replication slots must not be invalidated during the upgrade"),
-                                     errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

Hmm, if I read this code right, this error is going to be thrown by the
checkpointer while finishing a checkpoint. Fortunately, the checkpoint
record has already been written, but I don't know what would happen if
this is thrown while trying to write the shutdown checkpoint. Probably
nothing terribly good.

I don't think this is useful. If the setting is invalid during binary
upgrade, let's prevent it from being set at all right from the start of
the upgrade process.

We are already forcing the required setting
"max_slot_wal_keep_size=-1" during the upgrade similar to some of the
other settings like "full_page_writes". However, the user can provide
an option for "max_slot_wal_keep_size" in which case it will be
overridden. Now, I think (a) we can ensure that our setting always
takes precedence in this case. The other idea is (b) to parse the
user-provided options and check if "max_slot_wal_keep_size" has a
value different than expected and raise an error accordingly. Or we
can simply (c) document the usage of max_slot_wal_keep_size in the
upgrade. I am not sure whether it is worth complicating the code for
this as the user shouldn't be using such an option during the upgrade.
So, I think doing (a) and (c) could be simpler.

In InvalidatePossiblyObsoleteSlot() we could have
just an Assert() or elog(PANIC).

Yeah, we can change to either of those.

This discussion seems like a bit off from my point. I suggested adding
a check for that setting when IsBinaryUpgraded is true at the GUC
level as shown in the attached patch. I believe Álvaro made a similar
suggestion. While the error message is somewhat succinct, I think it
is sufficient given the low possilibility of the scenario and the fact
that it cannot occur inadvertently.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

inhibit_m_s_w_k_s_during_upgrade.txttext/plain; charset=us-asciiDownload+15-1
#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#11)
Re: A recent message added to pg_upgade

On Mon, Oct 30, 2023 at 7:58 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Fri, 27 Oct 2023 14:57:10 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in

On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Oct-27, Kyotaro Horiguchi wrote:

@@ -1433,8 +1433,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
{
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                                     errmsg("replication slots must not be invalidated during the upgrade"),
-                                     errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

Hmm, if I read this code right, this error is going to be thrown by the
checkpointer while finishing a checkpoint. Fortunately, the checkpoint
record has already been written, but I don't know what would happen if
this is thrown while trying to write the shutdown checkpoint. Probably
nothing terribly good.

I don't think this is useful. If the setting is invalid during binary
upgrade, let's prevent it from being set at all right from the start of
the upgrade process.

We are already forcing the required setting
"max_slot_wal_keep_size=-1" during the upgrade similar to some of the
other settings like "full_page_writes". However, the user can provide
an option for "max_slot_wal_keep_size" in which case it will be
overridden. Now, I think (a) we can ensure that our setting always
takes precedence in this case. The other idea is (b) to parse the
user-provided options and check if "max_slot_wal_keep_size" has a
value different than expected and raise an error accordingly. Or we
can simply (c) document the usage of max_slot_wal_keep_size in the
upgrade. I am not sure whether it is worth complicating the code for
this as the user shouldn't be using such an option during the upgrade.
So, I think doing (a) and (c) could be simpler.

In InvalidatePossiblyObsoleteSlot() we could have
just an Assert() or elog(PANIC).

Yeah, we can change to either of those.

This discussion seems like a bit off from my point. I suggested adding
a check for that setting when IsBinaryUpgraded is true at the GUC
level as shown in the attached patch. I believe Álvaro made a similar
suggestion. While the error message is somewhat succinct, I think it
is sufficient given the low possilibility of the scenario and the fact
that it cannot occur inadvertently.

I think we can simply change that error message to assert if we want
to go with the check hook idea of yours. BTW, can we add
GUC_check_errdetail() with a better message as some of the other check
function uses? Also, I guess we can add some comments or at least
refer to the existing comments to explain the reason of such a check.

--
With Regards,
Amit Kapila.

#13Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Kyotaro Horiguchi (#11)
RE: A recent message added to pg_upgade

On Monday, October 30, 2023 10:29 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Fri, 27 Oct 2023 14:57:10 +0530, Amit Kapila <amit.kapila16@gmail.com>
wrote in

On Fri, Oct 27, 2023 at 2:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org>

wrote:

On 2023-Oct-27, Kyotaro Horiguchi wrote:

@@ -1433,8 +1433,8 @@

InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,

{
ereport(ERROR,

errcode(ERRCODE_INVALID_PARAMETER_VALUE),

- errmsg("replication slots must not

be invalidated during the upgrade"),

-

errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade"));

Hmm, if I read this code right, this error is going to be thrown by
the checkpointer while finishing a checkpoint. Fortunately, the
checkpoint record has already been written, but I don't know what
would happen if this is thrown while trying to write the shutdown
checkpoint. Probably nothing terribly good.

I don't think this is useful. If the setting is invalid during
binary upgrade, let's prevent it from being set at all right from
the start of the upgrade process.

We are already forcing the required setting
"max_slot_wal_keep_size=-1" during the upgrade similar to some of the
other settings like "full_page_writes". However, the user can provide
an option for "max_slot_wal_keep_size" in which case it will be
overridden. Now, I think (a) we can ensure that our setting always
takes precedence in this case. The other idea is (b) to parse the
user-provided options and check if "max_slot_wal_keep_size" has a
value different than expected and raise an error accordingly. Or we
can simply (c) document the usage of max_slot_wal_keep_size in the
upgrade. I am not sure whether it is worth complicating the code for
this as the user shouldn't be using such an option during the upgrade.
So, I think doing (a) and (c) could be simpler.

In InvalidatePossiblyObsoleteSlot() we could have just an Assert()
or elog(PANIC).

Yeah, we can change to either of those.

This discussion seems like a bit off from my point. I suggested adding a check
for that setting when IsBinaryUpgraded is true at the GUC level as shown in the
attached patch. I believe Álvaro made a similar suggestion. While the error
message is somewhat succinct, I think it is sufficient given the low possilibility
of the scenario and the fact that it cannot occur inadvertently.

Thanks for the diff and I think the approach basically works.

One notable behavior of this approach it will reject the GUC setting even if there
are no slots on old cluster or user set the value to a big enough value which
doesn't cause invalidation. The behavior doesn't look bad to me but just mention it
for reference.

Best Regards,
Hou zj

#14Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amit Kapila (#12)
Re: A recent message added to pg_upgade

On Mon, Oct 30, 2023 at 8:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This discussion seems like a bit off from my point. I suggested adding
a check for that setting when IsBinaryUpgraded is true at the GUC
level as shown in the attached patch. I believe Álvaro made a similar
suggestion. While the error message is somewhat succinct, I think it
is sufficient given the low possilibility of the scenario and the fact
that it cannot occur inadvertently.

I think we can simply change that error message to assert if we want
to go with the check hook idea of yours. BTW, can we add
GUC_check_errdetail() with a better message as some of the other check
function uses? Also, I guess we can add some comments or at least
refer to the existing comments to explain the reason of such a check.

Will the check_hook approach work correctly? I haven't checked that by
myself, but I see InitializeGUCOptions() getting called before
IsBinaryUpgrade is set to true and the passed-in config options ('c')
are parsed.

If the check_hook approach works correctly, I think we must add a test
hitting the error in check_max_slot_wal_keep_size for the
IsBinaryUpgrade case. And, I agree with Amit to have a detailed
messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV,
leaving the error message in InvalidatePossiblyObsoleteSlot() there
(if required with a better wording as discussed initially in this
thread) does no harm. Actually, it acts as another safety net given
that max_slot_wal_keep_size GUC is reloadable via SIGHUP.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#12)
Re: A recent message added to pg_upgade

At Mon, 30 Oct 2023 08:51:18 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in

I think we can simply change that error message to assert if we want
to go with the check hook idea of yours. BTW, can we add
GUC_check_errdetail() with a better message as some of the other check
function uses? Also, I guess we can add some comments or at least
refer to the existing comments to explain the reason of such a check.

Definitely. I've attached the following changes.

1. Added GUC_check_errdetail() to the check function.

2. Added a comment to the check function (based on my knowledge about
the feature).

3. Altered the ereport() into Assert() in
InvalidatePossiblyObsoleteSlot(). I considered removing the
!SlotIsLogical() condition since pg_upgrade always sets
max_slot_wal_keep_size to -1, but I left it unchanged.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

inhibit_m_s_w_k_s_during_upgrade_2.txttext/plain; charset=us-asciiDownload+29-12
#16Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#13)
Re: A recent message added to pg_upgade

At Mon, 30 Oct 2023 03:36:41 +0000, "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com> wrote in

Thanks for the diff and I think the approach basically works.

One notable behavior of this approach it will reject the GUC setting even if there
are no slots on old cluster or user set the value to a big enough value which
doesn't cause invalidation. The behavior doesn't look bad to me but just mention it
for reference.

Indeed. pg_upgrade anyway sets the variable to -1 irrespective of the
slot's existence, and I see no justification for allowing users to
forcibly change it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#17Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#14)
Re: A recent message added to pg_upgade

At Mon, 30 Oct 2023 12:38:47 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

Will the check_hook approach work correctly? I haven't checked that by
myself, but I see InitializeGUCOptions() getting called before
IsBinaryUpgrade is set to true and the passed-in config options ('c')
are parsed.

I'm not sure about the wanted behavior exactly, but the fast you
pointed doesn't matter because the check is required after parsing the
command line options. On the other hand, I'm not sure about the
behavior that a setting in postgresql.conf is rejected.

If the check_hook approach works correctly, I think we must add a test
hitting the error in check_max_slot_wal_keep_size for the
IsBinaryUpgrade case. And, I agree with Amit to have a detailed
messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV,
leaving the error message in InvalidatePossiblyObsoleteSlot() there
(if required with a better wording as discussed initially in this
thread) does no harm. Actually, it acts as another safety net given
that max_slot_wal_keep_size GUC is reloadable via SIGHUP.

The error message, which is deemed impossible, adds an additional
message translation. In another thread, we are discussing the
reduction of translatable messages. Therefore, I suggest using elog()
for the condition at the very least. Whether it should be elog() or
Assert() remains open for discussion, as I don't have a firm stance on
it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#17)
Re: A recent message added to pg_upgade

On Mon, Oct 30, 2023 at 1:42 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 30 Oct 2023 12:38:47 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

Will the check_hook approach work correctly? I haven't checked that by
myself, but I see InitializeGUCOptions() getting called before
IsBinaryUpgrade is set to true and the passed-in config options ('c')
are parsed.

I'm not sure about the wanted behavior exactly, but the fast you
pointed doesn't matter because the check is required after parsing the
command line options. On the other hand, I'm not sure about the
behavior that a setting in postgresql.conf is rejected.

Yeah. The check_hook is called even after the param is specified in
postgresql.conf during the upgrade, so I see no problem there.

If the check_hook approach works correctly, I think we must add a test
hitting the error in check_max_slot_wal_keep_size for the
IsBinaryUpgrade case. And, I agree with Amit to have a detailed
messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV,
leaving the error message in InvalidatePossiblyObsoleteSlot() there
(if required with a better wording as discussed initially in this
thread) does no harm. Actually, it acts as another safety net given
that max_slot_wal_keep_size GUC is reloadable via SIGHUP.

The error message, which is deemed impossible, adds an additional
message translation. In another thread, we are discussing the
reduction of translatable messages. Therefore, I suggest using elog()
for the condition at the very least. Whether it should be elog() or
Assert() remains open for discussion, as I don't have a firm stance on
it.

I get it. I agree to go with just the assert because the GUC
check_hook kinda tightens the screws against setting
max_slot_wal_keep_size to a value other than -1 during the binary
upgrade,

A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt:

1.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Bharath Rupireddy (#14)
RE: A recent message added to pg_upgade

Dear Bharath,

Will the check_hook approach work correctly?

I tested by using the first version and worked well (rejected). Please see the
log which recorded the output and log. Below lines were copied from server
log and found that max_slot_wal_keep_size must not be >= 0.

```
waiting for server to start....2023-10-30 08:53:32.529 GMT [6903] FATAL: invalid value for parameter "max_slot_wal_keep_size": 1
stopped waiting
pg_ctl: could not start serve
```

I haven't checked that by
myself, but I see InitializeGUCOptions() getting called before
IsBinaryUpgrade is set to true and the passed-in config options ('c')
are parsed.

I thought the key point was that user-defined options are aligned after the "-b".
User-defined options are set after the '-b' option, so check_hook could work
as we expected. Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

output.txttext/plain; name=output.txtDownload
pg_upgrade_server.logapplication/octet-stream; name=pg_upgrade_server.logDownload
#20Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#18)
Re: A recent message added to pg_upgade

On Mon, Oct 30, 2023 at 2:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Never mind. Previous message was accidentally sent before I finished
writing my comments.

Yeah. The check_hook is called even after the param is specified in
postgresql.conf during the upgrade, so I see no problem there.

The error message, which is deemed impossible, adds an additional
message translation. In another thread, we are discussing the
reduction of translatable messages. Therefore, I suggest using elog()
for the condition at the very least. Whether it should be elog() or
Assert() remains open for discussion, as I don't have a firm stance on
it.

I get it. I agree to go with just the assert because the GUC
check_hook kinda tightens the screws against setting
max_slot_wal_keep_size to a value other than -1 during the binary
upgrade,

A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt:

1.
+ * All WAL files on the publisher node must be retained during an upgrade to
+ * maintain connections from downstreams.  While pg_upgrade explicitly sets

It's not just the publisher, anyone using logical slots. Also, no
downstream please. If you want, you can repurpose the comment that's
added by 29d0a77f.

/*
* Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
* checkpointer process. If WALs required by logical replication slots
* are removed, the slots are unusable. This setting prevents the
* invalidation of slots during the upgrade. We set this option when
* cluster is PG17 or later because logical replication slots can only be
* migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
*/

2.
At present, only logical slots really require
+ * this.

Can we remove the above comment as the code with SlotIsLogical(s)
explains it all?

3.
+        GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set
to -1 during the upgrade.");
+        return false;

How about we be explicit like the following which helps users reason
about this restriction instead of them looking at the comments/docs?

GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE);
GUC_check_errmsg(""\"max_slot_wal_keep_size\" must be set
to -1 when in binary upgrade mode");
GUC_check_errdetail("A value of -1 prevents the removal of
WAL required for logical slots upgrade.");
return false;

4. I think a test case to hit the error in the check hook in
003_logical_slots.pl will help a lot here - not only covers the code
but also helps demonstrate how one can reach the error.

5. I think the check_hook is better defined in xlog.c the place where
it's actually being declared and in action. IMV, there's no reason for
it to be in slot.c as it doesn't deal with any slot related
variables/structs. This also avoids an unnecessary "utils/guc_hooks.h"
inclusion in slot.c.
+bool
+check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source)
+{

5. A possible problem with this check_hook approach is that it doesn't
let anyone setting max_slot_wal_keep_size to a value other than -1
during pg_ugprade even if someone doesn't have logical slots or
doesn't want to upgrade logical slots in which case the WAL file
growth during pg_upgrade may be huge (transiently) unless the
pg_resetwal step of pg_upgrade removes it at the end.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#21Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#20)
#22Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#21)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Bharath Rupireddy (#22)
#24Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Kyotaro Horiguchi (#21)
#25Peter Smith
smithpb2250@gmail.com
In reply to: Kyotaro Horiguchi (#21)
#26Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Peter Smith (#25)
#27Peter Smith
smithpb2250@gmail.com
In reply to: Kyotaro Horiguchi (#26)
#28Peter Smith
smithpb2250@gmail.com
In reply to: Peter Smith (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Peter Smith (#28)
#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#29)
#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#30)
#32Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#31)
#33Michael Paquier
michael@paquier.xyz
In reply to: Peter Smith (#32)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#34)
#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#35)
#37Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#36)
#38Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#37)
#39Peter Smith
smithpb2250@gmail.com
In reply to: Michael Paquier (#38)
#40Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#37)
#41Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#40)
#42Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#41)
#43Michael Paquier
michael@paquier.xyz
In reply to: Peter Smith (#39)
#44Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#43)
#45Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#44)
#46Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Kyotaro Horiguchi (#26)
#47Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#26)
#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#47)
#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Kapila (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#45)
#51Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#50)
#52Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#51)
#53Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#52)
#54Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#53)
#55Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#54)
#56Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#55)
#57Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#55)
#58Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#57)
#59Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#58)
#60Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#59)
#61Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#60)
#62Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#59)
#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#62)
#64Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#63)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#48)
#66Dilip Kumar
dilipbalaut@gmail.com
In reply to: Tom Lane (#65)
#67Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#66)
#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dilip Kumar (#67)
#69Dilip Kumar
dilipbalaut@gmail.com
In reply to: Tom Lane (#68)
#70Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#69)
#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#69)
#72Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#71)
#73Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#72)
#74Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#73)
#75Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#74)
#76Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dilip Kumar (#75)
#77Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#76)
#78Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#77)
#79Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#78)
#80Dilip Kumar
dilipbalaut@gmail.com
In reply to: Tom Lane (#79)
#81vignesh C
vignesh21@gmail.com
In reply to: Dilip Kumar (#77)
#82Dilip Kumar
dilipbalaut@gmail.com
In reply to: vignesh C (#81)
#83Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#82)
#84vignesh C
vignesh21@gmail.com
In reply to: Dilip Kumar (#83)
#85Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#84)
#86Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#85)
#87vignesh C
vignesh21@gmail.com
In reply to: Dilip Kumar (#83)
#88Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#87)
#89Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#88)