is_superuser versus set_config_option's parallelism check

Started by Tom Laneover 1 year ago10 messages
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I came across this misbehavior:

regression=# create or replace function foo() returns text as
$$select current_setting('role')$$ language sql
parallel safe set role = postgres;
CREATE FUNCTION
regression=# select foo();
foo
----------
postgres
(1 row)

regression=# set debug_parallel_query to 1;
SET
regression=# select foo();
ERROR: cannot set parameters during a parallel operation
CONTEXT: parallel worker

What is failing is the attempt to update the "is_superuser" GUC
as a side-effect of setting "role". That's coming from here:

/*
* GUC_ACTION_SAVE changes are acceptable during a parallel operation,
* because the current worker will also pop the change. We're probably
* dealing with a function having a proconfig entry. Only the function's
* body should observe the change, and peer workers do not share in the
* execution of a function call started by this worker.
*
* Other changes might need to affect other workers, so forbid them.
*/
if (IsInParallelMode() && changeVal && action != GUC_ACTION_SAVE)
// throw error

Since we're using GUC_ACTION_SET to set "is_superuser", this spits up.

The simplest fix would be to hack this test to allow the action anyway
when context == PGC_INTERNAL, excusing that as "assume the caller
knows what it's doing". That feels pretty grotty though. Perhaps
a cleaner way would be to move this check to some higher code level,
but I'm not sure where would be a good place.

Thoughts?

regards, tom lane

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#1)
Re: is_superuser versus set_config_option's parallelism check

On Fri, Aug 09, 2024 at 02:43:59PM -0400, Tom Lane wrote:

The simplest fix would be to hack this test to allow the action anyway
when context == PGC_INTERNAL, excusing that as "assume the caller
knows what it's doing". That feels pretty grotty though. Perhaps
a cleaner way would be to move this check to some higher code level,
but I'm not sure where would be a good place.

From a couple of quick tests, it looks like setting
"current_role_is_superuser" directly works. That's still grotty, but at
least the grottiness would be localized and not require broad assumptions
about callers knowing what they're doing when using PGC_INTERNAL. I
wouldn't be surprised if there are other problems with this approach, too.

--
nathan

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#2)
Re: is_superuser versus set_config_option's parallelism check

Nathan Bossart <nathandbossart@gmail.com> writes:

On Fri, Aug 09, 2024 at 02:43:59PM -0400, Tom Lane wrote:

The simplest fix would be to hack this test to allow the action anyway
when context == PGC_INTERNAL, excusing that as "assume the caller
knows what it's doing". That feels pretty grotty though. Perhaps
a cleaner way would be to move this check to some higher code level,
but I'm not sure where would be a good place.

From a couple of quick tests, it looks like setting
"current_role_is_superuser" directly works.

Yeah, I had been thinking along the same lines. Here's a draft
patch. (Still needs some attention to nearby comments, and I can't
avoid the impression that the miscinit.c code in this area could
use refactoring.)

A problem with this is that it couldn't readily be back-patched
further than v14, since we didn't have ReportChangedGUCOptions
before that. Maybe that doesn't matter; given the lack of
previous complaints, maybe we only need to fix this in HEAD.

regards, tom lane

Attachments:

v1-dont-use-SetConfigOption-for-is-superuser.patchtext/x-diff; charset=us-ascii; name=v1-dont-use-SetConfigOption-for-is-superuser.patchDownload+32-14
#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#3)
Re: is_superuser versus set_config_option's parallelism check

On Fri, Aug 09, 2024 at 04:04:15PM -0400, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

From a couple of quick tests, it looks like setting
"current_role_is_superuser" directly works.

Yeah, I had been thinking along the same lines. Here's a draft
patch. (Still needs some attention to nearby comments, and I can't
avoid the impression that the miscinit.c code in this area could
use refactoring.)

Hm. That's a bit more code than I expected.

A problem with this is that it couldn't readily be back-patched
further than v14, since we didn't have ReportChangedGUCOptions
before that. Maybe that doesn't matter; given the lack of
previous complaints, maybe we only need to fix this in HEAD.

Another option might be to introduce a new GUC flag or source for anything
we want to bypass the check (perhaps with the stipulation that it must also
be marked PGC_INTERNAL). I think a new flag would require moving the
parallel check down a stanza, but that seems fine. A new source would
allow us to limit the damage to specific SetConfigOption() call-sites, but
I haven't thought through that idea fully.

--
nathan

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#4)
Re: is_superuser versus set_config_option's parallelism check

Nathan Bossart <nathandbossart@gmail.com> writes:

On Fri, Aug 09, 2024 at 04:04:15PM -0400, Tom Lane wrote:

Yeah, I had been thinking along the same lines. Here's a draft
patch. (Still needs some attention to nearby comments, and I can't
avoid the impression that the miscinit.c code in this area could
use refactoring.)

Hm. That's a bit more code than I expected.

Yeah. I can see a couple of points of attraction in this, but
they're not strong:

* Fewer cycles involved in setting session_authorization or role.
But nobody has really complained that those are slow.

* Gets us out from any other gotchas that may exist or be added
in the SetConfigOption code path, not just this one point.
This is mostly hypothetical, and a regression test case or two
would likely catch any new problems anyway.

Another option might be to introduce a new GUC flag or source for anything
we want to bypass the check (perhaps with the stipulation that it must also
be marked PGC_INTERNAL).

A new GUC flag seems like a promising approach, and better than
giving a blanket exemption to PGC_INTERNAL. At least for
is_superuser, there's no visible value in restricting which
SetConfigOption calls can change it; they'd all need the escape
hatch.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: is_superuser versus set_config_option's parallelism check

I wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

Another option might be to introduce a new GUC flag or source for anything
we want to bypass the check (perhaps with the stipulation that it must also
be marked PGC_INTERNAL).

A new GUC flag seems like a promising approach, and better than
giving a blanket exemption to PGC_INTERNAL. At least for
is_superuser, there's no visible value in restricting which
SetConfigOption calls can change it; they'd all need the escape
hatch.

Here's a draft patch to fix it with a flag, now with regression tests.

Also, now that the error depends on which parameter we're talking
about, I thought it best to include the parameter name in the error
and to re-word it to be more like all the other can't-set-this-now
errors just below it. I'm half tempted to change the errcode and
set_config_option return value to match the others too, ie
ERRCODE_CANT_CHANGE_RUNTIME_PARAM and "return 0" not -1.
I don't think the existing choices here are very well thought
through, and they're certainly inconsistent with a lot of
otherwise-similar-seeming refusals in set_config_option.

regards, tom lane

Attachments:

v1-mark-is_superuser-as-settable-in-parallel.patchtext/x-diff; charset=us-ascii; name=v1-mark-is_superuser-as-settable-in-parallel.patchDownload+74-13
#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#6)
Re: is_superuser versus set_config_option's parallelism check

On Fri, Aug 09, 2024 at 06:50:14PM -0400, Tom Lane wrote:

Here's a draft patch to fix it with a flag, now with regression tests.

Looks reasonable.

Also, now that the error depends on which parameter we're talking
about, I thought it best to include the parameter name in the error
and to re-word it to be more like all the other can't-set-this-now
errors just below it. I'm half tempted to change the errcode and
set_config_option return value to match the others too, ie
ERRCODE_CANT_CHANGE_RUNTIME_PARAM and "return 0" not -1.
I don't think the existing choices here are very well thought
through, and they're certainly inconsistent with a lot of
otherwise-similar-seeming refusals in set_config_option.

This comment for set_config_option() leads me to think we should be
returning -1 instead of 0 in many more places in set_config_with_handle():

* Return value:
* +1: the value is valid and was successfully applied.
* 0: the name or value is invalid (but see below).
* -1: the value was not applied because of context, priority, or changeVal.

But I haven't thought through it, either. At this point, maybe the comment
is wrong...

--
nathan

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#7)
Re: is_superuser versus set_config_option's parallelism check

Nathan Bossart <nathandbossart@gmail.com> writes:

On Fri, Aug 09, 2024 at 06:50:14PM -0400, Tom Lane wrote:

Also, now that the error depends on which parameter we're talking
about, I thought it best to include the parameter name in the error
and to re-word it to be more like all the other can't-set-this-now
errors just below it. I'm half tempted to change the errcode and
set_config_option return value to match the others too, ie
ERRCODE_CANT_CHANGE_RUNTIME_PARAM and "return 0" not -1.

This comment for set_config_option() leads me to think we should be
returning -1 instead of 0 in many more places in set_config_with_handle():

* Return value:
* +1: the value is valid and was successfully applied.
* 0: the name or value is invalid (but see below).
* -1: the value was not applied because of context, priority, or changeVal.

But I haven't thought through it, either. At this point, maybe the comment
is wrong...

I poked through all the call sites. The only one that makes a
distinction between 0 and -1 is ProcessConfigFileInternal(),
and what it thinks is:

else if (scres == 0)
{
error = true;
item->errmsg = pstrdup("setting could not be applied");
ConfFileWithError = item->filename;
}
else
{
/* no error, but variable's active value was not changed */
item->applied = true;
}

Now, I don't believe that ProcessConfigFileInternal is ever executed
while IsInParallelMode, so it appears that no caller really cares
about which return code this case would return. However, if you
look through set_config_with_handle the general pattern is that
we "return 0" after any ereport call (either one directly in that
function, or one in a called function). Getting to those of course
implies that elevel is too low to throw an error; but we did think
there was an error condition. We "return -1" in cases where we didn't
ereport anything. So I am still of the opinion that the -1 usage here
is inconsistent, even if it happens to not make a difference today.

Yeah, the header comment could stand to be improved to make this
clearer. I think there are more conditions being checked now than
existed when the comment was written. But the para right below the
bit you quoted is pretty clear that "return 0" is associated with
an ereport.

Maybe

* Return value:
* +1: the value is valid and was successfully applied.
* 0: the name or value is invalid, or it's invalid to try to set
* this GUC now; but elevel was less than ERROR (see below).
* -1: no error detected, but the value was not applied, either
* because changeVal is false or there is some overriding value.

regards, tom lane

#9Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#8)
Re: is_superuser versus set_config_option's parallelism check

On Sat, Aug 10, 2024 at 12:57:36PM -0400, Tom Lane wrote:

Yeah, the header comment could stand to be improved to make this
clearer. I think there are more conditions being checked now than
existed when the comment was written. But the para right below the
bit you quoted is pretty clear that "return 0" is associated with
an ereport.

Ah, sorry, ENOCOFFEE.

Maybe

* Return value:
* +1: the value is valid and was successfully applied.
* 0: the name or value is invalid, or it's invalid to try to set
* this GUC now; but elevel was less than ERROR (see below).
* -1: no error detected, but the value was not applied, either
* because changeVal is false or there is some overriding value.

Nevertheless, I think this is an improvement.

Regarding returning 0 instead of -1 for the parallel case, I think that
follows. While doing some additional research, I noticed this return value
was just added in December (commit 059de3c [0]/messages/by-id/2089235.1703617353@sss.pgh.pa.us). Before that, it
apparently assumed that elevel >= ERROR. With that and your analysis of
the call sites, it seems highly unlikely that changing it will cause any
problems.

For the errcode, I do see that we pretty consistently use
ERRCODE_INVALID_TRANSACTION_STATE for "can't do thing during a parallel
operation." In fact, it looks like all but one use is for parallel errors.
I don't have any particular qualms about changing it to
ERRCODE_CANT_CHANGE_RUNTIME_PARAM in set_config_with_handle(), but I
thought that was interesting.

[0]: /messages/by-id/2089235.1703617353@sss.pgh.pa.us

--
nathan

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#9)
Re: is_superuser versus set_config_option's parallelism check

Nathan Bossart <nathandbossart@gmail.com> writes:

Regarding returning 0 instead of -1 for the parallel case, I think that
follows. While doing some additional research, I noticed this return value
was just added in December (commit 059de3c [0]). Before that, it
apparently assumed that elevel >= ERROR. With that and your analysis of
the call sites, it seems highly unlikely that changing it will cause any
problems.

Hah ... so the failure to think clearly about which value to use
was mine :-(.

For the errcode, I do see that we pretty consistently use
ERRCODE_INVALID_TRANSACTION_STATE for "can't do thing during a parallel
operation." In fact, it looks like all but one use is for parallel errors.

OK, I'll leave that alone but will change the return code.

regards, tom lane