Parallel worker error
Hello,
While investing an issue in Postgres-XL 10, I came across this rather
surprisingly behaviour in PG master. See this test case:
create role testuser1;
set role to testuser1;
show role; -- shows testuser1
-- reset back to default role
reset session authorization ;
show role; -- shows none
set force_parallel_mode TO 1;
select count(*) from pg_class ; -- ok
-- now drop the role and try parallel query again
drop role testuser1 ;
select count(*) from pg_class ; -- fails
The last statement in this test fails with an error:
ERROR: role "testuser1" does not exist
CONTEXT: parallel worker
Looks like the leader process when serialises the GUC state, saves the
"role" value, which is still set to testuser1 (and which is now dropped).
Later, when the parallel worker process tries to restore the GUC state, it
fails on catalog lookup.
Comments in show_role() mentions a kludge because apparently SET SESSION
AUTHORIZATION cannot call set_config_option and change the current value of
"role". So that probably explains why show_role() displays the correct
information, but parallel worker fails to handle the case.
It's quite possible that I don't understand the differences in "role" and
"session authorization", but it still looks like a bug to me. May
be SerializeGUCState() should check if SetRoleIsActive is true and only
then save the role information?
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Aug 30, 2017 at 6:58 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
It's quite possible that I don't understand the differences in "role" and
"session authorization", but it still looks like a bug to me. May be
SerializeGUCState() should check if SetRoleIsActive is true and only then
save the role information?
Ugh. Well, this is definitely a bug, but I'm not sure if that's the right fix.
Mutually interdependent GUCs are bad news.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
The last statement in this test fails with an error:
ERROR: role "testuser1" does not exist
CONTEXT: parallel worker
I can duplicate this in HEAD and v10, but not 9.6, so I've added it
as an open issue for v10. No idea what broke it.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 5:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 30, 2017 at 6:58 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:It's quite possible that I don't understand the differences in "role" and
"session authorization", but it still looks like a bug to me. May be
SerializeGUCState() should check if SetRoleIsActive is true and only then
save the role information?Ugh. Well, this is definitely a bug, but I'm not sure if that's the right fix.
Yeah.
Mutually interdependent GUCs are bad news.
I am able to reproduce this without involving session authorization
guc as well. One needs to drop the newly created role from another
session, then also we can see the same error.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
I can duplicate this in HEAD and v10, but not 9.6, so I've added it
as an open issue for v10. No idea what broke it.
Oh, scratch that: the issue exists in 9.6, it's just that the particular
test query you're using here does not generate a parallelized plan in
9.6, as shown by "explain". With a different query that does get
parallelized, 9.6 fails too:
regression=# select sum(ten) from tenk1;
ERROR: role "testuser1" does not exist
CONTEXT: parallel worker
Still, I think it's reasonable to characterize this as "must fix for v10".
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 5:19 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
I am able to reproduce this without involving session authorization
guc as well. One needs to drop the newly created role from another
session, then also we can see the same error.
Yeah, that's how I first created the case. But concurrent dropping of role
(which is surprisingly allowed even when there are active connections with
the role active) opens another set of errors. Hence I tried to reproduce
this in a single session. While it might be interesting to fix the
concurrent role drop problem someday, the single session problem looks more
pressing.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Amit Kapila <amit.kapila16@gmail.com> writes:
I am able to reproduce this without involving session authorization
guc as well. One needs to drop the newly created role from another
session, then also we can see the same error.
Hm. I suspect the basic shape of what's happening here is "an existing
session can continue to run with OuterUserId corresponding to a dropped
role, but we fail when trying to duplicate that state into a parallel
worker". I wonder whether there aren't similar gotchas for other GUCs
whose interpretation depends on catalog lookups, eg search_path.
We might need to redesign the GUC-propagation mechanism so it sends
the various internal representations of GUC values, not the user-visible
strings. (I'm thinking of the blobs that guc.c can use to restore a
previous state at transaction abort ... don't recall what the code
calls them ATM.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 8:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We might need to redesign the GUC-propagation mechanism so it sends
the various internal representations of GUC values, not the user-visible
strings.
That would probably be better in the long run, but I'm not keen to do
it in a back-branch under time pressure. I think one approach is to
stick another test against InitializingParallelWorker into check_role,
making it treat that case like no-role-specified. But I'm a little
worried about whether that would ever lead to a case where the role
should get set and doesn't, leading to a security bug.
Another approach is to attack this problem right at the root, which
IMHO is that changing "session_authorization" is implicitly setting
"role", but for the reasons mentioned in the comment in show_role(),
it doesn't set it explicitly. Well, that results in a situation where
the actual value assigned to the GUC doesn't match the value that
needs to be passed to the worker, so it breaks. I'm not sure what
would be a practical approach to that problem, but it's a thought.
A third approach is to do something like what you're proposing but
just limited to "role". In particular, the obvious thing to do would
be exclude it from SerializeGUCState and somehow propagate it as part
of FixedParallelState, which already has authenticated_user_id and
current_user_id.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 30, 2017 at 8:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We might need to redesign the GUC-propagation mechanism so it sends
the various internal representations of GUC values, not the user-visible
strings.
That would probably be better in the long run, but I'm not keen to do
it in a back-branch under time pressure.
Definitely a valid objection. But before assuming that this issue is
limited to SET ROLE, it'd be wise to push a bit on the other GUCs with
catalog-dependent values, to see if there are any others we need to
worry about. I"m okay with a narrow solution if SET ROLE really is
the only problem, but at this stage I'm not convinced of that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 30, 2017 at 8:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We might need to redesign the GUC-propagation mechanism so it sends
the various internal representations of GUC values, not the user-visible
strings.That would probably be better in the long run, but I'm not keen to do
it in a back-branch under time pressure.Definitely a valid objection. But before assuming that this issue is
limited to SET ROLE, it'd be wise to push a bit on the other GUCs with
catalog-dependent values, to see if there are any others we need to
worry about. I"m okay with a narrow solution if SET ROLE really is
the only problem, but at this stage I'm not convinced of that.
I don't think the problem with role is that it's catalog-dependent,
but that the effective value is changed by code that never calls
SetConfigOption() or set_config_option() or anything other mechanism
that the GUC code knows about. That coding pattern is known to be
broken (see the commit message for
a16d421ca4fc639929bc964b2585e8382cf16e33, for example) and my bet is
the only reason why set role has gotten by with it for so long is
because the code was written a long time ago and nobody wants to risk
breaking anything by trying to clean it up. It's almost unfair to
blame this on parallel query; if someone were to submit a patch
tomorrow that changes the effective value of a GUC without going
through SetConfigOption(), you'd punt it into outer space at
relativistic speed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 30, 2017 at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I"m okay with a narrow solution if SET ROLE really is
the only problem, but at this stage I'm not convinced of that.
I don't think the problem with role is that it's catalog-dependent,
but that the effective value is changed by code that never calls
SetConfigOption() or set_config_option() or anything other mechanism
that the GUC code knows about.
It's certainly possible that that's a contributing factor, but the
variant that Amit alluded to demonstrates that catalog dependency
is part of the problem:
regression=# create user testuser1;
CREATE ROLE
regression=# \c - testuser1
You are now connected to database "regression" as user "testuser1".
-- in a different session, do "drop user testuser1;", then:
regression=> show role;
role
------
none
(1 row)
regression=> show session authorization;
session_authorization
-----------------------
testuser1
(1 row)
regression=> select count(*) from pg_class;
count
-------
1113
(1 row)
regression=> set force_parallel_mode TO 1;
SET
regression=> select count(*) from pg_class;
ERROR: role with OID 110981 does not exist
CONTEXT: parallel worker
regression=> set force_parallel_mode TO 0;
SET
regression=> select count(*) from pg_class;
count
-------
1113
(1 row)
The problem here is exactly that we cannot transmit the leader's
state to the worker. You can't blame it on SET ROLE, because
I didn't do one.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The problem here is exactly that we cannot transmit the leader's
state to the worker. You can't blame it on SET ROLE, because
I didn't do one.
Hmm, that's a good reason for holding it blameless. In this case,
I'll blame the fact that we allow a role to be dropped while there are
users connected using that role. That's about as sensible as allowing
a table to be dropped while there are users reading from it, or a
database to be dropped while there are users connected to it, both of
which we in fact prohibit. IOW, I'd say the problem is that we've
allowed the system state to become inconsistent and now we're sad
because stuff doesn't work.
But since that's an established design fl^H^Hprinciple, maybe that
means we should go with the approach of teaching SerializeGUCState()
to ignore role altogether and instead have ParallelWorkerMain call
SetCurrentRoleId using information passed via the FixedParallelState
(not sure of the precise details here).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote:
In this case,
I'll blame the fact that we allow a role to be dropped while there are
users connected using that role. That's about as sensible as allowing
a table to be dropped while there are users reading from it, or a
database to be dropped while there are users connected to it, both of
which we in fact prohibit. IOW, I'd say the problem is that we've
allowed the system state to become inconsistent and now we're sad
because stuff doesn't work.But since that's an established design fl^H^Hprinciple,
Actually, my first comment when Pavan mentioned this on IM was that we
should look into fixing that problem sometime. It's not terribly urgent
since it doesn't seem to hurt anything too badly, but it's still a bug.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Robert Haas wrote:
In this case,
I'll blame the fact that we allow a role to be dropped while there are
users connected using that role.
Actually, my first comment when Pavan mentioned this on IM was that we
should look into fixing that problem sometime. It's not terribly urgent
since it doesn't seem to hurt anything too badly, but it's still a bug.
My feeling is that it's going to be unreasonably expensive. Are we
going to take a lock every time we call a SECURITY DEFINER function?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 11:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
But since that's an established design fl^H^Hprinciple, maybe that
means we should go with the approach of teaching SerializeGUCState()
to ignore role altogether and instead have ParallelWorkerMain call
SetCurrentRoleId using information passed via the FixedParallelState
(not sure of the precise details here).
Could I get some opinions on the virtues of this approach, vs. any of
the other suggestions at or near
/messages/by-id/CA+TgmoaSP90E33-MU2YpGs73TtJ37m5Hv-xqHjc7TPqX9wX8ew@mail.gmail.com
?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 31, 2017 at 03:11:10PM -0400, Robert Haas wrote:
On Wed, Aug 30, 2017 at 11:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
But since that's an established design fl^H^Hprinciple, maybe that
means we should go with the approach of teaching SerializeGUCState()
to ignore role altogether and instead have ParallelWorkerMain call
SetCurrentRoleId using information passed via the FixedParallelState
(not sure of the precise details here).Could I get some opinions on the virtues of this approach, vs. any of
the other suggestions at or near
/messages/by-id/CA+TgmoaSP90E33-MU2YpGs73TtJ37m5Hv-xqHjc7TPqX9wX8ew@mail.gmail.com
?
It seems good to me, better than the other options in that mail. This does
rely on "role" being on the only vulnerable GUC. Looking at callers of
GUC_check_errmsg(), I don't expect trouble in any other GUC. (I suspect one
can hit the "permission denied to set role" during parallel initialization
after a concurrent ALTER ROLE removes a membership.)
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 30, 2017 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The problem here is exactly that we cannot transmit the leader's
state to the worker. You can't blame it on SET ROLE, because
I didn't do one.Hmm, that's a good reason for holding it blameless. In this case,
I'll blame the fact that we allow a role to be dropped while there are
users connected using that role.
The similar could happen with schema as well. For example, even if
you have set the schema name in the search_path of session-1, it will
still be allowed to drop the schema from another session. Now, we
will pass the dropped schema name as part of search_path to the
parallel worker. It won't lead to any error because we don't check
catalog while setting the value of search_path GUC.
I am not sure whether we need to bother about this, but I thought it
might help in choosing the approach to fix this problem.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 2, 2017 at 12:21 PM, Noah Misch <noah@leadboat.com> wrote:
On Thu, Aug 31, 2017 at 03:11:10PM -0400, Robert Haas wrote:
On Wed, Aug 30, 2017 at 11:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
But since that's an established design fl^H^Hprinciple, maybe that
means we should go with the approach of teaching SerializeGUCState()
to ignore role altogether and instead have ParallelWorkerMain call
SetCurrentRoleId using information passed via the FixedParallelState
(not sure of the precise details here).Could I get some opinions on the virtues of this approach, vs. any of
the other suggestions at or near
/messages/by-id/CA+TgmoaSP90E33-MU2YpGs73TtJ37m5Hv-xqHjc7TPqX9wX8ew@mail.gmail.com
?It seems good to me, better than the other options in that mail. This does
rely on "role" being on the only vulnerable GUC. Looking at callers of
GUC_check_errmsg(), I don't expect trouble in any other GUC. (I suspect one
can hit the "permission denied to set role" during parallel initialization
after a concurrent ALTER ROLE removes a membership.)
I think that error won't happen during parallel initialization because
of 'InitializingParallelWorker' check.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 2, 2017 at 2:51 AM, Noah Misch <noah@leadboat.com> wrote:
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
It's not really a valid v10 open item, because it's not new in v10.
But I agree it should be fixed soon.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 8:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
But since that's an established design fl^H^Hprinciple, maybe that
means we should go with the approach of teaching SerializeGUCState()
to ignore role altogether and instead have ParallelWorkerMain call
SetCurrentRoleId using information passed via the FixedParallelState
(not sure of the precise details here).
Seems like a reasonable approach to me.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services