Assert failure on running a completed portal again
Hi,
I notice that the following Assert in PortalRun fails when a same portal is
executed more than once by an Execute message whose "max number of rows"
is specified to zero, that is, "no limit".
/* Set run_once flag. Shouldn't be clear if previously set. */
Assert(!portal->run_once || run_once);
portal->run_once = run_once;
I tested this using pgproto [1]https://www.pgpool.net/docs/latest/en/html/pgproto.html in Pgpool-II.
I believe the server should return CommanComplete normally in this case.
his can be fixed by not setting execute_is_fetch flag (run_once as the result)
when the portal is already completed since no rows will be fetched in this case.
I've attached a pach in this approach.
[1]: https://www.pgpool.net/docs/latest/en/html/pgproto.html
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
Attachments:
0001-Prevent-Assert-failure-when-a-completed-portal-is-ru.patchtext/x-diff; name=0001-Prevent-Assert-failure-when-a-completed-portal-is-ru.patchDownload+5-6
On Fri, 6 Dec 2024 06:25:49 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
Hi,
I notice that the following Assert in PortalRun fails when a same portal is
executed more than once by an Execute message whose "max number of rows"
is specified to zero, that is, "no limit"./* Set run_once flag. Shouldn't be clear if previously set. */
Assert(!portal->run_once || run_once);
portal->run_once = run_once;I tested this using pgproto [1] in Pgpool-II.
I believe the server should return CommanComplete normally in this case.
his can be fixed by not setting execute_is_fetch flag (run_once as the result)
when the portal is already completed since no rows will be fetched in this case.
I've attached a pach in this approach.
Another idea is not call PortalRun in this case like the attached patch.
Which approach is better? Or, should we fix in other approach?
Regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>
--
Yugo Nagata <nagata@sraoss.co.jp>
Attachments:
portal_assert_another_way.patch.txttext/plain; name=portal_assert_another_way.patch.txtDownload+14-7
Yugo Nagata <nagata@sraoss.co.jp> writes:
I notice that the following Assert in PortalRun fails when a same portal is
executed more than once by an Execute message whose "max number of rows"
is specified to zero, that is, "no limit".
/* Set run_once flag. Shouldn't be clear if previously set. */
Assert(!portal->run_once || run_once);
portal->run_once = run_once;
Hmm. It's fairly hard to tell what was meant there, because
(a) there is exactly zero documentation of what run_once means
(b) that comment is opaque and doesn't appear to describe the
code behavior anyway.
I believe the server should return CommanComplete normally in this case.
his can be fixed by not setting execute_is_fetch flag (run_once as the result)
when the portal is already completed since no rows will be fetched in this case.
Don't like this much. I'm not sure I believe any part of this
approach to deciding whether the portal is "run once". In any
case, a proper fix for this needs to include actually documenting
what that field means and does.
regards, tom lane
I wrote:
Don't like this much. I'm not sure I believe any part of this
approach to deciding whether the portal is "run once". In any
case, a proper fix for this needs to include actually documenting
what that field means and does.
After looking at this further, I think this whole "run_once"
business is badly designed, worse documented, and redundant
(as well as buggy). It can be replaced with three self-contained
lines in ExecutePlan, as per the attached.
(Obviously, the API changes in this wouldn't do for the back
branches. But we could leave those arguments in place as
unused dummies.)
regards, tom lane
Attachments:
v1-remove-run_once-execute_once-flags.patchtext/x-diff; charset=us-ascii; name=v1-remove-run_once-execute_once-flags.patchDownload+44-70
I wrote:
After looking at this further, I think this whole "run_once"
business is badly designed, worse documented, and redundant
(as well as buggy). It can be replaced with three self-contained
lines in ExecutePlan, as per the attached.
(Obviously, the API changes in this wouldn't do for the back
branches. But we could leave those arguments in place as
unused dummies.)
Here's a cut-down version that I think would be OK to use in the
back branches.
regards, tom lane
Attachments:
v1-remove-run_once-execute_once-flags-v17.patchtext/x-diff; charset=us-ascii; name=v1-remove-run_once-execute_once-flags-v17.patchDownload+24-36
On Sun, 08 Dec 2024 14:25:53 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
After looking at this further, I think this whole "run_once"
business is badly designed, worse documented, and redundant
(as well as buggy). It can be replaced with three self-contained
lines in ExecutePlan, as per the attached.(Obviously, the API changes in this wouldn't do for the back
branches. But we could leave those arguments in place as
unused dummies.)Here's a cut-down version that I think would be OK to use in the
back branches.
I confirmed both versions of the pach fixed the issue using pgproto.
Also, I feel your fix makes the codes easier to understand at least for me.
I have a few minor comment:
* moving in the specified direction.
*
* Runs to completion if numberTuples is 0
- *
- * Note: the ctid attribute is a 'junk' attribute that is removed before the
- * user can see it
* ----------------------------------------------------------------
*/
This comment remove seems not related to the fix of ExecutePlan. Should this
actually have been done by 8a5849b7ff24c?
static void
-ExecutePlan(EState *estate,
- PlanState *planstate,
+ExecutePlan(QueryDesc *queryDesc,
bool use_parallel_mode,
CmdType operation,
bool sendTuples,
uint64 numberTuples,
ScanDirection direction,
- DestReceiver *dest,
- bool execute_once)
+ DestReceiver *dest)
{
+ EState *estate = queryDesc->estate;
+ PlanState *planstate = queryDesc->planstate;
TupleTableSlot *slot;
uint64 current_tuple_count
I guess planstate is removed due to the redundancy that this is included
in queryDesc. If so, I think we can also remove use_parallel_mode for the
same reason.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Yugo NAGATA <nagata@sraoss.co.jp> writes:
On Sun, 08 Dec 2024 14:25:53 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
- *
- * Note: the ctid attribute is a 'junk' attribute that is removed before the
- * user can see it
* ----------------------------------------------------------------
*/
This comment remove seems not related to the fix of ExecutePlan. Should this
actually have been done by 8a5849b7ff24c?
Yeah, I'm sure it should have been removed a long time ago.
Presumably it's there because ctid was once an explicit part
of ExecutePlan's API; but that's surely not been so for a long
time, and I don't see that this sentence has any remaining
relevance where it is.
You could say that getting rid of that dead comment should be
a separate commit, but I thought that was a bit too pedantic.
static void -ExecutePlan(EState *estate, - PlanState *planstate, +ExecutePlan(QueryDesc *queryDesc, bool use_parallel_mode, CmdType operation, bool sendTuples, uint64 numberTuples, ScanDirection direction, - DestReceiver *dest, - bool execute_once) + DestReceiver *dest) { + EState *estate = queryDesc->estate; + PlanState *planstate = queryDesc->planstate; TupleTableSlot *slot; uint64 current_tuple_count
I guess planstate is removed due to the redundancy that this is included
in queryDesc. If so, I think we can also remove use_parallel_mode for the
same reason.
Certainly there's a bit of aesthetic judgment involved here. In
principle, now that we're passing down the QueryDesc, we could remove
the operation and dest arguments too. I chose not to do that on
the grounds that ExecutorRun is deciding what ExecutePlan should do,
and at least in principle it could decide to pass something else
for these arguments than what it passes today. But it's hard to
see how it would pass a different EState or PlanState tree than
what is in the QueryDesc, so I thought it was okay to remove those
arguments. Somebody else writing this patch might have done that
differently, but I think that's fine.
A different line of thought is that the reason we need this change
is that the already_executed flag is in the wrong place. If we
moved it into EState, which feels like a more natural place, we
could avoid refactoring ExecutePlan's API. But that seemed more
invasive in the big picture, and it'd involve more risk in the
back branches, so I didn't go that way.
Thanks for reviewing and testing the patch!
regards, tom lane
I wrote:
Yugo NAGATA <nagata@sraoss.co.jp> writes:
I guess planstate is removed due to the redundancy that this is included
in queryDesc. If so, I think we can also remove use_parallel_mode for the
same reason.
Certainly there's a bit of aesthetic judgment involved here.
After thinking awhile longer, I'm coming around to the idea that
you're right and it'd be better to get rid of ExecutePlan's
use_parallel_mode argument. We clearly have to pass down count and
direction, and it's sensible to pass down operation, sendTuples,
and dest because ExecutorRun is involved with those to perform
dest-receiver startup and shutdown. But ExecutorRun has no real
concern with the parallelism decision, so it seems better to unify
that logic in one place.
I'll make that change before pushing.
regards, tom lane
On Thu, Dec 5, 2024 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
After looking at this further, I think this whole "run_once"
business is badly designed, worse documented, and redundant
(as well as buggy). It can be replaced with three self-contained
lines in ExecutePlan, as per the attached.
Did you look at the commit message for
691b8d59281b5177f16fe80858df921f77a8e955? IMHO, that's pretty clear
about what the goals of this were.
I didn't consider the scenario mentioned by Nagata-san in his original
post, namely, executing a portal with a row count of 0 and then doing
the same thing again. So in that sense the assertion fulfilled its
intended purpose: alerting somebody that there is a case which is
reachable but which the original programmer believed to be
unreachable.
I haven't tested, but I'm guessing that what happens in Nagata-san's
case with the committed fix is that we execute the plan once to
completion in parallel mode; and then the second execute message
executes the same plan tree again in non-parallel mode but this time
it ends up returning zero tuples because it's already been executed to
completion. If that is correct, then I think this is subtly changing
the rules for parallel Plan trees: before, the rule as I understood
it, was "only complete execution must supported" but now it's
"complete executed must be supported and, in addition, a no-op
re-execution after complete execution must be supported". Off the top
of my head that seems like it should be fine.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Dec 5, 2024 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
After looking at this further, I think this whole "run_once"
business is badly designed, worse documented, and redundant
(as well as buggy). It can be replaced with three self-contained
lines in ExecutePlan, as per the attached.
Did you look at the commit message for
691b8d59281b5177f16fe80858df921f77a8e955? IMHO, that's pretty clear
about what the goals of this were.
Well, it's not very clear about what implementation limitations we
are trying to protect.
I haven't tested, but I'm guessing that what happens in Nagata-san's
case with the committed fix is that we execute the plan once to
completion in parallel mode; and then the second execute message
executes the same plan tree again in non-parallel mode but this time
it ends up returning zero tuples because it's already been executed to
completion. If that is correct, then I think this is subtly changing
the rules for parallel Plan trees: before, the rule as I understood
it, was "only complete execution must supported" but now it's
"complete executed must be supported and, in addition, a no-op
re-execution after complete execution must be supported". Off the top
of my head that seems like it should be fine.
Hmm. As committed, the re-execution will happen with
use_parallel_mode disabled, which might make that safer, or maybe it's
less safe? Do you think we need something closer to the upthread
proposal to not run the executor at all during the "re-execution"?
(I'd still be inclined to put that short-circuit into ExecutePlan
not anything higher-level.)
My recollection is that even before parallelism we had some plan node
types that didn't cope well with being called again after they'd once
returned EOF (ie a null tuple). So maybe a defense against trying to
do that would be wise. It could probably be as simple as checking
estate->es_finished.
regards, tom lane
On Tue, Dec 10, 2024 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Did you look at the commit message for
691b8d59281b5177f16fe80858df921f77a8e955? IMHO, that's pretty clear
about what the goals of this were.Well, it's not very clear about what implementation limitations we
are trying to protect.
OK, I see that I was vague about that. The core problem here is that
when we start parallel workers, we do a one-time state transfer from
the leader to the workers. If the leader makes any changes to any of
that state, or the workers do, then they're out of sync, and very bad
things will happen. To prevent that, we need a mechanism to keep any
of that state from changing while there are active workers, and that
mechanism is parallel mode. The leader enters parallel mode (via
EnterParallelMode()) before launching any workers and must only exit
parallel mode after they're all gone; the workers run in parallel mode
the whole time. Hence, we are (hopefully) guaranteed that the relevant
state can't change under us.
Hence, the most fundamental restriction here is that ExecutePlan() had
better not reach the call to ExitParallelMode() at the end of the
function while there are still workers active. Since workers don't
exit until they've run their portion of the plan to completion (except
in case of error), we need to run the plan to completion in the leader
too before reaching ExecutePlan() escapes from the loop; otherwise,
some worker could have filled up its output queue with tuples and then
blocked waiting for us to read them and we never did. In the case at
issue, if the plan was already run to completion, then there shouldn't
be any workers active any more and trying to re-execute the completed
tree should not launch any new ones, so in theory there's no problem,
but...
Hmm. As committed, the re-execution will happen with
use_parallel_mode disabled, which might make that safer, or maybe it's
less safe? Do you think we need something closer to the upthread
proposal to not run the executor at all during the "re-execution"?
(I'd still be inclined to put that short-circuit into ExecutePlan
not anything higher-level.)
...hypothetically, there could be code which doesn't enjoy being
called on the same plan tree first in parallel mode and then later in
non-parallel mode. I can't really see any good reason for such code to
actually exist. For example, ExecGather() bases its decision about
whether to launch workers on gather->num_workers > 0 &&
estate->es_use_parallel_mode, but it looks to me like all the
decisions about whether to read from workers or close down workers are
based on what workers actually got launched without further reference
to the criteria that ExecGather() used to decide whether to launch
them in the first place. That seems like the right way for things to
be coded, and I think that probably all the things are actually coded
that way. I just wanted to point out that it wasn't theoretically
impossible for this change to run into some lower-level problem.
My recollection is that even before parallelism we had some plan node
types that didn't cope well with being called again after they'd once
returned EOF (ie a null tuple). So maybe a defense against trying to
do that would be wise. It could probably be as simple as checking
estate->es_finished.
It's not a crazy idea, but it might be better to fix those node types.
It might just be me, but I find the comments in the executor to be
pretty lacking and there's a good amount of boilerplate that seems to
have been copy-and-pasted around without the author totally
understanding what was happening. That never gets better if we just
paper over the bugs.
--
Robert Haas
EDB: http://www.enterprisedb.com
I wrote:
My recollection is that even before parallelism we had some plan node
types that didn't cope well with being called again after they'd once
returned EOF (ie a null tuple). So maybe a defense against trying to
do that would be wise.
I spent a little time trying to run that recollection to ground,
and after awhile arrived at this comment in heapam.c:
* Note: when we fall off the end of the scan in either direction, we
* reset rs_inited. This means that a further request with the same
* scan direction will restart the scan, which is a bit odd, but a
* request with the opposite scan direction will start a fresh scan
* in the proper direction. The latter is required behavior for cursors,
* while the former case is generally undefined behavior in Postgres
* so we don't care too much.
So indeed nodes types as basic as SeqScan will misbehave if that
happens. However, there is also logic in pquery.c that intends
to provide defenses against this, for instance in PortalRunSelect:
* Determine which direction to go in, and check to see if we're already
* at the end of the available tuples in that direction. If so, set the
* direction to NoMovement to avoid trying to fetch any tuples. (This
* check exists because not all plan node types are robust about being
* called again if they've already returned NULL once.) Then call the
* executor (we must not skip this, because the destination needs to see a
* setup and shutdown even if no tuples are available). Finally, update
* the portal position state depending on the number of tuples that were
* retrieved.
*/
if (forward)
{
if (portal->atEnd || count <= 0)
{
direction = NoMovementScanDirection;
count = 0; /* don't pass negative count to executor */
}
And ExecutorRun skips ExecutePlan if direction is
NoMovementScanDirection. So unless there are gaps in pquery.c's EOF
checks, I think we're OK: the "re-execution" call will not reach
ExecutePlan. This could probably do with more commentary though.
regards, tom lane
On Tue, Dec 10, 2024 at 1:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
And ExecutorRun skips ExecutePlan if direction is
NoMovementScanDirection. So unless there are gaps in pquery.c's EOF
checks, I think we're OK: the "re-execution" call will not reach
ExecutePlan. This could probably do with more commentary though.
Thanks for the research, and +1 if you feel like adding more commentary.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
Thanks for the research, and +1 if you feel like adding more commentary.
I'm thinking about something like this:
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 13f3fcdaee..7ebb17fc2e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -348,7 +348,15 @@ standard_ExecutorRun(QueryDesc *queryDesc,
dest->rStartup(dest, operation, queryDesc->tupDesc);
/*
- * run plan
+ * Run plan, unless direction is NoMovement.
+ *
+ * Note: pquery.c selects NoMovement if a prior call already reached
+ * end-of-data in the user-specified fetch direction. This is important
+ * because various parts of the executor can misbehave if called again
+ * after reporting EOF. For example, heapam.c would actually restart a
+ * heapscan and return all its data afresh. There is also reason to be
+ * concerned about whether parallel query mode would operate properly if a
+ * fresh execution request arrives after completing the query.
*/
if (!ScanDirectionIsNoMovement(direction))
ExecutePlan(queryDesc,
It's slightly annoying that the executor is depending on the Portal
machinery to guarantee that it won't do something wrong. However,
that was true in the previous formulation that relied on
Portal.run_once, too. I don't see another way that wouldn't
amount to duplicating the Portal-level state in the executor,
and I don't think it's worth doing that.
regards, tom lane
On Tue, Dec 10, 2024 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Thanks for the research, and +1 if you feel like adding more commentary.
I'm thinking about something like this:
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 13f3fcdaee..7ebb17fc2e 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -348,7 +348,15 @@ standard_ExecutorRun(QueryDesc *queryDesc, dest->rStartup(dest, operation, queryDesc->tupDesc);/* - * run plan + * Run plan, unless direction is NoMovement. + * + * Note: pquery.c selects NoMovement if a prior call already reached + * end-of-data in the user-specified fetch direction. This is important + * because various parts of the executor can misbehave if called again + * after reporting EOF. For example, heapam.c would actually restart a + * heapscan and return all its data afresh. There is also reason to be + * concerned about whether parallel query mode would operate properly if a + * fresh execution request arrives after completing the query. */ if (!ScanDirectionIsNoMovement(direction)) ExecutePlan(queryDesc,
That seems pretty good, although the last sentence seems like it might
be a little too alarming. Maybe add "although we know of no specific
problem" or something like that.
It's slightly annoying that the executor is depending on the Portal
machinery to guarantee that it won't do something wrong. However,
that was true in the previous formulation that relied on
Portal.run_once, too. I don't see another way that wouldn't
amount to duplicating the Portal-level state in the executor,
and I don't think it's worth doing that.
The portal mechanism is a hot mess, IMHO, and needs some serious
redesign, or at least cleanup. For example, the fact that
portal->cleanup is always either PortalCleanup or NULL is an
"interesting" design choice. MarkPortalDone() and MarkPortalFailed()
looked like they do the same amount of cleanup but, ha ha, no they
don't, because the one and only cleanup hook peeks behind the curtain
to figure out who is calling it. This code looks like it was written
by somebody who didn't initially understand the requirements and then
kept hitting it with a sledgehammer until it stopped complaining
without ever stopping to reconsider the basic design.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 10, 2024 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm thinking about something like this:
That seems pretty good, although the last sentence seems like it might
be a little too alarming. Maybe add "although we know of no specific
problem" or something like that.
OK, I'll tone it down a bit.
The portal mechanism is a hot mess, IMHO, and needs some serious
redesign, or at least cleanup. For example, the fact that
portal->cleanup is always either PortalCleanup or NULL is an
"interesting" design choice.
While I'm not here to defend that code particularly, that part
doesn't seem totally insane to me. The intent clearly was to
support more than one form of cleanup. Maybe after 25 years
we can conclude that we'll never need more than one form, but
I'm not going to fault whoever wrote it for not having an
operational crystal ball.
MarkPortalDone() and MarkPortalFailed()
looked like they do the same amount of cleanup but, ha ha, no they
don't, because the one and only cleanup hook peeks behind the curtain
to figure out who is calling it.
If you're talking about
* Shut down executor, if still running. We skip this during error abort,
* since other mechanisms will take care of releasing executor resources,
* and we can't be sure that ExecutorEnd itself wouldn't fail.
it's hardly the fault of the Portal logic that ExecutorEnd is unsafe
to call during abort. (ExecutorEnd shares that property with a
boatload of other code, too.)
Anyway, if you feel like rewriting that stuff, step right up.
My feeling about it is that the law of conservation of cruft
will prevent a replacement from being all that much cleaner,
but maybe I'm wrong.
regards, tom lane
On Tue, Dec 10, 2024 at 4:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anyway, if you feel like rewriting that stuff, step right up.
My feeling about it is that the law of conservation of cruft
will prevent a replacement from being all that much cleaner,
but maybe I'm wrong.
Thanks for the thoughts. It's always good to get your perspective on
stuff like this, or at least I find it so.
--
Robert Haas
EDB: http://www.enterprisedb.com