Centralizing protective copying of utility statements
I haven't pushed my quick-hack fix for bug #17053 ([1]/messages/by-id/17053-3ca3f501bbc212b4@postgresql.org) because
I wasn't really satisfied with band-aiding that problem in one
more place. I took a look around to see if we could protect
against the whole class of scribble-on-a-utility-statement
issues in a more centralized way.
What I found is that there are only two places that call
ProcessUtility with a statement that might be coming from the plan
cache. _SPI_execute_plan is always doing so, so it can just
unconditionally copy the statement. The other one is
PortalRunUtility, which can examine the Portal to see if the
parsetree came out of cache or not. Having added copyObject
calls there, we can get rid of the retail calls that exist
in not-quite-enough utility statement execution routines.
I think this would have been more complicated before plpgsql
started using the plancache; at least, some of the comments
removed here refer to plpgsql as being an independent hazard.
Also, I didn't risk removing any copyObject calls that are
further down than the top level of statement execution handlers.
Some of those are visibly still necessary, and others are hard
to be sure about.
Although this adds some overhead in the form of copying of
utility node trees that won't actually mutate during execution,
I think that won't be too bad because those trees tend to be
small and hence cheap to copy. The statements that can have
a lot of substructure usually contain expression trees or the
like, which do have to be copied for safety. Moreover, we buy
back a lot of cost by removing pointless copying when we're
not executing on a cached plan.
(BTW, in case you are wondering: this hazard only exists for
utility statements, because we long ago made the executor
not modify the Plan tree it's given.)
This is possibly too aggressive to consider for back-patching.
In the back branches, perhaps we should just use my original
localized fix. Another conservative (but expensive) answer
for the back branches is to add the new copyObject calls
but not remove any of the old ones.
Thoughts?
regards, tom lane
Attachments:
centralize-utility-stmt-copying-1.patchtext/x-diff; charset=us-ascii; name=centralize-utility-stmt-copying-1.patchDownload+37-80
On Wed, Jun 16, 2021 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I haven't pushed my quick-hack fix for bug #17053 ([1]) because
I wasn't really satisfied with band-aiding that problem in one
more place. I took a look around to see if we could protect
against the whole class of scribble-on-a-utility-statement
issues in a more centralized way.What I found is that there are only two places that call
ProcessUtility with a statement that might be coming from the plan
cache. _SPI_execute_plan is always doing so, so it can just
unconditionally copy the statement. The other one is
PortalRunUtility, which can examine the Portal to see if the
parsetree came out of cache or not. Having added copyObject
calls there, we can get rid of the retail calls that exist
in not-quite-enough utility statement execution routines.I think this would have been more complicated before plpgsql
started using the plancache; at least, some of the comments
removed here refer to plpgsql as being an independent hazard.
Also, I didn't risk removing any copyObject calls that are
further down than the top level of statement execution handlers.
Some of those are visibly still necessary, and others are hard
to be sure about.Although this adds some overhead in the form of copying of
utility node trees that won't actually mutate during execution,
I think that won't be too bad because those trees tend to be
small and hence cheap to copy. The statements that can have
a lot of substructure usually contain expression trees or the
like, which do have to be copied for safety. Moreover, we buy
back a lot of cost by removing pointless copying when we're
not executing on a cached plan.(BTW, in case you are wondering: this hazard only exists for
utility statements, because we long ago made the executor
not modify the Plan tree it's given.)This is possibly too aggressive to consider for back-patching.
In the back branches, perhaps we should just use my original
localized fix. Another conservative (but expensive) answer
for the back branches is to add the new copyObject calls
but not remove any of the old ones.Thoughts?
regards, tom lane
[1]
/messages/by-id/17053-3ca3f501bbc212b4@postgresql.orgHi,
For back-patching, if we wait for a while (a few weeks) after this patch
gets committed to master branch (and see there is no regression),
it seems that would give us more confidence in backporting to older
branches.
Cheers
I wrote:
What I found is that there are only two places that call
ProcessUtility with a statement that might be coming from the plan
cache. _SPI_execute_plan is always doing so, so it can just
unconditionally copy the statement. The other one is
PortalRunUtility, which can examine the Portal to see if the
parsetree came out of cache or not. Having added copyObject
calls there, we can get rid of the retail calls that exist
in not-quite-enough utility statement execution routines.
In the light of morning, I'm not too pleased with this patch either.
It's essentially creating a silent API change for ProcessUtility.
I don't know whether there are any out-of-tree ProcessUtility
callers, but if there are, this could easily break them in a way
that basic testing might not catch.
What seems like a much safer answer is to make the API change noisy.
That is, move the responsibility for actually calling copyObject
into ProcessUtility itself, and add a bool parameter saying whether
that needs to be done. That forces all callers to consider the
issue, and gives them the tool they need to make the right thing
happen.
However, this clearly is not a back-patchable approach. I'm
thinking there are two plausible alternatives for the back branches:
1. Narrow fix as per my original patch for #17053. Low cost,
but no protection against other bugs of the same ilk.
2. Still move the responsibility for calling copyObject into
ProcessUtility; but lacking the bool parameter, just do it
unconditionally. Offers solid protection at some uncertain
performance cost.
I'm not terribly certain which way to go. Thoughts?
regards, tom lane
I wrote:
What seems like a much safer answer is to make the API change noisy.
That is, move the responsibility for actually calling copyObject
into ProcessUtility itself, and add a bool parameter saying whether
that needs to be done. That forces all callers to consider the
issue, and gives them the tool they need to make the right thing
happen.
Here's a v2 that does it like that. In this formulation, we're
basically hoisting the responsibility for doing copyObject up into
ProcessUtility from its direct children, which seems like a clearer
way of thinking about what has to change.
We could avoid the side-effects on users of ProcessUtility_hook by
doing the copy step in ProcessUtility itself rather than passing the
flag on to standard_ProcessUtility. But that sounded like a bit of a
kluge. Also, putting the work in standard_ProcessUtility preserves
the option to redistribute it into the individual switch arms, in case
anyone does find the extra copying overhead annoying for statement
types that don't need it. (I don't plan to do any such thing as part
of this bug-fix patch, though.)
Barring objections, I'm going to push this into HEAD fairly soon,
since beta2 is hard upon us. Still thinking about which way to
fix it in the back branches.
regards, tom lane
Attachments:
centralize-utility-stmt-copying-2.patchtext/x-diff; charset=us-ascii; name=centralize-utility-stmt-copying-2.patchDownload+60-91
On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
Here's a v2 that does it like that. In this formulation, we're
basically hoisting the responsibility for doing copyObject up into
ProcessUtility from its direct children, which seems like a clearer
way of thinking about what has to change.
I agree that forcing an API break is better. Just a nit:
+ * readOnlyTree: treat pstmt's node tree as read-only
Maybe it's because I'm not a native english speaker, or because it's quite
late here, but I don't find "treat as read-only" really clear. I don't have a
concise better wording to suggest.
Still thinking about which way to fix it in the back branches.
I'm +0.5 for a narrow fix, due to the possibility of unspotted similar problem
vs possibility of performance regression ratio.
Hi,
On 2021-06-16 21:39:49 -0400, Tom Lane wrote:
Although this adds some overhead in the form of copying of
utility node trees that won't actually mutate during execution,
I think that won't be too bad because those trees tend to be
small and hence cheap to copy. The statements that can have
a lot of substructure usually contain expression trees or the
like, which do have to be copied for safety. Moreover, we buy
back a lot of cost by removing pointless copying when we're
not executing on a cached plan.
Have you evaluated the cost in some form? I don't think it a relevant
cost for most utility statements, but there's a few exceptions that *do*
worry me. In particular, in some workloads transaction statements are
very frequent.
Greetings,
Andres Freund
Hi,
On 2021-06-17 13:03:29 -0400, Tom Lane wrote:
Here's a v2 that does it like that. In this formulation, we're
basically hoisting the responsibility for doing copyObject up into
ProcessUtility from its direct children, which seems like a clearer
way of thinking about what has to change.We could avoid the side-effects on users of ProcessUtility_hook by
doing the copy step in ProcessUtility itself rather than passing the
flag on to standard_ProcessUtility. But that sounded like a bit of a
kluge. Also, putting the work in standard_ProcessUtility preserves
the option to redistribute it into the individual switch arms, in case
anyone does find the extra copying overhead annoying for statement
types that don't need it. (I don't plan to do any such thing as part
of this bug-fix patch, though.)Barring objections, I'm going to push this into HEAD fairly soon,
since beta2 is hard upon us. Still thinking about which way to
fix it in the back branches.
Phew. Do we really want to break a quite significant number of
extensions this long after feature freeze? Since we already need to find
a backpatchable way to deal with the issue it seems like deferring the
API change to 15 might be prudent?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
Phew. Do we really want to break a quite significant number of
extensions this long after feature freeze? Since we already need to find
a backpatchable way to deal with the issue it seems like deferring the
API change to 15 might be prudent?
Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
that would be a horrid crimp on our ability to fix bugs during beta.
I've generally supposed that we don't start expecting that till RC stage.
regards, tom lane
Hi,
On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Phew. Do we really want to break a quite significant number of
extensions this long after feature freeze? Since we already need to find
a backpatchable way to deal with the issue it seems like deferring the
API change to 15 might be prudent?Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
that would be a horrid crimp on our ability to fix bugs during beta.
Sure, there's no promise. But I still think it's worth taking the amount
of breakage more into account than pre beta?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-06-16 21:39:49 -0400, Tom Lane wrote:
Although this adds some overhead in the form of copying of
utility node trees that won't actually mutate during execution,
I think that won't be too bad because those trees tend to be
small and hence cheap to copy.
Have you evaluated the cost in some form? I don't think it a relevant
cost for most utility statements, but there's a few exceptions that *do*
worry me. In particular, in some workloads transaction statements are
very frequent.
I hadn't, but since you mention it, I tried this test case:
$ cat trivial.sql
begin;
commit;
$ pgbench -n -M prepared -f trivial.sql -T 60
I got these results on HEAD:
tps = 23853.244130 (without initial connection time)
tps = 23810.759969 (without initial connection time)
tps = 23167.608493 (without initial connection time)
tps = 23784.432746 (without initial connection time)
and adding the v2 patch:
tps = 23298.081147 (without initial connection time)
tps = 23614.466755 (without initial connection time)
tps = 23475.297853 (without initial connection time)
tps = 23530.826527 (without initial connection time)
So if you squint there might be a sub-one-percent difference
there, but it's barely distinguishable from the noise. In
any situation where the transactions are doing actual work,
I doubt you could measure a difference.
(In any case, if someone does get excited about this, they
could rearrange things to push the copyObject calls into the
individual arms of the switch in ProcessUtility. Personally
though I doubt it could be worth the code bloat.)
regards, tom lane
Andres Freund <andres@anarazel.de> writes:
On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
that would be a horrid crimp on our ability to fix bugs during beta.
Sure, there's no promise. But I still think it's worth taking the amount
of breakage more into account than pre beta?
Are there really so many people using the ProcessUtility hook?
In a quick look on codesearch.debian.net, I found
hypopg
pgaudit
pgextwlist
pglogical
which admittedly is more than none, but it's not a huge number
either. I have to think that fixing this bug reliably is a
more important consideration.
regards, tom lane
Hi,
On 2021-06-17 16:50:57 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
that would be a horrid crimp on our ability to fix bugs during beta.Sure, there's no promise. But I still think it's worth taking the amount
of breakage more into account than pre beta?Are there really so many people using the ProcessUtility hook?
In a quick look on codesearch.debian.net, I foundhypopg
pgaudit
pgextwlist
pglogical
The do seem to be quite a few more outside of Debian. E.g.
https://github.com/search?p=2&q=ProcessUtility_hook&type=Code
shows quite a few.
Unfortunately github is annoying to search through - it doesn't seem to
have any logic to prevent duplicates across repositories :(. Which means
there's dozens of copies of postgres code included...
which admittedly is more than none, but it's not a huge number
either. I have to think that fixing this bug reliably is a
more important consideration.
Sure!
Greetings,
Andres Freund
I wrote:
(In any case, if someone does get excited about this, they
could rearrange things to push the copyObject calls into the
individual arms of the switch in ProcessUtility. Personally
though I doubt it could be worth the code bloat.)
It occurred to me to try making the copying code look like
if (readOnlyTree)
{
switch (nodeTag(parsetree))
{
case T_TransactionStmt:
/* stmt is immutable anyway, no need to copy */
break;
default:
pstmt = copyObject(pstmt);
parsetree = pstmt->utilityStmt;
break;
}
}
This didn't move the needle at all, in fact it seems maybe a
shade slower:
tps = 23502.288878 (without initial connection time)
tps = 23643.821923 (without initial connection time)
tps = 23082.976795 (without initial connection time)
tps = 23547.527641 (without initial connection time)
So I think this confirms my gut feeling that copyObject on a
TransactionStmt is negligible. To the extent that the prior
measurement shows a real difference, it's probably a chance effect
of changing code layout elsewhere.
regards, tom lane
On Thu, Jun 17, 2021 at 02:08:34PM -0700, Andres Freund wrote:
Unfortunately github is annoying to search through - it doesn't seem to
have any logic to prevent duplicates across repositories :(. Which means
there's dozens of copies of postgres code included...
I agree with the position of doing something now while in beta. I
have not looked at the tree, but I am rather sure that we had changes
in the hooks while in beta phase in the past.
which admittedly is more than none, but it's not a huge number
either. I have to think that fixing this bug reliably is a
more important consideration.Sure!
The DECLARE CURSOR case in ExplainOneUtility() does a copy of a Query.
Perhaps a comment should be added to explain why a copy is still
required?
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
The DECLARE CURSOR case in ExplainOneUtility() does a copy of a Query.
Perhaps a comment should be added to explain why a copy is still
required?
I did add a comment about that in the v2 patch --- the issue is the
call path for EXPLAIN EXECUTE.
regards, tom lane
Julien Rouhaud <rjuju123@gmail.com> writes:
On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
+ * readOnlyTree: treat pstmt's node tree as read-only
Maybe it's because I'm not a native english speaker, or because it's quite
late here, but I don't find "treat as read-only" really clear. I don't have a
concise better wording to suggest.
Maybe "if true, pstmt's node tree must not be modified" ?
Still thinking about which way to fix it in the back branches.
I'm +0.5 for a narrow fix, due to the possibility of unspotted similar problem
vs possibility of performance regression ratio.
After sleeping on it another day, I'm inclined to think the same. The
key attraction of a centralized fix is that it prevents the possibility
of new bugs of the same ilk in newly-added features. Given how long
these CREATE/ALTER DOMAIN bugs escaped detection, it's hard to have
full confidence that there are no others in the back branches --- but
they must be in really lightly-used features.
regards, tom lane
On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
+ * readOnlyTree: treat pstmt's node tree as read-onlyMaybe it's because I'm not a native english speaker, or because it's quite
late here, but I don't find "treat as read-only" really clear. I don't have a
concise better wording to suggest.Maybe "if true, pstmt's node tree must not be modified" ?
Thanks, I find it way better!
Julien Rouhaud <rjuju123@gmail.com> writes:
On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
Maybe "if true, pstmt's node tree must not be modified" ?
Thanks, I find it way better!
OK, pushed that way, and with a couple other comment tweaks from
an additional re-reading.
regards, tom lane
On Fri, Jun 18, 2021 at 11:24:00AM -0400, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
Maybe "if true, pstmt's node tree must not be modified" ?
Thanks, I find it way better!
OK, pushed that way, and with a couple other comment tweaks from
an additional re-reading.
Thanks! For the record I already pushed the required compatibility changes for
hypopg extension.