moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

Started by Amit Langoteover 3 years ago12 messageshackers
Jump to latest
#1Amit Langote
Langote_Amit_f8@lab.ntt.co.jp

Per Alvaro's advice, forking this from [1]/messages/by-id/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com.

In that thread, Tom had asked if it wouldn't be better to find a new
place to put extraUpdatedCols [2]/messages/by-id/3098829.1658956718@sss.pgh.pa.us instead of RangeTblEntry, along with
the permission-checking fields are now no longer stored in
RangeTblEntry.

In [3]/messages/by-id/CA+HiwqEHoLgN=vSsaNMaHP-+qYPT40-ooySyrieXZHNzbSBj0w@mail.gmail.com of the same thread, I proposed to move it into a List of
Bitmapsets in ModifyTable, as implemented in the attached patch that I
had been posting to that thread.

The latest version of that patch is attached herewith. I'll add this
one to the January CF too.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
[2]: /messages/by-id/3098829.1658956718@sss.pgh.pa.us
[3]: /messages/by-id/CA+HiwqEHoLgN=vSsaNMaHP-+qYPT40-ooySyrieXZHNzbSBj0w@mail.gmail.com

Attachments:

v1-0001-Add-per-result-relation-extraUpdatedCols-to-Modif.patchapplication/octet-stream; name=v1-0001-Add-per-result-relation-extraUpdatedCols-to-Modif.patchDownload+115-78
#2Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#1)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

On Wed, Dec 7, 2022 at 8:54 PM Amit Langote <amitlangote09@gmail.com> wrote:

Per Alvaro's advice, forking this from [1].

In that thread, Tom had asked if it wouldn't be better to find a new
place to put extraUpdatedCols [2] instead of RangeTblEntry, along with
the permission-checking fields are now no longer stored in
RangeTblEntry.

In [3] of the same thread, I proposed to move it into a List of
Bitmapsets in ModifyTable, as implemented in the attached patch that I
had been posting to that thread.

The latest version of that patch is attached herewith. I'll add this
one to the January CF too.

Done.

https://commitfest.postgresql.org/41/4049/

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#1)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

On Wed, Dec 7, 2022 at 8:54 PM Amit Langote <amitlangote09@gmail.com> wrote:

Per Alvaro's advice, forking this from [1].

In that thread, Tom had asked if it wouldn't be better to find a new
place to put extraUpdatedCols [2] instead of RangeTblEntry, along with
the permission-checking fields are now no longer stored in
RangeTblEntry.

In [3] of the same thread, I proposed to move it into a List of
Bitmapsets in ModifyTable, as implemented in the attached patch that I
had been posting to that thread.

The latest version of that patch is attached herewith.

Updated to replace a list_nth() with list_nth_node() and rewrote the
commit message.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Add-per-result-relation-extraUpdatedCols-to-Modif.patchapplication/octet-stream; name=v2-0001-Add-per-result-relation-extraUpdatedCols-to-Modif.patchDownload+115-78
#4vignesh C
vignesh21@gmail.com
In reply to: Amit Langote (#3)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

On Thu, 8 Dec 2022 at 08:17, Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Dec 7, 2022 at 8:54 PM Amit Langote <amitlangote09@gmail.com> wrote:

Per Alvaro's advice, forking this from [1].

In that thread, Tom had asked if it wouldn't be better to find a new
place to put extraUpdatedCols [2] instead of RangeTblEntry, along with
the permission-checking fields are now no longer stored in
RangeTblEntry.

In [3] of the same thread, I proposed to move it into a List of
Bitmapsets in ModifyTable, as implemented in the attached patch that I
had been posting to that thread.

The latest version of that patch is attached herewith.

Updated to replace a list_nth() with list_nth_node() and rewrote the
commit message.

The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_4049.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
e351f85418313e97c203c73181757a007dfda6d0 ===
=== applying patch
./v2-0001-Add-per-result-relation-extraUpdatedCols-to-Modif.patch
...
patching file src/backend/optimizer/util/inherit.c
Hunk #2 FAILED at 50.
Hunk #9 succeeded at 926 with fuzz 2 (offset -1 lines).
1 out of 9 hunks FAILED -- saving rejects to file
src/backend/optimizer/util/inherit.c.rej

[1]: http://cfbot.cputube.org/patch_41_4049.log

Regards,
Vignesh

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#3)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

Amit Langote <amitlangote09@gmail.com> writes:

Updated to replace a list_nth() with list_nth_node() and rewrote the
commit message.

So I was working through this with intent to commit, when I realized
that the existing code it's revising is flat out broken. You can't
simply translate a parent rel's set of dependent generated columns
to obtain the correct set for a child. Maybe that's sufficient for
partitioned tables, but it fails miserably for general inheritance:

regression=# create table pp(f1 int);
CREATE TABLE
regression=# create table cc(f2 int generated always as (f1+1) stored) inherits(pp);
CREATE TABLE
regression=# insert into cc values(42);
INSERT 0 1
regression=# table cc;
f1 | f2
----+----
42 | 43
(1 row)

regression=# update pp set f1 = f1*10;
UPDATE 1
regression=# table cc;
f1 | f2
-----+----
420 | 43
(1 row)

So we have a long-standing existing bug to fix here.

I think what we have to do basically is repeat what fill_extraUpdatedCols
does independently for each target table. That's not really horrible:
given the premise that we're moving this calculation into the planner,
we can have expand_single_inheritance_child run the code while we have
each target table open. It'll require some rethinking though, and we
will need to have the set of update target columns already available
at that point. This suggests that we want to put the updated_cols and
extraUpdatedCols fields into RelOptInfo not PlannerInfo.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

I wrote:

I think what we have to do basically is repeat what fill_extraUpdatedCols
does independently for each target table. That's not really horrible:
given the premise that we're moving this calculation into the planner,
we can have expand_single_inheritance_child run the code while we have
each target table open. It'll require some rethinking though, and we
will need to have the set of update target columns already available
at that point. This suggests that we want to put the updated_cols and
extraUpdatedCols fields into RelOptInfo not PlannerInfo.

After further thought: maybe we should get radical and postpone this
work all the way to executor startup. The downside of that is having
to do it over again on each execution of a prepared plan. But the
upside is that when the UPDATE targets a many-partitioned table,
we would have a chance at not doing the work at all for partitions
that get pruned at runtime. I'm not sure if that win would emerge
immediately or if we still have executor work to do to manage pruning
of the target table. I'm also not sure that this'd be a net win
overall. But it seems worth considering.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

I wrote:

After further thought: maybe we should get radical and postpone this
work all the way to executor startup. The downside of that is having
to do it over again on each execution of a prepared plan. But the
upside is that when the UPDATE targets a many-partitioned table,
we would have a chance at not doing the work at all for partitions
that get pruned at runtime. I'm not sure if that win would emerge
immediately or if we still have executor work to do to manage pruning
of the target table. I'm also not sure that this'd be a net win
overall. But it seems worth considering.

Here's a draft patch that does it like that. This seems like a win
for more reasons than just pruning, because I was able to integrate
the calculation into runtime setup of the expressions, so that we
aren't doing an extra stringToNode() on them.

There's still a code path that does such a calculation at plan time
(get_rel_all_updated_cols), but it's only used by postgres_fdw which
has some other rather-inefficient behaviors in the same area.

I've not looked into what it'd take to back-patch this. We can't
add a field to ResultRelInfo in released branches (cf 4b3e37993),
but we might be able to repurpose RangeTblEntry.extraUpdatedCols.

regards, tom lane

Attachments:

compute-extraUpdatedCols-at-runtime-1.patchtext/x-diff; charset=us-ascii; name=compute-extraUpdatedCols-at-runtime-1.patchDownload+216-152
#8Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#7)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

On Thu, Jan 5, 2023 at 4:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

After further thought: maybe we should get radical and postpone this
work all the way to executor startup. The downside of that is having
to do it over again on each execution of a prepared plan. But the
upside is that when the UPDATE targets a many-partitioned table,
we would have a chance at not doing the work at all for partitions
that get pruned at runtime. I'm not sure if that win would emerge
immediately or if we still have executor work to do to manage pruning
of the target table. I'm also not sure that this'd be a net win
overall. But it seems worth considering.

Here's a draft patch that does it like that. This seems like a win
for more reasons than just pruning, because I was able to integrate
the calculation into runtime setup of the expressions, so that we
aren't doing an extra stringToNode() on them.

Thanks for the patch. This looks pretty neat and I agree that this
seems like a net win overall.

As an aside, I wonder why AttrDefault (and other things in
RelationData that need stringToNode() done on them to put into a Query
or a plan tree) doesn't store the expression Node tree to begin with?

There's still a code path that does such a calculation at plan time
(get_rel_all_updated_cols), but it's only used by postgres_fdw which
has some other rather-inefficient behaviors in the same area.

I've not looked into what it'd take to back-patch this. We can't
add a field to ResultRelInfo in released branches (cf 4b3e37993),
but we might be able to repurpose RangeTblEntry.extraUpdatedCols.

I think we can make that work. Would you like me to give that a try?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#8)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

Amit Langote <amitlangote09@gmail.com> writes:

On Thu, Jan 5, 2023 at 4:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a draft patch that does it like that. This seems like a win
for more reasons than just pruning, because I was able to integrate
the calculation into runtime setup of the expressions, so that we
aren't doing an extra stringToNode() on them.

Thanks for the patch. This looks pretty neat and I agree that this
seems like a net win overall.

Thanks for looking.

I've not looked into what it'd take to back-patch this. We can't
add a field to ResultRelInfo in released branches (cf 4b3e37993),
but we might be able to repurpose RangeTblEntry.extraUpdatedCols.

I think we can make that work. Would you like me to give that a try?

I'm on it already. AFAICT, the above won't actually work because
we don't have RTEs for all ResultRelInfos (per the
"relinfo->ri_RangeTableIndex != 0" test in ExecGetExtraUpdatedCols).
Probably we need something more like what 4b3e37993 did.

regards, tom lane

#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#9)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

On Fri, Jan 6, 2023 at 12:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Thu, Jan 5, 2023 at 4:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've not looked into what it'd take to back-patch this. We can't
add a field to ResultRelInfo in released branches (cf 4b3e37993),
but we might be able to repurpose RangeTblEntry.extraUpdatedCols.

I think we can make that work. Would you like me to give that a try?

I'm on it already.

Thanks. What you committed seems fine to me.

AFAICT, the above won't actually work because
we don't have RTEs for all ResultRelInfos (per the
"relinfo->ri_RangeTableIndex != 0" test in ExecGetExtraUpdatedCols).

Yeah, I noticed that too. I was thinking that that wouldn't be a
problem, because it is only partitions that are execution-time routing
targets that don't have their own RTEs and using a translated copy of
the root parent's RTE's extraUpdatedCols for them as before isn't
wrong. Note that partitions can't have generated columns that are not
present in its parent.

BTW, you wrote in the commit message:

However, there's nothing that says a traditional-inheritance child
can't have generated columns that aren't there in its parent, or that
have different dependencies than are in the parent's expression.
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)

Maybe I'm missing something, but AFICS, neither
traditional-inheritance child tables nor partitions allow a generated
column with an expression that is not the same as the parent's
expression for the same generated column:

-- traditional inheritance
create table inhp (a int, b int generated always as (a+1) stored);
create table inhc (b int generated always as (a+2) stored) inherits (inhp);
NOTICE: moving and merging column "b" with inherited definition
DETAIL: User-specified column moved to the position of the inherited column.
ERROR: child column "b" specifies generation expression
HINT: Omit the generation expression in the definition of the child
table column to inherit the generation expression from the parent
table.
create table inhc (a int, b int);
alter table inhc inherit inhp;
ERROR: column "b" in child table must be a generated column
alter table inhc drop b, add b int generated always as (a+2) stored;
alter table inhc inherit inhp;
ERROR: column "b" in child table has a conflicting generation expression

-- partitioning
create table partp (a int, b int generated always as (a+1) stored)
partition by list (a);
create table partc partition of partp (b generated always as (a+2)
stored) for values in (1);
ERROR: generated columns are not supported on partitions
create table partc (a int, b int);
alter table partp attach partition partc for values in (1);
ERROR: column "b" in child table must be a generated column
alter table partc drop b, add b int generated always as (a+2) stored;
alter table partp attach partition partc for values in (1);
ERROR: column "b" in child table has a conflicting generation expression

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#10)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

Amit Langote <amitlangote09@gmail.com> writes:

BTW, you wrote in the commit message:
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)

Maybe I'm missing something, but AFICS, neither
traditional-inheritance child tables nor partitions allow a generated
column with an expression that is not the same as the parent's
expression for the same generated column:

Well, there's some large holes in that, as per my post at [1]/messages/by-id/2793383.1672944799@sss.pgh.pa.us.
I'm on board with locking this down for partitioning, but we haven't
done so yet.

regards, tom lane

[1]: /messages/by-id/2793383.1672944799@sss.pgh.pa.us

#12Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#11)
Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

On Fri, Jan 6, 2023 at 3:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

BTW, you wrote in the commit message:
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)

Maybe I'm missing something, but AFICS, neither
traditional-inheritance child tables nor partitions allow a generated
column with an expression that is not the same as the parent's
expression for the same generated column:

Well, there's some large holes in that, as per my post at [1].
I'm on board with locking this down for partitioning, but we haven't
done so yet.

[1] /messages/by-id/2793383.1672944799@sss.pgh.pa.us

Ah, I had missed that. Will check and reply there.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com