pgsql: Allow UPDATE to move rows between partitions.

Started by Robert Haasalmost 8 years ago4 messages
#1Robert Haas
rhaas@postgresql.org

Allow UPDATE to move rows between partitions.

When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)

Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.

Discussion: /messages/by-id/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/2f178441044be430f6b4d626e4dae68a9a6f6cec

Modified Files
--------------
contrib/file_fdw/input/file_fdw.source | 1 +
contrib/file_fdw/output/file_fdw.source | 2 +
doc/src/sgml/ddl.sgml | 24 +-
doc/src/sgml/ref/update.sgml | 13 +-
doc/src/sgml/trigger.sgml | 23 ++
src/backend/commands/copy.c | 40 +-
src/backend/commands/trigger.c | 52 ++-
src/backend/executor/execPartition.c | 241 ++++++++++-
src/backend/executor/nodeModifyTable.c | 583 +++++++++++++++++++++------
src/backend/nodes/copyfuncs.c | 2 +
src/backend/nodes/equalfuncs.c | 1 +
src/backend/nodes/outfuncs.c | 3 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/path/allpaths.c | 4 +-
src/backend/optimizer/plan/createplan.c | 4 +
src/backend/optimizer/plan/planner.c | 19 +-
src/backend/optimizer/prep/prepunion.c | 28 +-
src/backend/optimizer/util/pathnode.c | 4 +
src/include/executor/execPartition.h | 34 +-
src/include/nodes/execnodes.h | 4 +-
src/include/nodes/plannodes.h | 1 +
src/include/nodes/relation.h | 3 +
src/include/optimizer/pathnode.h | 1 +
src/include/optimizer/planner.h | 3 +-
src/test/regress/expected/update.out | 681 ++++++++++++++++++++++++++++++--
src/test/regress/sql/update.sql | 458 +++++++++++++++++++--
src/tools/pgindent/typedefs.list | 1 +
27 files changed, 1957 insertions(+), 274 deletions(-)

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#1)
Re: pgsql: Allow UPDATE to move rows between partitions.

On Sat, Jan 20, 2018 at 2:03 AM, Robert Haas <rhaas@postgresql.org> wrote:

Allow UPDATE to move rows between partitions.

+    If an <command>UPDATE</command> on a partitioned table causes a row to move
+    to another partition, it will be performed as a <command>DELETE</command>
+    from the original partition followed by an <command>INSERT</command> into
+    the new partition. In this case, all row-level <literal>BEFORE</literal>
+    <command>UPDATE</command> triggers and all row-level
+    <literal>BEFORE</literal> <command>DELETE</command> triggers are fired on
+    the original partition.

Do we need to maintain triggers related behavior for logical
replication? In logical replication, we use ExecSimpleRelationDelete
to perform Delete operation which is not aware of this special
behavior (execute before update trigger for this case).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#3Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#2)
Re: pgsql: Allow UPDATE to move rows between partitions.

On Tue, Jan 23, 2018 at 8:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jan 20, 2018 at 2:03 AM, Robert Haas <rhaas@postgresql.org> wrote:

Allow UPDATE to move rows between partitions.

+    If an <command>UPDATE</command> on a partitioned table causes a row to move
+    to another partition, it will be performed as a <command>DELETE</command>
+    from the original partition followed by an <command>INSERT</command> into
+    the new partition. In this case, all row-level <literal>BEFORE</literal>
+    <command>UPDATE</command> triggers and all row-level
+    <literal>BEFORE</literal> <command>DELETE</command> triggers are fired on
+    the original partition.

Do we need to maintain triggers related behavior for logical
replication? In logical replication, we use ExecSimpleRelationDelete
to perform Delete operation which is not aware of this special
behavior (execute before update trigger for this case).

Hmm. I don't think there's any way for the logical decoding
infrastructure to identify this case at present. I suppose if we want
that behavior, we'd need to modify the WAL format, and the changes
might not be too straightforward because the after-image of the tuple
wouldn't be available in the DELETE record. I think the only reason
we fire the UPDATE triggers is because we can't decide until after
they've finished executing that we really want to DELETE and INSERT
instead; by the time we are replicating the changes, we know what the
final shape of the operation ended up being, so it's not clear to me
that firing UPDATE triggers at that point would be useful. I fear
that trying to change this is going to cost performance (and developer
time) for no real benefit, so my gut feeling is to leave it alone.
However, what do other people think?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#3)
Re: pgsql: Allow UPDATE to move rows between partitions.

On Tue, Jan 23, 2018 at 9:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 23, 2018 at 8:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jan 20, 2018 at 2:03 AM, Robert Haas <rhaas@postgresql.org> wrote:

Allow UPDATE to move rows between partitions.

+    If an <command>UPDATE</command> on a partitioned table causes a row to move
+    to another partition, it will be performed as a <command>DELETE</command>
+    from the original partition followed by an <command>INSERT</command> into
+    the new partition. In this case, all row-level <literal>BEFORE</literal>
+    <command>UPDATE</command> triggers and all row-level
+    <literal>BEFORE</literal> <command>DELETE</command> triggers are fired on
+    the original partition.

Do we need to maintain triggers related behavior for logical
replication? In logical replication, we use ExecSimpleRelationDelete
to perform Delete operation which is not aware of this special
behavior (execute before update trigger for this case).

Hmm. I don't think there's any way for the logical decoding
infrastructure to identify this case at present. I suppose if we want
that behavior, we'd need to modify the WAL format, and the changes
might not be too straightforward because the after-image of the tuple
wouldn't be available in the DELETE record. I think the only reason
we fire the UPDATE triggers is because we can't decide until after
they've finished executing that we really want to DELETE and INSERT
instead; by the time we are replicating the changes, we know what the
final shape of the operation ended up being, so it's not clear to me
that firing UPDATE triggers at that point would be useful. I fear
that trying to change this is going to cost performance (and developer
time) for no real benefit, so my gut feeling is to leave it alone.

I think the problem is not only for triggers behavior but also for the
pending patch which takes care of concurrent updates/deletes. I think
in case of logical replication we won't be able to detect that the row
has been moved to another partition by the logical worker as part of
replaying the actions done on master. See my response to Amul's patch
[1]: /messages/by-id/CAA4eK1LHVnNWYF53F1gUGx6CTxuvznozvU-Lr-dfE=Qeu1gEcg@mail.gmail.com

[1]: /messages/by-id/CAA4eK1LHVnNWYF53F1gUGx6CTxuvznozvU-Lr-dfE=Qeu1gEcg@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com