propagating replica identity to partitions

Started by Alvaro Herreraabout 7 years ago31 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

Hello

If you do ALTER TABLE .. REPLICA IDENTITY to a partitioned table, the
command operates on the parent table itself and does not propagate to
partitions. Why is this? Maybe not recursing was the right call when
we only had regular inheritance (back in 9.4), but since partitioned
tables got introduced, I think it is completely the other way around:
not recursing is an usability fail.

At the same time, I think that psql failing to display the replica
identity for partitioned tables is just an oversight and not designed
in.

I propose to change the behavior to:

1. When replica identity is changed on a partitioned table, all partitions
are updated also. Do we need to care about regular inheritance?
My inclination is not to touch those; this'd become the first case
in ATPrepCmd that recurses on partitioning but not inheritance.

2. When a partition is created, the replica identity is set to the
same that the parent table has. If it's type index, figure out the
corresponding index in the partition, set that. If the index doesn't
exist, raise an error (i.e. replica identity cannot be set to an
index until it has propagated to all children).

3. psql should display replica identity for partitioned tables.

Thoughts?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
Re: propagating replica identity to partitions

I propose to change the behavior to:

... this patch.

I'll now look more carefully at the cases involving indexes; thus far I
was looking at the ones using FULL. Those seem to work as intended.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-index_get_partition.patchtext/x-diff; charset=us-asciiDownload+47-30
0002-Propagate-replica-identity-to-partitions.patchtext/x-diff; charset=us-asciiDownload+189-6
#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#2)
Re: propagating replica identity to partitions

On 2019-Feb-04, Alvaro Herrera wrote:

I'll now look more carefully at the cases involving indexes; thus far I
was looking at the ones using FULL. Those seem to work as intended.

Yeah, that didn't work so well -- it was trying to propagate the command
verbatim to each partition, and obviously the index names did not match.
So this subcommand has to reimplement the recursion internally, as in
the attached.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-index_get_partition.patchtext/x-diff; charset=us-asciiDownload+47-30
v2-0002-Propagate-replica-identity-to-partitions.patchtext/x-diff; charset=us-asciiDownload+304-11
#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#3)
Re: propagating replica identity to partitions

Actually, index-based replica identities failed in pg_dump: we first
create the index ONLY on the partitioned table (marking it as invalid),
so when we immediately do the ALTER TABLE/REPLICA IDENTITY, it rejects
the invalid index. If we change the code to allow invalid indexes on
partitioned tables to become replica identities, we hit the problem that
the index didn't exist when processing the partition list. In order to
fix that, I added a flag so that partitions are allowed not to have the
index, in hopes that the missing index are created in subsequent
commands; those indexes should become valid & identity afterwards.

There's a small emerging problem, which is that if you have an invalid
index marked as replica identity, you cannot create any more partitions;
the reason is that we want to propagate the replica identity to the
partition, but the index is not there (since invalid indexes are not
propagated). I don't think this case is worth supporting; it can be
fixed but requires some work[1]In DefineRelation, we obtain the list of indexes to clone by RelationGetIndexList, which ignores invalid indexes. In order to implement this we'd need to include invalid indexes in that list (possibly by just not using RelationGetIndexList.).

New patch attached.

[1]: In DefineRelation, we obtain the list of indexes to clone by RelationGetIndexList, which ignores invalid indexes. In order to implement this we'd need to include invalid indexes in that list (possibly by just not using RelationGetIndexList.)
RelationGetIndexList, which ignores invalid indexes. In order to
implement this we'd need to include invalid indexes in that list
(possibly by just not using RelationGetIndexList.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-index_get_partition.patchtext/x-diff; charset=us-asciiDownload+47-30
v3-0002-Propagate-replica-identity-to-partitions.patchtext/x-diff; charset=us-asciiDownload+326-13
#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#4)
Re: propagating replica identity to partitions

On Tue, Feb 5, 2019 at 12:54 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Actually, index-based replica identities failed in pg_dump: we first
create the index ONLY on the partitioned table (marking it as invalid),
so when we immediately do the ALTER TABLE/REPLICA IDENTITY, it rejects
the invalid index. If we change the code to allow invalid indexes on
partitioned tables to become replica identities, we hit the problem that
the index didn't exist when processing the partition list. In order to
fix that, I added a flag so that partitions are allowed not to have the
index, in hopes that the missing index are created in subsequent
commands; those indexes should become valid & identity afterwards.

There's a small emerging problem, which is that if you have an invalid
index marked as replica identity, you cannot create any more partitions;
the reason is that we want to propagate the replica identity to the
partition, but the index is not there (since invalid indexes are not
propagated). I don't think this case is worth supporting; it can be
fixed but requires some work[1].

New patch attached.

Could there be a race condition somewhere? The idea and the code looks
reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
patch and concurrent partition creation, I've got the following error on ATTACH
PARTITION:

ERROR: 42P17: replica index does not exist in partition "test373"
LOCATION: MatchReplicaIdentity, tablecmds.c:15018

This error seems unstable, other time I've got a deadlock. And I don't observe
this behaviour on the master.

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#5)
Re: propagating replica identity to partitions

On 2019-Feb-07, Dmitry Dolgov wrote:

Could there be a race condition somewhere? The idea and the code looks
reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
patch and concurrent partition creation, I've got the following error on ATTACH
PARTITION:

ERROR: 42P17: replica index does not exist in partition "test373"
LOCATION: MatchReplicaIdentity, tablecmds.c:15018

This error seems unstable, other time I've got a deadlock. And I don't observe
this behaviour on the master.

Clearly there is a problem somewhere. I'll investigate. Thanks for
testing.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dmitry Dolgov (#5)
Re: propagating replica identity to partitions

On 2019-Feb-07, Dmitry Dolgov wrote:

Could there be a race condition somewhere? The idea and the code looks
reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
patch and concurrent partition creation, I've got the following error on ATTACH
PARTITION:

ERROR: 42P17: replica index does not exist in partition "test373"
LOCATION: MatchReplicaIdentity, tablecmds.c:15018

This error seems unstable, other time I've got a deadlock. And I don't observe
this behaviour on the master.

Can you share your reproducer?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#7)
Re: propagating replica identity to partitions

On Fri, Feb 15, 2019 at 10:02 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Feb-07, Dmitry Dolgov wrote:

Could there be a race condition somewhere? The idea and the code looks
reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
patch and concurrent partition creation, I've got the following error on ATTACH
PARTITION:

ERROR: 42P17: replica index does not exist in partition "test373"
LOCATION: MatchReplicaIdentity, tablecmds.c:15018

This error seems unstable, other time I've got a deadlock. And I don't observe
this behaviour on the master.

Can you share your reproducer?

Sure, I was running concurrently the attached script, that creates a chain of
nested partitions test1,test2,..., and in a separate session:

alter table test1 replica identity full;

Checking this time, I've got from the script:

ERROR: 40P01: deadlock detected
DETAIL: Process 10547 waits for AccessShareLock on relation 30449
of database
29898; blocked by process 9714.
Process 9714 waits for AccessExclusiveLock on relation 30454 of
database 29898;
blocked by process 10547.
HINT: See server log for query details.
LOCATION: DeadLockReport, deadlock.c:1140
Time: 1001.917 ms (00:01.002)

Attachments:

nested_partitions.shtext/x-sh; charset=US-ASCII; name=nested_partitions.shDownload
#9Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#1)
Re: propagating replica identity to partitions

On Mon, Feb 4, 2019 at 11:30 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

If you do ALTER TABLE .. REPLICA IDENTITY to a partitioned table, the
command operates on the parent table itself and does not propagate to
partitions. Why is this? Maybe not recursing was the right call when
we only had regular inheritance (back in 9.4), but since partitioned
tables got introduced, I think it is completely the other way around:
not recursing is an usability fail.

This sounds an awful like the TABLESPACE case. I wonder how many more
similar things there are.

It's not unreasonable to use the parent's REPLICA IDENTITY setting as
the default for new partitions, much as we now do for the TABLESPACE,
because the parent's replica identity is otherwise without meaning.
But I'm less convinced that it's reasonable to have changing the
parent's replica identity recurse to existing children, because:

1. That's different from the TABLESPACE case. We shouldn't just treat
each new instance of this problem as a green field where we can pick
any old behavior at random; it should be consistent as far as
reasonable, and

2. It confuses a change to the default for new partitions with a
change to the value for existing partitions. Say that you've got 5
existing partitions that use one setting, but now you want new
partitions to use a different setting. You can't get that with your
proposed semantics, because trying to change the default will change
all the existing partitions to the new value also, even though that's
not what you wanted.

We should really, really try to figure out all of the things that are
like this -- a property that is meaningless for the partitioned table
itself but may serve as a useful default for its children -- and try
to make them all work the same, ideally in one release, rather than
changing them at different times, back-patching sometimes, and having
no consistency in the details.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: propagating replica identity to partitions

On 2019-Feb-19, Robert Haas wrote:

It's not unreasonable to use the parent's REPLICA IDENTITY setting as
the default for new partitions, much as we now do for the TABLESPACE,
because the parent's replica identity is otherwise without meaning.
But I'm less convinced that it's reasonable to have changing the
parent's replica identity recurse to existing children, because:

1. That's different from the TABLESPACE case. We shouldn't just treat
each new instance of this problem as a green field where we can pick
any old behavior at random; it should be consistent as far as
reasonable,

Maybe we should be using the inheritance marker in the command to note
whether to recurse to partitions? That is, if you say
ALTER TABLE ONLY parent SET REPLICA IDENTITY
then we don't recurse and just change the parent table and future
partitions, whereas if you leave out the ONLY then it does affect
existing partitions.

However, I think TABLESPACE and any other property that involves
rewriting tables wholesale can reasonably be expected to behave
differently (w.r.t. recursing) from commands that just modify the
catalogs. I think we already have such a distinction somewhere.

EXPLAIN ALTER TABLE would sure be handy :-)

2. It confuses a change to the default for new partitions with a
change to the value for existing partitions. Say that you've got 5
existing partitions that use one setting, but now you want new
partitions to use a different setting. You can't get that with your
proposed semantics, because trying to change the default will change
all the existing partitions to the new value also, even though that's
not what you wanted.

If we make sure to heed ONLY (and I admit to not thinking about it in my
original patch) then I think this angle is covered.

We should really, really try to figure out all of the things that are
like this -- a property that is meaningless for the partitioned table
itself but may serve as a useful default for its children -- and try
to make them all work the same, ideally in one release, rather than
changing them at different times, back-patching sometimes, and having
no consistency in the details.

I have no argument against somebody doing that, but I don't think I have
the resources to do in in time for 12; on the other hand, leaving a
known misfeature unfixed because nobody has looked for hypothetical
similar bugs would be bad.

I'm not proposing to back-patch this, but I would love it if I can
borrow somebody's time machine to retroactively fix it for 11.0.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#10)
Re: propagating replica identity to partitions

On Tue, Feb 19, 2019 at 3:40 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Maybe we should be using the inheritance marker in the command to note
whether to recurse to partitions? That is, if you say
ALTER TABLE ONLY parent SET REPLICA IDENTITY
then we don't recurse and just change the parent table and future
partitions, whereas if you leave out the ONLY then it does affect
existing partitions.

However, I think TABLESPACE and any other property that involves
rewriting tables wholesale can reasonably be expected to behave
differently (w.r.t. recursing) from commands that just modify the
catalogs. I think we already have such a distinction somewhere.

I don't really follow why that should be different, or why the user
should be expected to know or care whether a particular property
involves rewriting.

I have no argument against somebody doing that, but I don't think I have
the resources to do in in time for 12; on the other hand, leaving a
known misfeature unfixed because nobody has looked for hypothetical
similar bugs would be bad.

Oh, come on. If you don't have time to study the issue and come up
with a plan that can at least in theory be applied to all similar
cases in a principled way, whether or not you have time to apply it to
all of those cases, then you have no business committing anything at
all. You're basically saying that you don't have time to do this
right, so you're just going to do something at random and hope it
doesn't create too many problems later. That's terrible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#11)
Re: propagating replica identity to partitions

On 2019-Feb-19, Robert Haas wrote:

On Tue, Feb 19, 2019 at 3:40 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Maybe we should be using the inheritance marker in the command to note
whether to recurse to partitions? That is, if you say
ALTER TABLE ONLY parent SET REPLICA IDENTITY
then we don't recurse and just change the parent table and future
partitions, whereas if you leave out the ONLY then it does affect
existing partitions.

However, I think TABLESPACE and any other property that involves
rewriting tables wholesale can reasonably be expected to behave
differently (w.r.t. recursing) from commands that just modify the
catalogs. I think we already have such a distinction somewhere.

I don't really follow why that should be different, or why the user
should be expected to know or care whether a particular property
involves rewriting.

OK, let me concede that point -- it's not rewriting that makes things
act differently, but rather TABLESPACE (as well as some other things)
behave that way. ALTER TABLE ... SET DATA TYPE is the obvious
counterexample.

The Notes section of ALTER TABLE says:

: The actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY), as
: well as the actions TRIGGER, CLUSTER, OWNER, and TABLESPACE never recurse to
: descendant tables; that is, they always act as though ONLY were specified.
: Adding a constraint recurses only for CHECK constraints that are not marked NO
: INHERIT.

Since REPLICA IDENTITY does not appear in this list, the documented
behavior is to recurse, per the description of the "name" parameter:

: The name (optionally schema-qualified) of an existing table to
: alter. If ONLY is specified before the table name, only that table
: is altered. If ONLY is not specified, the table and all its
: descendant tables (if any) are altered. Optionally, * can be
: specified after the table name to explicitly indicate that
: descendant tables are included.

I didn't come up with this on my own, as you imply.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#12)
Re: propagating replica identity to partitions

On Tue, Feb 19, 2019 at 5:41 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

OK, let me concede that point -- it's not rewriting that makes things
act differently, but rather TABLESPACE (as well as some other things)
behave that way. ALTER TABLE ... SET DATA TYPE is the obvious
counterexample.

The Notes section of ALTER TABLE says:

: The actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY), as
: well as the actions TRIGGER, CLUSTER, OWNER, and TABLESPACE never recurse to
: descendant tables; that is, they always act as though ONLY were specified.
: Adding a constraint recurses only for CHECK constraints that are not marked NO
: INHERIT.

I don't see that in the NOTES section for ALTER TABLE. Are you
looking at some patch, or at master?

More generally, our documentation in this area seems to be somewhat
lacking. For instance, the TABLESPACE section of the ALTER TABLE
documentation appears to make no mention of what exactly the behavior
is when applied to a partition table. It doesn't seem sufficient to
simply say "well, it doesn't recurse," because that would imply that
it doesn't do anything at all, which isn't the case.

Since REPLICA IDENTITY does not appear in this list, the documented
behavior is to recurse, per the description of the "name" parameter:

: The name (optionally schema-qualified) of an existing table to
: alter. If ONLY is specified before the table name, only that table
: is altered. If ONLY is not specified, the table and all its
: descendant tables (if any) are altered. Optionally, * can be
: specified after the table name to explicitly indicate that
: descendant tables are included.

I didn't come up with this on my own, as you imply.

Fair enough. I don't think it's entirely useful to think about this
in terms of which operations do and don't recurse; the question in my
mind is WHY some things recurse and other things don't, and what will
be easiest for users to understand. Obviously any property where the
parents and children have to match, like adding a column or changing a
data type, has to always recurse; nothing else is sensible. For
others, it seems we have a choice. It's unclear to me why we'd choose
to make REPLICA IDENTITY recurse by default and, say, OWNER not
recurse. In both cases, the default assumption is presumably that the
whole partitioning hierarchy should match, but in a particular case a
user could want to do something else. Consequently, I don't
understand why a user would want one of those operations to descend to
children by default and the other not. It feels like we're choosing
the behavior in individual cases, as Tom would say, with the aid of a
dartboard. Maybe there's some coherent principle here that I'm just
missing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#13)
Re: propagating replica identity to partitions

On 2019-Feb-20, Robert Haas wrote:

On Tue, Feb 19, 2019 at 5:41 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

OK, let me concede that point -- it's not rewriting that makes things
act differently, but rather TABLESPACE (as well as some other things)
behave that way. ALTER TABLE ... SET DATA TYPE is the obvious
counterexample.

The Notes section of ALTER TABLE says:

: The actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY), as
: well as the actions TRIGGER, CLUSTER, OWNER, and TABLESPACE never recurse to
: descendant tables; that is, they always act as though ONLY were specified.
: Adding a constraint recurses only for CHECK constraints that are not marked NO
: INHERIT.

I don't see that in the NOTES section for ALTER TABLE. Are you
looking at some patch, or at master?

I was looking here:
https://www.postgresql.org/docs/11/sql-altertable.html

More generally, our documentation in this area seems to be somewhat
lacking. For instance, the TABLESPACE section of the ALTER TABLE
documentation appears to make no mention of what exactly the behavior
is when applied to a partition table. It doesn't seem sufficient to
simply say "well, it doesn't recurse," because that would imply that
it doesn't do anything at all, which isn't the case.

That's true. Maybe we can add a short blurb in the stanza for the SET
TABLESPACE form in the Description section. It currently says:

: This form changes the table's tablespace to the specified tablespace
: and moves the data file(s) associated with the table to the new
: tablespace. Indexes on the table, if any, are not moved; but they
: can be moved separately with additional SET TABLESPACE commands. All
: tables in the current database in a tablespace can be moved by using
: the ALL IN TABLESPACE form, which will lock all tables to be moved
: first and then move each one. This form also supports OWNED BY,
: which will only move tables owned by the roles specified. If the
: NOWAIT option is specified then the command will fail if it is
: unable to acquire all of the locks required immediately. Note that
: system catalogs are not moved by this command, use ALTER DATABASE or
: explicit ALTER TABLE invocations instead if desired. The
: information_schema relations are not considered part of the system
: catalogs and will be moved. See also CREATE TABLESPACE.

Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a
separate para. I suggest:

: This form changes the table's tablespace to the specified tablespace
: and moves the data file(s) associated with the table to the new
: tablespace. Indexes on the table, if any, are not moved; but they
: can be moved separately with additional SET TABLESPACE commands.
: When applied to a partitioned table, nothing is moved, but any
: partitions created afterwards with CREATE TABLE PARTITION OF
: will use that tablespace.
:
: All
: tables in the current database in a tablespace can be moved by using
: the ALL IN TABLESPACE form, which will lock all tables to be moved
: first and then move each one. This form also supports OWNED BY,
: which will only move tables owned by the roles specified. If the
: NOWAIT option is specified then the command will fail if it is
: unable to acquire all of the locks required immediately. Note that
: system catalogs are not moved by this command, use ALTER DATABASE or
: explicit ALTER TABLE invocations instead if desired. The
: information_schema relations are not considered part of the system
: catalogs and will be moved. See also CREATE TABLESPACE.

Since REPLICA IDENTITY does not appear in this list, the documented
behavior is to recurse, per the description of the "name" parameter:

: The name (optionally schema-qualified) of an existing table to
: alter. If ONLY is specified before the table name, only that table
: is altered. If ONLY is not specified, the table and all its
: descendant tables (if any) are altered. Optionally, * can be
: specified after the table name to explicitly indicate that
: descendant tables are included.

I didn't come up with this on my own, as you imply.

Fair enough. I don't think it's entirely useful to think about this
in terms of which operations do and don't recurse; the question in my
mind is WHY some things recurse and other things don't, and what will
be easiest for users to understand.

I think the reason tablespace was made not to recurse was to avoid
acquiring access exclusive lock on too many tables at once, but I'm not
sure. This is very old behavior.

Obviously any property where the
parents and children have to match, like adding a column or changing a
data type, has to always recurse; nothing else is sensible. For
others, it seems we have a choice. It's unclear to me why we'd choose
to make REPLICA IDENTITY recurse by default and, say, OWNER not
recurse.

Ah. I did argue that OWNER should recurse:
/messages/by-id/20171017163203.uw7hmlqonidlfeqj@alvherre.pgsql
but it was too late then to change it for pg10. We can change it now,
surely.

In both cases, the default assumption is presumably that the
whole partitioning hierarchy should match, but in a particular case a
user could want to do something else. Consequently, I don't
understand why a user would want one of those operations to descend to
children by default and the other not.

I agree that OWNER TO should recurse for partitioned tables. I'm -0 on
changing it for legacy inheritance, but if the majority is to change it
for both, let's do that.

It feels like we're choosing the behavior in individual cases, as Tom
would say, with the aid of a dartboard. Maybe there's some coherent
principle here that I'm just missing.

I don't think that's a thing we're doing now. Rather we all did it as a
group years ago, and we're now finding that some of the darts landed in
the wrong spot of the board. I don't disagree with fixing all the ones
we know about (I volunteer to do that), but I don't want to conduct a
full audit of tablecmds.c.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#14)
Re: propagating replica identity to partitions

On Wed, 20 Feb 2019 at 17:38, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Fair enough. I don't think it's entirely useful to think about this
in terms of which operations do and don't recurse; the question in my
mind is WHY some things recurse and other things don't, and what will
be easiest for users to understand.

I think the reason tablespace was made not to recurse was to avoid
acquiring access exclusive lock on too many tables at once, but I'm not
sure. This is very old behavior.

-1 for changing that; here's why:

Partitioning is designed to support very large tables.

Since the table is very large there is a valid use case for moving older
partitions to other tablespaces, one at a time.

If we moved all partitions at once, while holding AccessExclusiveLock, it
would a) likely fail with out of space errors, b) if you are unlucky enough
for it to succeed it would lock the table for a ridiculously long time. The
main use case for changing the tablespace is to define where new partitions
should be created. So in this specific case only, recursion makes no sense.

Obviously any property where the
parents and children have to match, like adding a column or changing a
data type, has to always recurse; nothing else is sensible. For
others, it seems we have a choice. It's unclear to me why we'd choose
to make REPLICA IDENTITY recurse by default and, say, OWNER not
recurse.

Ah. I did argue that OWNER should recurse:
/messages/by-id/20171017163203.uw7hmlqonidlfeqj@alvherre.pgsql
but it was too late then to change it for pg10. We can change it now,
surely.

In both cases, the default assumption is presumably that the
whole partitioning hierarchy should match, but in a particular case a
user could want to do something else. Consequently, I don't
understand why a user would want one of those operations to descend to
children by default and the other not.

I agree that OWNER TO should recurse for partitioned tables.

+1

That was clearly just missed. It's a strange thought to have a partitioned
table with different pieces owned by different users. So strange that the
lack of complaints about the recursion are surely because nobody ever tried
it in a real situation. I would prefer to disallow differences in ownership
across partitions, since that almost certainly prevents running sensible
DDL and its just a huge footgun.

Recursion should be the default for partitioned tables.

I'm -0 on

changing it for legacy inheritance, but if the majority is to change it
for both, let's do that.

-1 for changing legacy inheritance. Two separate features. Inheritance has
been around for many years and its feature set is stable. No need to touch
it.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#14)
Re: propagating replica identity to partitions

On Wed, Feb 20, 2019 at 12:38 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I don't see that in the NOTES section for ALTER TABLE. Are you
looking at some patch, or at master?

I was looking here:
https://www.postgresql.org/docs/11/sql-altertable.html

OK, I was looking at the wrong thing. Not enough caffeine, apparently.

Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a
separate para. I suggest:

: This form changes the table's tablespace to the specified tablespace
: and moves the data file(s) associated with the table to the new
: tablespace. Indexes on the table, if any, are not moved; but they
: can be moved separately with additional SET TABLESPACE commands.
: When applied to a partitioned table, nothing is moved, but any
: partitions created afterwards with CREATE TABLE PARTITION OF
: will use that tablespace.
:
: All
: tables in the current database in a tablespace can be moved by using
: the ALL IN TABLESPACE form, which will lock all tables to be moved
: first and then move each one. This form also supports OWNED BY,
: which will only move tables owned by the roles specified. If the
: NOWAIT option is specified then the command will fail if it is
: unable to acquire all of the locks required immediately. Note that
: system catalogs are not moved by this command, use ALTER DATABASE or
: explicit ALTER TABLE invocations instead if desired. The
: information_schema relations are not considered part of the system
: catalogs and will be moved. See also CREATE TABLESPACE.

Seems reasonable.

I think the reason tablespace was made not to recurse was to avoid
acquiring access exclusive lock on too many tables at once, but I'm not
sure. This is very old behavior.

Obviously any property where the
parents and children have to match, like adding a column or changing a
data type, has to always recurse; nothing else is sensible. For
others, it seems we have a choice. It's unclear to me why we'd choose
to make REPLICA IDENTITY recurse by default and, say, OWNER not
recurse.

Ah. I did argue that OWNER should recurse:
/messages/by-id/20171017163203.uw7hmlqonidlfeqj@alvherre.pgsql
but it was too late then to change it for pg10. We can change it now,
surely.

Yeah, we could. I wonder, though, whether we should just make
everything recurse. I think that's what people are commonly going to
want, at least for partitioned tables, and it doesn't seem to me that
it would hurt anything to make the inheritance case work that way,
too. Right now it looks like we have this list of exceptions:

- actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY)
- TRIGGER
- CLUSTER
- OWNER
- TABLESPACE never recurse to descendant tables
- Adding a constraint recurses only for CHECK constraints that are not
marked NO INHERIT.

I have a feeling we eventually want this list to be empty, right? We
want a partitioned table to work as much like a non-partitioned table
as possible, unless the user asks for some other behavior. Going from
six exceptions to four and maybe having some of them depend on whether
it's partitioning or inheritance doesn't seem like it's as good and
clear as trying to adopt a really uniform policy.

I don't buy Simon's argument that we should treat TABLESPACE
differently because the tables might be really big and take a long
time to move. I agree that this could well be true, but nobody is
proposing to remove the ability to move tables individually or to use
ONLY here. Allowing TABLESPACE to recurse just gives people one
additional choice that they don't have today: to move everything at
once. We don't lose any functionality by enabling that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#16)
Re: propagating replica identity to partitions

On Wed, 20 Feb 2019 at 18:51, Robert Haas <robertmhaas@gmail.com> wrote:

I don't buy Simon's argument that we should treat TABLESPACE

differently because the tables might be really big and take a long
time to move. I agree that this could well be true, but nobody is
proposing to remove the ability to move tables individually or to use
ONLY here. Allowing TABLESPACE to recurse just gives people one
additional choice that they don't have today: to move everything at
once. We don't lose any functionality by enabling that.

Doing that would add the single largest footgun in Postgres, given that
command's current behavior and the size of partitioned tables.

If it moved partitions concurrently I'd feel differently.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#16)
Re: propagating replica identity to partitions

On 2019-Feb-20, Robert Haas wrote:

On Wed, Feb 20, 2019 at 12:38 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a
separate para. I suggest:

: This form changes the table's tablespace to the specified tablespace
: and moves the data file(s) associated with the table to the new
: tablespace. Indexes on the table, if any, are not moved; but they
: can be moved separately with additional SET TABLESPACE commands.
: When applied to a partitioned table, nothing is moved, but any
: partitions created afterwards with CREATE TABLE PARTITION OF
: will use that tablespace.
:
: All
: tables in the current database in a tablespace can be moved by using
: the ALL IN TABLESPACE form, which will lock all tables to be moved
: first and then move each one. This form also supports OWNED BY,
: which will only move tables owned by the roles specified. If the
: NOWAIT option is specified then the command will fail if it is
: unable to acquire all of the locks required immediately. Note that
: system catalogs are not moved by this command, use ALTER DATABASE or
: explicit ALTER TABLE invocations instead if desired. The
: information_schema relations are not considered part of the system
: catalogs and will be moved. See also CREATE TABLESPACE.

Seems reasonable.

Pushed this bit, thanks.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#16)
Re: propagating replica identity to partitions

Added Peter E to CC; question at the very end.

On 2019-Feb-20, Robert Haas wrote:

Yeah, we could. I wonder, though, whether we should just make
everything recurse. I think that's what people are commonly going to
want, at least for partitioned tables, and it doesn't seem to me that
it would hurt anything to make the inheritance case work that way,
too. Right now it looks like we have this list of exceptions:

- actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY)
- TRIGGER
- CLUSTER
- OWNER
- TABLESPACE never recurse to descendant tables
- Adding a constraint recurses only for CHECK constraints that are not
marked NO INHERIT.

I have a feeling we eventually want this list to be empty, right? We
want a partitioned table to work as much like a non-partitioned table
as possible, unless the user asks for some other behavior. Going from
six exceptions to four and maybe having some of them depend on whether
it's partitioning or inheritance doesn't seem like it's as good and
clear as trying to adopt a really uniform policy.

I don't buy Simon's argument that we should treat TABLESPACE
differently because the tables might be really big and take a long
time to move. I agree that this could well be true, but nobody is
proposing to remove the ability to move tables individually or to use
ONLY here. Allowing TABLESPACE to recurse just gives people one
additional choice that they don't have today: to move everything at
once. We don't lose any functionality by enabling that.

Tablespaces already behave a little bit especially in another sense:
it doesn't recurse to indexes. I think it's not possible to implement a
three-way flag, where you tell it to move only the table, or to recurse
to child tables but leave local indexes alone, or just to move
everything. If we don't move the child indexes, are we really recursing
in that sense?

I think changing the behavior of tablespaces is likely to cause pain for
people with existing scripts that expect the command not to recurse. We
would have to add a switch of some kind to provide the old behavior in
order not to break those, and that doesn't seem so good either.

I agree we definitely want to treat a partitioned table as similarly as
possible to a non-partitioned table, but in the particular case of
tablespace it seems to me better not to mess with the behavior.

CLUSTER: I'm not sure about this one. For legacy inheritance there's
no principled way to figure out the right index to recurse to. For
partitioning that seems a very simple problem, but I do wonder if
there's any use case at all for ALTER TABLE CLUSTER there. My
inclination would be to reject ALTER TABLE CLUSTER on a partitioned
table altogether, if it isn't already.

OWNER: let's just change this one to recurse always. I think we have a
consensus about this one.

TRIGGER: I think it would be good to recurse, based on the trigger name.
I'm not sure what to do about legacy inheritance; the trigger isn't
likely to be there. An option is to silently ignore each inheritance
child (not partition) if the trigger isn't present on it.

We all seem to agree that REPLICA IDENTITY should recurse.

Maybe ADD GENERATED AS IDENTITY / DROP IDENTITY should recurse; not
really sure about this one. Peter?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#19)
Re: propagating replica identity to partitions

On Thu, Feb 28, 2019 at 07:41:11PM -0300, Alvaro Herrera wrote:

We all seem to agree that REPLICA IDENTITY should recurse.

(entering the ring)
FWIW, I agree that having REPLICA IDENTITY recurse on partitions feels
more natural, as much as being able to use ALTER TABLE ONLY to only
update the definition on a given partition member.
--
Michael

#21Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#19)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#19)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#23)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#25)
#27Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#26)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#28)
#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Michael Paquier (#29)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#4)