Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Started by Rushabh Lathiaalmost 8 years ago18 messages
#1Rushabh Lathia
rushabh.lathia@gmail.com

Hi,

Consider the below test:

CREATE TABLE foo (a INT, b INT, c VARCHAR) PARTITION BY LIST(a);
CREATE TABLE foo_p1 PARTITION OF foo FOR VALUES IN (1,2);
CREATE TABLE foo_p2 PARTITION OF foo FOR VALUES IN (3,4);
INSERT INTO foo select i,i,i from generate_series(1,4)i;

CREATE TABLE foo_d (like foo);
INSERT INTO foo_d select i,i,i from generate_series(1,9)i;

ALTER TABLE foo ATTACH PARTITION foo_d DEFAULT;

Above ATTACH PARTITION should fail with "partition constraint is violated"
error, but instead it's working on a master branch.

Looking further I found that problem introduced with below commit:

commit 4dba331cb3dc1b5ffb0680ed8efae847de216796
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue Mar 20 11:19:41 2018 -0300

Fix CommandCounterIncrement in partition-related DDL

It makes sense to do the CCIs in the places that do catalog updates,
rather than before the places that error out because the former ones
fail to do it. In particular, it looks like StorePartitionBound() and
IndexSetParentIndex() ought to make their own CCIs.

Per review comments from Peter Eisentraut for row-level triggers on
partitioned tables.

I noticed that further below commit tried to correct thing in similar
area, but still I am noticing the broken behavior in case of attaching
the partition as DEFAULT.

commit 56163004b8b2151db279744b77138d4d90e2d5cb
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed Mar 21 12:03:35 2018 -0300

Fix relcache handling of the 'default' partition

Regards,

--
Rushabh Lathia

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Rushabh Lathia (#1)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Rushabh Lathia wrote:

CREATE TABLE foo (a INT, b INT, c VARCHAR) PARTITION BY LIST(a);
CREATE TABLE foo_p1 PARTITION OF foo FOR VALUES IN (1,2);
CREATE TABLE foo_p2 PARTITION OF foo FOR VALUES IN (3,4);
INSERT INTO foo select i,i,i from generate_series(1,4)i;

CREATE TABLE foo_d (like foo);
INSERT INTO foo_d select i,i,i from generate_series(1,9)i;

ALTER TABLE foo ATTACH PARTITION foo_d DEFAULT;

Above ATTACH PARTITION should fail with "partition constraint is violated"
error, but instead it's working on a master branch.

Hmm, offhand I don't quite see why this error fails to be thrown.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

On Thu, Mar 29, 2018 at 7:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

Rushabh Lathia wrote:

CREATE TABLE foo (a INT, b INT, c VARCHAR) PARTITION BY LIST(a);
CREATE TABLE foo_p1 PARTITION OF foo FOR VALUES IN (1,2);
CREATE TABLE foo_p2 PARTITION OF foo FOR VALUES IN (3,4);
INSERT INTO foo select i,i,i from generate_series(1,4)i;

CREATE TABLE foo_d (like foo);
INSERT INTO foo_d select i,i,i from generate_series(1,9)i;

ALTER TABLE foo ATTACH PARTITION foo_d DEFAULT;

Above ATTACH PARTITION should fail with "partition constraint is

violated"

error, but instead it's working on a master branch.

Hmm, offhand I don't quite see why this error fails to be thrown.

ATTACH PARTITION should throw an error, because partition table "foo"
already have two partition with key values (1, 2,3 4). And table "foo_d"
which we are attaching here has :

postgres@76099=#select * from foo_d;
a | b | c
---+---+---
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
5 | 5 | 5
6 | 6 | 6
7 | 7 | 7
8 | 8 | 8
9 | 9 | 9
(9 rows)

After ATTACH PARTITION, when we see the partition constraint for
the newly attached partition:

postgres@76099=#\d+ foo_d
Table "public.foo_d"
Column | Type | Collation | Nullable | Default | Storage |
Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
a | integer | | | | plain |
|
b | integer | | | | plain |
|
c | character varying | | | | extended |
|
Partition of: foo DEFAULT
Partition constraint: (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[1, 2, 3,
4]))))

So, it says that this partition (table) should not include values (1,2, 3,
4). But
it sill has those values.

postgres@76099=#select tableoid::regclass, * from foo;
tableoid | a | b | c
----------+---+---+---
foo_p1 | 1 | 1 | 1
foo_p1 | 2 | 2 | 2
foo_p2 | 3 | 3 | 3
foo_p2 | 4 | 4 | 4
foo_d | 1 | 1 | 1
foo_d | 2 | 2 | 2
foo_d | 3 | 3 | 3
foo_d | 4 | 4 | 4
foo_d | 5 | 5 | 5
foo_d | 6 | 6 | 6
foo_d | 7 | 7 | 7
foo_d | 8 | 8 | 8
foo_d | 9 | 9 | 9
(13 rows)

So basically ATTACH PARTITION should throw an error when partition
constraint is violated.

--
Rushabh Lathia
www.EnterpriseDB.com

#4Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Rushabh Lathia (#3)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Rushabh Lathia wrote:

On Thu, Mar 29, 2018 at 7:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

Hmm, offhand I don't quite see why this error fails to be thrown.

ATTACH PARTITION should throw an error, because partition table "foo"
already have two partition with key values (1, 2,3 4). And table "foo_d"
which we are attaching here has :

Oh, I understand how it's supposed to work. I was just saying I don't
understand how this bug occurs. Is it because we fail to determine the
correct partition constraint for the default partition in time for its
verification scan?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Alvaro Herrera (#4)
1 attachment(s)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

At Thu, 29 Mar 2018 13:04:06 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in <20180329160406.ii2wgbkmlnfxtwbt@alvherre.pgsql>

Rushabh Lathia wrote:

On Thu, Mar 29, 2018 at 7:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

Hmm, offhand I don't quite see why this error fails to be thrown.

ATTACH PARTITION should throw an error, because partition table "foo"
already have two partition with key values (1, 2,3 4). And table "foo_d"
which we are attaching here has :

Oh, I understand how it's supposed to work. I was just saying I don't
understand how this bug occurs. Is it because we fail to determine the
correct partition constraint for the default partition in time for its
verification scan?

The reason is that CommandCounterIncrement added in
StorePartitionBound reveals the just added default partition to
get_default_oid_from_partdesc too early. The revealed partition
has immature constraint and it overrites the right constraint
generated just above.

ATExecAttachPartition checks for default partition oid twice but
the second is just needless before the commit and harms after it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_attach_partition.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8f83aa4675..96b62bd4b6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14082,11 +14082,9 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	}
 
 	/*
-	 * Check whether default partition has a row that would fit the partition
-	 * being attached.
+	 * Check whether existing default partition has a row that would fit the
+	 * partition being attached.
 	 */
-	defaultPartOid =
-		get_default_oid_from_partdesc(RelationGetPartitionDesc(rel));
 	if (OidIsValid(defaultPartOid))
 	{
 		Relation	defaultrel;
#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#5)
1 attachment(s)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

On 2018/03/30 17:31, Kyotaro HORIGUCHI wrote:

At Thu, 29 Mar 2018 13:04:06 -0300, Alvaro Herrera wrote:

Rushabh Lathia wrote:

On Thu, Mar 29, 2018 at 7:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

Hmm, offhand I don't quite see why this error fails to be thrown.

ATTACH PARTITION should throw an error, because partition table "foo"
already have two partition with key values (1, 2,3 4). And table "foo_d"
which we are attaching here has :

Oh, I understand how it's supposed to work. I was just saying I don't
understand how this bug occurs. Is it because we fail to determine the
correct partition constraint for the default partition in time for its
verification scan?

The reason is that CommandCounterIncrement added in
StorePartitionBound reveals the just added default partition to
get_default_oid_from_partdesc too early. The revealed partition
has immature constraint and it overrites the right constraint
generated just above.

ATExecAttachPartition checks for default partition oid twice but
the second is just needless before the commit and harms after it.

Yes. What happens as of the commit mentioned in $subject is that the
partition constraint that's set as tab->partition_constraint during the
first call to ValidatePartitionConstraints (which is the correct one) is
overwritten by a wrong one during the 2nd call, which wouldn't happen
before the commit. In the wrongly occurring 2nd call, we'd end up setting
tab->partition_constraint to the negation of the clause expression that
would've been set by the first call (in this case). Thus
tab->partition_constraint ends up returning true for all the values it
contains.

I noticed that there were no tests covering this case causing 4dba331cb3
to not notice this failure in the first place. I updated your patch to
add a few tests. Also, I revised the comment changed by your patch a bit.

Thanks,
Amit

Attachments:

v2-fix_attach_partition.patchtext/plain; charset=UTF-8; name=v2-fix_attach_partition.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 83a881eff3..8bba9a2e20 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14114,11 +14114,9 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	}
 
 	/*
-	 * Check whether default partition has a row that would fit the partition
-	 * being attached.
+	 * Check if the default partition contains a row that would belong in the
+	 * partition being attached.
 	 */
-	defaultPartOid =
-		get_default_oid_from_partdesc(RelationGetPartitionDesc(rel));
 	if (OidIsValid(defaultPartOid))
 	{
 		Relation	defaultrel;
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index a80d16a394..82b195e034 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3891,3 +3891,24 @@ ALTER TABLE attmp ALTER COLUMN i RESET (n_distinct_inherited);
 ANALYZE attmp;
 DROP TABLE attmp;
 DROP USER regress_alter_table_user1;
+-- check that violating rows are correctly reported when attching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+ERROR:  partition constraint is violated by some row
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+INFO:  partition constraint for table "defpart_attach_test_d" is implied by existing constraints
+-- trying to add a partition for 2 should fail because the default
+-- partition contains a row that would violate its new constraint which
+-- prevents rows containing 2
+create table defpart_attach_test2 partition of defpart_attach_test for values in (2);
+ERROR:  updated partition constraint for default partition "defpart_attach_test_d" would be violated by some row
+drop table defpart_attach_test;
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 8198d1e930..e0f4009b19 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2565,3 +2565,26 @@ ANALYZE attmp;
 DROP TABLE attmp;
 
 DROP USER regress_alter_table_user1;
+
+-- check that violating rows are correctly reported when attching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+
+-- trying to add a partition for 2 should fail because the default
+-- partition contains a row that would violate its new constraint which
+-- prevents rows containing 2
+create table defpart_attach_test2 partition of defpart_attach_test for values in (2);
+
+drop table defpart_attach_test;
#7Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Amit Langote (#6)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Hi,

I noticed that there were no tests covering this case causing 4dba331cb3

to not notice this failure in the first place. I updated your patch to
add a few tests. Also, I revised the comment changed by your patch a bit.

1. A minor typo:

+-- check that violating rows are correctly reported when attching as the
s/attching/attaching

2. I think following part of the test is already covered:

+-- trying to add a partition for 2 should fail because the default
+-- partition contains a row that would violate its new constraint which
+-- prevents rows containing 2
+create table defpart_attach_test2 partition of defpart_attach_test for
values in (2);
+ERROR:  updated partition constraint for default partition
"defpart_attach_test_d" would be violated by some row
+drop table defpart_attach_test;

IIUC, the test in create_table covers the same scenario as of above:

-- check default partition overlap
INSERT INTO list_parted2 VALUES('X');
CREATE TABLE fail_part PARTITION OF list_parted2 FOR VALUES IN ('W', 'X',
'Y');
ERROR: updated partition constraint for default partition
"list_parted2_def" would be violated by some row

Regards,
Jeevan Ladhe

#8Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Jeevan Ladhe (#7)
1 attachment(s)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Thanks Jeevan for reviewing.

On 2018/04/02 13:10, Jeevan Ladhe wrote:

Hi,

I noticed that there were no tests covering this case causing 4dba331cb3

to not notice this failure in the first place. I updated your patch to
add a few tests. Also, I revised the comment changed by your patch a bit.

1. A minor typo:

+-- check that violating rows are correctly reported when attching as the
s/attching/attaching

Oops, fixed.

2. I think following part of the test is already covered:

+-- trying to add a partition for 2 should fail because the default
+-- partition contains a row that would violate its new constraint which
+-- prevents rows containing 2
+create table defpart_attach_test2 partition of defpart_attach_test for
values in (2);
+ERROR:  updated partition constraint for default partition
"defpart_attach_test_d" would be violated by some row
+drop table defpart_attach_test;

IIUC, the test in create_table covers the same scenario as of above:

-- check default partition overlap
INSERT INTO list_parted2 VALUES('X');
CREATE TABLE fail_part PARTITION OF list_parted2 FOR VALUES IN ('W', 'X',
'Y');
ERROR: updated partition constraint for default partition
"list_parted2_def" would be violated by some row

Sorry, didn't realize that it was already covered in create_tabel.sql.
Removed this one.

Attached updated patch. Adding this to the v11 open items list.

Thanks,
Amit

Attachments:

v3-fix_attach_partition.patchtext/plain; charset=UTF-8; name=v3-fix_attach_partition.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c8da82217d..9d474ad5e2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14114,11 +14114,9 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	}
 
 	/*
-	 * Check whether default partition has a row that would fit the partition
-	 * being attached.
+	 * Check if the default partition contains a row that would belong in the
+	 * partition being attached.
 	 */
-	defaultPartOid =
-		get_default_oid_from_partdesc(RelationGetPartitionDesc(rel));
 	if (OidIsValid(defaultPartOid))
 	{
 		Relation	defaultrel;
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index a80d16a394..4712fab540 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3891,3 +3891,19 @@ ALTER TABLE attmp ALTER COLUMN i RESET (n_distinct_inherited);
 ANALYZE attmp;
 DROP TABLE attmp;
 DROP USER regress_alter_table_user1;
+-- check that violating rows are correctly reported when attaching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+ERROR:  partition constraint is violated by some row
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+INFO:  partition constraint for table "defpart_attach_test_d" is implied by existing constraints
+drop table defpart_attach_test;
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 8198d1e930..c557b050af 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2565,3 +2565,21 @@ ANALYZE attmp;
 DROP TABLE attmp;
 
 DROP USER regress_alter_table_user1;
+
+-- check that violating rows are correctly reported when attaching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+
+drop table defpart_attach_test;
#9Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#8)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.
/*
-	 * Check whether default partition has a row that would fit the partition
-	 * being attached.
+	 * Check if the default partition contains a row that would belong in the
+	 * partition being attached.
*/
-	defaultPartOid =
-		get_default_oid_from_partdesc(RelationGetPartitionDesc(rel));
if (OidIsValid(defaultPartOid))

Oh my. This code is terrible, and I think this patch is wrong. More
later.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#9)
1 attachment(s)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Why do we need AccessExclusiveLock on all children of a relation that we
want to scan to search for rows not satisfying the constraint? I think
it should be enough to get ShareLock, which prevents INSERT, no? I have
a feeling I'm missing something here, but I don't know what, and all
tests pass with that change.

Also: the change proposed to remove the get_default_oid_from_partdesc()
call actually fixes the bug, but to me it was not at all obvious why.

To figure out why, I first had to realize that
ValidatePartitionConstraints was lying to me, both in name and in
comments: it doesn't actually validate anything, it merely queues
entries so that alter table's phase 3 would do the validation. I found
this extremely confusing, so I fixed the comments to match reality, and
later decided to rename the function also.

At that point I was able to understand what the problem was: when
attaching the default partition, we were setting the constraint to be
validated for that partition to the correctly computed partition
constraint; and later in the same function we would set the constraint
to "the immature constraint" because we were now seeing that the default
partition OID was not invalid. So it was rather a bug in
ValidatePartitionConstraints, in that it was accepting to set the
expression to validate when another expression had already been set! I
added an assert to protect against this. And then changed the decision
of whether or not to run this block based on the attached partition
being the default one or not; because as the "if" test was, it was just
a recipe for confusion. (Now, if you look carefully you realize that
the new test for bound->is_default I added is kinda redundant: it can
only be set if there was a default partition OID at start of the
function, and therefore we know we're not adding a default partition.
And for the case where we *are* adding the default partition, it is
still Invalid because we don't re-read its own OID. But relying on that
seems brittle because it breaks if we happen to update the default
partition OID in the middle of that function, which is what we were
doing. Hence the new is_default test.)

I looked at the implementation of ValidatePartitionConstraints and
didn't like it, so I rewrote it also.

This comment is mistaken, too:
- /*
- * Skip if the partition is itself a partitioned table. We can only
- * ever scan RELKIND_RELATION relations.
- */
... because it ignores the possibility of a partition being a foreign
table. The code seems to work, but I think there is no test to cover
the case that a foreign table containing data that doesn't satisfy the
constraint is attached, because when I set that case to return doing
nothing (ie. ATGetQueueEntry is not called for a foreign partition), no
test failed.

Generally speaking, I didn't like ATExecAttachPartition; it's doing too
much that should have been done in ATPrepAttachPartition. The only
reason that's not broken is because we don't allow ATTACH PARTITION to
appear together with other commands in alter table, which seems
disappointing ... for example, when attaching multiple tables and a
default partition exist, you have to scan the default one multiple
times, which is lame.

(Speaking of lame: I noticed that RelationGetPartitionQual obtains lock
on the parent relation ... I wonder if this can be used to cause a
deadlock during InitResultRelationInfo.)

BTW, I think this is already broken for the case where the default
partition is partitioned and you attach a partition colliding with a row
that's being concurrently inserted in a partition of the default
partition, though I didn't bother to write a test for that.

Anyway, I'm just an onlooker fixing a CommandCounterIncrement change.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v4-fix_attach_partition.patchtext/plain; charset=iso-8859-1Download
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c8da82217d..357b1078f8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -483,10 +483,9 @@ static void RemoveInheritance(Relation child_rel, Relation parent_rel);
 static ObjectAddress ATExecAttachPartition(List **wqueue, Relation rel,
 					  PartitionCmd *cmd);
 static void AttachPartitionEnsureIndexes(Relation rel, Relation attachrel);
-static void ValidatePartitionConstraints(List **wqueue, Relation scanrel,
-							 List *scanrel_children,
-							 List *partConstraint,
-							 bool validate_default);
+static void QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
+								   List *partConstraint,
+								   bool validate_default);
 static void CloneRowTriggersToPartition(Relation parent, Relation partition);
 static ObjectAddress ATExecDetachPartition(Relation rel, RangeVar *name);
 static ObjectAddress ATExecAttachPartitionIdx(List **wqueue, Relation rel,
@@ -13765,29 +13764,23 @@ PartConstraintImpliedByRelConstraint(Relation scanrel,
 }
 
 /*
- * ValidatePartitionConstraints
+ * QueuePartitionConstraintValidation
  *
- * Check whether all rows in the given table obey the given partition
- * constraint; if so, it can be attached as a partition.  We do this by
- * scanning the table (or all of its leaf partitions) row by row, except when
- * the existing constraints are sufficient to prove that the new partitioning
- * constraint must already hold.
+ * Add an entry to wqueue to have the given partition constraint validated by
+ * Phase 3, for the given relation, and all its children.
+ *
+ * We first verify whether the given constraint is implied by pre-existing
+ * relation constraints; if it is, there's no need to scan the table to
+ * validate, so don't queue in that case.
  */
 static void
-ValidatePartitionConstraints(List **wqueue, Relation scanrel,
-							 List *scanrel_children,
-							 List *partConstraint,
-							 bool validate_default)
+QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
+								   List *partConstraint,
+								   bool validate_default)
 {
-	bool		found_whole_row;
-	ListCell   *lc;
-
-	if (partConstraint == NIL)
-		return;
-
 	/*
-	 * Based on the table's existing constraints, determine if we can skip
-	 * scanning the table to validate the partition constraint.
+	 * Based on the table's existing constraints, determine whether or not we
+	 * may skip scanning the table.
 	 */
 	if (PartConstraintImpliedByRelConstraint(scanrel, partConstraint))
 	{
@@ -13802,68 +13795,54 @@ ValidatePartitionConstraints(List **wqueue, Relation scanrel,
 		return;
 	}
 
-	/* Constraints proved insufficient, so we need to scan the table. */
-	foreach(lc, scanrel_children)
+	/*
+	 * Constraints proved insufficient. For plain relations, queue a validation
+	 * item now; for partitioned tables, recurse to process each partition.
+	 */
+	if (scanrel->rd_rel->relkind == RELKIND_RELATION ||
+		scanrel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
 		AlteredTableInfo *tab;
-		Oid			part_relid = lfirst_oid(lc);
-		Relation	part_rel;
-		List	   *my_partconstr = partConstraint;
 
-		/* Lock already taken */
-		if (part_relid != RelationGetRelid(scanrel))
-			part_rel = heap_open(part_relid, NoLock);
-		else
-			part_rel = scanrel;
+		/* Grab a work queue entry. */
+		tab = ATGetQueueEntry(wqueue, scanrel);
+		Assert(tab->partition_constraint == NULL);
+		tab->partition_constraint = (Expr *) linitial(partConstraint);
+		tab->validate_default = validate_default;
+	}
+	else
+	{
+		PartitionDesc partdesc = RelationGetPartitionDesc(scanrel);
+		int			i;
 
-		/*
-		 * Skip if the partition is itself a partitioned table.  We can only
-		 * ever scan RELKIND_RELATION relations.
-		 */
-		if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		for (i = 0; i < partdesc->nparts; i++)
 		{
-			if (part_rel != scanrel)
-				heap_close(part_rel, NoLock);
-			continue;
-		}
+			Relation	part_rel;
+			bool		found_whole_row;
+			List	   *thisPartConstraint;
+
+			/*
+			 * This is the minimum lock we need to prevent concurrent data
+			 * additions.
+			 */
+			part_rel = heap_open(partdesc->oids[i], ShareLock);
 
-		if (part_rel != scanrel)
-		{
 			/*
 			 * Adjust the constraint for scanrel so that it matches this
 			 * partition's attribute numbers.
 			 */
-			my_partconstr = map_partition_varattnos(my_partconstr, 1,
-													part_rel, scanrel,
-													&found_whole_row);
+			thisPartConstraint =
+				map_partition_varattnos(partConstraint, 1,
+										part_rel, scanrel, &found_whole_row);
 			/* There can never be a whole-row reference here */
 			if (found_whole_row)
-				elog(ERROR, "unexpected whole-row reference found in partition key");
+				elog(ERROR, "unexpected whole-row reference found in partition constraint");
 
-			/* Can we skip scanning this part_rel? */
-			if (PartConstraintImpliedByRelConstraint(part_rel, my_partconstr))
-			{
-				if (!validate_default)
-					ereport(INFO,
-							(errmsg("partition constraint for table \"%s\" is implied by existing constraints",
-									RelationGetRelationName(part_rel))));
-				else
-					ereport(INFO,
-							(errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
-									RelationGetRelationName(part_rel))));
-				heap_close(part_rel, NoLock);
-				continue;
-			}
+			QueuePartitionConstraintValidation(wqueue, part_rel,
+											   thisPartConstraint,
+											   validate_default);
+			heap_close(part_rel, NoLock);	/* keep lock till commit */
 		}
-
-		/* Grab a work queue entry. */
-		tab = ATGetQueueEntry(wqueue, part_rel);
-		tab->partition_constraint = (Expr *) linitial(my_partconstr);
-		tab->validate_default = validate_default;
-
-		/* keep our lock until commit */
-		if (part_rel != scanrel)
-			heap_close(part_rel, NoLock);
 	}
 }
 
@@ -13891,8 +13870,8 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	List	   *partBoundConstraint;
 
 	/*
-	 * We must lock the default partition, because attaching a new partition
-	 * will change its partition constraint.
+	 * We must lock the default partition if one exists, because attaching a
+	 * new partition will change its partition constraint.
 	 */
 	defaultPartOid =
 		get_default_oid_from_partdesc(RelationGetPartitionDesc(rel));
@@ -13957,17 +13936,17 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	 *
 	 * We do that by checking if rel is a member of the list of attachrel's
 	 * partitions provided the latter is partitioned at all.  We want to avoid
-	 * having to construct this list again, so we request the strongest lock
-	 * on all partitions.  We need the strongest lock, because we may decide
-	 * to scan them if we find out that the table being attached (or its leaf
-	 * partitions) may contain rows that violate the partition constraint. If
-	 * the table has a constraint that would prevent such rows, which by
-	 * definition is present in all the partitions, we need not scan the
-	 * table, nor its partitions.  But we cannot risk a deadlock by taking a
-	 * weaker lock now and the stronger one only when needed.
+	 * having to construct this list again, so we request a lock on all
+	 * partitions.  We need ShareLock, preventing data changes, because we
+	 * may decide to scan them if we find out that the table being attached (or
+	 * its leaf partitions) may contain rows that violate the partition
+	 * constraint.  If the table has a constraint that would prevent such rows,
+	 * which by definition is present in all the partitions, we need not scan
+	 * the table, nor its partitions.  But we cannot risk a deadlock by taking
+	 * a weaker lock now and the stronger one only when needed.
 	 */
 	attachrel_children = find_all_inheritors(RelationGetRelid(attachrel),
-											 AccessExclusiveLock, NULL);
+											 ShareLock, NULL);
 	if (list_member_oid(attachrel_children, RelationGetRelid(rel)))
 		ereport(ERROR,
 				(errcode(ERRCODE_DUPLICATE_TABLE),
@@ -14086,9 +14065,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 		/*
 		 * Run the partition quals through const-simplification similar to
 		 * check constraints.  We skip canonicalize_qual, though, because
-		 * partition quals should be in canonical form already; also, since
-		 * the qual is in implicit-AND format, we'd have to explicitly convert
-		 * it to explicit-AND format and back again.
+		 * partition quals should be in canonical form already.
 		 */
 		partConstraint =
 			(List *) eval_const_expressions(NULL,
@@ -14109,32 +14086,28 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 				 "unexpected whole-row reference found in partition key");
 
 		/* Validate partition constraints against the table being attached. */
-		ValidatePartitionConstraints(wqueue, attachrel, attachrel_children,
-									 partConstraint, false);
+		QueuePartitionConstraintValidation(wqueue, attachrel, partConstraint,
+										   false);
 	}
 
 	/*
-	 * Check whether default partition has a row that would fit the partition
-	 * being attached.
+	 * If we're attaching a partition other than the default partition and a
+	 * default one exists, then that partition's partition constraint changes,
+	 * so add an entry to the work queue to validate it, too.  (We must not
+	 * do this when the partition being attached is the default one; we
+	 * already did it above!)
 	 */
-	defaultPartOid =
-		get_default_oid_from_partdesc(RelationGetPartitionDesc(rel));
-	if (OidIsValid(defaultPartOid))
+	if (!cmd->bound->is_default && OidIsValid(defaultPartOid))
 	{
 		Relation	defaultrel;
-		List	   *defaultrel_children;
 		List	   *defPartConstraint;
 
-		/* We already have taken a lock on default partition. */
+		/* we already hold a lock on the default partition */
 		defaultrel = heap_open(defaultPartOid, NoLock);
 		defPartConstraint =
 			get_proposed_default_constraint(partBoundConstraint);
-		defaultrel_children =
-			find_all_inheritors(defaultPartOid,
-								AccessExclusiveLock, NULL);
-		ValidatePartitionConstraints(wqueue, defaultrel,
-									 defaultrel_children,
-									 defPartConstraint, true);
+		QueuePartitionConstraintValidation(wqueue, defaultrel,
+										   defPartConstraint, true);
 
 		/* keep our lock until commit. */
 		heap_close(defaultrel, NoLock);
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index a80d16a394..4712fab540 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3891,3 +3891,19 @@ ALTER TABLE attmp ALTER COLUMN i RESET (n_distinct_inherited);
 ANALYZE attmp;
 DROP TABLE attmp;
 DROP USER regress_alter_table_user1;
+-- check that violating rows are correctly reported when attaching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+ERROR:  partition constraint is violated by some row
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+INFO:  partition constraint for table "defpart_attach_test_d" is implied by existing constraints
+drop table defpart_attach_test;
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 8198d1e930..c557b050af 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2565,3 +2565,21 @@ ANALYZE attmp;
 DROP TABLE attmp;
 
 DROP USER regress_alter_table_user1;
+
+-- check that violating rows are correctly reported when attaching as the
+-- default partition
+create table defpart_attach_test (a int) partition by list (a);
+create table defpart_attach_test1 partition of defpart_attach_test for values in (1);
+create table defpart_attach_test_d (like defpart_attach_test);
+insert into defpart_attach_test_d values (1), (2);
+
+-- error because its constraint as the default partition would be violated
+-- by the row containing 1
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+delete from defpart_attach_test_d where a = 1;
+alter table defpart_attach_test_d add check (a > 1);
+
+-- should be attached successfully and without needing to be scanned
+alter table defpart_attach_test attach partition defpart_attach_test_d default;
+
+drop table defpart_attach_test;
#11Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Alvaro Herrera (#10)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Hello.

At Mon, 2 Apr 2018 16:11:12 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in <20180402191112.wneiyj4v5upnfjst@alvherre.pgsql>

Why do we need AccessExclusiveLock on all children of a relation that we
want to scan to search for rows not satisfying the constraint? I think
it should be enough to get ShareLock, which prevents INSERT, no? I have
a feeling I'm missing something here, but I don't know what, and all
tests pass with that change.

Mmm. I'm not sure if there's a lock-upgrade case but the
following sentense is left at the last of one of the modified
comments. ATREwriteTables is called with AEL after that (that has
finally no effect in this case).

| But we cannot risk a deadlock by taking
| * a weaker lock now and the stronger one only when needed.

I don't actually find places where the children's lock can be
raised but this seems just following the lock parent first
principle.

By the way check_default_allows_bound() (CREATE TABLE case)
contains a similar usage of find_all_inheritors(default_rel,
AEL).

Also: the change proposed to remove the get_default_oid_from_partdesc()
call actually fixes the bug, but to me it was not at all obvious why.

CommandCounterIncrement updates the content of a relcache entry
via invalidation. It can be surprising for callers of a function
like StorePartitionBound.

CommandCounterIncrement
<skip inval mechanism>
RelationCacheInvalidateEntry
RelationFlushRelation
RelationClearRelation

To figure out why, I first had to realize that
ValidatePartitionConstraints was lying to me, both in name and in
comments: it doesn't actually validate anything, it merely queues
entries so that alter table's phase 3 would do the validation. I found
this extremely confusing, so I fixed the comments to match reality, and
later decided to rename the function also.

It is reasonable. Removing exccessive extension of lower-level
partitions is also reasonable. The previous code extended
inheritances in different manner for levels at odd and even
depth.

At that point I was able to understand what the problem was: when
attaching the default partition, we were setting the constraint to be
validated for that partition to the correctly computed partition
constraint; and later in the same function we would set the constraint
to "the immature constraint" because we were now seeing that the default
partition OID was not invalid. So it was rather a bug in
ValidatePartitionConstraints, in that it was accepting to set the
expression to validate when another expression had already been set! I
added an assert to protect against this. And then changed the decision
of whether or not to run this block based on the attached partition
being the default one or not; because as the "if" test was, it was just
a recipe for confusion. (Now, if you look carefully you realize that
the new test for bound->is_default I added is kinda redundant: it can
only be set if there was a default partition OID at start of the
function, and therefore we know we're not adding a default partition.
And for the case where we *are* adding the default partition, it is
still Invalid because we don't re-read its own OID. But relying on that
seems brittle because it breaks if we happen to update the default
partition OID in the middle of that function, which is what we were
doing. Hence the new is_default test.)

Seems reasonable. But even if we assume so, rereading
defaultPartOid is still breaking the assumption that
defaultPartOid is that at the time of entering to this function
and the added condition just conceals the fact.

So I think it should be an assertion.

| if (OidIsValid(defaultPartOid))
| {
| /*
| * The command cannot be adding default partition if the
| * defaultPartOid is valid.
| */
| Assert(!cmd->bound->is_default);

I looked at the implementation of ValidatePartitionConstraints and
didn't like it, so I rewrote it also.

This comment is mistaken, too:
- /*
- * Skip if the partition is itself a partitioned table. We can only
- * ever scan RELKIND_RELATION relations.
- */
... because it ignores the possibility of a partition being a foreign
table. The code seems to work, but I think there is no test to cover
the case that a foreign table containing data that doesn't satisfy the
constraint is attached, because when I set that case to return doing
nothing (ie. ATGetQueueEntry is not called for a foreign partition), no
test failed.

Foreign tables are intentionally not verified on attaching (in my
understanding). ATRewriteTables ignores foreign tables so it
contradicts to what ATRewriteTables thinks. As my understanding
foreign tables are assumed to be unrestrictable by local
constraint after attaching so the users are responsible to the
consistency.

Generally speaking, I didn't like ATExecAttachPartition; it's doing too
much that should have been done in ATPrepAttachPartition. The only
reason that's not broken is because we don't allow ATTACH PARTITION to
appear together with other commands in alter table, which seems
disappointing ... for example, when attaching multiple tables and a
default partition exist, you have to scan the default one multiple
times, which is lame.

Indeed, currently only partition commands are isolated from other
alter table subcommands (except all in tablespace cases). We can
improve that in the next step?

(Speaking of lame: I noticed that RelationGetPartitionQual obtains lock
on the parent relation ... I wonder if this can be used to cause a
deadlock during InitResultRelationInfo.)

Mmm. That seems strange since the RealtionGetPartitionQual
doesn't return anything (specifically NIL) there, because we
don't allow a partition to be attached to partitioned tables. It
seems totally useless.

Addition to that the code tries to add the partition qual (which
is always NIL) destructively and assign to partConstraint..

BTW, I think this is already broken for the case where the default
partition is partitioned and you attach a partition colliding with a row
that's being concurrently inserted in a partition of the default
partition, though I didn't bother to write a test for that.

How is it broken? Every attaching partitions are checked for the
specified partition bound and every partitions of the default
partition are also checked against the new default part bound. We
already hold required locks on all the participants.

Anyway, I'm just an onlooker fixing a CommandCounterIncrement change.

It's reassuring. Thanks.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#11)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

On 2018/04/03 14:45, Kyotaro HORIGUCHI wrote:

Hello.

At Mon, 2 Apr 2018 16:11:12 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Why do we need AccessExclusiveLock on all children of a relation that we
want to scan to search for rows not satisfying the constraint? I think
it should be enough to get ShareLock, which prevents INSERT, no? I have
a feeling I'm missing something here, but I don't know what, and all
tests pass with that change.

Thinking on this a bit, I see no problem with locking the children with
just ShareLock. It was just a paranoia that if we're going to lock the
table itself being attached with AEL, we must its children (if any) with
AEL, too.

Mmm. I'm not sure if there's a lock-upgrade case but the
following sentense is left at the last of one of the modified
comments. ATREwriteTables is called with AEL after that (that has
finally no effect in this case).

| But we cannot risk a deadlock by taking
| * a weaker lock now and the stronger one only when needed.

I don't actually find places where the children's lock can be
raised but this seems just following the lock parent first
principle.

No lock upgrade happen as of now. The comment was added by the commit
972b6ec20bf [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=972b6ec20bf, which removed the code that could cause such a deadlock.
The comment fragment is simply trying to explain why we don't postpone the
locking of children to a later time, say, to the point where we actually
know that they need to be scanned. Previously the code next to the
comment used to lock the children using AccessShareLock, because at that
point we just needed to check if the table being attached is one of those
children and then later locked with AEL if it turned out that they need to
be scanned to check the partition constraint.

By the way check_default_allows_bound() (CREATE TABLE case)
contains a similar usage of find_all_inheritors(default_rel,
AEL).

Good catch. Its job is more or less same as
ValidatePartitionConstraints(), except all the children (if any) are
scanned right away instead of queuing it like in the AT case.

Also: the change proposed to remove the get_default_oid_from_partdesc()
call actually fixes the bug, but to me it was not at all obvious why.

CommandCounterIncrement updates the content of a relcache entry
via invalidation. It can be surprising for callers of a function
like StorePartitionBound.

CommandCounterIncrement
<skip inval mechanism>
RelationCacheInvalidateEntry
RelationFlushRelation
RelationClearRelation

Because of the CCI() after storing the default partition OID into the
system catalog, RelationClearRelation() would changes what
rel->rd_partdesc points to where 'rel' is the ATExecAttachPartition()'s
reference to the relcache entry of the parent that it passed to
StorePartitionBound.

So, whereas the rel->rd_partdesc wouldn't contain the default partition
before StorePartitionBound() was called, it would after.

To figure out why, I first had to realize that
ValidatePartitionConstraints was lying to me, both in name and in
comments: it doesn't actually validate anything, it merely queues
entries so that alter table's phase 3 would do the validation. I found
this extremely confusing, so I fixed the comments to match reality, and
later decided to rename the function also.

It is reasonable. Removing exccessive extension of lower-level
partitions is also reasonable. The previous code extended
inheritances in different manner for levels at odd and even
depth.

I like the new code including the naming, but I notice this changes the
order in which we do the locking now. There are still sites in the code
where the locking order is breadth-first, that is, as determined by
find_all_inheritors(), as this function would too previously.

Also note that beside the breadth-first -> depth-first change, this also
changes the locking order of partitions for a given partitioned table.
The OIDs in partdesc->oids[] are canonically ordered (that is order of
their partition bounds), whereas find_inheritance_children() that's called
by find_all_inheritors() would lock them in the order in which the
individual OIDs were found in the system catalog.

Not sure if there is anything to be alarmed of here, but in all previous
discussions, this has been a thing to avoid.

At that point I was able to understand what the problem was: when
attaching the default partition, we were setting the constraint to be
validated for that partition to the correctly computed partition
constraint; and later in the same function we would set the constraint
to "the immature constraint" because we were now seeing that the default
partition OID was not invalid. So it was rather a bug in
ValidatePartitionConstraints, in that it was accepting to set the
expression to validate when another expression had already been set! I
added an assert to protect against this. And then changed the decision
of whether or not to run this block based on the attached partition
being the default one or not; because as the "if" test was, it was just
a recipe for confusion. (Now, if you look carefully you realize that
the new test for bound->is_default I added is kinda redundant: it can
only be set if there was a default partition OID at start of the
function, and therefore we know we're not adding a default partition.
And for the case where we *are* adding the default partition, it is
still Invalid because we don't re-read its own OID. But relying on that
seems brittle because it breaks if we happen to update the default
partition OID in the middle of that function, which is what we were
doing. Hence the new is_default test.)

Seems reasonable.

+1

But even if we assume so, rereading
defaultPartOid is still breaking the assumption that
defaultPartOid is that at the time of entering to this function
and the added condition just conceals the fact.

Afaics, defaultPartOid is only set at the beginning of
ATExecAttachPartition, so it seems fine.

So I think it should be an assertion.

| if (OidIsValid(defaultPartOid))
| {
| /*
| * The command cannot be adding default partition if the
| * defaultPartOid is valid.
| */
| Assert(!cmd->bound->is_default);

I guess that makes sense, because when trying to attach a default
partition to the table that already has one, check_new_partition_bound
that's called before this errors out before we could get here.

I looked at the implementation of ValidatePartitionConstraints and
didn't like it, so I rewrote it also.

This comment is mistaken, too:
- /*
- * Skip if the partition is itself a partitioned table. We can only
- * ever scan RELKIND_RELATION relations.
- */
... because it ignores the possibility of a partition being a foreign
table. The code seems to work, but I think there is no test to cover
the case that a foreign table containing data that doesn't satisfy the
constraint is attached, because when I set that case to return doing
nothing (ie. ATGetQueueEntry is not called for a foreign partition), no
test failed.

Foreign tables are intentionally not verified on attaching (in my
understanding). ATRewriteTables ignores foreign tables so it
contradicts to what ATRewriteTables thinks. As my understanding
foreign tables are assumed to be unrestrictable by local
constraint after attaching so the users are responsible to the
consistency.

That and ATRewriteTable() in phase 3 cannot really cope with being handed
a foreign table as it can only work with RELKIND_RELATION tables.
Actually, the following in ATRewriteTables() also prevents passing foreign
tables:

static void
ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
{
ListCell *ltab;

/* Go through each table that needs to be checked or rewritten */
foreach(ltab, *wqueue)
{
AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);

/*
* Foreign tables have no storage, nor do partitioned tables and
* indexes.
*/
if (tab->relkind == RELKIND_FOREIGN_TABLE ||
tab->relkind == RELKIND_PARTITIONED_TABLE ||
tab->relkind == RELKIND_PARTITIONED_INDEX)
continue;

Generally speaking, I didn't like ATExecAttachPartition; it's doing too
much that should have been done in ATPrepAttachPartition. The only
reason that's not broken is because we don't allow ATTACH PARTITION to
appear together with other commands in alter table, which seems
disappointing ... for example, when attaching multiple tables and a
default partition exist, you have to scan the default one multiple
times, which is lame.

Indeed, currently only partition commands are isolated from other
alter table subcommands (except all in tablespace cases). We can
improve that in the next step?

I think we can improve this.

(Speaking of lame: I noticed that RelationGetPartitionQual obtains lock
on the parent relation ... I wonder if this can be used to cause a
deadlock during InitResultRelationInfo.)

It does seem that there is a possibility of deadlock, because when
InitResultRelInfo(), that's initializing a partition, calls
RelationGetPartitionQual() to get its partition constraint, the partition
would already have been locked. So this reverses the generally
established order of locking the parent first; another session which tries
to add a column to the parent, for example, will lock the parent and then
partitions.

I think we need for inserts directly into a partition to lock all of its
ancestors from the root to the direct parent (in that order) before
locking the partition itself, or maybe at least the parent if not all
ancestors.

Mmm. That seems strange since the RealtionGetPartitionQual
doesn't return anything (specifically NIL) there, because we
don't allow a partition to be attached to partitioned tables. It
seems totally useless.

Addition to that the code tries to add the partition qual (which
is always NIL) destructively and assign to partConstraint..

It is not always NIL. Imagine attaching a partition at a lower level.

create table foo (a int, b char) partition by list (a);
create table foo1 partition of foo for values in (1) partition by list (b);
create table foo1a (a, b) as values (2, 'b');

-- note that we're attaching to foo1, not foo
alter table foo1 attach partition foo1a for values in ('a');

If we didn't include foo1's (the parent) constraint (that is, a = 1), the
above command will wrongly succeed. It must include a = 1 in the
constraint to be be checked when scanning foo1a.

Although, I noticed there is no test covering this.

BTW, I think this is already broken for the case where the default
partition is partitioned and you attach a partition colliding with a row
that's being concurrently inserted in a partition of the default
partition, though I didn't bother to write a test for that.

How is it broken? Every attaching partitions are checked for the
specified partition bound and every partitions of the default
partition are also checked against the new default part bound. We
already hold required locks on all the participants.

Yes, concurrent insertions to either the default partition or any of its
partitions couldn't be occurring as we'd have locked them.

Anyway, I'm just an onlooker fixing a CommandCounterIncrement change.

It's reassuring. Thanks.

Yes, thank you for taking the time out to clean things up.

Thanks,
Amit

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=972b6ec20bf
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=972b6ec20bf

#13Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Amit Langote (#12)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Added to the open items list.

https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items#Open_Issues

On Tue, Apr 3, 2018 at 2:52 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

On 2018/04/03 14:45, Kyotaro HORIGUCHI wrote:

Hello.

At Mon, 2 Apr 2018 16:11:12 -0300, Alvaro Herrera <

alvherre@alvh.no-ip.org> wrote:

Why do we need AccessExclusiveLock on all children of a relation that we
want to scan to search for rows not satisfying the constraint? I think
it should be enough to get ShareLock, which prevents INSERT, no? I have
a feeling I'm missing something here, but I don't know what, and all
tests pass with that change.

Thinking on this a bit, I see no problem with locking the children with
just ShareLock. It was just a paranoia that if we're going to lock the
table itself being attached with AEL, we must its children (if any) with
AEL, too.

Mmm. I'm not sure if there's a lock-upgrade case but the
following sentense is left at the last of one of the modified
comments. ATREwriteTables is called with AEL after that (that has
finally no effect in this case).

| But we cannot risk a deadlock by

taking

| * a weaker lock now and the stronger one only when needed.

I don't actually find places where the children's lock can be
raised but this seems just following the lock parent first
principle.

No lock upgrade happen as of now. The comment was added by the commit
972b6ec20bf [1], which removed the code that could cause such a deadlock.
The comment fragment is simply trying to explain why we don't postpone the
locking of children to a later time, say, to the point where we actually
know that they need to be scanned. Previously the code next to the
comment used to lock the children using AccessShareLock, because at that
point we just needed to check if the table being attached is one of those
children and then later locked with AEL if it turned out that they need to
be scanned to check the partition constraint.

By the way check_default_allows_bound() (CREATE TABLE case)
contains a similar usage of find_all_inheritors(default_rel,
AEL).

Good catch. Its job is more or less same as
ValidatePartitionConstraints(), except all the children (if any) are
scanned right away instead of queuing it like in the AT case.

Also: the change proposed to remove the get_default_oid_from_partdesc()
call actually fixes the bug, but to me it was not at all obvious why.

CommandCounterIncrement updates the content of a relcache entry
via invalidation. It can be surprising for callers of a function
like StorePartitionBound.

CommandCounterIncrement
<skip inval mechanism>
RelationCacheInvalidateEntry
RelationFlushRelation
RelationClearRelation

Because of the CCI() after storing the default partition OID into the
system catalog, RelationClearRelation() would changes what
rel->rd_partdesc points to where 'rel' is the ATExecAttachPartition()'s
reference to the relcache entry of the parent that it passed to
StorePartitionBound.

So, whereas the rel->rd_partdesc wouldn't contain the default partition
before StorePartitionBound() was called, it would after.

To figure out why, I first had to realize that
ValidatePartitionConstraints was lying to me, both in name and in
comments: it doesn't actually validate anything, it merely queues
entries so that alter table's phase 3 would do the validation. I found
this extremely confusing, so I fixed the comments to match reality, and
later decided to rename the function also.

It is reasonable. Removing exccessive extension of lower-level
partitions is also reasonable. The previous code extended
inheritances in different manner for levels at odd and even
depth.

I like the new code including the naming, but I notice this changes the
order in which we do the locking now. There are still sites in the code
where the locking order is breadth-first, that is, as determined by
find_all_inheritors(), as this function would too previously.

Also note that beside the breadth-first -> depth-first change, this also
changes the locking order of partitions for a given partitioned table.
The OIDs in partdesc->oids[] are canonically ordered (that is order of
their partition bounds), whereas find_inheritance_children() that's called
by find_all_inheritors() would lock them in the order in which the
individual OIDs were found in the system catalog.

Not sure if there is anything to be alarmed of here, but in all previous
discussions, this has been a thing to avoid.

At that point I was able to understand what the problem was: when
attaching the default partition, we were setting the constraint to be
validated for that partition to the correctly computed partition
constraint; and later in the same function we would set the constraint
to "the immature constraint" because we were now seeing that the default
partition OID was not invalid. So it was rather a bug in
ValidatePartitionConstraints, in that it was accepting to set the
expression to validate when another expression had already been set! I
added an assert to protect against this. And then changed the decision
of whether or not to run this block based on the attached partition
being the default one or not; because as the "if" test was, it was just
a recipe for confusion. (Now, if you look carefully you realize that
the new test for bound->is_default I added is kinda redundant: it can
only be set if there was a default partition OID at start of the
function, and therefore we know we're not adding a default partition.
And for the case where we *are* adding the default partition, it is
still Invalid because we don't re-read its own OID. But relying on that
seems brittle because it breaks if we happen to update the default
partition OID in the middle of that function, which is what we were
doing. Hence the new is_default test.)

Seems reasonable.

+1

But even if we assume so, rereading
defaultPartOid is still breaking the assumption that
defaultPartOid is that at the time of entering to this function
and the added condition just conceals the fact.

Afaics, defaultPartOid is only set at the beginning of
ATExecAttachPartition, so it seems fine.

So I think it should be an assertion.

| if (OidIsValid(defaultPartOid))
| {
| /*
| * The command cannot be adding default partition if the
| * defaultPartOid is valid.
| */
| Assert(!cmd->bound->is_default);

I guess that makes sense, because when trying to attach a default
partition to the table that already has one, check_new_partition_bound
that's called before this errors out before we could get here.

I looked at the implementation of ValidatePartitionConstraints and
didn't like it, so I rewrote it also.

This comment is mistaken, too:
- /*
- * Skip if the partition is itself a partitioned table. We can

only

- * ever scan RELKIND_RELATION relations.
- */
... because it ignores the possibility of a partition being a foreign
table. The code seems to work, but I think there is no test to cover
the case that a foreign table containing data that doesn't satisfy the
constraint is attached, because when I set that case to return doing
nothing (ie. ATGetQueueEntry is not called for a foreign partition), no
test failed.

Foreign tables are intentionally not verified on attaching (in my
understanding). ATRewriteTables ignores foreign tables so it
contradicts to what ATRewriteTables thinks. As my understanding
foreign tables are assumed to be unrestrictable by local
constraint after attaching so the users are responsible to the
consistency.

That and ATRewriteTable() in phase 3 cannot really cope with being handed
a foreign table as it can only work with RELKIND_RELATION tables.
Actually, the following in ATRewriteTables() also prevents passing foreign
tables:

static void
ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE
lockmode)
{
ListCell *ltab;

/* Go through each table that needs to be checked or rewritten */
foreach(ltab, *wqueue)
{
AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);

/*
* Foreign tables have no storage, nor do partitioned tables and
* indexes.
*/
if (tab->relkind == RELKIND_FOREIGN_TABLE ||
tab->relkind == RELKIND_PARTITIONED_TABLE ||
tab->relkind == RELKIND_PARTITIONED_INDEX)
continue;

Generally speaking, I didn't like ATExecAttachPartition; it's doing too
much that should have been done in ATPrepAttachPartition. The only
reason that's not broken is because we don't allow ATTACH PARTITION to
appear together with other commands in alter table, which seems
disappointing ... for example, when attaching multiple tables and a
default partition exist, you have to scan the default one multiple
times, which is lame.

Indeed, currently only partition commands are isolated from other
alter table subcommands (except all in tablespace cases). We can
improve that in the next step?

I think we can improve this.

(Speaking of lame: I noticed that RelationGetPartitionQual obtains lock
on the parent relation ... I wonder if this can be used to cause a
deadlock during InitResultRelationInfo.)

It does seem that there is a possibility of deadlock, because when
InitResultRelInfo(), that's initializing a partition, calls
RelationGetPartitionQual() to get its partition constraint, the partition
would already have been locked. So this reverses the generally
established order of locking the parent first; another session which tries
to add a column to the parent, for example, will lock the parent and then
partitions.

I think we need for inserts directly into a partition to lock all of its
ancestors from the root to the direct parent (in that order) before
locking the partition itself, or maybe at least the parent if not all
ancestors.

Mmm. That seems strange since the RealtionGetPartitionQual
doesn't return anything (specifically NIL) there, because we
don't allow a partition to be attached to partitioned tables. It
seems totally useless.

Addition to that the code tries to add the partition qual (which
is always NIL) destructively and assign to partConstraint..

It is not always NIL. Imagine attaching a partition at a lower level.

create table foo (a int, b char) partition by list (a);
create table foo1 partition of foo for values in (1) partition by list (b);
create table foo1a (a, b) as values (2, 'b');

-- note that we're attaching to foo1, not foo
alter table foo1 attach partition foo1a for values in ('a');

If we didn't include foo1's (the parent) constraint (that is, a = 1), the
above command will wrongly succeed. It must include a = 1 in the
constraint to be be checked when scanning foo1a.

Although, I noticed there is no test covering this.

BTW, I think this is already broken for the case where the default
partition is partitioned and you attach a partition colliding with a row
that's being concurrently inserted in a partition of the default
partition, though I didn't bother to write a test for that.

How is it broken? Every attaching partitions are checked for the
specified partition bound and every partitions of the default
partition are also checked against the new default part bound. We
already hold required locks on all the participants.

Yes, concurrent insertions to either the default partition or any of its
partitions couldn't be occurring as we'd have locked them.

Anyway, I'm just an onlooker fixing a CommandCounterIncrement change.

It's reassuring. Thanks.

Yes, thank you for taking the time out to clean things up.

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=
commitdiff;h=972b6ec20bf

--
Rushabh Lathia

#14Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#12)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Thanks for the discussion. Per your suggestions, I changed the check
for default partition OID to an assert instead of the 'if' condition,
and removed the code that attempted vainly to verify the constraint when
attaching a foreign table as a partition. And pushed.

I think we're done here, so marked the Open Item as fixed.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#10)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

On Mon, Apr 2, 2018 at 3:11 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Why do we need AccessExclusiveLock on all children of a relation that we
want to scan to search for rows not satisfying the constraint? I think
it should be enough to get ShareLock, which prevents INSERT, no? I have
a feeling I'm missing something here, but I don't know what, and all
tests pass with that change.

I don't think it was a good idea to change this without a lot more
discussion, as part of another commit that really was about something
else, and after feature freeze.

As Kyotaro Horiguchi also mentioned, this introduces a deadlock
hazard. With current master:

Setup:

create table foo (a int, b text) partition by range (a);
create table foo1 partition of foo for values from (1) to (100);
create table food (a int, b text) partition by range (a);
create table food1 partition of food for values from (1) to (100);

Session 1:
begin;
BEGIN
rhaas=# select * from food1;
a | b
---+---
(0 rows)

rhaas=# insert into food1 values (1, 'thunk');

Session 2:
rhaas=# alter table foo attach partition food default;

At which point session 1 deadlocks, because the lock has to be
upgraded to AccessExclusiveLock since we're changing the constraints.

Now you might think about relaxing the lock level for the later
acquisition, too, but I'm not sure that's safe. The issue is that
scanning a relation for rows that don't match the new constraint isn't
by itself sufficient: you also have to be sure that nobody can add one
later. If they don't have the relation open, they'll certainly
rebuild their relcache entry when they open it, so it'll be fine. But
if they already have the relation open, I'm not sure we can be certain
it will get rebuilt if, later in the same transaction, they try to
insert data. This whole area needs more research -- there may very
well be good opportunities to reduce lock levels in this area, but it
needs careful study and analysis.

Please revert the part of this commit that changed the lock level.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Robert Haas (#15)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Robert Haas wrote:

I don't think it was a good idea to change this without a lot more
discussion, as part of another commit that really was about something
else, and after feature freeze.

Please revert the part of this commit that changed the lock level.

You're right, that was too hasty. Will revert.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Robert Haas (#15)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Robert Haas wrote:

Please revert the part of this commit that changed the lock level.

Done.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#17)
Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

On Thu, Apr 12, 2018 at 3:59 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Robert Haas wrote:

Please revert the part of this commit that changed the lock level.

Done.

Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company