ALTER TABLE validate foreign key dependency problem

Started by David Rowleyover 5 years ago4 messages
#1David Rowley
dgrowleyml@gmail.com
1 attachment(s)

Hi,

I had an ALTER TABLE dependency problem reported to me. Here's a
simplified version of it:

CREATE TABLE t (a INT, PRIMARY KEY(a));
ALTER TABLE t ADD CONSTRAINT t_fkey FOREIGN KEY (a) REFERENCES t(a) NOT VALID;
ALTER TABLE t VALIDATE CONSTRAINT t_fkey, ALTER a TYPE BIGINT;

Which results in:

ERROR: could not read block 0 in file "base/12854/16411": read only 0
of 8192 bytes
CONTEXT: SQL statement "SELECT fk."a" FROM ONLY "public"."t" fk LEFT
OUTER JOIN ONLY "public"."t" pk ON ( pk."a" OPERATOR(pg_catalog.=)
fk."a") WHERE pk."a" IS NULL AND (fk."a" IS NOT NULL)"

What's going on here is that due to the ALTER TYPE, a table rewrite is
pending. The primary key index of the table is also due to be
rewritten which ATExecAddIndex() delays due to the pending table
rewrite. When we process AT_PASS_MISC level changes and attempt to
validate the foreign key constraint, the table is still pending a
rewrite and the new index still does not exist.
validateForeignKeyConstraint() executes regardless of the pending
rewrite and bumps into the above error during the SPI call while
trying to check the _bt_getrootheight() in get_relation_info().

I think the fix is just to delay the foreign key validation when
there's a rewrite pending until the rewrite is complete.

I also considered that we could just delay all foreign key validations
until phase 3, but I ended up just doing then only when a rewrite is
pending.

David

Attachments:

delay_altertable_foreignkey_validation_during_table_rewrite.patchapplication/octet-stream; name=delay_altertable_foreignkey_validation_during_table_rewrite.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f79044f39f..f64dfffef0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,7 +328,8 @@ static void AlterSeqNamespaces(Relation classRel, Relation rel,
 							   LOCKMODE lockmode);
 static ObjectAddress ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 										   bool recurse, bool recursing, LOCKMODE lockmode);
-static ObjectAddress ATExecValidateConstraint(Relation rel, char *constrName,
+static ObjectAddress ATExecValidateConstraint(AlteredTableInfo *tab,
+											  Relation rel, char *constrName,
 											  bool recurse, bool recursing, LOCKMODE lockmode);
 static int	transformColumnNameList(Oid relId, List *colList,
 									int16 *attnums, Oid *atttypids);
@@ -4500,13 +4501,13 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			address = ATExecAlterConstraint(rel, cmd, false, false, lockmode);
 			break;
 		case AT_ValidateConstraint: /* VALIDATE CONSTRAINT */
-			address = ATExecValidateConstraint(rel, cmd->name, false, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, false,
+											   false, lockmode);
 			break;
 		case AT_ValidateConstraintRecurse:	/* VALIDATE CONSTRAINT with
 											 * recursion */
-			address = ATExecValidateConstraint(rel, cmd->name, true, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, true,
+											   false, lockmode);
 			break;
 		case AT_DropConstraint: /* DROP CONSTRAINT */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
@@ -9727,8 +9728,9 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
  * was already validated, InvalidObjectAddress is returned.
  */
 static ObjectAddress
-ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
-						 bool recursing, LOCKMODE lockmode)
+ATExecValidateConstraint(AlteredTableInfo *tab, Relation rel,
+						 char *constrName, bool recurse, bool recursing,
+						 LOCKMODE lockmode)
 {
 	Relation	conrel;
 	SysScanDesc scan;
@@ -9779,22 +9781,50 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 
 		if (con->contype == CONSTRAINT_FOREIGN)
 		{
-			Relation	refrel;
-
 			/*
-			 * Triggers are already in place on both tables, so a concurrent
-			 * write that alters the result here is not possible. Normally we
-			 * can run a query here to do the validation, which would only
-			 * require AccessShareLock. In some cases, it is possible that we
-			 * might need to fire triggers to perform the check, so we take a
-			 * lock at RowShareLock level just in case.
+			 * If a table rewrite is pending then we may not be able to
+			 * perform a foreign key constriant validation right now. We'll
+			 * just mark that the validation must be done after the rewrite
+			 * instead.
 			 */
-			refrel = table_open(con->confrelid, RowShareLock);
+			if (tab->rewrite > 0)
+			{
+				NewConstraint *newcon;
+				Constraint *fkconstraint;
+
+				fkconstraint = makeNode(Constraint);
+				/* for now this is all we need */
+				fkconstraint->conname = constrName;
+
+				newcon = (NewConstraint *) palloc0(sizeof(NewConstraint));
+				newcon->name =  constrName;
+				newcon->contype = CONSTR_FOREIGN;
+				newcon->refrelid = con->confrelid;
+				newcon->refindid = con->conindid;
+				newcon->conid = con->oid;
+				newcon->qual = (Node *) fkconstraint;
+
+				tab->constraints = lappend(tab->constraints, newcon);
+			}
+			else
+			{
+				Relation	refrel;
 
-			validateForeignKeyConstraint(constrName, rel, refrel,
-										 con->conindid,
-										 con->oid);
-			table_close(refrel, NoLock);
+				/*
+				 * Triggers are already in place on both tables, so a concurrent
+				 * write that alters the result here is not possible. Normally we
+				 * can run a query here to do the validation, which would only
+				 * require AccessShareLock. In some cases, it is possible that we
+				 * might need to fire triggers to perform the check, so we take a
+				 * lock at RowShareLock level just in case.
+				 */
+				refrel = table_open(con->confrelid, RowShareLock);
+
+				validateForeignKeyConstraint(constrName, rel, refrel,
+											 con->conindid,
+											 con->oid);
+				table_close(refrel, NoLock);
+			}
 
 			/*
 			 * We disallow creating invalid foreign keys to or from
@@ -9844,7 +9874,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 				/* find_all_inheritors already got lock */
 				childrel = table_open(childoid, NoLock);
 
-				ATExecValidateConstraint(childrel, constrName, false,
+				ATExecValidateConstraint(tab, childrel, constrName, false,
 										 true, lockmode);
 				table_close(childrel, NoLock);
 			}
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 002079601f..a3b8e7c044 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -997,6 +997,12 @@ alter table atacc1
   add column b float8 not null default random(),
   add primary key(a);
 drop table atacc1;
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index ec272d78f8..6f9091060b 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -757,6 +757,13 @@ alter table atacc1
   add primary key(a);
 drop table atacc1;
 
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
+
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
#2David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#1)
3 attachment(s)
Re: ALTER TABLE validate foreign key dependency problem

On Thu, 9 Jul 2020 at 15:54, David Rowley <dgrowleyml@gmail.com> wrote:

I think the fix is just to delay the foreign key validation when
there's a rewrite pending until the rewrite is complete.

I looked over this again and only slightly reworded a comment. The
problem exists as far back as 9.5 so I've attached 3 patches that,
pending any objections, I plan to push about 24 hours from now.

I also considered that we could just delay all foreign key validations
until phase 3, but I ended up just doing then only when a rewrite is
pending.

I still wonder if it's best to delay the validation of the foreign key
regardless of if there's a pending table rewrite, but the patch as it
is now only delays if there's a pending rewrite.

David

Attachments:

delay_altertable_foreignkey_validation_during_table_rewrite10_v2.patchapplication/octet-stream; name=delay_altertable_foreignkey_validation_during_table_rewrite10_v2.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 375e035f4e..ffd8db2563 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -313,8 +313,9 @@ static void AlterSeqNamespaces(Relation classRel, Relation rel,
 				   LOCKMODE lockmode);
 static ObjectAddress ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 					  bool recurse, bool recursing, LOCKMODE lockmode);
-static ObjectAddress ATExecValidateConstraint(Relation rel, char *constrName,
-						 bool recurse, bool recursing, LOCKMODE lockmode);
+static ObjectAddress ATExecValidateConstraint(AlteredTableInfo *tab,
+						 Relation rel, char *constrName, bool recurse,
+						 bool recursing, LOCKMODE lockmode);
 static int transformColumnNameList(Oid relId, List *colList,
 						int16 *attnums, Oid *atttypids);
 static int transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid,
@@ -3951,13 +3952,13 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			address = ATExecAlterConstraint(rel, cmd, false, false, lockmode);
 			break;
 		case AT_ValidateConstraint: /* VALIDATE CONSTRAINT */
-			address = ATExecValidateConstraint(rel, cmd->name, false, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, false,
+											   false, lockmode);
 			break;
 		case AT_ValidateConstraintRecurse:	/* VALIDATE CONSTRAINT with
 											 * recursion */
-			address = ATExecValidateConstraint(rel, cmd->name, true, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, true,
+											   false, lockmode);
 			break;
 		case AT_DropConstraint: /* DROP CONSTRAINT */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
@@ -7660,8 +7661,9 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
  * was already validated, InvalidObjectAddress is returned.
  */
 static ObjectAddress
-ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
-						 bool recursing, LOCKMODE lockmode)
+ATExecValidateConstraint(AlteredTableInfo *tab, Relation rel,
+						 char *constrName, bool recurse, bool recursing,
+						 LOCKMODE lockmode)
 {
 	Relation	conrel;
 	SysScanDesc scan;
@@ -7713,22 +7715,45 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 
 		if (con->contype == CONSTRAINT_FOREIGN)
 		{
-			Relation	refrel;
+			if (tab->rewrite > 0)
+			{
+				NewConstraint *newcon;
+				Constraint *fkconstraint;
+
+				fkconstraint = makeNode(Constraint);
+				/* for now this is all we need */
+				fkconstraint->conname = constrName;
+
+				newcon = (NewConstraint *) palloc0(sizeof(NewConstraint));
+				newcon->name =  constrName;
+				newcon->contype = CONSTR_FOREIGN;
+				newcon->refrelid = con->confrelid;
+				newcon->refindid = con->conindid;
+				newcon->conid = HeapTupleGetOid(tuple);
+				newcon->qual = (Node *) fkconstraint;
+
+				tab->constraints = lappend(tab->constraints, newcon);
+			}
+			else
+			{
 
-			/*
-			 * Triggers are already in place on both tables, so a concurrent
-			 * write that alters the result here is not possible. Normally we
-			 * can run a query here to do the validation, which would only
-			 * require AccessShareLock. In some cases, it is possible that we
-			 * might need to fire triggers to perform the check, so we take a
-			 * lock at RowShareLock level just in case.
-			 */
-			refrel = heap_open(con->confrelid, RowShareLock);
+				Relation	refrel;
 
-			validateForeignKeyConstraint(constrName, rel, refrel,
-										 con->conindid,
-										 HeapTupleGetOid(tuple));
-			heap_close(refrel, NoLock);
+				/*
+				 * Triggers are already in place on both tables, so a concurrent
+				 * write that alters the result here is not possible. Normally we
+				 * can run a query here to do the validation, which would only
+				 * require AccessShareLock. In some cases, it is possible that we
+				 * might need to fire triggers to perform the check, so we take a
+				 * lock at RowShareLock level just in case.
+				 */
+				refrel = heap_open(con->confrelid, RowShareLock);
+
+				validateForeignKeyConstraint(constrName, rel, refrel,
+											 con->conindid,
+											 HeapTupleGetOid(tuple));
+				heap_close(refrel, NoLock);
+			}
 
 			/*
 			 * Foreign keys do not inherit, so we purposely ignore the
@@ -7778,7 +7803,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 				/* find_all_inheritors already got lock */
 				childrel = heap_open(childoid, NoLock);
 
-				ATExecValidateConstraint(childrel, constrName, false,
+				ATExecValidateConstraint(tab, childrel, constrName, false,
 										 true, lockmode);
 				heap_close(childrel, NoLock);
 			}
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index eabd3e160c..ce9ea9ee33 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -941,6 +941,12 @@ ERROR:  column "test2" contains null values
 -- now add a primary key column with a default (succeeds).
 alter table atacc1 add column test2 int default 0 primary key;
 drop table atacc1;
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 8afdf0613e..c52fc9e3fb 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -713,6 +713,13 @@ alter table atacc1 add column test2 int primary key;
 alter table atacc1 add column test2 int default 0 primary key;
 drop table atacc1;
 
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
+
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
delay_altertable_foreignkey_validation_during_table_rewrite95_v2.patchapplication/octet-stream; name=delay_altertable_foreignkey_validation_during_table_rewrite95_v2.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f1daf4b148..c6447e2570 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -288,8 +288,9 @@ static void AlterSeqNamespaces(Relation classRel, Relation rel,
 				   LOCKMODE lockmode);
 static ObjectAddress ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 					  bool recurse, bool recursing, LOCKMODE lockmode);
-static ObjectAddress ATExecValidateConstraint(Relation rel, char *constrName,
-						 bool recurse, bool recursing, LOCKMODE lockmode);
+static ObjectAddress ATExecValidateConstraint(AlteredTableInfo *tab,
+						 Relation rel, char *constrName, bool recurse,
+						 bool recursing, LOCKMODE lockmode);
 static int transformColumnNameList(Oid relId, List *colList,
 						int16 *attnums, Oid *atttypids);
 static int transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid,
@@ -3582,13 +3583,13 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			address = ATExecAlterConstraint(rel, cmd, false, false, lockmode);
 			break;
 		case AT_ValidateConstraint:		/* VALIDATE CONSTRAINT */
-			address = ATExecValidateConstraint(rel, cmd->name, false, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, false,
+											   false, lockmode);
 			break;
 		case AT_ValidateConstraintRecurse:		/* VALIDATE CONSTRAINT with
 												 * recursion */
-			address = ATExecValidateConstraint(rel, cmd->name, true, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, true,
+											   false, lockmode);
 			break;
 		case AT_DropConstraint:	/* DROP CONSTRAINT */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
@@ -6851,8 +6852,9 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
  * was already validated, InvalidObjectAddress is returned.
  */
 static ObjectAddress
-ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
-						 bool recursing, LOCKMODE lockmode)
+ATExecValidateConstraint(AlteredTableInfo *tab, Relation rel,
+						 char *constrName, bool recurse, bool recursing,
+						 LOCKMODE lockmode)
 {
 	Relation	conrel;
 	SysScanDesc scan;
@@ -6904,22 +6906,45 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 
 		if (con->contype == CONSTRAINT_FOREIGN)
 		{
-			Relation	refrel;
+			if (tab->rewrite > 0)
+			{
+				NewConstraint *newcon;
+				Constraint *fkconstraint;
+
+				fkconstraint = makeNode(Constraint);
+				/* for now this is all we need */
+				fkconstraint->conname = constrName;
+
+				newcon = (NewConstraint *) palloc0(sizeof(NewConstraint));
+				newcon->name =  constrName;
+				newcon->contype = CONSTR_FOREIGN;
+				newcon->refrelid = con->confrelid;
+				newcon->refindid = con->conindid;
+				newcon->conid = HeapTupleGetOid(tuple);
+				newcon->qual = (Node *) fkconstraint;
+
+				tab->constraints = lappend(tab->constraints, newcon);
+			}
+			else
+			{
 
-			/*
-			 * Triggers are already in place on both tables, so a concurrent
-			 * write that alters the result here is not possible. Normally we
-			 * can run a query here to do the validation, which would only
-			 * require AccessShareLock. In some cases, it is possible that we
-			 * might need to fire triggers to perform the check, so we take a
-			 * lock at RowShareLock level just in case.
-			 */
-			refrel = heap_open(con->confrelid, RowShareLock);
+				Relation	refrel;
 
-			validateForeignKeyConstraint(constrName, rel, refrel,
-										 con->conindid,
-										 HeapTupleGetOid(tuple));
-			heap_close(refrel, NoLock);
+				/*
+				 * Triggers are already in place on both tables, so a concurrent
+				 * write that alters the result here is not possible. Normally we
+				 * can run a query here to do the validation, which would only
+				 * require AccessShareLock. In some cases, it is possible that we
+				 * might need to fire triggers to perform the check, so we take a
+				 * lock at RowShareLock level just in case.
+				 */
+				refrel = heap_open(con->confrelid, RowShareLock);
+
+				validateForeignKeyConstraint(constrName, rel, refrel,
+											 con->conindid,
+											 HeapTupleGetOid(tuple));
+				heap_close(refrel, NoLock);
+			}
 
 			/*
 			 * Foreign keys do not inherit, so we purposely ignore the
@@ -6968,7 +6993,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 				/* find_all_inheritors already got lock */
 				childrel = heap_open(childoid, NoLock);
 
-				ATExecValidateConstraint(childrel, constrName, false,
+				ATExecValidateConstraint(tab, childrel, constrName, false,
 										 true, lockmode);
 				heap_close(childrel, NoLock);
 			}
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 46402b4a6d..1b54e01f95 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -932,6 +932,12 @@ ERROR:  column "test2" contains null values
 -- now add a primary key column with a default (succeeds).
 alter table atacc1 add column test2 int default 0 primary key;
 drop table atacc1;
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 421d4c7d31..9a1dc69b64 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -710,6 +710,13 @@ alter table atacc1 add column test2 int primary key;
 alter table atacc1 add column test2 int default 0 primary key;
 drop table atacc1;
 
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
+
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
delay_altertable_foreignkey_validation_during_table_rewrite12_v2.patchapplication/octet-stream; name=delay_altertable_foreignkey_validation_during_table_rewrite12_v2.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f79044f39f..b5a67ecda7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,7 +328,8 @@ static void AlterSeqNamespaces(Relation classRel, Relation rel,
 							   LOCKMODE lockmode);
 static ObjectAddress ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 										   bool recurse, bool recursing, LOCKMODE lockmode);
-static ObjectAddress ATExecValidateConstraint(Relation rel, char *constrName,
+static ObjectAddress ATExecValidateConstraint(AlteredTableInfo *tab,
+											  Relation rel, char *constrName,
 											  bool recurse, bool recursing, LOCKMODE lockmode);
 static int	transformColumnNameList(Oid relId, List *colList,
 									int16 *attnums, Oid *atttypids);
@@ -4500,13 +4501,13 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			address = ATExecAlterConstraint(rel, cmd, false, false, lockmode);
 			break;
 		case AT_ValidateConstraint: /* VALIDATE CONSTRAINT */
-			address = ATExecValidateConstraint(rel, cmd->name, false, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, false,
+											   false, lockmode);
 			break;
 		case AT_ValidateConstraintRecurse:	/* VALIDATE CONSTRAINT with
 											 * recursion */
-			address = ATExecValidateConstraint(rel, cmd->name, true, false,
-											   lockmode);
+			address = ATExecValidateConstraint(tab, rel, cmd->name, true,
+											   false, lockmode);
 			break;
 		case AT_DropConstraint: /* DROP CONSTRAINT */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
@@ -9727,8 +9728,9 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
  * was already validated, InvalidObjectAddress is returned.
  */
 static ObjectAddress
-ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
-						 bool recursing, LOCKMODE lockmode)
+ATExecValidateConstraint(AlteredTableInfo *tab, Relation rel,
+						 char *constrName, bool recurse, bool recursing,
+						 LOCKMODE lockmode)
 {
 	Relation	conrel;
 	SysScanDesc scan;
@@ -9779,22 +9781,49 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 
 		if (con->contype == CONSTRAINT_FOREIGN)
 		{
-			Relation	refrel;
-
 			/*
-			 * Triggers are already in place on both tables, so a concurrent
-			 * write that alters the result here is not possible. Normally we
-			 * can run a query here to do the validation, which would only
-			 * require AccessShareLock. In some cases, it is possible that we
-			 * might need to fire triggers to perform the check, so we take a
-			 * lock at RowShareLock level just in case.
+			 * If a table rewrite is pending then we may not be able to
+			 * perform a foreign key constriant validation right now. Instead
+			 * we add the to-be-validated foreign key to Phase 3's queue.
 			 */
-			refrel = table_open(con->confrelid, RowShareLock);
+			if (tab->rewrite > 0)
+			{
+				NewConstraint *newcon;
+				Constraint *fkconstraint;
+
+				fkconstraint = makeNode(Constraint);
+				/* for now this is all we need */
+				fkconstraint->conname = constrName;
+
+				newcon = (NewConstraint *) palloc0(sizeof(NewConstraint));
+				newcon->name =  constrName;
+				newcon->contype = CONSTR_FOREIGN;
+				newcon->refrelid = con->confrelid;
+				newcon->refindid = con->conindid;
+				newcon->conid = con->oid;
+				newcon->qual = (Node *) fkconstraint;
+
+				tab->constraints = lappend(tab->constraints, newcon);
+			}
+			else
+			{
+				Relation	refrel;
 
-			validateForeignKeyConstraint(constrName, rel, refrel,
-										 con->conindid,
-										 con->oid);
-			table_close(refrel, NoLock);
+				/*
+				 * Triggers are already in place on both tables, so a concurrent
+				 * write that alters the result here is not possible. Normally we
+				 * can run a query here to do the validation, which would only
+				 * require AccessShareLock. In some cases, it is possible that we
+				 * might need to fire triggers to perform the check, so we take a
+				 * lock at RowShareLock level just in case.
+				 */
+				refrel = table_open(con->confrelid, RowShareLock);
+
+				validateForeignKeyConstraint(constrName, rel, refrel,
+											 con->conindid,
+											 con->oid);
+				table_close(refrel, NoLock);
+			}
 
 			/*
 			 * We disallow creating invalid foreign keys to or from
@@ -9844,7 +9873,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
 				/* find_all_inheritors already got lock */
 				childrel = table_open(childoid, NoLock);
 
-				ATExecValidateConstraint(childrel, constrName, false,
+				ATExecValidateConstraint(tab, childrel, constrName, false,
 										 true, lockmode);
 				table_close(childrel, NoLock);
 			}
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 002079601f..a3b8e7c044 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -997,6 +997,12 @@ alter table atacc1
   add column b float8 not null default random(),
   add primary key(a);
 drop table atacc1;
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index ec272d78f8..6f9091060b 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -757,6 +757,13 @@ alter table atacc1
   add primary key(a);
 drop table atacc1;
 
+-- additionally, we've seen issues with foreign key validation not being
+-- properly delayed until after a table rewrite.  Check that works ok.
+create table atacc1 (a int primary key);
+alter table atacc1 add constraint atacc1_fkey foreign key (a) references atacc1 (a) not valid;
+alter table atacc1 validate constraint atacc1_fkey, alter A type bigint;
+drop table atacc1;
+
 -- something a little more complicated
 create table atacc1 ( test int, test2 int);
 -- add a primary key constraint
#3Simon Riggs
simon@2ndquadrant.com
In reply to: David Rowley (#2)
Re: ALTER TABLE validate foreign key dependency problem

On Sun, 12 Jul 2020 at 05:51, David Rowley <dgrowleyml@gmail.com> wrote:

I also considered that we could just delay all foreign key validations
until phase 3, but I ended up just doing then only when a rewrite is
pending.

I still wonder if it's best to delay the validation of the foreign key
regardless of if there's a pending table rewrite, but the patch as it
is now only delays if there's a pending rewrite.

Consistency seems the better choice, so I agree we should validate later in
all cases. Does changing that have any other effects?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
Mission Critical Databases

#4David Rowley
dgrowleyml@gmail.com
In reply to: Simon Riggs (#3)
Re: ALTER TABLE validate foreign key dependency problem

On Mon, 13 Jul 2020 at 08:13, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sun, 12 Jul 2020 at 05:51, David Rowley <dgrowleyml@gmail.com> wrote:

I also considered that we could just delay all foreign key validations
until phase 3, but I ended up just doing then only when a rewrite is
pending.

I still wonder if it's best to delay the validation of the foreign key
regardless of if there's a pending table rewrite, but the patch as it
is now only delays if there's a pending rewrite.

Consistency seems the better choice, so I agree we should validate later in all cases. Does changing that have any other effects?

Thanks for having a look here.

I looked at this again and noticed it wasn't just FOREIGN KEY
constraints. CHECK constraints were being validated at the wrong time
too.

I did end up going with unconditionally moving the validation until
phase 3. I've pushed fixed back to 9.5

David