ALTER TABLE ... REPLACE WITH

Started by Simon Riggsabout 15 years ago63 messages
#1Simon Riggs
simon@2ndQuadrant.com

There are various applications where we want to completely replace the
contents of a table with new/re-calculated data.

It seems fairly obvious to be able to do this like...
1. Prepare new data into "new_table" and build indexes
2. Swap old for new
BEGIN;
DROP TABLE "old_table";
ALTER TABLE "new_table" RENAME to "old_table";
COMMIT;

Step (2) works, but any people queuing to access the table will see
ERROR: could not open relation with OID xxxxx
What we need is a way to atomically replace the contents of a table
without receiving this error. (You can't use views).

What I propose is to write a function/command to allow this to be
explicitly achievable by the server.

ALTER TABLE "old_table"
REPLACE WITH "new_table";

This would do the following:
* Check that *content* definitions of old and new are the same
* Drop all old indexes
* Move new relfilenode into place
* Move all indexes from new to old (the set of indexes may change)
* All triggers, non-index constraints, defaults etc would remain same
* "new_table" is TRUNCATEd.

TRUNCATE already achieves something similar, and is equivalent to
REPLACE WITH an empty table, so we know it is possible. Obviously this
breaks MVCC, but the applications for this don't care.

Of course, as with all things, this can be done with a function and some
dodgy catalog updates. I'd rather avoid that and have this as a full
strength capability on the server, since it has a very wide range of
potential applications of use to all Postgres users.

Similar, though not inspired by EXCHANGE PARTITION in Oracle.

It looks a short project to me, just some checks and a few updates.

Objections?

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: ALTER TABLE ... REPLACE WITH

Simon Riggs <simon@2ndQuadrant.com> writes:

There are various applications where we want to completely replace the
contents of a table with new/re-calculated data.

It seems fairly obvious to be able to do this like...
1. Prepare new data into "new_table" and build indexes
2. Swap old for new
BEGIN;
DROP TABLE "old_table";
ALTER TABLE "new_table" RENAME to "old_table";
COMMIT;

Why not

BEGIN;
TRUNCATE TABLE;
... load new data ...
COMMIT;

What I propose is to write a function/command to allow this to be
explicitly achievable by the server.

ALTER TABLE "old_table"
REPLACE WITH "new_table";

I don't think the cost/benefit ratio of this is anywhere near as good
as you seem to think (ie, you're both underestimating the work involved
and overstating the benefit). I'm also noticing a lack of specification
as to trigger behavior, foreign keys, etc. The apparent intention to
disregard FKs entirely is particularly distressing,

regards, tom lane

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#2)
Re: ALTER TABLE ... REPLACE WITH

On Tue, 2010-12-14 at 13:54 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

There are various applications where we want to completely replace the
contents of a table with new/re-calculated data.

It seems fairly obvious to be able to do this like...
1. Prepare new data into "new_table" and build indexes
2. Swap old for new
BEGIN;
DROP TABLE "old_table";
ALTER TABLE "new_table" RENAME to "old_table";
COMMIT;

Why not

BEGIN;
TRUNCATE TABLE;
... load new data ...
COMMIT;

The above is atomic, but not fast.

The intention is to produce an atomic swap with as small a lock window
as possible, to allow it to happen in real operational systems.

At the moment we have a choice of fast or atomic. We need both.

(Note that there are 2 utilities that already do this, but the
operations aren't supported in core Postgres).

What I propose is to write a function/command to allow this to be
explicitly achievable by the server.

ALTER TABLE "old_table"
REPLACE WITH "new_table";

I don't think the cost/benefit ratio of this is anywhere near as good
as you seem to think (ie, you're both underestimating the work involved
and overstating the benefit). I'm also noticing a lack of specification
as to trigger behavior, foreign keys, etc. The apparent intention to
disregard FKs entirely is particularly distressing,

No triggers would be fired. All constraints that exist on "old_table"
must also exist on "new_table". As I said, lots of checks required, no
intention to add back doors.

("Disregard FKs" is the other project, not connected other than both are
operations on tables designed to improve manageability of large tables.)

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: ALTER TABLE ... REPLACE WITH

On Tue, Dec 14, 2010 at 1:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

BEGIN;
TRUNCATE TABLE;
... load new data ...
COMMIT;

Because then you have to take an AccessExclusiveLock on the target
table, of course.

If we had some kind of TRUNCATE CONCURRENTLY, I think that'd address a
large portion of the use case for the proposed feature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#4)
Re: ALTER TABLE ... REPLACE WITH

On 12/14/10 11:07 AM, Robert Haas wrote:

Because then you have to take an AccessExclusiveLock on the target
table, of course.

Well, you have to do that for DROP TABLE as well, and I don't see any
way around doing it for REPLACE WITH.

As for the utility of this command: there is no question that I would
use it. I'm not sure I like the syntax (I'd prefer REPLACE TABLE ____
WITH _____), but that's painting the bike shed. While the command may
appear frivolous and unnecessary syntactical ornamentation to some, I
have to say that doing the "table doesy-doe" which this command
addresses is something I have written scripts for on at least 50% of my
professional clients. It keeps coming up.

In order for REPLACE WITH to be really useful, though, we need a command
cloning at table design with *all* constraints, FKs, keys, and indexes.
Currently, I still don't think we have that ... do we?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: ALTER TABLE ... REPLACE WITH

On 14.12.2010 20:27, Simon Riggs wrote:

There are various applications where we want to completely replace the
contents of a table with new/re-calculated data.

It seems fairly obvious to be able to do this like...
1. Prepare new data into "new_table" and build indexes
2. Swap old for new
BEGIN;
DROP TABLE "old_table";
ALTER TABLE "new_table" RENAME to "old_table";
COMMIT;

Step (2) works, but any people queuing to access the table will see
ERROR: could not open relation with OID xxxxx

Could we make that work without error?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#5)
Re: ALTER TABLE ... REPLACE WITH

On Tue, Dec 14, 2010 at 2:34 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 12/14/10 11:07 AM, Robert Haas wrote:

Because then you have to take an AccessExclusiveLock on the target
table, of course.

Well, you have to do that for DROP TABLE as well, and I don't see any
way around doing it for REPLACE WITH.

Sure, but in Simon's proposal you can load the data FIRST and then
take a lock just long enough to do the swap. That's very different
from needing to hold the lock during the whole data load.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#6)
Re: ALTER TABLE ... REPLACE WITH

On Tue, 2010-12-14 at 21:35 +0200, Heikki Linnakangas wrote:

On 14.12.2010 20:27, Simon Riggs wrote:

There are various applications where we want to completely replace the
contents of a table with new/re-calculated data.

It seems fairly obvious to be able to do this like...
1. Prepare new data into "new_table" and build indexes
2. Swap old for new
BEGIN;
DROP TABLE "old_table";
ALTER TABLE "new_table" RENAME to "old_table";
COMMIT;

Step (2) works, but any people queuing to access the table will see
ERROR: could not open relation with OID xxxxx

Could we make that work without error?

Possibly, and good thinking, but its effectively the same patch, just
syntax free since we still need to do lots of checking to avoid swapping
oranges with lemons.

I prefer explicit syntax because its easier to be certain that you've
got it right.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#5)
Re: ALTER TABLE ... REPLACE WITH

On Tue, 2010-12-14 at 11:34 -0800, Josh Berkus wrote:

In order for REPLACE WITH to be really useful, though, we need a
command cloning at table design with *all* constraints, FKs, keys, and
indexes. Currently, I still don't think we have that ... do we?

Being able to vary the indexes when we REPLACE is a good feature.

We only need to check that datatypes and constraints match.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#5)
Re: ALTER TABLE ... REPLACE WITH

On Tue, 2010-12-14 at 11:34 -0800, Josh Berkus wrote:

As for the utility of this command: there is no question that I would
use it. I'm not sure I like the syntax (I'd prefer REPLACE TABLE ____
WITH _____), but that's painting the bike shed.

REPLACE TABLE ying WITH yang

is probably easier to implement than hacking at the ALTER TABLE code
mountain.

While the command may
appear frivolous and unnecessary syntactical ornamentation to some, I
have to say that doing the "table doesy-doe" which this command
addresses is something I have written scripts for on at least 50% of
my professional clients. It keeps coming up.

Yeh.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#11Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#9)
Re: ALTER TABLE ... REPLACE WITH

On 12/14/10 11:43 AM, Simon Riggs wrote:

On Tue, 2010-12-14 at 11:34 -0800, Josh Berkus wrote:

In order for REPLACE WITH to be really useful, though, we need a
command cloning at table design with *all* constraints, FKs, keys, and
indexes. Currently, I still don't think we have that ... do we?

Being able to vary the indexes when we REPLACE is a good feature.

We only need to check that datatypes and constraints match.

No, you're missing my point ... currently we don't have a command which
says "make an identical clone of this table". CREATE TABLE AS allows us
to copy all of the data for the table, but not the full table design.
CREATE TABLE LIKE gives us most of the design (although it still won't
copy FKs) but won't copy the data.

However, for the usual do-si-do case, you need to populate the data
using a query and not clone all the data. What you'd really need is
something like:

CREATE TABLE new_table LIKE old_table ( INCLUDING ALL ) FROM SELECT ...

.. which would create the base tabledef, copy in the data from the
query, and then apply all the constraints, indexes, defaults, etc.

Without some means of doing a clone of the table in a single command,
you've eliminated half the scripting work, but not helped at all with
the other half.

Actually, you know what would be ideal?

REPLACE TABLE old_table WITH SELECT ...

Give it some thought ...

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#11)
Re: ALTER TABLE ... REPLACE WITH

On Tue, 2010-12-14 at 16:19 -0800, Josh Berkus wrote:

Without some means of doing a clone of the table in a single command,
you've eliminated half the scripting work, but not helped at all with
the other half.

I'm not trying to eliminate scripting work, I'm trying to minimise the
lock window with a reliable and smooth atomic switcheroo.

Actually, you know what would be ideal?

REPLACE TABLE old_table WITH SELECT ...

Give it some thought ...

I have; the above would hold the lock window open while the SELECT runs
and that is explicitly something we are trying to avoid.

Good creative input though, thank you.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#13Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#12)
Re: ALTER TABLE ... REPLACE WITH

I have; the above would hold the lock window open while the SELECT runs
and that is explicitly something we are trying to avoid.

Not necessarily. You could copy into a temp table first, and then swap.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#14Csaba Nagy
ncslists@googlemail.com
In reply to: Robert Haas (#7)
Re: ALTER TABLE ... REPLACE WITH

On Tue, 2010-12-14 at 14:36 -0500, Robert Haas wrote:

Well, you have to do that for DROP TABLE as well, and I don't see any
way around doing it for REPLACE WITH.

Sure, but in Simon's proposal you can load the data FIRST and then
take a lock just long enough to do the swap. That's very different
from needing to hold the lock during the whole data load.

Except Simon's original proposal has this line in it:

* "new_table" is TRUNCATEd.

I guess Simon mixed up "new_table" and "old_table", and the one which
should get truncated is the replaced one and not the replacement,
otherwise it doesn't make sense to me.

BTW, I would have also used such a feature on multiple occasions in the
past and expect I would do in the future too.

Cheers,
Csaba.

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Csaba Nagy (#14)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2010-12-15 at 10:54 +0100, Csaba Nagy wrote:

On Tue, 2010-12-14 at 14:36 -0500, Robert Haas wrote:

Well, you have to do that for DROP TABLE as well, and I don't see any
way around doing it for REPLACE WITH.

Sure, but in Simon's proposal you can load the data FIRST and then
take a lock just long enough to do the swap. That's very different
from needing to hold the lock during the whole data load.

Except Simon's original proposal has this line in it:

* "new_table" is TRUNCATEd.

I guess Simon mixed up "new_table" and "old_table", and the one which
should get truncated is the replaced one and not the replacement,
otherwise it doesn't make sense to me.

What I meant was...

REPLACE TABLE target WITH source;

* target's old rows are discarded
* target's new rows are all of the rows from "source".
* source is then truncated, so ends up empty

Perhaps a more useful definition would be

EXCHANGE TABLE target WITH source;

which just swaps the heap and indexes of each table.
You can then use TRUNCATE if you want to actually destroy data.

I will go with that unless we have other objections.

BTW, I would have also used such a feature on multiple occasions in the
past and expect I would do in the future too.

Cheers,
Csaba.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#16Csaba Nagy
ncslists@googlemail.com
In reply to: Simon Riggs (#15)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2010-12-15 at 10:39 +0000, Simon Riggs wrote:

Perhaps a more useful definition would be

EXCHANGE TABLE target WITH source;

which just swaps the heap and indexes of each table.
You can then use TRUNCATE if you want to actually destroy data.

Yes please, that's exactly what I would have needed in many occasions.

But one problem would be when the replaced table is the _parent_ for a
foreign key relationship. I don't think you can have that constraint
pre-verified on the replacement table and simply replacing the content
could leave the child relations with orphans.

Cheers,
Csaba.

#17Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#15)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Dec 15, 2010 at 5:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Perhaps a more useful definition would be

EXCHANGE TABLE target WITH source;

which just swaps the heap and indexes of each table.
You can then use TRUNCATE if you want to actually destroy data.

I will go with that unless we have other objections.

I still don't see how that's going to work with foreign keys. If
there's a foreign key referencing the old table, there's no way to be
sure that all of those references are still going to be valid with
respect to the new table without a full-table check. And that seems
to defeat the purpose of the feature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18David Christensen
david@endpoint.com
In reply to: Simon Riggs (#15)
Re: ALTER TABLE ... REPLACE WITH

On Dec 15, 2010, at 4:39 AM, Simon Riggs wrote:

On Wed, 2010-12-15 at 10:54 +0100, Csaba Nagy wrote:

On Tue, 2010-12-14 at 14:36 -0500, Robert Haas wrote:

Well, you have to do that for DROP TABLE as well, and I don't see any
way around doing it for REPLACE WITH.

Sure, but in Simon's proposal you can load the data FIRST and then
take a lock just long enough to do the swap. That's very different
from needing to hold the lock during the whole data load.

Except Simon's original proposal has this line in it:

* "new_table" is TRUNCATEd.

I guess Simon mixed up "new_table" and "old_table", and the one which
should get truncated is the replaced one and not the replacement,
otherwise it doesn't make sense to me.

What I meant was...

REPLACE TABLE target WITH source;

* target's old rows are discarded
* target's new rows are all of the rows from "source".
* source is then truncated, so ends up empty

Perhaps a more useful definition would be

EXCHANGE TABLE target WITH source;

which just swaps the heap and indexes of each table.
You can then use TRUNCATE if you want to actually destroy data.

Are there any considerations with toast tables and the inline line pointers for toasted tuples?

Regards,

David
--
David Christensen
End Point Corporation
david@endpoint.com

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: David Christensen (#18)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2010-12-15 at 07:43 -0600, David Christensen wrote:

Are there any considerations with toast tables and the inline line pointers for toasted tuples?

Toast tables would be swapped as well. Toast pointers are only
applicable within a relfilenode, so we could not do otherwise.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Csaba Nagy (#16)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2010-12-15 at 12:17 +0100, Csaba Nagy wrote:

But one problem would be when the replaced table is the _parent_ for a
foreign key relationship. I don't think you can have that constraint
pre-verified on the replacement table and simply replacing the content
could leave the child relations with orphans.

Good point.

The only sensible way to handle this is by putting the FK checks into
check pending state (as discussed on a different thread).

We would probably need to disallow FKs with DELETE or UPDATE CASCADE
since it would be difficult to execute those.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#21Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#20)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Dec 15, 2010 at 10:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2010-12-15 at 12:17 +0100, Csaba Nagy wrote:

But one problem would be when the replaced table is the _parent_ for a
foreign key relationship. I don't think you can have that constraint
pre-verified on the replacement table and simply replacing the content
could leave the child relations with orphans.

Good point.

The only sensible way to handle this is by putting the FK checks into
check pending state (as discussed on a different thread).

We would probably need to disallow FKs with DELETE or UPDATE CASCADE
since it would be difficult to execute those.

I'm still wondering if TRUNCATE CONCURRENTLY would be a more elegant solution.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#21)
Re: ALTER TABLE ... REPLACE WITH

Heikki Linnakangas wrote:

On 14.12.2010 20:27, Simon Riggs wrote:

1. Prepare new data into "new_table" and build indexes
2. Swap old for new
BEGIN;
DROP TABLE "old_table";
ALTER TABLE "new_table" RENAME to "old_table";
COMMIT;

Step (2) works, but any people queuing to access the table
will see ERROR: could not open relation with OID xxxxx

Could we make that work without error?

Well, that worked better for us than building up the new
contents in a temporary table and doing the sequence Tom
suggests, but to eliminate the above error we had to do:

BEGIN;
ALTER TABLE "old_table" RENAME TO "dead_table";
ALTER TABLE "new_table" RENAME TO "old_table";
COMMIT;
-- Wait for all references to old OID to expire.
DROP TABLE "dead_table";

We don't put foreign keys on the table we do this with;
it's rebuilt from the related tables weekly....

-Kevin

#23Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#22)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Dec 15, 2010 at 11:30 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Heikki Linnakangas  wrote:

On 14.12.2010 20:27, Simon Riggs wrote:

1. Prepare new data into "new_table" and build indexes
2. Swap old for new
BEGIN;
DROP TABLE "old_table";
ALTER TABLE "new_table" RENAME to "old_table";
COMMIT;

Step (2) works, but any people queuing to access the table
will see ERROR: could not open relation with OID xxxxx

Could we make that work without error?

Well, that worked better for us than building up the new
contents in a temporary table and doing the sequence Tom
suggests, but to eliminate the above error we had to do:

BEGIN;
ALTER TABLE "old_table" RENAME TO "dead_table";
ALTER TABLE "new_table" RENAME TO "old_table";
COMMIT;
-- Wait for all references to old OID to expire.
DROP TABLE "dead_table";

Been there, done that. Didn't buy the post-card.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#24bricklen
bricklen@gmail.com
In reply to: Simon Riggs (#15)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Dec 15, 2010 at 2:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Perhaps a more useful definition would be

EXCHANGE TABLE target WITH source;

which just swaps the heap and indexes of each table.

At the risk of stating the obvious, this would work with partition exchange too?

#25Simon Riggs
simon@2ndQuadrant.com
In reply to: bricklen (#24)
Re: ALTER TABLE ... REPLACE WITH

On Thu, 2010-12-16 at 16:19 -0800, bricklen wrote:

On Wed, Dec 15, 2010 at 2:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Perhaps a more useful definition would be

EXCHANGE TABLE target WITH source;

which just swaps the heap and indexes of each table.

At the risk of stating the obvious, this would work with partition exchange too?

Yes, that is one use case.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#26Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#10)
1 attachment(s)
Re: ALTER TABLE ... REPLACE WITH

On Tue, 2010-12-14 at 19:48 +0000, Simon Riggs wrote:

On Tue, 2010-12-14 at 11:34 -0800, Josh Berkus wrote:

As for the utility of this command: there is no question that I would
use it. I'm not sure I like the syntax (I'd prefer REPLACE TABLE ____
WITH _____), but that's painting the bike shed.

REPLACE TABLE ying WITH yang

is probably easier to implement than hacking at the ALTER TABLE code
mountain.

While the command may
appear frivolous and unnecessary syntactical ornamentation to some, I
have to say that doing the "table doesy-doe" which this command
addresses is something I have written scripts for on at least 50% of
my professional clients. It keeps coming up.

Yeh.

Patch. Needs work.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

exchange.v1.patchtext/x-patch; charset=UTF-8; name=exchange.v1.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3bd565..671d2c3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8766,3 +8766,102 @@ AtEOSubXact_on_commit_actions(bool isCommit, SubTransactionId mySubid,
 		}
 	}
 }
+
+/*
+ * ExchangeTable
+ *		Swap the contents of two tables, after many checks.
+ *		We refer to the two tables as yin and yang to avoid confusion.
+ */
+void
+ExchangeTable(RangeVar *yin, RangeVar *yang)
+{
+	Oid			yinrelid, yangrelid;
+	Relation	yinrel, yangrel;
+
+	yinrel = heap_openrv(yin, AccessExclusiveLock);
+	yinrelid = RelationGetRelid(yinrel);
+
+	/* Checks on yin table */
+	if (yinrel->rd_rel->relkind != RELKIND_RELATION)
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("\"%s\" is not a table",
+						RelationGetRelationName(yinrel))));
+
+	if (!pg_class_ownercheck(yinrelid, GetUserId()))
+		aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
+					   RelationGetRelationName(yinrel));
+
+	if (IsSystemRelation(yinrel))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied: \"%s\" is a system catalog",
+						RelationGetRelationName(yinrel))));
+
+	yangrel = heap_openrv(yang, AccessExclusiveLock);
+	yangrelid = RelationGetRelid(yangrel);
+
+	/* Check on yin and yang are not the same */
+	if (yangrelid == yinrelid)
+		ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_TABLE),
+					errmsg("relation \"%s\" cannot be exchanged with itself",
+					RelationGetRelationName(yinrel))));
+
+	/* Checks on yin table */
+	if (yangrel->rd_rel->relkind != RELKIND_RELATION)
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("\"%s\" is not a table",
+						RelationGetRelationName(yangrel))));
+
+	if (!pg_class_ownercheck(yangrelid, GetUserId()))
+		aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
+					   RelationGetRelationName(yangrel));
+
+	if (IsSystemRelation(yangrel))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied: \"%s\" is a system catalog",
+						RelationGetRelationName(yangrel))));
+
+	/*
+	 * Don't allow exchange on temp tables of other backends ... their local
+	 * buffer manager is not going to cope.
+	 */
+	if (RELATION_IS_OTHER_TEMP(yinrel) ||
+		RELATION_IS_OTHER_TEMP(yangrel))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			  errmsg("cannot truncate temporary tables of other sessions")));
+
+	/*
+	 * Also check for active uses of the relation in the current transaction,
+	 * including open scans and pending AFTER trigger events.
+	 */
+	CheckTableNotInUse(yinrel, "EXCHANGE");
+	CheckTableNotInUse(yangrel, "EXCHANGE");
+
+	/*
+	 * Exchange table contents
+	 *
+	 * Swap heaps, toast tables, toast indexes
+	 * all forks
+	 * all indexes
+	 *
+	 * Checks:
+	 * * table definitions must match
+	 * * constraints must match
+	 * * indexes need not match
+	 * * outbound FKs don't need to match
+	 * * inbound FKs will be set to not validated
+	 *
+	 * No Trigger behaviour
+	 *
+	 * How is it WAL logged? By locks and underlying catalog updates
+	 */
+
+	/* keep our locks until commit */
+	heap_close(yangrel, NoLock);
+	heap_close(yinrel, NoLock);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 660947c..67bc432 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -201,7 +201,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
 		DropGroupStmt DropOpClassStmt DropOpFamilyStmt DropPLangStmt DropStmt
 		DropAssertStmt DropTrigStmt DropRuleStmt DropCastStmt DropRoleStmt
 		DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt
-		DropForeignServerStmt DropUserMappingStmt ExplainStmt FetchStmt
+		DropForeignServerStmt DropUserMappingStmt
+		ExchangeStmt ExplainStmt FetchStmt
 		GrantStmt GrantRoleStmt IndexStmt InsertStmt ListenStmt LoadStmt
 		LockStmt NotifyStmt ExplainableStmt PreparableStmt
 		CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
@@ -488,7 +489,7 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
 	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC
 	DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP
 
-	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT
+	EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT EXCHANGE
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT
 
 	FALSE_P FAMILY FETCH FIRST_P FLOAT_P FOLLOWING FOR FORCE FOREIGN FORWARD
@@ -730,6 +731,7 @@ stmt :
 			| DropUserStmt
 			| DropUserMappingStmt
 			| DropdbStmt
+			| ExchangeStmt
 			| ExecuteStmt
 			| ExplainStmt
 			| FetchStmt
@@ -6461,6 +6463,21 @@ AlterOwnerStmt: ALTER AGGREGATE func_name aggr_args OWNER TO RoleId
 				}
 		;
 
+/*****************************************************************************
+ *
+ *	EXCHANGE TABLE yin WITH yang
+ *
+ *****************************************************************************/
+
+ExchangeStmt: EXCHANGE TABLE relation_expr TO relation_expr
+				{
+					ExchangeStmt *n = makeNode(ExchangeStmt);
+					n->yin  = $3;
+					n->yang = $5;
+					$$ = (Node *)n;
+				}
+		;
+
 
 /*****************************************************************************
  *
@@ -11367,6 +11384,7 @@ unreserved_keyword:
 			| ENCRYPTED
 			| ENUM_P
 			| ESCAPE
+			| EXCHANGE
 			| EXCLUDE
 			| EXCLUDING
 			| EXCLUSIVE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 9500037..ed6d65f 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -767,6 +767,15 @@ standard_ProcessUtility(Node *parsetree,
 			}
 			break;
 
+		case T_ExchangeStmt:
+			{
+				ExchangeStmt *stmt = (ExchangeStmt *) parsetree;
+
+				ExchangeTable(stmt->yin, stmt->yang);
+				break;
+			}
+			break;
+
 		case T_AlterDomainStmt:
 			{
 				AlterDomainStmt *stmt = (AlterDomainStmt *) parsetree;
@@ -1872,6 +1881,10 @@ CreateCommandTag(Node *parsetree)
 			}
 			break;
 
+		case T_ExchangeStmt:
+			tag = "EXCHANGE";
+			break;
+
 		case T_AlterDomainStmt:
 			tag = "ALTER DOMAIN";
 			break;
@@ -2456,6 +2469,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_DDL;
 			break;
 
+		case T_ExchangeStmt:
+			lev = LOGSTMT_DDL;
+			break;
+
 		case T_AlterDomainStmt:
 			lev = LOGSTMT_DDL;
 			break;
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index d3deffb..bbb0488 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -69,4 +69,6 @@ extern void AtEOSubXact_on_commit_actions(bool isCommit,
 							  SubTransactionId mySubid,
 							  SubTransactionId parentSubid);
 
+extern void ExchangeTable(RangeVar *yin, RangeVar *yang);
+
 #endif   /* TABLECMDS_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 456e8e2..6e869b1 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -279,6 +279,7 @@ typedef enum NodeTag
 	T_DefineStmt,
 	T_DropStmt,
 	T_TruncateStmt,
+	T_ExchangeStmt,
 	T_CommentStmt,
 	T_FetchStmt,
 	T_IndexStmt,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 3d2ae99..a12eb97 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1183,6 +1183,17 @@ typedef struct AlterTableCmd	/* one subcommand of an ALTER TABLE */
 	bool		missing_ok;		/* skip error if missing? */
 } AlterTableCmd;
 
+/* ----------------------
+ *		Exchange Statement
+ * ----------------------
+ */
+typedef struct ExchangeStmt
+{
+	NodeTag		type;
+	ObjectType objectType;		/* OBJECT_TABLE, OBJECT_TYPE, etc */
+	RangeVar   *yin;
+	RangeVar   *yang;
+} ExchangeStmt;
 
 /* ----------------------
  *	Alter Domain
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 578d3cd..38ec11f 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -150,6 +150,7 @@ PG_KEYWORD("exclusive", EXCLUSIVE, UNRESERVED_KEYWORD)
 PG_KEYWORD("execute", EXECUTE, UNRESERVED_KEYWORD)
 PG_KEYWORD("exists", EXISTS, COL_NAME_KEYWORD)
 PG_KEYWORD("explain", EXPLAIN, UNRESERVED_KEYWORD)
+PG_KEYWORD("exchange", EXCHANGE, UNRESERVED_KEYWORD)
 PG_KEYWORD("external", EXTERNAL, UNRESERVED_KEYWORD)
 PG_KEYWORD("extract", EXTRACT, COL_NAME_KEYWORD)
 PG_KEYWORD("false", FALSE_P, RESERVED_KEYWORD)
#27Noah Misch
noah@leadboat.com
In reply to: Simon Riggs (#26)
1 attachment(s)
Re: ALTER TABLE ... REPLACE WITH

Hi Simon,

I'm reviewing this patch for CommitFest 2011-01.

On Sat, Jan 15, 2011 at 10:02:03PM +0000, Simon Riggs wrote:

On Tue, 2010-12-14 at 19:48 +0000, Simon Riggs wrote:

REPLACE TABLE ying WITH yang

Patch. Needs work.

First, I'd like to note that the thread for this patch had *four* "me-too"
responses to the use case. That's extremely unusual; the subject is definitely
compelling to people. It addresses the bad behavior of natural attempts to
atomically swap two tables in the namespace:

psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')"
psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
sleep 1 # give prev time to take AccessShareLock

# Do it this way, and the next SELECT gets data from the old table.
#psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' &
# Do it this way, and get: ERROR: could not open relation with OID 41380
psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' &

psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'.
psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'

by letting you do this instead:

psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')"
psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
sleep 1 # give prev time to take AccessShareLock

psql -c 'EXCHANGE TABLE new_t TO t &

psql -c 'SELECT * FROM t' # I get 'new', finally!
psql -c 'DROP TABLE IF EXISTS t, new_t'

I find Heikki's (4D07C6EC.2030200@enterprisedb.com) suggestion from the thread
interesting: can we just make the first example work? Even granting that the
second syntax may be a useful addition, the existing behavior of the first
example is surely worthless, even actively harmful. I tossed together a
proof-of-concept patch, attached, that makes the first example DTRT. Do you see
any value in going down that road?

As for your patch itself:

+	/*
+	 * Exchange table contents
+	 *
+	 * Swap heaps, toast tables, toast indexes
+	 * all forks
+	 * all indexes

For indexes, would you basically swap yin<->yang in pg_index.indrelid, update
pg_class.relnamespace as needed, and check for naming conflicts (when yin and
yang differ in schema)? Is there more?

+	 *
+	 * Checks:
+	 * * table definitions must match

Is there a particular reason to require this, or is it just a simplification to
avoid updating things to match?

+ * * constraints must match

Wouldn't CHECK constraints have no need to match?

+	 * * indexes need not match
+	 * * outbound FKs don't need to match
+	 * * inbound FKs will be set to not validated

We would need to ensure that inbound FOREIGN KEY constraints still have indexes
suitable to implement them. The easiest thing there might be to internally drop
and recreate the constraint, so we get all that verification.

+	 *
+	 * No Trigger behaviour
+	 *
+	 * How is it WAL logged? By locks and underlying catalog updates
+	 */

I see that the meat of the patch is yet to be written, so this ended up as more
of a design review based on your in-patch comments. Hopefully it's of some
value. I'll go ahead and mark the patch Returned with Feedback.

Thanks,
nm

Attachments:

atomic-openrv-poc.patchtext/plain; charset=us-asciiDownload
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 970,995 **** relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
  	Oid			relOid;
  
! 	/*
! 	 * Check for shared-cache-inval messages before trying to open the
! 	 * relation.  This is needed to cover the case where the name identifies a
! 	 * rel that has been dropped and recreated since the start of our
! 	 * transaction: if we don't flush the old syscache entry then we'll latch
! 	 * onto that entry and suffer an error when we do RelationIdGetRelation.
! 	 * Note that relation_open does not need to do this, since a relation's
! 	 * OID never changes.
! 	 *
! 	 * We skip this if asked for NoLock, on the assumption that the caller has
! 	 * already ensured some appropriate lock is held.
! 	 */
! 	if (lockmode != NoLock)
! 		AcceptInvalidationMessages();
! 
! 	/* Look up the appropriate relation using namespace search */
! 	relOid = RangeVarGetRelid(relation, false);
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 970,980 ----
  {
  	Oid			relOid;
  
! 	/* Look up and lock the appropriate relation using namespace search */
! 	relOid = RangeVarLockRelid(relation, lockmode, false);
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, NoLock);
  }
  
  /* ----------------
***************
*** 1005,1034 **** try_relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
  	Oid			relOid;
  
! 	/*
! 	 * Check for shared-cache-inval messages before trying to open the
! 	 * relation.  This is needed to cover the case where the name identifies a
! 	 * rel that has been dropped and recreated since the start of our
! 	 * transaction: if we don't flush the old syscache entry then we'll latch
! 	 * onto that entry and suffer an error when we do RelationIdGetRelation.
! 	 * Note that relation_open does not need to do this, since a relation's
! 	 * OID never changes.
! 	 *
! 	 * We skip this if asked for NoLock, on the assumption that the caller has
! 	 * already ensured some appropriate lock is held.
! 	 */
! 	if (lockmode != NoLock)
! 		AcceptInvalidationMessages();
! 
! 	/* Look up the appropriate relation using namespace search */
! 	relOid = RangeVarGetRelid(relation, true);
  
  	/* Return NULL on not-found */
  	if (!OidIsValid(relOid))
  		return NULL;
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 990,1004 ----
  {
  	Oid			relOid;
  
! 	/* Look up and lock the appropriate relation using namespace search */
! 	relOid = RangeVarLockRelid(relation, lockmode, true);
  
  	/* Return NULL on not-found */
  	if (!OidIsValid(relOid))
  		return NULL;
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, NoLock);
  }
  
  /* ----------------
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
***************
*** 42,47 ****
--- 42,48 ----
  #include "parser/parse_func.h"
  #include "storage/backendid.h"
  #include "storage/ipc.h"
+ #include "storage/lmgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/guc.h"
***************
*** 282,287 **** RangeVarGetRelid(const RangeVar *relation, bool failOK)
--- 283,340 ----
  }
  
  /*
+  * RangeVarLockRelid
+  *		Like RangeVarGetRelid, but simulatenously acquire the specified lock on
+  *		the relation.  This works properly in the face of concurrent DDL that
+  *		may drop, create or rename relations.
+  *
+  * If the relation is not found and failOK = true, take no lock and return
+  * InvalidOid.  Otherwise, raise an error.
+  */
+ Oid
+ RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+ 				  bool failOK)
+ {
+ 	Oid			relOid1,
+ 				relOid2;
+ 
+ 	/*
+ 	 * First attempt.  If the caller requested NoLock, it already acquired an
+ 	 * appropriate lock and has called AcceptInvalidationMessages() since doing
+ 	 * so.  In this case, our first search is always correct, and we degenerate
+ 	 * to behave exactly like RangeVarGetRelid().
+ 	 */
+ 	relOid1 = RangeVarGetRelid(relation, failOK);
+ 	if (lockmode == NoLock)
+ 		return relOid1;
+ 
+ 	/*
+ 	 * By the time we acquire the lock, our RangeVar may denote a different
+ 	 * relation or no relation at all.  In particular, this can happen when the
+ 	 * lock acquisition blocks on a transaction performing DROP or ALTER TABLE
+ 	 * RENAME.  However, once and while we do hold a lock of any level, we can
+ 	 * count on the name correspondence remaining stable.
+ 	 */
+ 	do
+ 	{
+ 		/* Not-found is always final. */
+ 		if (!OidIsValid(relOid1))
+ 			return relOid1;
+ 
+ 		LockRelationOid(relOid1, lockmode);
+ 
+ 		/* Make recent DDL effects visible.  Names are stable; search again. */
+ 		AcceptInvalidationMessages();
+ 		relOid2 = relOid1;
+ 		relOid1 = RangeVarGetRelid(relation, failOK);
+ 
+ 		/* Done when our RangeVar denotes the same relation we locked. */
+ 	} while (relOid1 != relOid2);
+ 
+ 	return relOid1;
+ }
+ 
+ /*
   * RangeVarGetCreationNamespace
   *		Given a RangeVar describing a to-be-created relation,
   *		choose which namespace to create it in.
*** a/src/include/catalog/namespace.h
--- b/src/include/catalog/namespace.h
***************
*** 48,53 **** typedef struct OverrideSearchPath
--- 48,55 ----
  
  
  extern Oid	RangeVarGetRelid(const RangeVar *relation, bool failOK);
+ extern Oid	RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+ 				  bool failOK);
  extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
  extern Oid	RelnameGetRelid(const char *relname);
  extern bool RelationIsVisible(Oid relid);
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Noah Misch (#27)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

I'll go ahead and mark the patch Returned with Feedback.

My understanding of the meaning of that is polite rejection. If you do
that there is no further author comment and we move to July 2011. That
then also rejects your own patch with what you say is an alternative
implementation...

Is that what you wish? That isn't what I wish, either way. I suggest you
mark it Waiting on Author, so we can discuss it further.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#29Noah Misch
noah@leadboat.com
In reply to: Simon Riggs (#28)
Re: ALTER TABLE ... REPLACE WITH

On Thu, Jan 20, 2011 at 12:57:23AM +0000, Simon Riggs wrote:

On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

I'll go ahead and mark the patch Returned with Feedback.

My understanding of the meaning of that is polite rejection. If you do
that there is no further author comment and we move to July 2011. That
then also rejects your own patch with what you say is an alternative
implementation...

Is that what you wish? That isn't what I wish, either way. I suggest you
mark it Waiting on Author, so we can discuss it further.

Before I consider my wishes too carefully, I've moved the patch back to Waiting
on Author, for the reason that it seems wrong to force it elsewhere today as
long as the author (you) would like it there. Not that I have some kind of
authority over patch disposition in any case.

I had put it straight to RWF because it seemed clearly not intended to be
applied. No political statement or harm intended, and maybe that determination
was not even correct.

nm

#30Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#28)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Jan 19, 2011 at 7:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

I'll go ahead and mark the patch Returned with Feedback.

My understanding of the meaning of that is polite rejection. If you do
that there is no further author comment and we move to July 2011. That
then also rejects your own patch with what you say is an alternative
implementation...

Is that what you wish? That isn't what I wish, either way. I suggest you
mark it Waiting on Author, so we can discuss it further.

Simon,

I have no idea what you're talking about here. It is entirely fitting
and appropriate to reject a patch the guts of which have not been
written four days into the final CommitFest. Doing so does not
somehow reject Noah's patches, which stand or fall on their own
merits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#31Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#30)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Jan 19, 2011 at 08:55:22PM -0500, Robert Haas wrote:

On Wed, Jan 19, 2011 at 7:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

I'll go ahead and mark the patch Returned with Feedback.

My understanding of the meaning of that is polite rejection. If you do
that there is no further author comment and we move to July 2011. That
then also rejects your own patch with what you say is an alternative
implementation...

Is that what you wish? That isn't what I wish, either way. I suggest you
mark it Waiting on Author, so we can discuss it further.

Simon,

I have no idea what you're talking about here. It is entirely fitting
and appropriate to reject a patch the guts of which have not been
written four days into the final CommitFest. Doing so does not
somehow reject Noah's patches, which stand or fall on their own
merits.

I think Simon was referring to the proof-of-concept sketch I had included with
my review.

#32Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#31)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Jan 19, 2011 at 8:57 PM, Noah Misch <noah@leadboat.com> wrote:

I think Simon was referring to the proof-of-concept sketch I had included with
my review.

I think it's a bit late to be turning proofs-of-concept into code at
this point, no matter who came up with them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#32)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2011-01-19 at 21:01 -0500, Robert Haas wrote:

On Wed, Jan 19, 2011 at 8:57 PM, Noah Misch <noah@leadboat.com> wrote:

I think Simon was referring to the proof-of-concept sketch I had included with
my review.

I think it's a bit late to be turning proofs-of-concept into code at
this point, no matter who came up with them.

Noah's patch is trivial, as are the changes to make mine work fully.
Neither can be achieved barring sensible review.

This topic delivers important functionality. I think it's more important
than simply who gets the credit.

I'm not sure yet where to go, but we have viable options yet for this
release.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#34Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#33)
Re: ALTER TABLE ... REPLACE WITH

On Wed, Jan 19, 2011 at 9:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Noah's patch is trivial, as are the changes to make mine work fully.

I dispute that. In particular:

+	/*
+	 * Exchange table contents
+	 *
+	 * Swap heaps, toast tables, toast indexes
+	 * all forks
+	 * all indexes
+	 *
+	 * Checks:
+	 * * table definitions must match
+	 * * constraints must match
+	 * * indexes need not match
+	 * * outbound FKs don't need to match
+	 * * inbound FKs will be set to not validated
+	 *
+	 * No Trigger behaviour
+	 *
+	 * How is it WAL logged? By locks and underlying catalog updates
+	 */

That's another way of saying "the patch is not anywhere close to being done".

Neither can be achieved barring sensible review.

I think Noah posted a very nice review.

This topic delivers important functionality. I think it's more important
than simply who gets the credit.

This is not about credit. I like credit as much as the next guy, but
this is about the fact that there was a deadline for this CommitFest,
and that deadline is now in the past, and this patch is not in a state
to be reviewed. The CommitFest deadline is not a deadline by which
you much post something; it's a deadline by which you much post
something that is reasonably close to being committable, or at least
reviewable. That's obviously not the case here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#35Simon Riggs
simon@2ndQuadrant.com
In reply to: Noah Misch (#27)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

First, I'd like to note that the thread for this patch had *four* "me-too"
responses to the use case. That's extremely unusual; the subject is definitely
compelling to people. It addresses the bad behavior of natural attempts to
atomically swap two tables in the namespace:

psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')"
psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
sleep 1 # give prev time to take AccessShareLock

# Do it this way, and the next SELECT gets data from the old table.
#psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' &
# Do it this way, and get: ERROR: could not open relation with OID 41380
psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' &

psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'.
psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'

by letting you do this instead:

psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')"
psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
sleep 1 # give prev time to take AccessShareLock

psql -c 'EXCHANGE TABLE new_t TO t &

psql -c 'SELECT * FROM t' # I get 'new', finally!
psql -c 'DROP TABLE IF EXISTS t, new_t'

I find Heikki's (4D07C6EC.2030200@enterprisedb.com) suggestion from the thread
interesting: can we just make the first example work? Even granting that the
second syntax may be a useful addition, the existing behavior of the first
example is surely worthless, even actively harmful. I tossed together a
proof-of-concept patch, attached, that makes the first example DTRT. Do you see
any value in going down that road?

As I said previously on the thread you quote, having this happen
implicitly is not a good thing, and IMHO, definitely not "the right
thing".

Heikki's suggestion, and your patch, contain no checking to see whether
the old and new tables are similar. If they are not similar then we have
all the same problems raised by my patch. SQL will suddenly fail because
columns have ceased to exist, FKs suddenly disappear etc..

I don't see how having a patch helps at all. I didn't think it was the
right way before you wrote it and I still disagree now you've written
it.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#36Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#34)
Re: ALTER TABLE ... REPLACE WITH

On Wed, 2011-01-19 at 22:16 -0500, Robert Haas wrote:

That's another way of saying "the patch is not anywhere close to being done".

My patch is materially incomplete. Certainly we may see that as grounds
for rejection, which I would not and could not argue with. It is a
popular feature, so I submitted anyway.

When I said Noah's patch was trivial, I was referring to the amount of
work expended on it so far; no insult intended. I think the amount of
code to finish either is fairly low as well.

If we wish to continue in this release then we must decide how. What I
was trying to indicate in my earlier comments was that my focus is on
achieving the required functionality in this release, or put another
way, I would accept Noah's patch rather than end with nothing.

The main requirement, as I see it, is error checking. We need to do the
same checking however we do it; neither patch currently does it.

If Noah's patch had error checking, then it would at least be safe to
recommend people do that. Then it is a simple matter of whether we think
implicit is OK, or whether it needs an explicit command. My patch does
it explicitly because that was the consensus from the earlier
discussion; I am in favour of the explicit route which is why I wrote
the patch that way, not because I wrote it that way.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#37Noah Misch
noah@leadboat.com
In reply to: Simon Riggs (#35)
Re: ALTER TABLE ... REPLACE WITH

On Thu, Jan 20, 2011 at 10:07:23AM +0000, Simon Riggs wrote:

On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

First, I'd like to note that the thread for this patch had *four* "me-too"
responses to the use case. That's extremely unusual; the subject is definitely
compelling to people. It addresses the bad behavior of natural attempts to
atomically swap two tables in the namespace:

psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')"
psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
sleep 1 # give prev time to take AccessShareLock

# Do it this way, and the next SELECT gets data from the old table.
#psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' &
# Do it this way, and get: ERROR: could not open relation with OID 41380
psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' &

psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'.
psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'

by letting you do this instead:

psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')"
psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
sleep 1 # give prev time to take AccessShareLock

psql -c 'EXCHANGE TABLE new_t TO t &

psql -c 'SELECT * FROM t' # I get 'new', finally!
psql -c 'DROP TABLE IF EXISTS t, new_t'

I find Heikki's (4D07C6EC.2030200@enterprisedb.com) suggestion from the thread
interesting: can we just make the first example work? Even granting that the
second syntax may be a useful addition, the existing behavior of the first
example is surely worthless, even actively harmful. I tossed together a
proof-of-concept patch, attached, that makes the first example DTRT. Do you see
any value in going down that road?

As I said previously on the thread you quote, having this happen
implicitly is not a good thing, and IMHO, definitely not "the right
thing".

When DDL has taken AccessExclusiveLock and a query waits for it, it's the Right
Thing for that query to wake up and proceed based on the complete, final state
of that committed DDL. Aside from the waiting itself, the query should behave
as though it started after the DDL completed.

In my example, the SELECT silently reads data from a table named "old_t". What
if that were an INSERT? The data falls in the wrong table.

Heikki's suggestion, and your patch, contain no checking to see whether
the old and new tables are similar. If they are not similar then we have
all the same problems raised by my patch. SQL will suddenly fail because
columns have ceased to exist, FKs suddenly disappear etc..

Indeed, Heikki's suggestion and my patch would not do such verification. I
can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP
...; CREATE ...; than we allow today. Those need to stay open-ended, with the
user responsible for choosing well. So, what's the right concurrent behavior
around use of those statements? I answer that above.

That said, I see utility in a feature that compares two tables, swaps them if
similar, and fixes up foreign keys. Having such a feature does not justify
wrong concurrent behavior around ALTER TABLE RENAME. Having right concurrent
behavior around ALTER TABLE RENAME does not remove the utility of this feature.
We should do both.

I don't see how having a patch helps at all. I didn't think it was the
right way before you wrote it and I still disagree now you've written
it.

Perhaps it helped me more than anyone else, and I should have kept it to myself.
Heikki's suggestion seemed straightforward, so much so that I couldn't figure
why nobody had done it. That would usually mean I'm missing something. So, I
implemented it in a effort to discover what I had missed, failing to do so.
Then, I sent it with the review in case you might spot what I had missed.
Failure to add some kind of table similarity check was intentional, per above.

nm

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#37)
Re: ALTER TABLE ... REPLACE WITH

Noah Misch <noah@leadboat.com> writes:

Heikki's suggestion seemed straightforward, so much so that I couldn't figure
why nobody had done it. That would usually mean I'm missing something.

If you're willing to substitute an incompatible table, it's not clear
why you don't just do

begin;
drop table t;
alter table t_new rename to t;
commit;

There are some implementation issues with this: concurrent accesses are
likely to end up failing with "relation with OID nnn doesn't exist",
because backends translate the table's name to OID before acquiring
lock. But you'd have to solve those issues anyway to make an ALTER
REPLACE WITH work as transparently as you seem to hope it would.
Unless the idea here is to also have t_new acquire t's OID, and that
is an absolute complete won't-happen if you're not enforcing a pretty
thorough level of compatibility between the two tables.

regards, tom lane

#39Simon Riggs
simon@2ndQuadrant.com
In reply to: Noah Misch (#37)
Re: ALTER TABLE ... REPLACE WITH

On Thu, 2011-01-20 at 13:14 -0500, Noah Misch wrote:

When DDL has taken AccessExclusiveLock and a query waits for it, it's the Right
Thing for that query to wake up and proceed based on the complete, final state
of that committed DDL. Aside from the waiting itself, the query should behave
as though it started after the DDL completed.

In my example, the SELECT silently reads data from a table named "old_t". What
if that were an INSERT? The data falls in the wrong table.

Heikki's suggestion, and your patch, contain no checking to see whether
the old and new tables are similar. If they are not similar then we have
all the same problems raised by my patch. SQL will suddenly fail because
columns have ceased to exist, FKs suddenly disappear etc..

Indeed, Heikki's suggestion and my patch would not do such verification. I
can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP
...; CREATE ...; than we allow today. Those need to stay open-ended, with the
user responsible for choosing well. So, what's the right concurrent behavior
around use of those statements? I answer that above.

That said, I see utility in a feature that compares two tables, swaps them if
similar, and fixes up foreign keys. Having such a feature does not justify
wrong concurrent behavior around ALTER TABLE RENAME. Having right concurrent
behavior around ALTER TABLE RENAME does not remove the utility of this feature.
We should do both.

I agree that the DDL behaviour is wrong and should be fixed. Thank you
for championing that alternative view.

Swapping based upon names only works and is very flexible, much more so
than EXCHANGE could be.

A separate utility might be worth it, but the feature set of that should
be defined in terms of correctly-working DDL behaviour. It's possible
that no further requirement exists. I remove my own patch from
consideration for this release.

I'll review your patch and commit it, problems or objections excepted. I
haven't looked at it in any detail.

Having said that, writing the patch did nothing to convince me this was
the correct approach. Reviews should be reviews, they are not an
opportunity to provide your own alternate version of a patch. That just
confuses things and creates a competitive, not a cooperative
environment. Authors do need to listen to reviewers, so I hope I'm
demonstrating that here.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#40Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#38)
Re: ALTER TABLE ... REPLACE WITH

On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Noah Misch <noah@leadboat.com> writes:

Heikki's suggestion seemed straightforward, so much so that I couldn't figure
why nobody had done it.  That would usually mean I'm missing something.

If you're willing to substitute an incompatible table, it's not clear
why you don't just do

               begin;
               drop table t;
               alter table t_new rename to t;
               commit;

Because the whole source of this problem is dependency hell.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#41Noah Misch
noah@leadboat.com
In reply to: Simon Riggs (#39)
Re: ALTER TABLE ... REPLACE WITH

On Thu, Jan 20, 2011 at 09:36:11PM +0000, Simon Riggs wrote:

I agree that the DDL behaviour is wrong and should be fixed. Thank you
for championing that alternative view.

Swapping based upon names only works and is very flexible, much more so
than EXCHANGE could be.

A separate utility might be worth it, but the feature set of that should
be defined in terms of correctly-working DDL behaviour. It's possible
that no further requirement exists. I remove my own patch from
consideration for this release.

I'll review your patch and commit it, problems or objections excepted. I
haven't looked at it in any detail.

Thanks. I wouldn't be very surprised if that patch is even the wrong way to
achieve these semantics, but it's great that we're on the same page as to which
semantics they are.

Having said that, writing the patch did nothing to convince me this was
the correct approach. Reviews should be reviews, they are not an
opportunity to provide your own alternate version of a patch. That just
confuses things and creates a competitive, not a cooperative
environment. Authors do need to listen to reviewers, so I hope I'm
demonstrating that here.

Understood. I can see now that posting a second code patch, however framed, in
the same thread creates a presumption of aggression that is difficult to dispel.
I will have a lot to think about before doing that again. Thanks for giving
this discussion, which started poorly due to my actions, a second chance.

nm

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#40)
Re: ALTER TABLE ... REPLACE WITH

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you're willing to substitute an incompatible table, it's not clear
why you don't just do

begin;
drop table t;
alter table t_new rename to t;
commit;

Because the whole source of this problem is dependency hell.

Well, if you want to preserve dependencies, you can *not* just blindly
substitute an incompatible table. You must ensure that views and
foreign keys referencing the table are still valid. So I'm not sure
where anybody got the idea that an implementation that fails to check
all that is even worth presenting.

regards, tom lane

#43Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#36)
Re: ALTER TABLE ... REPLACE WITH

On Thu, Jan 20, 2011 at 11:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2011-01-19 at 22:16 -0500, Robert Haas wrote:

That's another way of saying "the patch is not anywhere close to being done".

My patch is materially incomplete. Certainly we may see that as grounds
for rejection, which I would not and could not argue with. It is a
popular feature, so I submitted anyway.

I wouldn't say rejection per se - but I would definitely say push it out to 9.2.

When I said Noah's patch was trivial, I was referring to the amount of
work expended on it so far; no insult intended. I think the amount of
code to finish either is fairly low as well.

If we wish to continue in this release then we must decide how. What I
was trying to indicate in my earlier comments was that my focus is on
achieving the required functionality in this release, or put another
way, I would accept Noah's patch rather than end with nothing.

The main requirement, as I see it, is error checking. We need to do the
same checking however we do it; neither patch currently does it.

If Noah's patch had error checking, then it would at least be safe to
recommend people do that. Then it is a simple matter of whether we think
implicit is OK, or whether it needs an explicit command. My patch does
it explicitly because that was the consensus from the earlier
discussion; I am in favour of the explicit route which is why I wrote
the patch that way, not because I wrote it that way.

I'm not too sure I understand what you mean in saying that Noah's
patch is "implicit"...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#44Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#41)
Re: ALTER TABLE ... REPLACE WITH

On Thu, Jan 20, 2011 at 6:19 PM, Noah Misch <noah@leadboat.com> wrote:

On Thu, Jan 20, 2011 at 09:36:11PM +0000, Simon Riggs wrote:

I agree that the DDL behaviour is wrong and should be fixed. Thank you
for championing that alternative view.

Swapping based upon names only works and is very flexible, much more so
than EXCHANGE could be.

A separate utility might be worth it, but the feature set of that should
be defined in terms of correctly-working DDL behaviour. It's possible
that no further requirement exists. I remove my own patch from
consideration for this release.

I'll review your patch and commit it, problems or objections excepted. I
haven't looked at it in any detail.

Thanks.  I wouldn't be very surprised if that patch is even the wrong way to
achieve these semantics, but it's great that we're on the same page as to which
semantics they are.

I think Noah's patch is a not a good idea, because it will result in
calling RangeVarGetRelid twice even in the overwhelmingly common case
where no relevant invalidation occurs. That'll add several syscache
lookups per table to very common operations.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#45Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#39)
Re: ALTER TABLE ... REPLACE WITH

On Thu, 2011-01-20 at 21:36 +0000, Simon Riggs wrote:

I'll review your patch and commit it, problems or objections excepted.

Tom's comments elsewhere prevent me from committing.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#46Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#44)
2 attachment(s)
Make relation_openrv atomic wrt DDL

On Thu, Jan 20, 2011 at 09:48:12PM -0500, Robert Haas wrote:

On Thu, Jan 20, 2011 at 6:19 PM, Noah Misch <noah@leadboat.com> wrote:

On Thu, Jan 20, 2011 at 09:36:11PM +0000, Simon Riggs wrote:

I agree that the DDL behaviour is wrong and should be fixed. Thank you
for championing that alternative view.

Swapping based upon names only works and is very flexible, much more so
than EXCHANGE could be.

A separate utility might be worth it, but the feature set of that should
be defined in terms of correctly-working DDL behaviour. It's possible
that no further requirement exists. I remove my own patch from
consideration for this release.

I'll review your patch and commit it, problems or objections excepted. I
haven't looked at it in any detail.

Thanks. ?I wouldn't be very surprised if that patch is even the wrong way to
achieve these semantics, but it's great that we're on the same page as to which
semantics they are.

I think Noah's patch is a not a good idea, because it will result in
calling RangeVarGetRelid twice even in the overwhelmingly common case
where no relevant invalidation occurs. That'll add several syscache
lookups per table to very common operations.

Valid concern.

[Refresher: this was a patch to improve behavior for this test case:

psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')"
psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
sleep 1 # give prev time to take AccessShareLock

# Do it this way, and the next SELECT gets data from the old table.
psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' &
# Do it this way, and get: ERROR: could not open relation with OID 41380
#psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' &

psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'.
psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'

It did so by rechecking the RangeVar->oid resolution after locking the found
relation, by which time concurrent DDL could no longer be interfering.]

I benchmarked the original patch with this function:

Datum
nmtest(PG_FUNCTION_ARGS)
{
int32 n = PG_GETARG_INT32(0);
int i;
RangeVar *rv = makeRangeVar(NULL, "pg_am", 0);

for (i = 0; i < n; ++i)
{
Relation r = relation_openrv(rv, AccessShareLock);
relation_close(r, AccessShareLock);
}
PG_RETURN_INT32(4);
}

(Releasing the lock before transaction end makes for an unrealistic benchmark,
but so would opening the same relation millions of times in a single
transaction. I'm trying to isolate the cost that would be spread over
millions of transactions opening relations a handful of times. See attached
shar archive for a complete extension wrapping that test function.)

Indeed, the original patch slowed it by about 50%. I improved the patch,
adding a global SharedInvalidMessageCounter to increment as we process
messages. If this counter does not change between the RangeVarGetRelid() call
and the post-lock AcceptInvalidationMessages() call, we can skip the second
RangeVarGetRelid() call. With the updated patch, I get these timings (in ms)
for runs of "SELECT nmtest(10000000)":

master: 19697.642, 20087.477, 19748.995
patched: 19723.716, 19879.538, 20257.671

In other words, no significant difference. Since the patch removes the
no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to
"relation_close(r, NoLock)" in the test case actually reveals a 15%
performance improvement. Granted, nothing to get excited about in light of
the artificiality.

This semantic improvement would be hard to test with the current pg_regress
suite, so I do not include any test case addition in the patch. If
sufficiently important, it could be done with isolationtester.

Thanks,
nm

Attachments:

atomic-openrv-v2.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 01a492e..63537fd 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 979,1004 **** relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
  	Oid			relOid;
  
! 	/*
! 	 * Check for shared-cache-inval messages before trying to open the
! 	 * relation.  This is needed to cover the case where the name identifies a
! 	 * rel that has been dropped and recreated since the start of our
! 	 * transaction: if we don't flush the old syscache entry then we'll latch
! 	 * onto that entry and suffer an error when we do RelationIdGetRelation.
! 	 * Note that relation_open does not need to do this, since a relation's
! 	 * OID never changes.
! 	 *
! 	 * We skip this if asked for NoLock, on the assumption that the caller has
! 	 * already ensured some appropriate lock is held.
! 	 */
! 	if (lockmode != NoLock)
! 		AcceptInvalidationMessages();
! 
! 	/* Look up the appropriate relation using namespace search */
! 	relOid = RangeVarGetRelid(relation, false);
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 979,989 ----
  {
  	Oid			relOid;
  
! 	/* Look up and lock the appropriate relation using namespace search */
! 	relOid = RangeVarLockRelid(relation, lockmode, false);
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, NoLock);
  }
  
  /* ----------------
***************
*** 1014,1043 **** try_relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
  	Oid			relOid;
  
! 	/*
! 	 * Check for shared-cache-inval messages before trying to open the
! 	 * relation.  This is needed to cover the case where the name identifies a
! 	 * rel that has been dropped and recreated since the start of our
! 	 * transaction: if we don't flush the old syscache entry then we'll latch
! 	 * onto that entry and suffer an error when we do RelationIdGetRelation.
! 	 * Note that relation_open does not need to do this, since a relation's
! 	 * OID never changes.
! 	 *
! 	 * We skip this if asked for NoLock, on the assumption that the caller has
! 	 * already ensured some appropriate lock is held.
! 	 */
! 	if (lockmode != NoLock)
! 		AcceptInvalidationMessages();
! 
! 	/* Look up the appropriate relation using namespace search */
! 	relOid = RangeVarGetRelid(relation, true);
  
  	/* Return NULL on not-found */
  	if (!OidIsValid(relOid))
  		return NULL;
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 999,1013 ----
  {
  	Oid			relOid;
  
! 	/* Look up and lock the appropriate relation using namespace search */
! 	relOid = RangeVarLockRelid(relation, lockmode, true);
  
  	/* Return NULL on not-found */
  	if (!OidIsValid(relOid))
  		return NULL;
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, NoLock);
  }
  
  /* ----------------
diff --git a/src/backend/catalog/namespindex 41e9299..771976b 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
***************
*** 44,49 ****
--- 44,51 ----
  #include "parser/parse_func.h"
  #include "storage/backendid.h"
  #include "storage/ipc.h"
+ #include "storage/lmgr.h"
+ #include "storage/sinval.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/guc.h"
***************
*** 285,290 **** RangeVarGetRelid(const RangeVar *relation, bool failOK)
--- 287,358 ----
  }
  
  /*
+  * RangeVarLockRelid
+  *		Like RangeVarGetRelid, but simulatenously acquire the specified lock on
+  *		the relation.  This works properly in the face of concurrent DDL that
+  *		may drop, create or rename relations.
+  *
+  * If the relation is not found and failOK = true, take no lock and return
+  * InvalidOid.  Otherwise, raise an error.
+  */
+ Oid
+ RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+ 				  bool failOK)
+ {
+ 	int			lastCounter;
+ 	Oid			relOid1,
+ 				relOid2;
+ 
+ 	/*
+ 	 * First attempt.  If the caller requested NoLock, it already acquired an
+ 	 * appropriate lock and has called AcceptInvalidationMessages() since
+ 	 * doing so.  In this case, our first search is always correct, and we
+ 	 * degenerate to behave exactly like RangeVarGetRelid().
+ 	 */
+ 	lastCounter = SharedInvalidMessageCounter;
+ 	relOid1 = RangeVarGetRelid(relation, failOK);
+ 	if (lockmode == NoLock)
+ 		return relOid1;
+ 
+ 	/*
+ 	 * By the time we acquire the lock, our RangeVar may denote a different
+ 	 * relation or no relation at all.  In particular, this can happen when
+ 	 * the lock acquisition blocks on a transaction performing DROP or ALTER
+ 	 * TABLE RENAME.  However, once and while we do hold a lock of any level,
+ 	 * we can count on the name of the found relation remaining stable.
+ 	 *
+ 	 * Even so, DDL activity could cause an object in a schema earlier in the
+ 	 * search path to mask our original selection undetected.  No current lock
+ 	 * would prevent that.  We let the user worry about it, such as by taking
+ 	 * additional explicit locks.
+ 	 */
+ 	do
+ 	{
+ 		/* Not-found is always final. */
+ 		if (!OidIsValid(relOid1))
+ 			return relOid1;
+ 
+ 		/*
+ 		 * LockRelationOid also calls AcceptInvalidationMessages() to make
+ 		 * recent DDL effects visible, if needed.  Finding none, we're done.
+ 		 */
+ 		LockRelationOid(relOid1, lockmode);
+ 		if (lastCounter == SharedInvalidMessageCounter)
+ 			break;
+ 		else
+ 			lastCounter = SharedInvalidMessageCounter;
+ 
+ 		/* Some invalidation messages arrived; search again. */
+ 		relOid2 = relOid1;
+ 		relOid1 = RangeVarGetRelid(relation, failOK);
+ 
+ 		/* Done when our RangeVar denotes the same relation we locked. */
+ 	} while (relOid1 != relOid2);
+ 
+ 	return relOid1;
+ }
+ 
+ /*
   * RangeVarGetCreationNamespace
   *		Given a RangeVar describing a to-be-created relation,
   *		choose which namespace to create it in.
diff --git a/src/backend/storage/ipc/sindex 9ab16b1..9b1ec82 100644
*** a/src/backend/storage/ipc/sinval.c
--- b/src/backend/storage/ipc/sinval.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "utils/inval.h"
  
  
+ unsigned SharedInvalidMessageCounter;
+ 
+ 
  /*
   * Because backends sitting idle will not be reading sinval events, we
   * need a way to give an idle backend a swift kick in the rear and make
***************
*** 90,95 **** ReceiveSharedInvalidMessages(
--- 93,99 ----
  	{
  		SharedInvalidationMessage *msg = &messages[nextmsg++];
  
+ 		SharedInvalidMessageCounter++;
  		invalFunction(msg);
  	}
  
***************
*** 106,111 **** ReceiveSharedInvalidMessages(
--- 110,116 ----
  		{
  			/* got a reset message */
  			elog(DEBUG4, "cache state reset");
+ 			SharedInvalidMessageCounter++;
  			resetFunction();
  			break;				/* nothing more to do */
  		}
***************
*** 118,123 **** ReceiveSharedInvalidMessages(
--- 123,129 ----
  		{
  			SharedInvalidationMessage *msg = &messages[nextmsg++];
  
+ 			SharedInvalidMessageCounter++;
  			invalFunction(msg);
  		}
  
diff --git a/src/backend/storage/lmgr/lindex 859b385..90b2ecc 100644
*** a/src/backend/storage/lmgr/lmgr.c
--- b/src/backend/storage/lmgr/lmgr.c
***************
*** 81,90 **** LockRelationOid(Oid relid, LOCKMODE lockmode)
  	/*
  	 * Now that we have the lock, check for invalidation messages, so that we
  	 * will update or flush any stale relcache entry before we try to use it.
! 	 * We can skip this in the not-uncommon case that we already had the same
! 	 * type of lock being requested, since then no one else could have
! 	 * modified the relcache entry in an undesirable way.  (In the case where
! 	 * our own xact modifies the rel, the relcache update happens via
  	 * CommandCounterIncrement, not here.)
  	 */
  	if (res != LOCKACQUIRE_ALREADY_HELD)
--- 81,91 ----
  	/*
  	 * Now that we have the lock, check for invalidation messages, so that we
  	 * will update or flush any stale relcache entry before we try to use it.
! 	 * RangeVarLockRelid() specifically relies on us for this.  We can skip
! 	 * this in the not-uncommon case that we already had the same type of lock
! 	 * being requested, since then no one else could have modified the
! 	 * relcache entry in an undesirable way.  (In the case where our own xact
! 	 * modifies the rel, the relcache update happens via
  	 * CommandCounterIncrement, not here.)
  	 */
  	if (res != LOCKACQUIRE_ALREADY_HELD)
diff --git a/src/include/catalog/namesindex 5360096..7e64952 100644
*** a/src/include/catalog/namespace.h
--- b/src/include/catalog/namespace.h
***************
*** 15,20 ****
--- 15,21 ----
  #define NAMESPACE_H
  
  #include "nodes/primnodes.h"
+ #include "storage/lock.h"
  
  
  /*
***************
*** 48,53 **** typedef struct OverrideSearchPath
--- 49,56 ----
  
  
  extern Oid	RangeVarGetRelid(const RangeVar *relation, bool failOK);
+ extern Oid	RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+ 				  bool failOK);
  extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
  extern Oid	RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation);
  extern Oid	RelnameGetRelid(const char *relname);
diff --git a/src/include/storage/sinvaindex e9ce025..a90aa2d 100644
*** a/src/include/storage/sinval.h
--- b/src/include/storage/sinval.h
***************
*** 116,121 **** typedef union
--- 116,125 ----
  } SharedInvalidationMessage;
  
  
+ /* Counter of messages processed; may overflow. */
+ extern unsigned SharedInvalidMessageCounter;
+ 
+ 
  extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
  						  int n);
  extern void ReceiveSharedInvalidMessages(
nmtest-relation_openrv.sharapplication/x-sharDownload
#47Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#46)
Re: Make relation_openrv atomic wrt DDL

On Sun, Jun 12, 2011 at 3:18 PM, Noah Misch <noah@leadboat.com> wrote:

Indeed, the original patch slowed it by about 50%.  I improved the patch,
adding a global SharedInvalidMessageCounter to increment as we process
messages.  If this counter does not change between the RangeVarGetRelid() call
and the post-lock AcceptInvalidationMessages() call, we can skip the second
RangeVarGetRelid() call.  With the updated patch, I get these timings (in ms)
for runs of "SELECT nmtest(10000000)":

master: 19697.642, 20087.477, 19748.995
patched: 19723.716, 19879.538, 20257.671

In other words, no significant difference.  Since the patch removes the
no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to
"relation_close(r, NoLock)" in the test case actually reveals a 15%
performance improvement.  Granted, nothing to get excited about in light of
the artificiality.

In point of fact, given the not-so-artificial results I just posted on
another thread ("lazy vxid locks") I'm *very* excited about trying to
reduce the cost of AcceptInvalidationMessages(). I haven't reviewed
your patch in detail, but is there a way we can encapsulate the
knowledge of the invalidation system down inside the sinval machinery,
rather than letting the heap code have to know directly about the
counter? Perhaps AIV() could return true or false depending on
whether any invalidation messages were processed, or somesuch.

This semantic improvement would be hard to test with the current pg_regress
suite, so I do not include any test case addition in the patch.  If
sufficiently important, it could be done with isolationtester.

I haven't had a chance to look closely at the isolation tester yet,
but I'm excited about the possibilities for testing this sort of
thing. Not sure whether it's worth including this or not, but it
doesn't seem like a bad idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#48Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#47)
Re: Make relation_openrv atomic wrt DDL

On Sun, Jun 12, 2011 at 06:20:53PM -0400, Robert Haas wrote:

On Sun, Jun 12, 2011 at 3:18 PM, Noah Misch <noah@leadboat.com> wrote:

Indeed, the original patch slowed it by about 50%. ?I improved the patch,
adding a global SharedInvalidMessageCounter to increment as we process
messages. ?If this counter does not change between the RangeVarGetRelid() call
and the post-lock AcceptInvalidationMessages() call, we can skip the second
RangeVarGetRelid() call. ?With the updated patch, I get these timings (in ms)
for runs of "SELECT nmtest(10000000)":

master: 19697.642, 20087.477, 19748.995
patched: 19723.716, 19879.538, 20257.671

In other words, no significant difference. ?Since the patch removes the
no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to
"relation_close(r, NoLock)" in the test case actually reveals a 15%
performance improvement. ?Granted, nothing to get excited about in light of
the artificiality.

In point of fact, given the not-so-artificial results I just posted on
another thread ("lazy vxid locks") I'm *very* excited about trying to
reduce the cost of AcceptInvalidationMessages().

Quite interesting. A quick look suggests there is room for optimization there.

I haven't reviewed
your patch in detail, but is there a way we can encapsulate the
knowledge of the invalidation system down inside the sinval machinery,
rather than letting the heap code have to know directly about the
counter? Perhaps AIV() could return true or false depending on
whether any invalidation messages were processed, or somesuch.

I actually did it exactly that way originally. The problem was the return value
only applying to the given call, while I wished to answer a question like "Did
any call to AcceptInvalidationMessages() between code point A and code point B
process a message?" Adding AcceptInvalidationMessages() calls to code between A
and B would make the return value test yield a false negative. A global counter
was the best thing I could come up with that avoided this hazard.

#49Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#48)
Re: Make relation_openrv atomic wrt DDL

On Sun, Jun 12, 2011 at 9:23 PM, Noah Misch <noah@leadboat.com> wrote:

I haven't reviewed
your patch in detail, but is there a way we can encapsulate the
knowledge of the invalidation system down inside the sinval machinery,
rather than letting the heap code have to know directly about the
counter?  Perhaps AIV() could return true or false depending on
whether any invalidation messages were processed, or somesuch.

I actually did it exactly that way originally.  The problem was the return value
only applying to the given call, while I wished to answer a question like "Did
any call to AcceptInvalidationMessages() between code point A and code point B
process a message?"  Adding AcceptInvalidationMessages() calls to code between A
and B would make the return value test yield a false negative.  A global counter
was the best thing I could come up with that avoided this hazard.

Oh, interesting point. What if AcceptInvalidationMessages returns the
counter? Maybe with typedef uint32 InvalidationPositionId or
something like that, to make it partially self-documenting, and
greppable.

Taking that a bit further, what if we put that counter in
shared-memory? After writing new messages into the queue, a writer
would bump this count (only one process can be doing this at a time
because SInvalWriteLock is held) and memory-fence. Readers would
memory-fence and then read the count before acquiring the lock. If it
hasn't changed since we last read it, then don't bother acquiring
SInvalReadLock, because no new messages have arrived. Or maybe an
exact multiple of 2^32 messages have arrived, but there's probably
someway to finesse around that issue, like maybe also using some kind
of memory barrier to allow resetState to be checked without the lock.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#50Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#49)
Re: Make relation_openrv atomic wrt DDL

On Sun, Jun 12, 2011 at 10:56:41PM -0400, Robert Haas wrote:

On Sun, Jun 12, 2011 at 9:23 PM, Noah Misch <noah@leadboat.com> wrote:

I haven't reviewed
your patch in detail, but is there a way we can encapsulate the
knowledge of the invalidation system down inside the sinval machinery,
rather than letting the heap code have to know directly about the
counter? ?Perhaps AIV() could return true or false depending on
whether any invalidation messages were processed, or somesuch.

I actually did it exactly that way originally. ?The problem was the return value
only applying to the given call, while I wished to answer a question like "Did
any call to AcceptInvalidationMessages() between code point A and code point B
process a message?" ?Adding AcceptInvalidationMessages() calls to code between A
and B would make the return value test yield a false negative. ?A global counter
was the best thing I could come up with that avoided this hazard.

Oh, interesting point. What if AcceptInvalidationMessages returns the
counter? Maybe with typedef uint32 InvalidationPositionId or
something like that, to make it partially self-documenting, and
greppable.

That might be a start, but it's not a complete replacement for the global
counter. AcceptInvalidationMessages() is actually called in LockRelationOid(),
but the comparison needs to happen a level up in RangeVarLockRelid(). So, we
would be adding encapsulation in one place to lose it in another. Also, in the
uncontended case, the patch only calls AcceptInvalidationMessages() once per
relation_openrv. It compares the counter after that call with a counter as the
last caller left it -- RangeVarLockRelid() doesn't care who that caller was.

Taking that a bit further, what if we put that counter in
shared-memory? After writing new messages into the queue, a writer
would bump this count (only one process can be doing this at a time
because SInvalWriteLock is held) and memory-fence. Readers would
memory-fence and then read the count before acquiring the lock. If it
hasn't changed since we last read it, then don't bother acquiring
SInvalReadLock, because no new messages have arrived. Or maybe an
exact multiple of 2^32 messages have arrived, but there's probably
someway to finesse around that issue, like maybe also using some kind
of memory barrier to allow resetState to be checked without the lock.

This probably would not replace a backend-local counter of processed messages
for RangeVarLockRelid()'s purposes. It's quite possibly a good way to reduce
SInvalReadLock traffic, though.

Exact multiples of 2^32 messages need not be a problem, because the queue is
limited to MAXNUMMESSAGES (4096, currently). I think you will need to pack into
one 32-bit value all data each backend needs to decide whether to proceed with
the full process. Given that queue offsets fit into 13 bits (easily reduced to
12) and resetState is a bit, that seems practical enough at first glance.

nm

#51Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#50)
Re: Make relation_openrv atomic wrt DDL

On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch <noah@leadboat.com> wrote:

That might be a start, but it's not a complete replacement for the global
counter.  AcceptInvalidationMessages() is actually called in LockRelationOid(),
but the comparison needs to happen a level up in RangeVarLockRelid().  So, we
would be adding encapsulation in one place to lose it in another.  Also, in the
uncontended case, the patch only calls AcceptInvalidationMessages() once per
relation_openrv.  It compares the counter after that call with a counter as the
last caller left it -- RangeVarLockRelid() doesn't care who that caller was.

Hmm, OK.

Taking that a bit further, what if we put that counter in
shared-memory?  After writing new messages into the queue, a writer
would bump this count (only one process can be doing this at a time
because SInvalWriteLock is held) and memory-fence.  Readers would
memory-fence and then read the count before acquiring the lock.  If it
hasn't changed since we last read it, then don't bother acquiring
SInvalReadLock, because no new messages have arrived.  Or maybe an
exact multiple of 2^32 messages have arrived, but there's probably
someway to finesse around that issue, like maybe also using some kind
of memory barrier to allow resetState to be checked without the lock.

This probably would not replace a backend-local counter of processed messages
for RangeVarLockRelid()'s purposes.  It's quite possibly a good way to reduce
SInvalReadLock traffic, though.

Exact multiples of 2^32 messages need not be a problem, because the queue is
limited to MAXNUMMESSAGES (4096, currently).  I think you will need to pack into
one 32-bit value all data each backend needs to decide whether to proceed with
the full process.  Given that queue offsets fit into 13 bits (easily reduced to
12) and resetState is a bit, that seems practical enough at first glance.

I was imagining one shared global counter, not one per backend, and
thinking that each backend could do something like:

volatile uint32 *the_global_counter = &global_counter;
uint32 latest_counter;
mfence();
latest_counter = *the_global_counter;
if (latest_counter != previous_value_of_global_counter || myprocstate->isReset)
really_do_it();
previous_value_of_global_counter = latest_counter;

I'm not immediately seeing why that wouldn't work for your purposes as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#52Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#51)
Re: Make relation_openrv atomic wrt DDL

On Mon, Jun 13, 2011 at 08:21:05AM -0400, Robert Haas wrote:

On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch <noah@leadboat.com> wrote:

This probably would not replace a backend-local counter of processed messages
for RangeVarLockRelid()'s purposes. ?It's quite possibly a good way to reduce
SInvalReadLock traffic, though.

I was imagining one shared global counter, not one per backend, and
thinking that each backend could do something like:

volatile uint32 *the_global_counter = &global_counter;
uint32 latest_counter;
mfence();
latest_counter = *the_global_counter;
if (latest_counter != previous_value_of_global_counter || myprocstate->isReset)
really_do_it();
previous_value_of_global_counter = latest_counter;

I'm not immediately seeing why that wouldn't work for your purposes as well.

That takes us back to the problem of answering the (somewhat rephrased) question
"Did any call to AcceptInvalidationMessages() between code point A and code
point B call really_do_it()?" in a way not prone to breaking when new calls to
AcceptInvalidationMessages(), perhaps indirectly, get added. That's what the
local counter achieved. To achieve that, previous_value_of_global_counter would
need to be exposed outside sinval.c. That leaves us with a backend-local
counter updated in a different fashion. I might be missing something...

#53Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#52)
Re: Make relation_openrv atomic wrt DDL

On Mon, Jun 13, 2011 at 4:04 PM, Noah Misch <noah@leadboat.com> wrote:

On Mon, Jun 13, 2011 at 08:21:05AM -0400, Robert Haas wrote:

On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch <noah@leadboat.com> wrote:

This probably would not replace a backend-local counter of processed messages
for RangeVarLockRelid()'s purposes. ?It's quite possibly a good way to reduce
SInvalReadLock traffic, though.

I was imagining one shared global counter, not one per backend, and
thinking that each backend could do something like:

volatile uint32 *the_global_counter = &global_counter;
uint32 latest_counter;
mfence();
latest_counter = *the_global_counter;
if (latest_counter != previous_value_of_global_counter || myprocstate->isReset)
   really_do_it();
previous_value_of_global_counter = latest_counter;

I'm not immediately seeing why that wouldn't work for your purposes as well.

That takes us back to the problem of answering the (somewhat rephrased) question
"Did any call to AcceptInvalidationMessages() between code point A and code
point B call really_do_it()?" in a way not prone to breaking when new calls to
AcceptInvalidationMessages(), perhaps indirectly, get added.  That's what the
local counter achieved.  To achieve that, previous_value_of_global_counter would
need to be exposed outside sinval.c.  That leaves us with a backend-local
counter updated in a different fashion.  I might be missing something...

I see your point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#54Greg Stark
stark@mit.edu
In reply to: Noah Misch (#46)
Re: Make relation_openrv atomic wrt DDL

So I was the victim assigned to review this patch.

The code is pretty much impeccable as usual from Noah. But I have
questions about the semantics of it.

Firstly this bit makes me wonder:

+               /* Not-found is always final. */
+               if (!OidIsValid(relOid1))
+                       return relOid1;

If someone does

BEGIN; DROP TABLE foo; CREATE TABLE foo; COMMIT;

Then what prevents this logic from finding the doomed relation,
blocking until the transaction commits, then finding it's deleted and
returning InvalidOid?
RangeVarGetRelid is just going to complete its index scan of pg_class
and may not come across the newly inserted row.

Am I missing something? I would have expected to have to loop around
and retry if no valid record is found. But this raises the question --
if no lock was acquired then what would have triggered an
AcceptInvalidatationMessages and how would we know we waited long
enough to find out about the newly created table?

As a side note, if there are a long stream of such concurrent DDL then
this code will leave all the old versions locked. This is consistent
with our "hold locks until end of transaction" semantics but it seems
weird for tables that we locked "accidentally" and didn't really end
up using at all. I'm not sure it's really bad though.

#55Noah Misch
noah@leadboat.com
In reply to: Greg Stark (#54)
Re: Make relation_openrv atomic wrt DDL

Hi Greg,

On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote:

So I was the victim assigned to review this patch.

Thanks for doing so.

The code is pretty much impeccable as usual from Noah. But I have
questions about the semantics of it.

Firstly this bit makes me wonder:

+               /* Not-found is always final. */
+               if (!OidIsValid(relOid1))
+                       return relOid1;

If someone does

BEGIN; DROP TABLE foo; CREATE TABLE foo; COMMIT;

Then what prevents this logic from finding the doomed relation,
blocking until the transaction commits, then finding it's deleted and
returning InvalidOid?
RangeVarGetRelid is just going to complete its index scan of pg_class
and may not come across the newly inserted row.

RangeVarGetRelid() always runs its index scan to completion, and the blocking
happens in LockRelationOid(). You will get a sequence like this:

RangeVarGetRelid("foo") => 20000
LockRelationOid(20000) [... blocks ...]
AcceptInvalidationMessages() [process a message]
RangeVarGetRelid("foo") => 20001
[restart loop]
LockRelationOid(20001)
AcceptInvalidationMessages() [no new messages - done]

RangeVarGetRelid() *is* vulnerable to the problem Simon just reported in the
"ALTER TABLE lock strength reduction patch is unsafe" thread, which arises
when the DDL transaction actually commits in the middle of a concurrent system
table scan. I don't think this patch makes things better or worse in that
regard, but I haven't thought it through in great depth.

Am I missing something? I would have expected to have to loop around
and retry if no valid record is found. But this raises the question --
if no lock was acquired then what would have triggered an
AcceptInvalidatationMessages and how would we know we waited long
enough to find out about the newly created table?

Good question. I think characterizing "long enough" quickly leads to defining
one or more sequence points after which all backends must recognize a new
table as existing. My greenfield definition would be "a command should see
precisely the tables visible to its MVCC snapshot", but that has practical
problems. Let's see what implementation concerns would suggest...

This leads to a case I had not considered explicitly: CREATE TABLE on a name
that has not recently mapped to any table. If the catcache has a negative
entry on the key in question, we will rely on that and miss the new table
until we call AcceptInvalidationMessages() somehow. To hit this, you need a
command that dynamically chooses to query a table that has been created since
the command started running. DROP/CREATE of the same name in a single
transaction can't hit the problem. Consider this test script:

psql -X <<\_EOSQL &
-- Cleanup from last run
DROP TABLE IF EXISTS public.foo;

BEGIN;

-- Create the neg catcache entry.
SAVEPOINT q;
SELECT 1 FROM public.foo;
ROLLBACK to q;

--SET client_min_messages = debug5; -- use with CACHEDEBUG for insight
DO $$
BEGIN
EXECUTE 'SELECT 1 FROM pg_am'; -- prime basic catcache entries
PERFORM pg_sleep(11);
EXECUTE 'SELECT 1 FROM public.foo';
END
$$;
_EOSQL

sleep 1
psql -Xc 'CREATE TABLE public.foo ()'

wait

The first backend fails to see the new table despite its creating transaction
having committed ~10s ago. Starting a transaction, beginning to process a new
client-issued command, or successfully locking any relation prevents the miss.
We could narrow the window in most cases by re-adding a call to
AcceptInvalidationMessages() before RangeVarLockRelid()'s first call to
RangeVarGetRelid(). My current thinking is that it's not worth adding that
cost to every RangeVarLockRelid(). Thus, specify that, minimally, each
client-issued command will see all tables whose names were occupied at the
time the command started. I would add a comment to that effect. Thoughts?

As a side note, if there are a long stream of such concurrent DDL then
this code will leave all the old versions locked. This is consistent
with our "hold locks until end of transaction" semantics but it seems
weird for tables that we locked "accidentally" and didn't really end
up using at all. I'm not sure it's really bad though.

Yes. If that outcome were more common, this would be a good place to try
relaxing the rule.

nm

#56Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#55)
1 attachment(s)
Re: Make relation_openrv atomic wrt DDL

On Sun, Jun 19, 2011 at 11:42 PM, Noah Misch <noah@leadboat.com> wrote:

On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote:

So I was the victim assigned to review this patch.

Thanks for doing so.

This discussion seems to have died off. Let's see if we can drive
this forward to some conclusion.

I took a look at this patch and found that it had bit-rotted slightly.
I am attaching a rebased version.

Maybe this is a stupid idea, but what about changing the logic so
that, if we get back InvalidOid, we AcceptInvalidationMessages() and
retry if the counter has advanced? ISTM that might cover the example
you mentioned in your last post, where we fail to detect a relation
that has come into existence since our last call to
AcceptInvalidationMessages(). It would cost an extra
AcceptInvalidationMessages() only in the case where we haven't found
the relation, which (a) seems like a good time to worry about whether
we're missing something, since users generally try not to reference
nonexistent tables and (b) should be rare enough to be ignorable from
a performance perspective.

In the department of minor nitpicking, why not use a 64-bit counter
for SharedInvalidMessageCounter? Then we don't have to think very
hard about whether overflow can ever pose a problem.

It strikes me that, even with this patch, there is a fair amount of
room for wonky behavior. For example, as your comment notes, if
search_path = foo, bar, and we've previously referenced "x", getting
"bar.x", the creation of "foo.x" will generate invalidation messages,
but a subsequent reference - within the same transaction - to "x" will
not cause us to read them. It would be nice to
AcceptInvalidationMessages() unconditionally at the beginning of
RangeVarGetRelid() [and then redo as necessary to get a stable
answer], but that might have some performance consequence for
transactions that repeatedly read the same tables.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

atomic-openrv-v3.patchapplication/octet-stream; name=atomic-openrv-v3.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c9b1d5f..a345e39 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -975,26 +975,11 @@ relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
 {
 	Oid			relOid;
 
-	/*
-	 * Check for shared-cache-inval messages before trying to open the
-	 * relation.  This is needed to cover the case where the name identifies a
-	 * rel that has been dropped and recreated since the start of our
-	 * transaction: if we don't flush the old syscache entry then we'll latch
-	 * onto that entry and suffer an error when we do RelationIdGetRelation.
-	 * Note that relation_open does not need to do this, since a relation's
-	 * OID never changes.
-	 *
-	 * We skip this if asked for NoLock, on the assumption that the caller has
-	 * already ensured some appropriate lock is held.
-	 */
-	if (lockmode != NoLock)
-		AcceptInvalidationMessages();
-
-	/* Look up the appropriate relation using namespace search */
-	relOid = RangeVarGetRelid(relation, false);
+	/* Look up and lock the appropriate relation using namespace search */
+	relOid = RangeVarLockRelid(relation, lockmode, false);
 
 	/* Let relation_open do the rest */
-	return relation_open(relOid, lockmode);
+	return relation_open(relOid, NoLock);
 }
 
 /* ----------------
@@ -1012,30 +997,15 @@ relation_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
 {
 	Oid			relOid;
 
-	/*
-	 * Check for shared-cache-inval messages before trying to open the
-	 * relation.  This is needed to cover the case where the name identifies a
-	 * rel that has been dropped and recreated since the start of our
-	 * transaction: if we don't flush the old syscache entry then we'll latch
-	 * onto that entry and suffer an error when we do RelationIdGetRelation.
-	 * Note that relation_open does not need to do this, since a relation's
-	 * OID never changes.
-	 *
-	 * We skip this if asked for NoLock, on the assumption that the caller has
-	 * already ensured some appropriate lock is held.
-	 */
-	if (lockmode != NoLock)
-		AcceptInvalidationMessages();
-
-	/* Look up the appropriate relation using namespace search */
-	relOid = RangeVarGetRelid(relation, missing_ok);
+	/* Look up and lock the appropriate relation using namespace search */
+	relOid = RangeVarLockRelid(relation, lockmode, missing_ok);
 
 	/* Return NULL on not-found */
 	if (!OidIsValid(relOid))
 		return NULL;
 
 	/* Let relation_open do the rest */
-	return relation_open(relOid, lockmode);
+	return relation_open(relOid, NoLock);
 }
 
 /* ----------------
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index ce795a6..60862a4 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -44,6 +44,8 @@
 #include "parser/parse_func.h"
 #include "storage/backendid.h"
 #include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/sinval.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -285,6 +287,72 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
 }
 
 /*
+ * RangeVarLockRelid
+ *		Like RangeVarGetRelid, but simulatenously acquire the specified lock on
+ *		the relation.  This works properly in the face of concurrent DDL that
+ *		may drop, create or rename relations.
+ *
+ * If the relation is not found and failOK = true, take no lock and return
+ * InvalidOid.  Otherwise, raise an error.
+ */
+Oid
+RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+				  bool failOK)
+{
+	int			lastCounter;
+	Oid			relOid1,
+				relOid2;
+
+	/*
+	 * First attempt.  If the caller requested NoLock, it already acquired an
+	 * appropriate lock and has called AcceptInvalidationMessages() since
+	 * doing so.  In this case, our first search is always correct, and we
+	 * degenerate to behave exactly like RangeVarGetRelid().
+	 */
+	lastCounter = SharedInvalidMessageCounter;
+	relOid1 = RangeVarGetRelid(relation, failOK);
+	if (lockmode == NoLock)
+		return relOid1;
+
+	/*
+	 * By the time we acquire the lock, our RangeVar may denote a different
+	 * relation or no relation at all.  In particular, this can happen when
+	 * the lock acquisition blocks on a transaction performing DROP or ALTER
+	 * TABLE RENAME.  However, once and while we do hold a lock of any level,
+	 * we can count on the name of the found relation remaining stable.
+	 *
+	 * Even so, DDL activity could cause an object in a schema earlier in the
+	 * search path to mask our original selection undetected.  No current lock
+	 * would prevent that.  We let the user worry about it, such as by taking
+	 * additional explicit locks.
+	 */
+	do
+	{
+		/* Not-found is always final. */
+		if (!OidIsValid(relOid1))
+			return relOid1;
+
+		/*
+		 * LockRelationOid also calls AcceptInvalidationMessages() to make
+		 * recent DDL effects visible, if needed.  Finding none, we're done.
+		 */
+		LockRelationOid(relOid1, lockmode);
+		if (lastCounter == SharedInvalidMessageCounter)
+			break;
+		else
+			lastCounter = SharedInvalidMessageCounter;
+
+		/* Some invalidation messages arrived; search again. */
+		relOid2 = relOid1;
+		relOid1 = RangeVarGetRelid(relation, failOK);
+
+		/* Done when our RangeVar denotes the same relation we locked. */
+	} while (relOid1 != relOid2);
+
+	return relOid1;
+}
+
+/*
  * RangeVarGetCreationNamespace
  *		Given a RangeVar describing a to-be-created relation,
  *		choose which namespace to create it in.
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index 9ab16b1..9b1ec82 100644
--- a/src/backend/storage/ipc/sinval.c
+++ b/src/backend/storage/ipc/sinval.c
@@ -22,6 +22,9 @@
 #include "utils/inval.h"
 
 
+unsigned SharedInvalidMessageCounter;
+
+
 /*
  * Because backends sitting idle will not be reading sinval events, we
  * need a way to give an idle backend a swift kick in the rear and make
@@ -90,6 +93,7 @@ ReceiveSharedInvalidMessages(
 	{
 		SharedInvalidationMessage *msg = &messages[nextmsg++];
 
+		SharedInvalidMessageCounter++;
 		invalFunction(msg);
 	}
 
@@ -106,6 +110,7 @@ ReceiveSharedInvalidMessages(
 		{
 			/* got a reset message */
 			elog(DEBUG4, "cache state reset");
+			SharedInvalidMessageCounter++;
 			resetFunction();
 			break;				/* nothing more to do */
 		}
@@ -118,6 +123,7 @@ ReceiveSharedInvalidMessages(
 		{
 			SharedInvalidationMessage *msg = &messages[nextmsg++];
 
+			SharedInvalidMessageCounter++;
 			invalFunction(msg);
 		}
 
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 859b385..90b2ecc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -81,10 +81,11 @@ LockRelationOid(Oid relid, LOCKMODE lockmode)
 	/*
 	 * Now that we have the lock, check for invalidation messages, so that we
 	 * will update or flush any stale relcache entry before we try to use it.
-	 * We can skip this in the not-uncommon case that we already had the same
-	 * type of lock being requested, since then no one else could have
-	 * modified the relcache entry in an undesirable way.  (In the case where
-	 * our own xact modifies the rel, the relcache update happens via
+	 * RangeVarLockRelid() specifically relies on us for this.  We can skip
+	 * this in the not-uncommon case that we already had the same type of lock
+	 * being requested, since then no one else could have modified the
+	 * relcache entry in an undesirable way.  (In the case where our own xact
+	 * modifies the rel, the relcache update happens via
 	 * CommandCounterIncrement, not here.)
 	 */
 	if (res != LOCKACQUIRE_ALREADY_HELD)
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 7e1e194..da1ddfd 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -15,6 +15,7 @@
 #define NAMESPACE_H
 
 #include "nodes/primnodes.h"
+#include "storage/lock.h"
 
 
 /*
@@ -48,6 +49,8 @@ typedef struct OverrideSearchPath
 
 
 extern Oid	RangeVarGetRelid(const RangeVar *relation, bool failOK);
+extern Oid	RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+				  bool failOK);
 extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
 extern Oid	RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation);
 extern void RangeVarAdjustRelationPersistence(RangeVar *newRelation, Oid nspid);
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index e9ce025..a90aa2d 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -116,6 +116,10 @@ typedef union
 } SharedInvalidationMessage;
 
 
+/* Counter of messages processed; may overflow. */
+extern unsigned SharedInvalidMessageCounter;
+
+
 extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
 						  int n);
 extern void ReceiveSharedInvalidMessages(
#57Noah Misch
noah@2ndQuadrant.com
In reply to: Robert Haas (#56)
Re: Make relation_openrv atomic wrt DDL

On Wed, Jul 06, 2011 at 03:06:40PM -0400, Robert Haas wrote:

On Sun, Jun 19, 2011 at 11:42 PM, Noah Misch <noah@leadboat.com> wrote:

On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote:

So I was the victim assigned to review this patch.

Thanks for doing so.

This discussion seems to have died off. Let's see if we can drive
this forward to some conclusion.

I took a look at this patch and found that it had bit-rotted slightly.
I am attaching a rebased version.

Thanks.

Maybe this is a stupid idea, but what about changing the logic so
that, if we get back InvalidOid, we AcceptInvalidationMessages() and
retry if the counter has advanced? ISTM that might cover the example
you mentioned in your last post, where we fail to detect a relation
that has come into existence since our last call to
AcceptInvalidationMessages(). It would cost an extra
AcceptInvalidationMessages() only in the case where we haven't found
the relation, which (a) seems like a good time to worry about whether
we're missing something, since users generally try not to reference
nonexistent tables and (b) should be rare enough to be ignorable from
a performance perspective.

Agreed on all points. Good idea. That improves our guarantee from "any
client-issued command will see tables committed before its submission" to
"_any command_ will see tables committed before its _parsing_". In
particular, commands submitted using SPI will no longer be subject to this
source of d�j� vu. I, too, doubt that looking up nonexistent relations is a
performance-critical operation for anyone.

In the department of minor nitpicking, why not use a 64-bit counter
for SharedInvalidMessageCounter? Then we don't have to think very
hard about whether overflow can ever pose a problem.

Overflow is fine because I only ever compare values for equality, and I use an
unsigned int to give defined behavior at overflow. However, the added cost of
a 64-bit counter should be negligible, and future use cases (including
external code) might appreciate it. No strong preference.

It strikes me that, even with this patch, there is a fair amount of
room for wonky behavior. For example, as your comment notes, if
search_path = foo, bar, and we've previously referenced "x", getting
"bar.x", the creation of "foo.x" will generate invalidation messages,
but a subsequent reference - within the same transaction - to "x" will
not cause us to read them. It would be nice to
AcceptInvalidationMessages() unconditionally at the beginning of
RangeVarGetRelid() [and then redo as necessary to get a stable
answer], but that might have some performance consequence for
transactions that repeatedly read the same tables.

A user doing that should "LOCK bar.x" in the transaction that creates "foo.x",
giving a clean cutover. (I thought of documenting that somewhere, but it
seemed a tad esoteric.) In the absence of such a lock, an extra unconditional
call to AcceptInvalidationMessages() narrows the window in which his commands
parse as using the "wrong" table. However, commands that have already parsed
will still use the old table without interruption, with no particular bound on
when they may finish. I've failed to come up with a use case where the
narrower window for parse inconsistencies is valuable but the remaining
exposure is acceptable. There may very well be one I'm missing, though.

While a mere "LOCK bar.x" is sufficient to get a clean cutover with respect to
parsing, it fails to invalidate plans. To really cover all bases, you need
some no-op action that invalidates "bar.x". For actual practical use, I'd
recommend something like:

BEGIN;
ALTER TABLE bar.x RENAME TO x0;
ALTER TABLE bar.x0 RENAME TO x;
CREATE TABLE foo.x ...
COMMIT;

Probably worth making it more intuitive to DTRT here.

Thanks,
nm

#58Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#57)
Re: Make relation_openrv atomic wrt DDL

On Wed, Jul 6, 2011 at 6:32 PM, Noah Misch <noah@2ndquadrant.com> wrote:

Maybe this is a stupid idea, but what about changing the logic so
that, if we get back InvalidOid, we AcceptInvalidationMessages() and
retry if the counter has advanced?  ISTM that might cover the example
you mentioned in your last post, where we fail to detect a relation
that has come into existence since our last call to
AcceptInvalidationMessages().  It would cost an extra
AcceptInvalidationMessages() only in the case where we haven't found
the relation, which (a) seems like a good time to worry about whether
we're missing something, since users generally try not to reference
nonexistent tables and (b) should be rare enough to be ignorable from
a performance perspective.

Agreed on all points.  Good idea.  That improves our guarantee from "any
client-issued command will see tables committed before its submission" to
"_any command_ will see tables committed before its _parsing_".  In
particular, commands submitted using SPI will no longer be subject to this
source of déją vu.  I, too, doubt that looking up nonexistent relations is a
performance-critical operation for anyone.

In the department of minor nitpicking, why not use a 64-bit counter
for SharedInvalidMessageCounter?  Then we don't have to think very
hard about whether overflow can ever pose a problem.

Overflow is fine because I only ever compare values for equality, and I use an
unsigned int to give defined behavior at overflow.  However, the added cost of
a 64-bit counter should be negligible, and future use cases (including
external code) might appreciate it.  No strong preference.

Yeah, that's what I was thinking. I have a feeling we may want to use
this mechanism in other places, including places where it would be
nice to be able to assume that > has sensible semantics.

It strikes me that, even with this patch, there is a fair amount of
room for wonky behavior.  For example, as your comment notes, if
search_path = foo, bar, and we've previously referenced "x", getting
"bar.x", the creation of "foo.x" will generate invalidation messages,
but a subsequent reference - within the same transaction - to "x" will
not cause us to read them.  It would be nice to
AcceptInvalidationMessages() unconditionally at the beginning of
RangeVarGetRelid() [and then redo as necessary to get a stable
answer], but that might have some performance consequence for
transactions that repeatedly read the same tables.

A user doing that should "LOCK bar.x" in the transaction that creates "foo.x",
giving a clean cutover.  (I thought of documenting that somewhere, but it
seemed a tad esoteric.)  In the absence of such a lock, an extra unconditional
call to AcceptInvalidationMessages() narrows the window in which his commands
parse as using the "wrong" table.  However, commands that have already parsed
will still use the old table without interruption, with no particular bound on
when they may finish.  I've failed to come up with a use case where the
narrower window for parse inconsistencies is valuable but the remaining
exposure is acceptable.  There may very well be one I'm missing, though.

While a mere "LOCK bar.x" is sufficient to get a clean cutover with respect to
parsing, it fails to invalidate plans.  To really cover all bases, you need
some no-op action that invalidates "bar.x".  For actual practical use, I'd
recommend something like:

       BEGIN;
       ALTER TABLE bar.x RENAME TO x0;
       ALTER TABLE bar.x0 RENAME TO x;
       CREATE TABLE foo.x ...
       COMMIT;

Probably worth making it more intuitive to DTRT here.

Well, what would be really nice is if it just worked.

Care to submit an updated patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#59Noah Misch
noah@2ndQuadrant.com
In reply to: Robert Haas (#58)
1 attachment(s)
Re: Make relation_openrv atomic wrt DDL

On Wed, Jul 06, 2011 at 08:35:55PM -0400, Robert Haas wrote:

On Wed, Jul 6, 2011 at 6:32 PM, Noah Misch <noah@2ndquadrant.com> wrote:

While a mere "LOCK bar.x" is sufficient to get a clean cutover with respect to
parsing, it fails to invalidate plans. �To really cover all bases, you need
some no-op action that invalidates "bar.x". �For actual practical use, I'd
recommend something like:

� � � �BEGIN;
� � � �ALTER TABLE bar.x RENAME TO x0;
� � � �ALTER TABLE bar.x0 RENAME TO x;
� � � �CREATE TABLE foo.x ...
� � � �COMMIT;

Probably worth making it more intuitive to DTRT here.

Well, what would be really nice is if it just worked.

Yes.

Care to submit an updated patch?

Attached. I made the counter 64 bits wide, handled the nothing-found case per
your idea, and improved a few comments cosmetically. I have not attempted to
improve the search_path interposition case. We can recommend the workaround
above, and doing better looks like an excursion much larger than the one
represented by this patch.

Thanks,
nm

Attachments:

atomic-openrv-v4.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c9b1d5f..a345e39 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 975,1000 **** relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
  	Oid			relOid;
  
! 	/*
! 	 * Check for shared-cache-inval messages before trying to open the
! 	 * relation.  This is needed to cover the case where the name identifies a
! 	 * rel that has been dropped and recreated since the start of our
! 	 * transaction: if we don't flush the old syscache entry then we'll latch
! 	 * onto that entry and suffer an error when we do RelationIdGetRelation.
! 	 * Note that relation_open does not need to do this, since a relation's
! 	 * OID never changes.
! 	 *
! 	 * We skip this if asked for NoLock, on the assumption that the caller has
! 	 * already ensured some appropriate lock is held.
! 	 */
! 	if (lockmode != NoLock)
! 		AcceptInvalidationMessages();
! 
! 	/* Look up the appropriate relation using namespace search */
! 	relOid = RangeVarGetRelid(relation, false);
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 975,985 ----
  {
  	Oid			relOid;
  
! 	/* Look up and lock the appropriate relation using namespace search */
! 	relOid = RangeVarLockRelid(relation, lockmode, false);
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, NoLock);
  }
  
  /* ----------------
***************
*** 1012,1041 **** relation_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
  {
  	Oid			relOid;
  
! 	/*
! 	 * Check for shared-cache-inval messages before trying to open the
! 	 * relation.  This is needed to cover the case where the name identifies a
! 	 * rel that has been dropped and recreated since the start of our
! 	 * transaction: if we don't flush the old syscache entry then we'll latch
! 	 * onto that entry and suffer an error when we do RelationIdGetRelation.
! 	 * Note that relation_open does not need to do this, since a relation's
! 	 * OID never changes.
! 	 *
! 	 * We skip this if asked for NoLock, on the assumption that the caller has
! 	 * already ensured some appropriate lock is held.
! 	 */
! 	if (lockmode != NoLock)
! 		AcceptInvalidationMessages();
! 
! 	/* Look up the appropriate relation using namespace search */
! 	relOid = RangeVarGetRelid(relation, missing_ok);
  
  	/* Return NULL on not-found */
  	if (!OidIsValid(relOid))
  		return NULL;
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 997,1011 ----
  {
  	Oid			relOid;
  
! 	/* Look up and lock the appropriate relation using namespace search */
! 	relOid = RangeVarLockRelid(relation, lockmode, missing_ok);
  
  	/* Return NULL on not-found */
  	if (!OidIsValid(relOid))
  		return NULL;
  
  	/* Let relation_open do the rest */
! 	return relation_open(relOid, NoLock);
  }
  
  /* ----------------
diff --git a/src/backend/catalog/namespindex ce795a6..f75fcef 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
***************
*** 44,49 ****
--- 44,51 ----
  #include "parser/parse_func.h"
  #include "storage/backendid.h"
  #include "storage/ipc.h"
+ #include "storage/lmgr.h"
+ #include "storage/sinval.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/guc.h"
***************
*** 285,290 **** RangeVarGetRelid(const RangeVar *relation, bool failOK)
--- 287,372 ----
  }
  
  /*
+  * RangeVarLockRelid
+  *		Like RangeVarGetRelid, but simultaneously acquire the specified lock
+  *		on the relation.  This works properly in the face of concurrent DDL
+  *		that may drop, create or rename relations.
+  *
+  * If the relation is not found and failOK = true, take no lock and return
+  * InvalidOid.  Otherwise, raise an error.
+  */
+ Oid
+ RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+ 				  bool failOK)
+ {
+ 	int			lastCounter;
+ 	Oid			relOid1,
+ 				relOid2;
+ 
+ 	/*
+ 	 * If the caller requested NoLock, it already acquired an appropriate lock
+ 	 * with LockRelationOid().  In this case, our first search is always
+ 	 * correct, and we degenerate to RangeVarGetRelid().
+ 	 */
+ 	if (lockmode == NoLock)
+ 		return RangeVarGetRelid(relation, failOK);
+ 
+ 	/*
+ 	 * By the time we acquire the lock, our RangeVar may denote a different
+ 	 * relation or no relation at all.  In particular, this can happen when
+ 	 * the lock acquisition blocks on a transaction performing DROP or ALTER
+ 	 * TABLE RENAME.  However, once and while we do hold a lock of any level,
+ 	 * we can count on the name of the found relation remaining stable.
+ 	 *
+ 	 * Even so, DDL activity could cause an object in a schema earlier in the
+ 	 * search path to mask our original selection undetected.  No current lock
+ 	 * would prevent that.  We let the user worry about it, such as by taking
+ 	 * additional explicit locks.
+ 	 */
+ 
+ 	/*
+ 	 * First attempt.  Always allow it to fail; RangeVarGetRelid() may hit a
+ 	 * negative catcache entry for a recently-created relation.  Find out by
+ 	 * making recent DDL effects visible and checking again.  Attempting to
+ 	 * open nonexistent relations is not a hot path in applications, so the
+ 	 * extra cost is of little concern.
+ 	 */
+ 	lastCounter = SharedInvalidMessageCounter;
+ 	relOid1 = RangeVarGetRelid(relation, true);
+ 	if (!OidIsValid(relOid1))
+ 	{
+ 		AcceptInvalidationMessages();
+ 		lastCounter = SharedInvalidMessageCounter;
+ 		relOid1 = RangeVarGetRelid(relation, failOK);
+ 	}
+ 
+ 	do
+ 	{
+ 		/* After the initial precautions above, not-found is final. */
+ 		if (!OidIsValid(relOid1))
+ 			return relOid1;
+ 
+ 		/*
+ 		 * LockRelationOid() also calls AcceptInvalidationMessages() to make
+ 		 * recent DDL effects visible, if needed.  Finding none, we're done.
+ 		 */
+ 		LockRelationOid(relOid1, lockmode);
+ 		if (lastCounter == SharedInvalidMessageCounter)
+ 			break;
+ 		else
+ 			lastCounter = SharedInvalidMessageCounter;
+ 
+ 		/* Some invalidation messages arrived; search again. */
+ 		relOid2 = relOid1;
+ 		relOid1 = RangeVarGetRelid(relation, failOK);
+ 
+ 		/* Done when our RangeVar denotes the same relation we locked. */
+ 	} while (relOid1 != relOid2);
+ 
+ 	return relOid1;
+ }
+ 
+ /*
   * RangeVarGetCreationNamespace
   *		Given a RangeVar describing a to-be-created relation,
   *		choose which namespace to create it in.
diff --git a/src/backend/storage/ipc/sindex 9ab16b1..8499615 100644
*** a/src/backend/storage/ipc/sinval.c
--- b/src/backend/storage/ipc/sinval.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "utils/inval.h"
  
  
+ uint64 SharedInvalidMessageCounter;
+ 
+ 
  /*
   * Because backends sitting idle will not be reading sinval events, we
   * need a way to give an idle backend a swift kick in the rear and make
***************
*** 90,95 **** ReceiveSharedInvalidMessages(
--- 93,99 ----
  	{
  		SharedInvalidationMessage *msg = &messages[nextmsg++];
  
+ 		SharedInvalidMessageCounter++;
  		invalFunction(msg);
  	}
  
***************
*** 106,111 **** ReceiveSharedInvalidMessages(
--- 110,116 ----
  		{
  			/* got a reset message */
  			elog(DEBUG4, "cache state reset");
+ 			SharedInvalidMessageCounter++;
  			resetFunction();
  			break;				/* nothing more to do */
  		}
***************
*** 118,123 **** ReceiveSharedInvalidMessages(
--- 123,129 ----
  		{
  			SharedInvalidationMessage *msg = &messages[nextmsg++];
  
+ 			SharedInvalidMessageCounter++;
  			invalFunction(msg);
  		}
  
diff --git a/src/backend/storage/lmgr/lindex 859b385..90b2ecc 100644
*** a/src/backend/storage/lmgr/lmgr.c
--- b/src/backend/storage/lmgr/lmgr.c
***************
*** 81,90 **** LockRelationOid(Oid relid, LOCKMODE lockmode)
  	/*
  	 * Now that we have the lock, check for invalidation messages, so that we
  	 * will update or flush any stale relcache entry before we try to use it.
! 	 * We can skip this in the not-uncommon case that we already had the same
! 	 * type of lock being requested, since then no one else could have
! 	 * modified the relcache entry in an undesirable way.  (In the case where
! 	 * our own xact modifies the rel, the relcache update happens via
  	 * CommandCounterIncrement, not here.)
  	 */
  	if (res != LOCKACQUIRE_ALREADY_HELD)
--- 81,91 ----
  	/*
  	 * Now that we have the lock, check for invalidation messages, so that we
  	 * will update or flush any stale relcache entry before we try to use it.
! 	 * RangeVarLockRelid() specifically relies on us for this.  We can skip
! 	 * this in the not-uncommon case that we already had the same type of lock
! 	 * being requested, since then no one else could have modified the
! 	 * relcache entry in an undesirable way.  (In the case where our own xact
! 	 * modifies the rel, the relcache update happens via
  	 * CommandCounterIncrement, not here.)
  	 */
  	if (res != LOCKACQUIRE_ALREADY_HELD)
diff --git a/src/include/catalog/namesindex 7e1e194..da1ddfd 100644
*** a/src/include/catalog/namespace.h
--- b/src/include/catalog/namespace.h
***************
*** 15,20 ****
--- 15,21 ----
  #define NAMESPACE_H
  
  #include "nodes/primnodes.h"
+ #include "storage/lock.h"
  
  
  /*
***************
*** 48,53 **** typedef struct OverrideSearchPath
--- 49,56 ----
  
  
  extern Oid	RangeVarGetRelid(const RangeVar *relation, bool failOK);
+ extern Oid	RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+ 				  bool failOK);
  extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
  extern Oid	RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation);
  extern void RangeVarAdjustRelationPersistence(RangeVar *newRelation, Oid nspid);
diff --git a/src/include/storage/sinvaindex e9ce025..aba474d 100644
*** a/src/include/storage/sinval.h
--- b/src/include/storage/sinval.h
***************
*** 116,121 **** typedef union
--- 116,125 ----
  } SharedInvalidationMessage;
  
  
+ /* Counter of messages processed; don't worry about overflow. */
+ extern uint64 SharedInvalidMessageCounter;
+ 
+ 
  extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
  						  int n);
  extern void ReceiveSharedInvalidMessages(
#60Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#59)
1 attachment(s)
Re: Make relation_openrv atomic wrt DDL

On Wed, Jul 6, 2011 at 10:44 PM, Noah Misch <noah@2ndquadrant.com> wrote:

Attached.  I made the counter 64 bits wide, handled the nothing-found case per
your idea, and improved a few comments cosmetically.  I have not attempted to
improve the search_path interposition case.  We can recommend the workaround
above, and doing better looks like an excursion much larger than the one
represented by this patch.

I looked at this some more and started to get uncomfortable with the
whole idea of having RangeVarLockRelid() be a wrapper around
RangeVarGetRelid(). This hazard exists everywhere the latter function
gets called, not just in relation_open(). So it doesn't seem right to
fix the problem only in those places.

So I went through and incorporated the logic proposed for
RangeVarLockRelid() into RangeVarGetRelid() itself, and then went
through and examined all the callers of RangeVarGetRelid(). There are
some, such as has_table_privilege(), where it's really impractical to
take any lock, first because we might have no privileges at all on
that table and second because that could easily lead to a massive
amount of locking for no particular good reason. I believe Tom
suggested that the right fix for these functions is to have them
index-scan the system catalogs using the caller's MVCC snapshot, which
would be right at least for pg_dump. And there are other callers that
cannot acquire the lock as part of RangeVarGetRelid() for a variety of
other reasons. However, having said that, there do appear to be a
number of cases that are can be fixed fairly easily.

So here's a (heavily) updated patch that tries to do that, along with
adding comments to the places where things still need more fixing. In
addition to the problems corrected by your last version, this fixes
LOCK TABLE, ALTER SEQUENCE, ALTER TABLE .. RENAME, the whole-table
variant of REINDEX, CREATE CONSTRAINT TRIGGER (which is flat-out wrong
as it stands, since it acquires *no lock at all* on the table
specified in the FROM clause, never mind the question of doing so
atomically), CREATE RULE, and (partially) DROP TRIGGER and DROP RULE.

Regardless of exactly how we decide to proceed here, it strikes me
that there is a heck of a lot more work that could stand to be done in
this area... :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

atomic-rangevargetrelid.patchapplication/octet-stream; name=atomic-rangevargetrelid.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c9b1d5f..9f1bcf1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -975,26 +975,11 @@ relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
 {
 	Oid			relOid;
 
-	/*
-	 * Check for shared-cache-inval messages before trying to open the
-	 * relation.  This is needed to cover the case where the name identifies a
-	 * rel that has been dropped and recreated since the start of our
-	 * transaction: if we don't flush the old syscache entry then we'll latch
-	 * onto that entry and suffer an error when we do RelationIdGetRelation.
-	 * Note that relation_open does not need to do this, since a relation's
-	 * OID never changes.
-	 *
-	 * We skip this if asked for NoLock, on the assumption that the caller has
-	 * already ensured some appropriate lock is held.
-	 */
-	if (lockmode != NoLock)
-		AcceptInvalidationMessages();
-
-	/* Look up the appropriate relation using namespace search */
-	relOid = RangeVarGetRelid(relation, false);
+	/* Look up and lock the appropriate relation using namespace search */
+	relOid = RangeVarGetRelid(relation, lockmode, false, false);
 
 	/* Let relation_open do the rest */
-	return relation_open(relOid, lockmode);
+	return relation_open(relOid, NoLock);
 }
 
 /* ----------------
@@ -1012,30 +997,15 @@ relation_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
 {
 	Oid			relOid;
 
-	/*
-	 * Check for shared-cache-inval messages before trying to open the
-	 * relation.  This is needed to cover the case where the name identifies a
-	 * rel that has been dropped and recreated since the start of our
-	 * transaction: if we don't flush the old syscache entry then we'll latch
-	 * onto that entry and suffer an error when we do RelationIdGetRelation.
-	 * Note that relation_open does not need to do this, since a relation's
-	 * OID never changes.
-	 *
-	 * We skip this if asked for NoLock, on the assumption that the caller has
-	 * already ensured some appropriate lock is held.
-	 */
-	if (lockmode != NoLock)
-		AcceptInvalidationMessages();
-
-	/* Look up the appropriate relation using namespace search */
-	relOid = RangeVarGetRelid(relation, missing_ok);
+	/* Look up and lock the appropriate relation using namespace search */
+	relOid = RangeVarGetRelid(relation, lockmode, missing_ok, false);
 
 	/* Return NULL on not-found */
 	if (!OidIsValid(relOid))
 		return NULL;
 
 	/* Let relation_open do the rest */
-	return relation_open(relOid, lockmode);
+	return relation_open(relOid, NoLock);
 }
 
 /* ----------------
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index df32731..9a125bd 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -583,6 +583,11 @@ ExecGrantStmt_oids(InternalGrant *istmt)
  * objectNamesToOids
  *
  * Turn a list of object names of a given type into an Oid list.
+ *
+ * XXX: This function doesn't take any sort of locks on the objects whose
+ * names it looks up.  In the face of concurrent DDL, we might easily latch
+ * onto an old version of an object, causing the GRANT or REVOKE statement
+ * to fail.
  */
 static List *
 objectNamesToOids(GrantObjectType objtype, List *objnames)
@@ -601,7 +606,7 @@ objectNamesToOids(GrantObjectType objtype, List *objnames)
 				RangeVar   *relvar = (RangeVar *) lfirst(cell);
 				Oid			relOid;
 
-				relOid = RangeVarGetRelid(relvar, false);
+				relOid = RangeVarGetRelid(relvar, NoLock, false, false);
 				objects = lappend_oid(objects, relOid);
 			}
 			break;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 39ba486..83aae7a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2942,7 +2942,8 @@ reindex_relation(Oid relid, int flags)
 
 	/*
 	 * Open and lock the relation.	ShareLock is sufficient since we only need
-	 * to prevent schema and data changes in it.
+	 * to prevent schema and data changes in it.  The lock level used here
+	 * should match ReindexTable().
 	 */
 	rel = heap_open(relid, ShareLock);
 
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index ce795a6..e2260a3 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -44,6 +44,8 @@
 #include "parser/parse_func.h"
 #include "storage/backendid.h"
 #include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/sinval.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -215,14 +217,20 @@ Datum		pg_is_other_temp_schema(PG_FUNCTION_ARGS);
  *		Given a RangeVar describing an existing relation,
  *		select the proper namespace and look up the relation OID.
  *
- * If the relation is not found, return InvalidOid if failOK = true,
+ * If the relation is not found, return InvalidOid if missing_ok = true,
  * otherwise raise an error.
+ *
+ * If nowait = true, throw an error if we'd have to wait for a lock.
  */
 Oid
-RangeVarGetRelid(const RangeVar *relation, bool failOK)
+RangeVarGetRelid(const RangeVar *relation, LOCKMODE lockmode, bool missing_ok,
+				 bool nowait)
 {
+	uint64		inval_count;
 	Oid			namespaceId;
 	Oid			relId;
+	Oid			oldRelId = InvalidOid;
+	bool		retry = false;
 
 	/*
 	 * We check the catalog name and then ignore it.
@@ -238,37 +246,121 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
 	}
 
 	/*
-	 * Some non-default relpersistence value may have been specified.  The
-	 * parser never generates such a RangeVar in simple DML, but it can happen
-	 * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)".  Such
-	 * a command will generate an added CREATE INDEX operation, which must be
-	 * careful to find the temp table, even when pg_temp is not first in the
-	 * search path.
+	 * If lockmode = NoLock, the caller is assumed to already hold some sort
+	 * of heavyweight lock on the target relation.  Otherwise, we're preceding
+	 * here with no lock at all, which means that any answers we get must be
+	 * viewed with a certain degree of suspicion.  In particular, any time we
+	 * AcceptInvalidationMessages(), the anwer might change.  We handle that
+	 * case by retrying the operation until either (1) no more invalidation
+	 * messages show up or (2) the answer doesn't change.
+	 *
+	 * This isn't a perfect defense.  In particular, we fail to defend against
+	 * the case where the first name lookup returns a stale answer which
+	 * happens to refer to a relation on which we already have a lock.
+	 * In that case, no AcceptInvalidationMessages() will occur.  We could
+	 * protect against that case by calling AcceptInvalidationMessages()
+	 * before beginning this loop, but that would add a significant amount
+	 * overhead, so for now we don't.
 	 */
-	if (relation->relpersistence == RELPERSISTENCE_TEMP)
+	for (;;)
 	{
-		if (relation->schemaname)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-				   errmsg("temporary tables cannot specify a schema name")));
-		if (OidIsValid(myTempNamespace))
-			relId = get_relname_relid(relation->relname, myTempNamespace);
-		else	/* this probably can't happen? */
-			relId = InvalidOid;
-	}
-	else if (relation->schemaname)
-	{
-		/* use exact schema given */
-		namespaceId = LookupExplicitNamespace(relation->schemaname);
-		relId = get_relname_relid(relation->relname, namespaceId);
-	}
-	else
-	{
-		/* search the namespace path */
-		relId = RelnameGetRelid(relation->relname);
+		/*
+		 * Remember this value, so that, after looking up the relation name
+		 * and locking its OID, we can check whether any invalidation messages
+		 * have been processed that might require a do-over.
+		 */
+		inval_count = SharedInvalidMessageCounter;
+
+		/*
+		 * Some non-default relpersistence value may have been specified.  The
+		 * parser never generates such a RangeVar in simple DML, but it can
+		 * happen in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY
+		 * KEY)".  Such a command will generate an added CREATE INDEX
+		 * operation, which must be careful to find the temp table, even when
+		 * pg_temp is not first in the search path.
+		 */
+		if (relation->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (relation->schemaname)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+					   errmsg("temporary tables cannot specify a schema name")));
+			if (OidIsValid(myTempNamespace))
+				relId = get_relname_relid(relation->relname, myTempNamespace);
+			else	/* this probably can't happen? */
+				relId = InvalidOid;
+		}
+		else if (relation->schemaname)
+		{
+			/* use exact schema given */
+			namespaceId = LookupExplicitNamespace(relation->schemaname);
+			relId = get_relname_relid(relation->relname, namespaceId);
+		}
+		else
+		{
+			/* search the namespace path */
+			relId = RelnameGetRelid(relation->relname);
+		}
+
+		/*
+		 * If no lock requested, we assume the caller knows what they're
+		 * doing.  They should have already acquired a heavyweight lock on
+		 * this relation earlier in the processing of this same statement,
+		 * so it wouldn't be appropriate to AcceptInvalidationMessages()
+		 * here, as that might pull the rug out from under them.
+		 */
+		if (lockmode == NoLock)
+			break;
+
+		/*
+		 * If, upon retry, we get back the same OID we did last time, then
+		 * the invalidation messages we processed did not change the final
+		 * answer.  So we're done.
+		 */
+		if (retry && relId == oldRelId)
+			break;
+
+		/*
+		 * Lock relation.  This will also accept any pending invalidation
+		 * messages.  If we got back InvalidOid, indicating not found, then
+		 * there's nothing to lock, but we accept invalidation messages
+		 * anyway, to flush any negative catcache entries that may be
+		 * lingering.
+		 */
+		if (!OidIsValid(relId))
+			AcceptInvalidationMessages();
+		else if (!nowait)
+			LockRelationOid(relId, lockmode);
+		else if (!ConditionalLockRelationOid(relId, lockmode))
+		{
+			if (relation->schemaname)
+				ereport(ERROR,
+						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+						 errmsg("could not obtain lock on relation \"%s.%s\"",
+								relation->schemaname, relation->relname)));
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+						 errmsg("could not obtain lock on relation \"%s\"",
+								relation->relname)));
+		}
+
+		/*
+		 * If no invalidation message were processed, we're done!
+		 */
+		if (inval_count == SharedInvalidMessageCounter)
+			break;
+
+		/*
+		 * Something may have changed.  Let's repeat the name lookup, to
+		 * make sure this name still references the same relation it did
+		 * previously.
+		 */
+		retry = true;
+		oldRelId = relId;
 	}
 
-	if (!OidIsValid(relId) && !failOK)
+	if (!OidIsValid(relId) && !missing_ok)
 	{
 		if (relation->schemaname)
 			ereport(ERROR,
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 2373d52..61849ef 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -108,7 +108,9 @@ ExecRenameStmt(RenameStmt *stmt)
 
 				CheckRelationOwnership(stmt->relation, true);
 
-				relid = RangeVarGetRelid(stmt->relation, false);
+				/* lock level should match RenameRelation */
+				relid = RangeVarGetRelid(stmt->relation, AccessExclusiveLock,
+										 false, false);
 
 				switch (stmt->renameType)
 				{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b7c021d..5024854 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1531,9 +1531,15 @@ ReindexIndex(RangeVar *indexRelation)
 	Oid			indOid;
 	HeapTuple	tuple;
 
-	indOid = RangeVarGetRelid(indexRelation, false);
+	/*
+	 * XXX: This is not safe in the presence of concurrent DDL. We should
+	 * take AccessExclusiveLock here, but that would violate the rule that
+	 * indexes should only be locked after their parent tables.  For now,
+	 * we live with it.
+	 */
+	indOid = RangeVarGetRelid(indexRelation, NoLock, false, false);
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(indOid));
-	if (!HeapTupleIsValid(tuple))		/* shouldn't happen */
+	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", indOid);
 
 	if (((Form_pg_class) GETSTRUCT(tuple))->relkind != RELKIND_INDEX)
@@ -1562,7 +1568,8 @@ ReindexTable(RangeVar *relation)
 	Oid			heapOid;
 	HeapTuple	tuple;
 
-	heapOid = RangeVarGetRelid(relation, false);
+	/* The lock level used here should match reindex_relation(). */
+	heapOid = RangeVarGetRelid(relation, ShareLock, false, false);
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(heapOid));
 	if (!HeapTupleIsValid(tuple))		/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", heapOid);
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index b2c6ea5..6cb9cdd 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -24,8 +24,8 @@
 #include "utils/acl.h"
 #include "utils/lsyscache.h"
 
-static void LockTableRecurse(Oid reloid, RangeVar *rv,
-				 LOCKMODE lockmode, bool nowait, bool recurse);
+static void LockTableRecurse(Relation rel, LOCKMODE lockmode, bool nowait,
+				 bool recurse);
 
 
 /*
@@ -37,27 +37,28 @@ LockTableCommand(LockStmt *lockstmt)
 	ListCell   *p;
 
 	/*
+	 * During recovery we only accept these variations: LOCK TABLE foo IN
+	 * ACCESS SHARE MODE LOCK TABLE foo IN ROW SHARE MODE LOCK TABLE foo
+	 * IN ROW EXCLUSIVE MODE This test must match the restrictions defined
+	 * in LockAcquire()
+	 */
+	if (lockstmt->mode > RowExclusiveLock)
+		PreventCommandDuringRecovery("LOCK TABLE");
+
+	/*
 	 * Iterate over the list and process the named relations one at a time
 	 */
 	foreach(p, lockstmt->relations)
 	{
-		RangeVar   *relation = (RangeVar *) lfirst(p);
-		bool		recurse = interpretInhOption(relation->inhOpt);
+		RangeVar   *rv = (RangeVar *) lfirst(p);
+		Relation	rel;
+		bool		recurse = interpretInhOption(rv->inhOpt);
 		Oid			reloid;
 
-		reloid = RangeVarGetRelid(relation, false);
-
-		/*
-		 * During recovery we only accept these variations: LOCK TABLE foo IN
-		 * ACCESS SHARE MODE LOCK TABLE foo IN ROW SHARE MODE LOCK TABLE foo
-		 * IN ROW EXCLUSIVE MODE This test must match the restrictions defined
-		 * in LockAcquire()
-		 */
-		if (lockstmt->mode > RowExclusiveLock)
-			PreventCommandDuringRecovery("LOCK TABLE");
+		reloid = RangeVarGetRelid(rv, lockstmt->mode, false, lockstmt->nowait);
+		rel = relation_open(reloid, NoLock);
 
-		LockTableRecurse(reloid, relation,
-						 lockstmt->mode, lockstmt->nowait, recurse);
+		LockTableRecurse(rel, lockstmt->mode, lockstmt->nowait, recurse);
 	}
 }
 
@@ -69,67 +70,10 @@ LockTableCommand(LockStmt *lockstmt)
  * "rv" is NULL and we should just silently ignore any dropped child rel.
  */
 static void
-LockTableRecurse(Oid reloid, RangeVar *rv,
-				 LOCKMODE lockmode, bool nowait, bool recurse)
+LockTableRecurse(Relation rel, LOCKMODE lockmode, bool nowait, bool recurse)
 {
-	Relation	rel;
 	AclResult	aclresult;
-
-	/*
-	 * Acquire the lock.  We must do this first to protect against concurrent
-	 * drops.  Note that a lock against an already-dropped relation's OID
-	 * won't fail.
-	 */
-	if (nowait)
-	{
-		if (!ConditionalLockRelationOid(reloid, lockmode))
-		{
-			/* try to throw error by name; relation could be deleted... */
-			char	   *relname = rv ? rv->relname : get_rel_name(reloid);
-
-			if (relname)
-				ereport(ERROR,
-						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-						 errmsg("could not obtain lock on relation \"%s\"",
-								relname)));
-			else
-				ereport(ERROR,
-						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-					  errmsg("could not obtain lock on relation with OID %u",
-							 reloid)));
-		}
-	}
-	else
-		LockRelationOid(reloid, lockmode);
-
-	/*
-	 * Now that we have the lock, check to see if the relation really exists
-	 * or not.
-	 */
-	rel = try_relation_open(reloid, NoLock);
-
-	if (!rel)
-	{
-		/* Release useless lock */
-		UnlockRelationOid(reloid, lockmode);
-
-		/* At top level, throw error; otherwise, ignore this child rel */
-		if (rv)
-		{
-			if (rv->schemaname)
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_TABLE),
-						 errmsg("relation \"%s.%s\" does not exist",
-								rv->schemaname, rv->relname)));
-			else
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_TABLE),
-						 errmsg("relation \"%s\" does not exist",
-								rv->relname)));
-		}
-
-		return;
-	}
+	Oid			reloid = RelationGetRelid(rel);
 
 	/* Verify adequate privilege */
 	if (lockmode == AccessShareLock)
@@ -159,12 +103,48 @@ LockTableRecurse(Oid reloid, RangeVar *rv,
 	{
 		List	   *children = find_inheritance_children(reloid, NoLock);
 		ListCell   *lc;
+		Relation	childrel;
 
 		foreach(lc, children)
 		{
 			Oid			childreloid = lfirst_oid(lc);
 
-			LockTableRecurse(childreloid, NULL, lockmode, nowait, recurse);
+			/*
+			 * Acquire the lock, to protect against concurrent drops.  Note
+			 * that a lock against an already-dropped relation's OID won't
+			 * fail.
+			 */
+			if (!nowait)
+				LockRelationOid(childreloid, lockmode);
+			else if (!ConditionalLockRelationOid(childreloid, lockmode))
+			{
+				/* try to throw error by name; relation could be deleted... */
+				char	   *relname = get_rel_name(childreloid);
+
+				if (relname)
+					ereport(ERROR,
+							(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+							 errmsg("could not obtain lock on relation \"%s\"",
+								relname)));
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+					 errmsg("could not obtain lock on relation with OID %u",
+								 reloid)));
+			}
+
+			/*
+			 * Now that we have the lock, check to see if the relation really
+			 * exists or not.
+			 */
+			childrel = try_relation_open(childreloid, NoLock);
+			if (!childrel)
+			{
+				/* Release useless lock */
+				UnlockRelationOid(childreloid, lockmode);
+			}
+
+			LockTableRecurse(childrel, lockmode, nowait, recurse);
 		}
 	}
 
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index be04177..de6e2a3 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -427,8 +427,8 @@ AlterSequence(AlterSeqStmt *stmt)
 	FormData_pg_sequence new;
 	List	   *owned_by;
 
-	/* open and AccessShareLock sequence */
-	relid = RangeVarGetRelid(stmt->sequence, false);
+	/* Open and lock sequence. */
+	relid = RangeVarGetRelid(stmt->sequence, AccessShareLock, false, false);
 	init_sequence(relid, &elm, &seqrel);
 
 	/* allow ALTER to sequence owner only */
@@ -507,7 +507,16 @@ nextval(PG_FUNCTION_ARGS)
 	Oid			relid;
 
 	sequence = makeRangeVarFromNameList(textToQualifiedNameList(seqin));
-	relid = RangeVarGetRelid(sequence, false);
+
+	/*
+	 * XXX: This is not safe in the presence of concurrent DDL, but
+	 * acquiring a lock here is more expensive than letting nextval_internal
+	 * do it, since the latter maintains a cache that keeps us from hitting
+	 * the lock manager more than once per transaction.  It's not clear
+	 * whether the performance penalty is material in practice, but for now,
+	 * we do it this way.
+	 */
+	relid = RangeVarGetRelid(sequence, NoLock, false, false);
 
 	PG_RETURN_INT64(nextval_internal(relid));
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3bc350a..e835416 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -764,8 +764,15 @@ RemoveRelations(DropStmt *drop)
 		 */
 		AcceptInvalidationMessages();
 
-		/* Look up the appropriate relation using namespace search */
-		relOid = RangeVarGetRelid(rel, true);
+		/*
+		 * Look up the appropriate relation using namespace search.
+		 *
+		 * XXX: Doing this without a lock is unsafe in the presence of
+		 * concurrent DDL, but acquiring a lock here might violate the rule
+		 * that a table must be locked before its corresponding index.
+		 * So, for now, we ignore the hazzard.
+		 */
+		relOid = RangeVarGetRelid(rel, NoLock, true, false);
 
 		/* Not there? */
 		if (!OidIsValid(relOid))
@@ -2234,6 +2241,8 @@ RenameRelation(Oid myrelid, const char *newrelname, ObjectType reltype)
 	/*
 	 * Grab an exclusive lock on the target table, index, sequence or view,
 	 * which we will NOT release until end of transaction.
+	 *
+	 * Lock level used here should match ExecRenameStmt
 	 */
 	targetrelation = relation_open(myrelid, AccessExclusiveLock);
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 8072c77..a7b983e 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -201,7 +201,17 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 						RelationGetRelationName(rel))));
 
 	if (stmt->isconstraint && stmt->constrrel != NULL)
-		constrrelid = RangeVarGetRelid(stmt->constrrel, false);
+	{
+		/*
+		 * We must take a lock on the target relation to protect against
+		 * concurrent drop.  It's not clear that AccessShareLock is strong
+		 * enough, but we certainly need at least that much... otherwise,
+		 * we might end up creating a pg_constraint entry referencing a
+		 * nonexistent table.
+		 */
+		constrrelid = RangeVarGetRelid(stmt->constrrel, AccessShareLock, false,
+									   false);
+	}
 
 	/* permission checks */
 	if (!isInternal)
@@ -1027,11 +1037,15 @@ ConvertTriggerToFK(CreateTrigStmt *stmt, Oid funcoid)
  * DropTrigger - drop an individual trigger by name
  */
 void
-DropTrigger(Oid relid, const char *trigname, DropBehavior behavior,
+DropTrigger(RangeVar *relation, const char *trigname, DropBehavior behavior,
 			bool missing_ok)
 {
+	Oid			relid;
 	ObjectAddress object;
 
+	/* lock level should match RemoveTriggerById */
+	relid = RangeVarGetRelid(relation, ShareRowExclusiveLock, false, false);
+
 	object.classId = TriggerRelationId;
 	object.objectId = get_trigger_oid(relid, trigname, missing_ok);
 	object.objectSubId = 0;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 9303967..889737e 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -318,7 +318,16 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
 		/* Process a specific relation */
 		Oid			relid;
 
-		relid = RangeVarGetRelid(vacrel, false);
+		/*
+		 * Since we don't take a lock here, the relation might be gone,
+		 * or the RangeVar might no longer refer to the OID we look up
+		 * here.  In the former case, VACUUM will do nothing; in the
+		 * latter case, it will process the OID we looked up here, rather
+		 * than the new one.  Neither is ideal, but there's little practical
+		 * alternative, since we're going to commit this transaction and
+		 * begin a new one between now and then.
+		 */
+		relid = RangeVarGetRelid(vacrel, NoLock, false, false);
 
 		/* Make a relation list entry for this guy */
 		oldcontext = MemoryContextSwitchTo(vac_context);
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index f2ccf0d..3840c2f 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -273,11 +273,17 @@ searchRangeTable(ParseState *pstate, RangeVar *relation)
 	 * If it's an unqualified name, check for possible CTE matches. A CTE
 	 * hides any real relation matches.  If no CTE, look for a matching
 	 * relation.
+	 *
+	 * NB: It's not critical that RangeVarGetRelid return the correct answer
+	 * here in the face of concurrent DDL.  If it doesn't, the worst case
+	 * scenario is a less-clear error message.  Also, the tables involved in
+	 * the query are already locked, which reduces the number of cases in
+	 * which surprising behavior can occur.  So we do the name lookup unlocked.
 	 */
 	if (!relation->schemaname)
 		cte = scanNameSpaceForCTE(pstate, refname, &ctelevelsup);
 	if (!cte)
-		relId = RangeVarGetRelid(relation, true);
+		relId = RangeVarGetRelid(relation, NoLock, true, false);
 
 	/* Now look for RTEs matching either the relation/CTE or the alias */
 	for (levelsup = 0;
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index ac62cbc..2122032 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -108,8 +108,14 @@ LookupTypeName(ParseState *pstate, const TypeName *typeName,
 				break;
 		}
 
-		/* look up the field */
-		relid = RangeVarGetRelid(rel, false);
+		/*
+		 * Look up the field.
+		 *
+		 * XXX: As no lock is taken here, this might fail in the presence
+		 * of concurrent DDL.  But taking a lock would carry a performance
+		 * penalty and would also require a permissions check.
+		 */
+		relid = RangeVarGetRelid(rel, NoLock, false, false);
 		attnum = get_attnum(relid, field);
 		if (attnum == InvalidAttrNumber)
 			ereport(ERROR,
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index bb7d468..7e9e8bd 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -196,11 +196,15 @@ DefineRule(RuleStmt *stmt, const char *queryString)
 	Node	   *whereClause;
 	Oid			relId;
 
-	/* Parse analysis ... */
+	/* Parse analysis. */
 	transformRuleStmt(stmt, queryString, &actions, &whereClause);
 
-	/* ... find the relation ... */
-	relId = RangeVarGetRelid(stmt->relation, false);
+	/*
+	 * Find and lock the relation.  Lock level should match
+	 * DefineQueryRewrite.
+	 */
+	relId = RangeVarGetRelid(stmt->relation, AccessExclusiveLock, false,
+							 false);
 
 	/* ... and execute */
 	DefineQueryRewrite(stmt->rulename,
@@ -235,17 +239,8 @@ DefineQueryRewrite(char *rulename,
 	Query	   *query;
 	bool		RelisBecomingView = false;
 
-	/*
-	 * If we are installing an ON SELECT rule, we had better grab
-	 * AccessExclusiveLock to ensure no SELECTs are currently running on the
-	 * event relation.	For other types of rules, it is sufficient to grab
-	 * ShareRowExclusiveLock to lock out insert/update/delete actions and to
-	 * ensure that we lock out current CREATE RULE statements.
-	 */
-	if (event_type == CMD_SELECT)
-		event_relation = heap_open(event_relid, AccessExclusiveLock);
-	else
-		event_relation = heap_open(event_relid, ShareRowExclusiveLock);
+	/* lock level should match the one used in DefineRule */
+	event_relation = heap_open(event_relid, AccessExclusiveLock);
 
 	/*
 	 * Verify relation is of a type that rules can sensibly be applied to.
diff --git a/src/backend/rewrite/rewriteRemove.c b/src/backend/rewrite/rewriteRemove.c
index b9dc7c5..cec22ac 100644
--- a/src/backend/rewrite/rewriteRemove.c
+++ b/src/backend/rewrite/rewriteRemove.c
@@ -19,6 +19,7 @@
 #include "access/sysattr.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_rewrite.h"
 #include "miscadmin.h"
 #include "rewrite/rewriteRemove.h"
@@ -37,13 +38,18 @@
  * Delete a rule given its name.
  */
 void
-RemoveRewriteRule(Oid owningRel, const char *ruleName, DropBehavior behavior,
-				  bool missing_ok)
+RemoveRewriteRule(RangeVar *relation, const char *ruleName,
+				  DropBehavior behavior, bool missing_ok)
 {
 	HeapTuple	tuple;
 	Oid			eventRelationOid;
+	Oid			owningRel;
 	ObjectAddress object;
 
+	/* should match RemoveRewriteRuleById */
+	owningRel = RangeVarGetRelid(relation, ShareUpdateExclusiveLock,
+								 false, false);
+
 	/*
 	 * Find the tuple for the target rule.
 	 */
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index 9ab16b1..8499615 100644
--- a/src/backend/storage/ipc/sinval.c
+++ b/src/backend/storage/ipc/sinval.c
@@ -22,6 +22,9 @@
 #include "utils/inval.h"
 
 
+uint64 SharedInvalidMessageCounter;
+
+
 /*
  * Because backends sitting idle will not be reading sinval events, we
  * need a way to give an idle backend a swift kick in the rear and make
@@ -90,6 +93,7 @@ ReceiveSharedInvalidMessages(
 	{
 		SharedInvalidationMessage *msg = &messages[nextmsg++];
 
+		SharedInvalidMessageCounter++;
 		invalFunction(msg);
 	}
 
@@ -106,6 +110,7 @@ ReceiveSharedInvalidMessages(
 		{
 			/* got a reset message */
 			elog(DEBUG4, "cache state reset");
+			SharedInvalidMessageCounter++;
 			resetFunction();
 			break;				/* nothing more to do */
 		}
@@ -118,6 +123,7 @@ ReceiveSharedInvalidMessages(
 		{
 			SharedInvalidationMessage *msg = &messages[nextmsg++];
 
+			SharedInvalidMessageCounter++;
 			invalFunction(msg);
 		}
 
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 859b385..3ac098b 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -81,10 +81,11 @@ LockRelationOid(Oid relid, LOCKMODE lockmode)
 	/*
 	 * Now that we have the lock, check for invalidation messages, so that we
 	 * will update or flush any stale relcache entry before we try to use it.
-	 * We can skip this in the not-uncommon case that we already had the same
-	 * type of lock being requested, since then no one else could have
-	 * modified the relcache entry in an undesirable way.  (In the case where
-	 * our own xact modifies the rel, the relcache update happens via
+	 * RangeVarGetRelid() specifically relies on us for this.  We can skip
+	 * this in the not-uncommon case that we already had the same type of lock
+	 * being requested, since then no one else could have modified the
+	 * relcache entry in an undesirable way.  (In the case where our own xact
+	 * modifies the rel, the relcache update happens via
 	 * CommandCounterIncrement, not here.)
 	 */
 	if (res != LOCKACQUIRE_ALREADY_HELD)
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 224e1f3..0698caf 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -77,7 +77,15 @@ CheckRelationOwnership(RangeVar *rel, bool noCatalogs)
 	Oid			relOid;
 	HeapTuple	tuple;
 
-	relOid = RangeVarGetRelid(rel, false);
+	/*
+	 * XXX: This is unsafe in the presence of concurrent DDL, since it is
+	 * called before acquiring any lock on the target relation.  However,
+	 * locking the target relation (especially using something like
+	 * AccessExclusiveLock) before verifying that the user has permissions
+	 * is not appealing either.
+	 */
+	relOid = RangeVarGetRelid(rel, NoLock, false, false);
+
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(tuple))		/* should not happen */
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
@@ -1111,20 +1119,17 @@ standard_ProcessUtility(Node *parsetree,
 		case T_DropPropertyStmt:
 			{
 				DropPropertyStmt *stmt = (DropPropertyStmt *) parsetree;
-				Oid			relId;
-
-				relId = RangeVarGetRelid(stmt->relation, false);
 
 				switch (stmt->removeType)
 				{
 					case OBJECT_RULE:
 						/* RemoveRewriteRule checks permissions */
-						RemoveRewriteRule(relId, stmt->property,
+						RemoveRewriteRule(stmt->relation, stmt->property,
 										  stmt->behavior, stmt->missing_ok);
 						break;
 					case OBJECT_TRIGGER:
 						/* DropTrigger checks permissions */
-						DropTrigger(relId, stmt->property,
+						DropTrigger(stmt->relation, stmt->property,
 									stmt->behavior, stmt->missing_ok);
 						break;
 					default:
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 4a3e241..3fa95e2 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -1939,7 +1939,8 @@ convert_table_name(text *tablename)
 
 	relrv = makeRangeVarFromNameList(textToQualifiedNameList(tablename));
 
-	return RangeVarGetRelid(relrv, false);
+	/* We might not even have permissions on this relation; don't lock it. */
+	return RangeVarGetRelid(relrv, NoLock, false, false);
 }
 
 /*
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 6716c02..0d42a39 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -823,7 +823,9 @@ regclassin(PG_FUNCTION_ARGS)
 	 */
 	names = stringToQualifiedNameList(class_name_or_oid);
 
-	result = RangeVarGetRelid(makeRangeVarFromNameList(names), false);
+	/* We might not even have permissions on this relation; don't lock it. */
+	result = RangeVarGetRelid(makeRangeVarFromNameList(names), NoLock, false,
+							  false);
 
 	PG_RETURN_OID(result);
 }
@@ -1294,7 +1296,9 @@ text_regclass(PG_FUNCTION_ARGS)
 	RangeVar   *rv;
 
 	rv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
-	result = RangeVarGetRelid(rv, false);
+
+	/* We might not even have permissions on this relation; don't lock it. */
+	result = RangeVarGetRelid(rv, NoLock, false, false);
 
 	PG_RETURN_OID(result);
 }
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 49dc9c8..3fd99e0 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -385,8 +385,9 @@ pg_get_viewdef_name(PG_FUNCTION_ARGS)
 	RangeVar   *viewrel;
 	Oid			viewoid;
 
+	/* Look up view name.  Can't lock it - we might not have privileges. */
 	viewrel = makeRangeVarFromNameList(textToQualifiedNameList(viewname));
-	viewoid = RangeVarGetRelid(viewrel, false);
+	viewoid = RangeVarGetRelid(viewrel, NoLock, false, false);
 
 	PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, 0)));
 }
@@ -403,8 +404,10 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS)
 	Oid			viewoid;
 
 	prettyFlags = pretty ? PRETTYFLAG_PAREN | PRETTYFLAG_INDENT : 0;
+
+	/* Look up view name.  Can't lock it - we might not have privileges. */
 	viewrel = makeRangeVarFromNameList(textToQualifiedNameList(viewname));
-	viewoid = RangeVarGetRelid(viewrel, false);
+	viewoid = RangeVarGetRelid(viewrel, NoLock, false, false);
 
 	PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, prettyFlags)));
 }
@@ -1567,9 +1570,9 @@ pg_get_serial_sequence(PG_FUNCTION_ARGS)
 	SysScanDesc scan;
 	HeapTuple	tup;
 
-	/* Get the OID of the table */
+	/* Look up table name.  Can't lock it - we might not have privileges. */
 	tablerv = makeRangeVarFromNameList(textToQualifiedNameList(tablename));
-	tableOid = RangeVarGetRelid(tablerv, false);
+	tableOid = RangeVarGetRelid(tablerv, NoLock, false, false);
 
 	/* Get the number of the column */
 	column = text_to_cstring(columnname);
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 7e1e194..4bcbc20 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -15,6 +15,7 @@
 #define NAMESPACE_H
 
 #include "nodes/primnodes.h"
+#include "storage/lock.h"
 
 
 /*
@@ -47,7 +48,8 @@ typedef struct OverrideSearchPath
 } OverrideSearchPath;
 
 
-extern Oid	RangeVarGetRelid(const RangeVar *relation, bool failOK);
+extern Oid	RangeVarGetRelid(const RangeVar *relation, LOCKMODE lockmode,
+				 bool missing_ok, bool nowait);
 extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
 extern Oid	RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation);
 extern void RangeVarAdjustRelationPersistence(RangeVar *newRelation, Oid nspid);
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index ad97871..fe21298 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -112,7 +112,7 @@ extern Oid CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			  Oid constraintOid, Oid indexOid,
 			  bool isInternal);
 
-extern void DropTrigger(Oid relid, const char *trigname,
+extern void DropTrigger(RangeVar *relation, const char *trigname,
 			DropBehavior behavior, bool missing_ok);
 extern void RemoveTriggerById(Oid trigOid);
 extern Oid	get_trigger_oid(Oid relid, const char *name, bool missing_ok);
diff --git a/src/include/rewrite/rewriteRemove.h b/src/include/rewrite/rewriteRemove.h
index 90df045..b9a63ba 100644
--- a/src/include/rewrite/rewriteRemove.h
+++ b/src/include/rewrite/rewriteRemove.h
@@ -17,7 +17,7 @@
 #include "nodes/parsenodes.h"
 
 
-extern void RemoveRewriteRule(Oid owningRel, const char *ruleName,
+extern void RemoveRewriteRule(RangeVar *relation, const char *ruleName,
 				  DropBehavior behavior, bool missing_ok);
 extern void RemoveRewriteRuleById(Oid ruleOid);
 
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index e9ce025..aba474d 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -116,6 +116,10 @@ typedef union
 } SharedInvalidationMessage;
 
 
+/* Counter of messages processed; don't worry about overflow. */
+extern uint64 SharedInvalidMessageCounter;
+
+
 extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
 						  int n);
 extern void ReceiveSharedInvalidMessages(
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index afd20b4..d22fa68 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -1708,7 +1708,8 @@ plpgsql_parse_cwordtype(List *idents)
 		relvar = makeRangeVar(strVal(linitial(idents)),
 							  strVal(lsecond(idents)),
 							  -1);
-		classOid = RangeVarGetRelid(relvar, true);
+		/* Can't lock relation - we might not have privileges. */
+		classOid = RangeVarGetRelid(relvar, NoLock, true, false);
 		if (!OidIsValid(classOid))
 			goto done;
 		fldname = strVal(lthird(idents));
@@ -1804,16 +1805,11 @@ plpgsql_parse_cwordrowtype(List *idents)
 	/* Avoid memory leaks in long-term function context */
 	oldCxt = MemoryContextSwitchTo(compile_tmp_cxt);
 
-	/* Lookup the relation */
+	/* Look up relation name.  Can't lock it - we might not have privileges. */
 	relvar = makeRangeVar(strVal(linitial(idents)),
 						  strVal(lsecond(idents)),
 						  -1);
-	classOid = RangeVarGetRelid(relvar, true);
-	if (!OidIsValid(classOid))
-		ereport(ERROR,
-				(errcode(ERRCODE_UNDEFINED_TABLE),
-				 errmsg("relation \"%s.%s\" does not exist",
-						strVal(linitial(idents)), strVal(lsecond(idents)))));
+	classOid = RangeVarGetRelid(relvar, NoLock, false, false);
 
 	MemoryContextSwitchTo(oldCxt);
 
#61Noah Misch
noah@2ndQuadrant.com
In reply to: Robert Haas (#60)
Re: Make relation_openrv atomic wrt DDL

On Thu, Jul 07, 2011 at 11:43:30AM -0400, Robert Haas wrote:

On Wed, Jul 6, 2011 at 10:44 PM, Noah Misch <noah@2ndquadrant.com> wrote:

Attached. �I made the counter 64 bits wide, handled the nothing-found case per
your idea, and improved a few comments cosmetically. �I have not attempted to
improve the search_path interposition case. �We can recommend the workaround
above, and doing better looks like an excursion much larger than the one
represented by this patch.

I looked at this some more and started to get uncomfortable with the
whole idea of having RangeVarLockRelid() be a wrapper around
RangeVarGetRelid(). This hazard exists everywhere the latter function
gets called, not just in relation_open(). So it doesn't seem right to
fix the problem only in those places.

Yes; I wished to focus on the major case for this round, then address the
other callers later. We can do it this way, though.

It does make long-term sense to expose only the lock-taking form, making
otherwise-unaffected callers say NoLock explicitly.

So I went through and incorporated the logic proposed for
RangeVarLockRelid() into RangeVarGetRelid() itself, and then went
through and examined all the callers of RangeVarGetRelid(). There are
some, such as has_table_privilege(), where it's really impractical to
take any lock, first because we might have no privileges at all on
that table and second because that could easily lead to a massive
amount of locking for no particular good reason. I believe Tom
suggested that the right fix for these functions is to have them
index-scan the system catalogs using the caller's MVCC snapshot, which
would be right at least for pg_dump. And there are other callers that
cannot acquire the lock as part of RangeVarGetRelid() for a variety of
other reasons. However, having said that, there do appear to be a
number of cases that are can be fixed fairly easily.

So here's a (heavily) updated patch that tries to do that, along with
adding comments to the places where things still need more fixing. In
addition to the problems corrected by your last version, this fixes
LOCK TABLE, ALTER SEQUENCE, ALTER TABLE .. RENAME, the whole-table
variant of REINDEX, CREATE CONSTRAINT TRIGGER (which is flat-out wrong
as it stands, since it acquires *no lock at all* on the table
specified in the FROM clause, never mind the question of doing so
atomically), CREATE RULE, and (partially) DROP TRIGGER and DROP RULE.

Looks basically sound; see a few code comments below.

Regardless of exactly how we decide to proceed here, it strikes me
that there is a heck of a lot more work that could stand to be done in
this area... :-(

Yes. DDL-DDL concurrency is a much smaller practical concern, but it is a
quality-of-implementation hole.

--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c

@@ -238,37 +246,121 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
}

/*
-	 * Some non-default relpersistence value may have been specified.  The
-	 * parser never generates such a RangeVar in simple DML, but it can happen
-	 * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)".  Such
-	 * a command will generate an added CREATE INDEX operation, which must be
-	 * careful to find the temp table, even when pg_temp is not first in the
-	 * search path.
+	 * If lockmode = NoLock, the caller is assumed to already hold some sort
+	 * of heavyweight lock on the target relation.  Otherwise, we're preceding
+	 * here with no lock at all, which means that any answers we get must be
+	 * viewed with a certain degree of suspicion.  In particular, any time we
+	 * AcceptInvalidationMessages(), the anwer might change.  We handle that
+	 * case by retrying the operation until either (1) no more invalidation
+	 * messages show up or (2) the answer doesn't change.

The third sentence is true for all lock levels. The fourth sentence is true
for lock levels _except_ NoLock.

+		/*
+		 * If no lock requested, we assume the caller knows what they're
+		 * doing.  They should have already acquired a heavyweight lock on
+		 * this relation earlier in the processing of this same statement,
+		 * so it wouldn't be appropriate to AcceptInvalidationMessages()
+		 * here, as that might pull the rug out from under them.
+		 */

What sort of rug-pullout do you have in mind? Also, I don't think it matters
if the caller acquired the lock during this _statement_. It just needs to
hold a lock, somehow.

--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c

@@ -69,67 +70,10 @@ LockTableCommand(LockStmt *lockstmt)
* "rv" is NULL and we should just silently ignore any dropped child rel.

This comment refers to a now-removed argument.

*/
static void
-LockTableRecurse(Oid reloid, RangeVar *rv,
-				 LOCKMODE lockmode, bool nowait, bool recurse)
+LockTableRecurse(Relation rel, LOCKMODE lockmode, bool nowait, bool recurse)
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -764,8 +764,15 @@ RemoveRelations(DropStmt *drop)
*/
AcceptInvalidationMessages();
-		/* Look up the appropriate relation using namespace search */
-		relOid = RangeVarGetRelid(rel, true);
+		/*
+		 * Look up the appropriate relation using namespace search.
+		 *
+		 * XXX: Doing this without a lock is unsafe in the presence of
+		 * concurrent DDL, but acquiring a lock here might violate the rule
+		 * that a table must be locked before its corresponding index.
+		 * So, for now, we ignore the hazzard.

Spelling.

--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -196,11 +196,15 @@ DefineRule(RuleStmt *stmt, const char *queryString)
Node	   *whereClause;
Oid			relId;
-	/* Parse analysis ... */
+	/* Parse analysis. */
transformRuleStmt(stmt, queryString, &actions, &whereClause);
-	/* ... find the relation ... */
-	relId = RangeVarGetRelid(stmt->relation, false);
+	/*
+	 * Find and lock the relation.  Lock level should match
+	 * DefineQueryRewrite.
+	 */
+	relId = RangeVarGetRelid(stmt->relation, AccessExclusiveLock, false,
+							 false);

Seems better to just pass the RangeVar to DefineQueryRewrite() ...

/* ... and execute */
DefineQueryRewrite(stmt->rulename,
@@ -235,17 +239,8 @@ DefineQueryRewrite(char *rulename,
Query *query;
bool RelisBecomingView = false;

-	/*
-	 * If we are installing an ON SELECT rule, we had better grab
-	 * AccessExclusiveLock to ensure no SELECTs are currently running on the
-	 * event relation.	For other types of rules, it is sufficient to grab
-	 * ShareRowExclusiveLock to lock out insert/update/delete actions and to
-	 * ensure that we lock out current CREATE RULE statements.
-	 */
-	if (event_type == CMD_SELECT)
-		event_relation = heap_open(event_relid, AccessExclusiveLock);
-	else
-		event_relation = heap_open(event_relid, ShareRowExclusiveLock);
+	/* lock level should match the one used in DefineRule */
+	event_relation = heap_open(event_relid, AccessExclusiveLock);

... also making it cleaner to preserve this optimization.

Incidentally, you've added in many places this pattern of commenting that a
lock level must match another lock level used elsewhere. Would it be better
to migrate away from looking up a relation oid in one function and opening the
relation in another function, instead passing either a Relation or a RangeVar?
You did substitute passing a Relation in a couple of places.

Thanks,
nm

#62Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#61)
1 attachment(s)
Re: Make relation_openrv atomic wrt DDL

On Thu, Jul 7, 2011 at 3:54 PM, Noah Misch <noah@2ndquadrant.com> wrote:

Yes.  DDL-DDL concurrency is a much smaller practical concern, but it is a
quality-of-implementation hole.

Agreed, although I'm not too pleased about the fact that this doesn't
fix nextval(). That seems like a fairly significant hole (but one to
be addressed by a later patch).

The third sentence is true for all lock levels.  The fourth sentence is true
for lock levels _except_ NoLock.

I rewrote this whole blurb. See what you think.

+             /*
+              * If no lock requested, we assume the caller knows what they're
+              * doing.  They should have already acquired a heavyweight lock on
+              * this relation earlier in the processing of this same statement,
+              * so it wouldn't be appropriate to AcceptInvalidationMessages()
+              * here, as that might pull the rug out from under them.
+              */

What sort of rug-pullout do you have in mind?  Also, I don't think it matters
if the caller acquired the lock during this _statement_.  It just needs to
hold a lock, somehow.

What I'm specifically concerned about here is the possibility that we
have code which does RangeVarGetRelid() multiple times and expects to
get the same relation every time. Now, granted, any such places are
bugs. But I am not eager to change the logic here without looking
harder for them (and also for performance reasons).

... also making it cleaner to preserve this optimization.

That optimization is now gone anyway.

Incidentally, you've added in many places this pattern of commenting that a
lock level must match another lock level used elsewhere.  Would it be better
to migrate away from looking up a relation oid in one function and opening the
relation in another function, instead passing either a Relation or a RangeVar?
You did substitute passing a Relation in a couple of places.

Well, I've got these:

- ReindexTable() must match reindex_relation()
- ExecRenameStmt() must match RenameRelation()
- DropTrigger() must match RemoveTriggerById()
- DefineRule() must match DefineQueryRewrite()
- RemoveRewriteRule() must match RemoveRewriteRuleById()

(Whoever came up with these names was clearly a genius...)

RemoveTriggerById() and RemoveRewriteRuleById() are part of the
drop-statement machinery - they can be called either because that
rule/trigger itself gets dropped, or because some other object gets
dropped and it cascades to the rule/trigger. I'm pretty sure I don't
want to go start mucking with that machinery.

On the other hand, RenameRelation() is only called in one place other
than ExecRenameStmt(), and it looks to me like the other caller could
just as well use RenameRelationInternal(). If we change that, then we
can redefine RenameRelation() to take a RangeVar instead of an Oid and
push the rest of the logic from ExecRenameStmt() into it. That would
probably be better all around.

The situation with DefineRule and DefineQueryRewrite is a bit more
nuanced. As you suggest, those could probably be merged into one
function taking both an Oid and a RangeVar. If the Oid is not
InvalidOid, we do heap_open(); else, we do heap_openrv(). That's
slightly ugly, but there are only two callers, so maybe it's not so
bad.

Offhand, I don't see any good way to rearrange reindex_relation()
along those lines. ReindexTable() wants to check permissions, not
just open the relation. And passing a Relation to reindex_relation()
rather than an Oid or RangeVar is no good either; you still end up
with multiple people needing to know what lock level to use.

At any rate, I'm not inventing this problem; there are plenty of
existing instances where this same phenomenon occurs. At least I'm
documenting the ones I'm adding. There's probably room for further
improvement and restructuring of this code, but right at the moment I
feel like the reasonable alternatives are (a) to pass a lock level
that matches what will ultimately be taken later on or (b) pass
NoLock. I'm voting for the former.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

atomic-rangevargetrelid-v2.patchapplication/octet-stream; name=atomic-rangevargetrelid-v2.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c9b1d5f..9f1bcf1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -975,26 +975,11 @@ relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
 {
 	Oid			relOid;
 
-	/*
-	 * Check for shared-cache-inval messages before trying to open the
-	 * relation.  This is needed to cover the case where the name identifies a
-	 * rel that has been dropped and recreated since the start of our
-	 * transaction: if we don't flush the old syscache entry then we'll latch
-	 * onto that entry and suffer an error when we do RelationIdGetRelation.
-	 * Note that relation_open does not need to do this, since a relation's
-	 * OID never changes.
-	 *
-	 * We skip this if asked for NoLock, on the assumption that the caller has
-	 * already ensured some appropriate lock is held.
-	 */
-	if (lockmode != NoLock)
-		AcceptInvalidationMessages();
-
-	/* Look up the appropriate relation using namespace search */
-	relOid = RangeVarGetRelid(relation, false);
+	/* Look up and lock the appropriate relation using namespace search */
+	relOid = RangeVarGetRelid(relation, lockmode, false, false);
 
 	/* Let relation_open do the rest */
-	return relation_open(relOid, lockmode);
+	return relation_open(relOid, NoLock);
 }
 
 /* ----------------
@@ -1012,30 +997,15 @@ relation_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
 {
 	Oid			relOid;
 
-	/*
-	 * Check for shared-cache-inval messages before trying to open the
-	 * relation.  This is needed to cover the case where the name identifies a
-	 * rel that has been dropped and recreated since the start of our
-	 * transaction: if we don't flush the old syscache entry then we'll latch
-	 * onto that entry and suffer an error when we do RelationIdGetRelation.
-	 * Note that relation_open does not need to do this, since a relation's
-	 * OID never changes.
-	 *
-	 * We skip this if asked for NoLock, on the assumption that the caller has
-	 * already ensured some appropriate lock is held.
-	 */
-	if (lockmode != NoLock)
-		AcceptInvalidationMessages();
-
-	/* Look up the appropriate relation using namespace search */
-	relOid = RangeVarGetRelid(relation, missing_ok);
+	/* Look up and lock the appropriate relation using namespace search */
+	relOid = RangeVarGetRelid(relation, lockmode, missing_ok, false);
 
 	/* Return NULL on not-found */
 	if (!OidIsValid(relOid))
 		return NULL;
 
 	/* Let relation_open do the rest */
-	return relation_open(relOid, lockmode);
+	return relation_open(relOid, NoLock);
 }
 
 /* ----------------
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index df32731..9a125bd 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -583,6 +583,11 @@ ExecGrantStmt_oids(InternalGrant *istmt)
  * objectNamesToOids
  *
  * Turn a list of object names of a given type into an Oid list.
+ *
+ * XXX: This function doesn't take any sort of locks on the objects whose
+ * names it looks up.  In the face of concurrent DDL, we might easily latch
+ * onto an old version of an object, causing the GRANT or REVOKE statement
+ * to fail.
  */
 static List *
 objectNamesToOids(GrantObjectType objtype, List *objnames)
@@ -601,7 +606,7 @@ objectNamesToOids(GrantObjectType objtype, List *objnames)
 				RangeVar   *relvar = (RangeVar *) lfirst(cell);
 				Oid			relOid;
 
-				relOid = RangeVarGetRelid(relvar, false);
+				relOid = RangeVarGetRelid(relvar, NoLock, false, false);
 				objects = lappend_oid(objects, relOid);
 			}
 			break;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 39ba486..83aae7a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2942,7 +2942,8 @@ reindex_relation(Oid relid, int flags)
 
 	/*
 	 * Open and lock the relation.	ShareLock is sufficient since we only need
-	 * to prevent schema and data changes in it.
+	 * to prevent schema and data changes in it.  The lock level used here
+	 * should match ReindexTable().
 	 */
 	rel = heap_open(relid, ShareLock);
 
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index ce795a6..8d42695 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -44,6 +44,8 @@
 #include "parser/parse_func.h"
 #include "storage/backendid.h"
 #include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/sinval.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -215,14 +217,20 @@ Datum		pg_is_other_temp_schema(PG_FUNCTION_ARGS);
  *		Given a RangeVar describing an existing relation,
  *		select the proper namespace and look up the relation OID.
  *
- * If the relation is not found, return InvalidOid if failOK = true,
+ * If the relation is not found, return InvalidOid if missing_ok = true,
  * otherwise raise an error.
+ *
+ * If nowait = true, throw an error if we'd have to wait for a lock.
  */
 Oid
-RangeVarGetRelid(const RangeVar *relation, bool failOK)
+RangeVarGetRelid(const RangeVar *relation, LOCKMODE lockmode, bool missing_ok,
+				 bool nowait)
 {
+	uint64		inval_count;
 	Oid			namespaceId;
 	Oid			relId;
+	Oid			oldRelId = InvalidOid;
+	bool		retry = false;
 
 	/*
 	 * We check the catalog name and then ignore it.
@@ -238,37 +246,120 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
 	}
 
 	/*
-	 * Some non-default relpersistence value may have been specified.  The
-	 * parser never generates such a RangeVar in simple DML, but it can happen
-	 * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)".  Such
-	 * a command will generate an added CREATE INDEX operation, which must be
-	 * careful to find the temp table, even when pg_temp is not first in the
-	 * search path.
+	 * DDL operations can change the results of a name lookup.  Since all
+	 * such operations will generate invalidation messages, we keep track
+	 * of whether any such messages show up while we're performing the
+	 * operation, and retry until either (1) no more invalidation messages
+	 * show up or (2) the answer doesn't change.
+	 *
+	 * But if lockmode = NoLock, then we assume that either the caller is OK
+	 * with the answer changing under them, or that they already hold some
+	 * appropriate lock, and therefore return the first answer we get without
+	 * checking for invalidation messages.  Also, if the requested lock is
+	 * already held, no LockRelationOid will not AcceptInvalidationMessages,
+	 * so we may fail to notice a change.  We could protect against that case
+	 * by calling AcceptInvalidationMessages() before beginning this loop,
+	 * but that would add a significant amount overhead, so for now we don't.
 	 */
-	if (relation->relpersistence == RELPERSISTENCE_TEMP)
+	for (;;)
 	{
-		if (relation->schemaname)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-				   errmsg("temporary tables cannot specify a schema name")));
-		if (OidIsValid(myTempNamespace))
-			relId = get_relname_relid(relation->relname, myTempNamespace);
-		else	/* this probably can't happen? */
-			relId = InvalidOid;
-	}
-	else if (relation->schemaname)
-	{
-		/* use exact schema given */
-		namespaceId = LookupExplicitNamespace(relation->schemaname);
-		relId = get_relname_relid(relation->relname, namespaceId);
-	}
-	else
-	{
-		/* search the namespace path */
-		relId = RelnameGetRelid(relation->relname);
+		/*
+		 * Remember this value, so that, after looking up the relation name
+		 * and locking its OID, we can check whether any invalidation messages
+		 * have been processed that might require a do-over.
+		 */
+		inval_count = SharedInvalidMessageCounter;
+
+		/*
+		 * Some non-default relpersistence value may have been specified.  The
+		 * parser never generates such a RangeVar in simple DML, but it can
+		 * happen in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY
+		 * KEY)".  Such a command will generate an added CREATE INDEX
+		 * operation, which must be careful to find the temp table, even when
+		 * pg_temp is not first in the search path.
+		 */
+		if (relation->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (relation->schemaname)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+					   errmsg("temporary tables cannot specify a schema name")));
+			if (OidIsValid(myTempNamespace))
+				relId = get_relname_relid(relation->relname, myTempNamespace);
+			else	/* this probably can't happen? */
+				relId = InvalidOid;
+		}
+		else if (relation->schemaname)
+		{
+			/* use exact schema given */
+			namespaceId = LookupExplicitNamespace(relation->schemaname);
+			relId = get_relname_relid(relation->relname, namespaceId);
+		}
+		else
+		{
+			/* search the namespace path */
+			relId = RelnameGetRelid(relation->relname);
+		}
+
+		/*
+		 * If no lock requested, we assume the caller knows what they're
+		 * doing.  They should have already acquired a heavyweight lock on
+		 * this relation earlier in the processing of this same statement,
+		 * so it wouldn't be appropriate to AcceptInvalidationMessages()
+		 * here, as that might pull the rug out from under them.
+		 */
+		if (lockmode == NoLock)
+			break;
+
+		/*
+		 * If, upon retry, we get back the same OID we did last time, then
+		 * the invalidation messages we processed did not change the final
+		 * answer.  So we're done.
+		 */
+		if (retry && relId == oldRelId)
+			break;
+
+		/*
+		 * Lock relation.  This will also accept any pending invalidation
+		 * messages.  If we got back InvalidOid, indicating not found, then
+		 * there's nothing to lock, but we accept invalidation messages
+		 * anyway, to flush any negative catcache entries that may be
+		 * lingering.
+		 */
+		if (!OidIsValid(relId))
+			AcceptInvalidationMessages();
+		else if (!nowait)
+			LockRelationOid(relId, lockmode);
+		else if (!ConditionalLockRelationOid(relId, lockmode))
+		{
+			if (relation->schemaname)
+				ereport(ERROR,
+						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+						 errmsg("could not obtain lock on relation \"%s.%s\"",
+								relation->schemaname, relation->relname)));
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+						 errmsg("could not obtain lock on relation \"%s\"",
+								relation->relname)));
+		}
+
+		/*
+		 * If no invalidation message were processed, we're done!
+		 */
+		if (inval_count == SharedInvalidMessageCounter)
+			break;
+
+		/*
+		 * Something may have changed.  Let's repeat the name lookup, to
+		 * make sure this name still references the same relation it did
+		 * previously.
+		 */
+		retry = true;
+		oldRelId = relId;
 	}
 
-	if (!OidIsValid(relId) && !failOK)
+	if (!OidIsValid(relId) && !missing_ok)
 	{
 		if (relation->schemaname)
 			ereport(ERROR,
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 2373d52..61849ef 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -108,7 +108,9 @@ ExecRenameStmt(RenameStmt *stmt)
 
 				CheckRelationOwnership(stmt->relation, true);
 
-				relid = RangeVarGetRelid(stmt->relation, false);
+				/* lock level should match RenameRelation */
+				relid = RangeVarGetRelid(stmt->relation, AccessExclusiveLock,
+										 false, false);
 
 				switch (stmt->renameType)
 				{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b7c021d..5024854 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1531,9 +1531,15 @@ ReindexIndex(RangeVar *indexRelation)
 	Oid			indOid;
 	HeapTuple	tuple;
 
-	indOid = RangeVarGetRelid(indexRelation, false);
+	/*
+	 * XXX: This is not safe in the presence of concurrent DDL. We should
+	 * take AccessExclusiveLock here, but that would violate the rule that
+	 * indexes should only be locked after their parent tables.  For now,
+	 * we live with it.
+	 */
+	indOid = RangeVarGetRelid(indexRelation, NoLock, false, false);
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(indOid));
-	if (!HeapTupleIsValid(tuple))		/* shouldn't happen */
+	if (!HeapTupleIsValid(tuple))
 		elog(ERROR, "cache lookup failed for relation %u", indOid);
 
 	if (((Form_pg_class) GETSTRUCT(tuple))->relkind != RELKIND_INDEX)
@@ -1562,7 +1568,8 @@ ReindexTable(RangeVar *relation)
 	Oid			heapOid;
 	HeapTuple	tuple;
 
-	heapOid = RangeVarGetRelid(relation, false);
+	/* The lock level used here should match reindex_relation(). */
+	heapOid = RangeVarGetRelid(relation, ShareLock, false, false);
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(heapOid));
 	if (!HeapTupleIsValid(tuple))		/* shouldn't happen */
 		elog(ERROR, "cache lookup failed for relation %u", heapOid);
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index b2c6ea5..875f868 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -24,8 +24,8 @@
 #include "utils/acl.h"
 #include "utils/lsyscache.h"
 
-static void LockTableRecurse(Oid reloid, RangeVar *rv,
-				 LOCKMODE lockmode, bool nowait, bool recurse);
+static void LockTableRecurse(Relation rel, LOCKMODE lockmode, bool nowait,
+				 bool recurse);
 
 
 /*
@@ -37,99 +37,39 @@ LockTableCommand(LockStmt *lockstmt)
 	ListCell   *p;
 
 	/*
+	 * During recovery we only accept these variations: LOCK TABLE foo IN
+	 * ACCESS SHARE MODE LOCK TABLE foo IN ROW SHARE MODE LOCK TABLE foo
+	 * IN ROW EXCLUSIVE MODE This test must match the restrictions defined
+	 * in LockAcquire()
+	 */
+	if (lockstmt->mode > RowExclusiveLock)
+		PreventCommandDuringRecovery("LOCK TABLE");
+
+	/*
 	 * Iterate over the list and process the named relations one at a time
 	 */
 	foreach(p, lockstmt->relations)
 	{
-		RangeVar   *relation = (RangeVar *) lfirst(p);
-		bool		recurse = interpretInhOption(relation->inhOpt);
+		RangeVar   *rv = (RangeVar *) lfirst(p);
+		Relation	rel;
+		bool		recurse = interpretInhOption(rv->inhOpt);
 		Oid			reloid;
 
-		reloid = RangeVarGetRelid(relation, false);
-
-		/*
-		 * During recovery we only accept these variations: LOCK TABLE foo IN
-		 * ACCESS SHARE MODE LOCK TABLE foo IN ROW SHARE MODE LOCK TABLE foo
-		 * IN ROW EXCLUSIVE MODE This test must match the restrictions defined
-		 * in LockAcquire()
-		 */
-		if (lockstmt->mode > RowExclusiveLock)
-			PreventCommandDuringRecovery("LOCK TABLE");
+		reloid = RangeVarGetRelid(rv, lockstmt->mode, false, lockstmt->nowait);
+		rel = relation_open(reloid, NoLock);
 
-		LockTableRecurse(reloid, relation,
-						 lockstmt->mode, lockstmt->nowait, recurse);
+		LockTableRecurse(rel, lockstmt->mode, lockstmt->nowait, recurse);
 	}
 }
 
 /*
  * Apply LOCK TABLE recursively over an inheritance tree
- *
- * At top level, "rv" is the original command argument; we use it to throw
- * an appropriate error message if the relation isn't there.  Below top level,
- * "rv" is NULL and we should just silently ignore any dropped child rel.
  */
 static void
-LockTableRecurse(Oid reloid, RangeVar *rv,
-				 LOCKMODE lockmode, bool nowait, bool recurse)
+LockTableRecurse(Relation rel, LOCKMODE lockmode, bool nowait, bool recurse)
 {
-	Relation	rel;
 	AclResult	aclresult;
-
-	/*
-	 * Acquire the lock.  We must do this first to protect against concurrent
-	 * drops.  Note that a lock against an already-dropped relation's OID
-	 * won't fail.
-	 */
-	if (nowait)
-	{
-		if (!ConditionalLockRelationOid(reloid, lockmode))
-		{
-			/* try to throw error by name; relation could be deleted... */
-			char	   *relname = rv ? rv->relname : get_rel_name(reloid);
-
-			if (relname)
-				ereport(ERROR,
-						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-						 errmsg("could not obtain lock on relation \"%s\"",
-								relname)));
-			else
-				ereport(ERROR,
-						(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-					  errmsg("could not obtain lock on relation with OID %u",
-							 reloid)));
-		}
-	}
-	else
-		LockRelationOid(reloid, lockmode);
-
-	/*
-	 * Now that we have the lock, check to see if the relation really exists
-	 * or not.
-	 */
-	rel = try_relation_open(reloid, NoLock);
-
-	if (!rel)
-	{
-		/* Release useless lock */
-		UnlockRelationOid(reloid, lockmode);
-
-		/* At top level, throw error; otherwise, ignore this child rel */
-		if (rv)
-		{
-			if (rv->schemaname)
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_TABLE),
-						 errmsg("relation \"%s.%s\" does not exist",
-								rv->schemaname, rv->relname)));
-			else
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_TABLE),
-						 errmsg("relation \"%s\" does not exist",
-								rv->relname)));
-		}
-
-		return;
-	}
+	Oid			reloid = RelationGetRelid(rel);
 
 	/* Verify adequate privilege */
 	if (lockmode == AccessShareLock)
@@ -159,12 +99,48 @@ LockTableRecurse(Oid reloid, RangeVar *rv,
 	{
 		List	   *children = find_inheritance_children(reloid, NoLock);
 		ListCell   *lc;
+		Relation	childrel;
 
 		foreach(lc, children)
 		{
 			Oid			childreloid = lfirst_oid(lc);
 
-			LockTableRecurse(childreloid, NULL, lockmode, nowait, recurse);
+			/*
+			 * Acquire the lock, to protect against concurrent drops.  Note
+			 * that a lock against an already-dropped relation's OID won't
+			 * fail.
+			 */
+			if (!nowait)
+				LockRelationOid(childreloid, lockmode);
+			else if (!ConditionalLockRelationOid(childreloid, lockmode))
+			{
+				/* try to throw error by name; relation could be deleted... */
+				char	   *relname = get_rel_name(childreloid);
+
+				if (relname)
+					ereport(ERROR,
+							(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+							 errmsg("could not obtain lock on relation \"%s\"",
+								relname)));
+				else
+					ereport(ERROR,
+							(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+					 errmsg("could not obtain lock on relation with OID %u",
+								 reloid)));
+			}
+
+			/*
+			 * Now that we have the lock, check to see if the relation really
+			 * exists or not.
+			 */
+			childrel = try_relation_open(childreloid, NoLock);
+			if (!childrel)
+			{
+				/* Release useless lock */
+				UnlockRelationOid(childreloid, lockmode);
+			}
+
+			LockTableRecurse(childrel, lockmode, nowait, recurse);
 		}
 	}
 
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index be04177..de6e2a3 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -427,8 +427,8 @@ AlterSequence(AlterSeqStmt *stmt)
 	FormData_pg_sequence new;
 	List	   *owned_by;
 
-	/* open and AccessShareLock sequence */
-	relid = RangeVarGetRelid(stmt->sequence, false);
+	/* Open and lock sequence. */
+	relid = RangeVarGetRelid(stmt->sequence, AccessShareLock, false, false);
 	init_sequence(relid, &elm, &seqrel);
 
 	/* allow ALTER to sequence owner only */
@@ -507,7 +507,16 @@ nextval(PG_FUNCTION_ARGS)
 	Oid			relid;
 
 	sequence = makeRangeVarFromNameList(textToQualifiedNameList(seqin));
-	relid = RangeVarGetRelid(sequence, false);
+
+	/*
+	 * XXX: This is not safe in the presence of concurrent DDL, but
+	 * acquiring a lock here is more expensive than letting nextval_internal
+	 * do it, since the latter maintains a cache that keeps us from hitting
+	 * the lock manager more than once per transaction.  It's not clear
+	 * whether the performance penalty is material in practice, but for now,
+	 * we do it this way.
+	 */
+	relid = RangeVarGetRelid(sequence, NoLock, false, false);
 
 	PG_RETURN_INT64(nextval_internal(relid));
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c1af12a..295a1ff 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -764,8 +764,15 @@ RemoveRelations(DropStmt *drop)
 		 */
 		AcceptInvalidationMessages();
 
-		/* Look up the appropriate relation using namespace search */
-		relOid = RangeVarGetRelid(rel, true);
+		/*
+		 * Look up the appropriate relation using namespace search.
+		 *
+		 * XXX: Doing this without a lock is unsafe in the presence of
+		 * concurrent DDL, but acquiring a lock here might violate the rule
+		 * that a table must be locked before its corresponding index.
+		 * So, for now, we ignore the hazard.
+		 */
+		relOid = RangeVarGetRelid(rel, NoLock, true, false);
 
 		/* Not there? */
 		if (!OidIsValid(relOid))
@@ -2234,6 +2241,8 @@ RenameRelation(Oid myrelid, const char *newrelname, ObjectType reltype)
 	/*
 	 * Grab an exclusive lock on the target table, index, sequence or view,
 	 * which we will NOT release until end of transaction.
+	 *
+	 * Lock level used here should match ExecRenameStmt
 	 */
 	targetrelation = relation_open(myrelid, AccessExclusiveLock);
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 9cf8372..cadf3a1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -194,7 +194,17 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 						RelationGetRelationName(rel))));
 
 	if (stmt->isconstraint && stmt->constrrel != NULL)
-		constrrelid = RangeVarGetRelid(stmt->constrrel, false);
+	{
+		/*
+		 * We must take a lock on the target relation to protect against
+		 * concurrent drop.  It's not clear that AccessShareLock is strong
+		 * enough, but we certainly need at least that much... otherwise,
+		 * we might end up creating a pg_constraint entry referencing a
+		 * nonexistent table.
+		 */
+		constrrelid = RangeVarGetRelid(stmt->constrrel, AccessShareLock, false,
+									   false);
+	}
 
 	/* permission checks */
 	if (!isInternal)
@@ -1020,11 +1030,15 @@ ConvertTriggerToFK(CreateTrigStmt *stmt, Oid funcoid)
  * DropTrigger - drop an individual trigger by name
  */
 void
-DropTrigger(Oid relid, const char *trigname, DropBehavior behavior,
+DropTrigger(RangeVar *relation, const char *trigname, DropBehavior behavior,
 			bool missing_ok)
 {
+	Oid			relid;
 	ObjectAddress object;
 
+	/* lock level should match RemoveTriggerById */
+	relid = RangeVarGetRelid(relation, ShareRowExclusiveLock, false, false);
+
 	object.classId = TriggerRelationId;
 	object.objectId = get_trigger_oid(relid, trigname, missing_ok);
 	object.objectSubId = 0;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 9303967..889737e 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -318,7 +318,16 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
 		/* Process a specific relation */
 		Oid			relid;
 
-		relid = RangeVarGetRelid(vacrel, false);
+		/*
+		 * Since we don't take a lock here, the relation might be gone,
+		 * or the RangeVar might no longer refer to the OID we look up
+		 * here.  In the former case, VACUUM will do nothing; in the
+		 * latter case, it will process the OID we looked up here, rather
+		 * than the new one.  Neither is ideal, but there's little practical
+		 * alternative, since we're going to commit this transaction and
+		 * begin a new one between now and then.
+		 */
+		relid = RangeVarGetRelid(vacrel, NoLock, false, false);
 
 		/* Make a relation list entry for this guy */
 		oldcontext = MemoryContextSwitchTo(vac_context);
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index f2ccf0d..3840c2f 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -273,11 +273,17 @@ searchRangeTable(ParseState *pstate, RangeVar *relation)
 	 * If it's an unqualified name, check for possible CTE matches. A CTE
 	 * hides any real relation matches.  If no CTE, look for a matching
 	 * relation.
+	 *
+	 * NB: It's not critical that RangeVarGetRelid return the correct answer
+	 * here in the face of concurrent DDL.  If it doesn't, the worst case
+	 * scenario is a less-clear error message.  Also, the tables involved in
+	 * the query are already locked, which reduces the number of cases in
+	 * which surprising behavior can occur.  So we do the name lookup unlocked.
 	 */
 	if (!relation->schemaname)
 		cte = scanNameSpaceForCTE(pstate, refname, &ctelevelsup);
 	if (!cte)
-		relId = RangeVarGetRelid(relation, true);
+		relId = RangeVarGetRelid(relation, NoLock, true, false);
 
 	/* Now look for RTEs matching either the relation/CTE or the alias */
 	for (levelsup = 0;
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index ac62cbc..2122032 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -108,8 +108,14 @@ LookupTypeName(ParseState *pstate, const TypeName *typeName,
 				break;
 		}
 
-		/* look up the field */
-		relid = RangeVarGetRelid(rel, false);
+		/*
+		 * Look up the field.
+		 *
+		 * XXX: As no lock is taken here, this might fail in the presence
+		 * of concurrent DDL.  But taking a lock would carry a performance
+		 * penalty and would also require a permissions check.
+		 */
+		relid = RangeVarGetRelid(rel, NoLock, false, false);
 		attnum = get_attnum(relid, field);
 		if (attnum == InvalidAttrNumber)
 			ereport(ERROR,
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 60b9881..17db70e 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -196,11 +196,15 @@ DefineRule(RuleStmt *stmt, const char *queryString)
 	Node	   *whereClause;
 	Oid			relId;
 
-	/* Parse analysis ... */
+	/* Parse analysis. */
 	transformRuleStmt(stmt, queryString, &actions, &whereClause);
 
-	/* ... find the relation ... */
-	relId = RangeVarGetRelid(stmt->relation, false);
+	/*
+	 * Find and lock the relation.  Lock level should match
+	 * DefineQueryRewrite.
+	 */
+	relId = RangeVarGetRelid(stmt->relation, AccessExclusiveLock, false,
+							 false);
 
 	/* ... and execute */
 	DefineQueryRewrite(stmt->rulename,
@@ -242,6 +246,8 @@ DefineQueryRewrite(char *rulename,
 	 * grab ShareRowExclusiveLock to lock out insert/update/delete actions and
 	 * to ensure that we lock out current CREATE RULE statements; but because
 	 * of race conditions in access to catalog entries, we can't do that yet.
+	 *
+	 * Note that this lock level should match the one used in DefineRule.
 	 */
 	event_relation = heap_open(event_relid, AccessExclusiveLock);
 
diff --git a/src/backend/rewrite/rewriteRemove.c b/src/backend/rewrite/rewriteRemove.c
index b9dc7c5..cec22ac 100644
--- a/src/backend/rewrite/rewriteRemove.c
+++ b/src/backend/rewrite/rewriteRemove.c
@@ -19,6 +19,7 @@
 #include "access/sysattr.h"
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_rewrite.h"
 #include "miscadmin.h"
 #include "rewrite/rewriteRemove.h"
@@ -37,13 +38,18 @@
  * Delete a rule given its name.
  */
 void
-RemoveRewriteRule(Oid owningRel, const char *ruleName, DropBehavior behavior,
-				  bool missing_ok)
+RemoveRewriteRule(RangeVar *relation, const char *ruleName,
+				  DropBehavior behavior, bool missing_ok)
 {
 	HeapTuple	tuple;
 	Oid			eventRelationOid;
+	Oid			owningRel;
 	ObjectAddress object;
 
+	/* should match RemoveRewriteRuleById */
+	owningRel = RangeVarGetRelid(relation, ShareUpdateExclusiveLock,
+								 false, false);
+
 	/*
 	 * Find the tuple for the target rule.
 	 */
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index 9ab16b1..8499615 100644
--- a/src/backend/storage/ipc/sinval.c
+++ b/src/backend/storage/ipc/sinval.c
@@ -22,6 +22,9 @@
 #include "utils/inval.h"
 
 
+uint64 SharedInvalidMessageCounter;
+
+
 /*
  * Because backends sitting idle will not be reading sinval events, we
  * need a way to give an idle backend a swift kick in the rear and make
@@ -90,6 +93,7 @@ ReceiveSharedInvalidMessages(
 	{
 		SharedInvalidationMessage *msg = &messages[nextmsg++];
 
+		SharedInvalidMessageCounter++;
 		invalFunction(msg);
 	}
 
@@ -106,6 +110,7 @@ ReceiveSharedInvalidMessages(
 		{
 			/* got a reset message */
 			elog(DEBUG4, "cache state reset");
+			SharedInvalidMessageCounter++;
 			resetFunction();
 			break;				/* nothing more to do */
 		}
@@ -118,6 +123,7 @@ ReceiveSharedInvalidMessages(
 		{
 			SharedInvalidationMessage *msg = &messages[nextmsg++];
 
+			SharedInvalidMessageCounter++;
 			invalFunction(msg);
 		}
 
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 859b385..3ac098b 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -81,10 +81,11 @@ LockRelationOid(Oid relid, LOCKMODE lockmode)
 	/*
 	 * Now that we have the lock, check for invalidation messages, so that we
 	 * will update or flush any stale relcache entry before we try to use it.
-	 * We can skip this in the not-uncommon case that we already had the same
-	 * type of lock being requested, since then no one else could have
-	 * modified the relcache entry in an undesirable way.  (In the case where
-	 * our own xact modifies the rel, the relcache update happens via
+	 * RangeVarGetRelid() specifically relies on us for this.  We can skip
+	 * this in the not-uncommon case that we already had the same type of lock
+	 * being requested, since then no one else could have modified the
+	 * relcache entry in an undesirable way.  (In the case where our own xact
+	 * modifies the rel, the relcache update happens via
 	 * CommandCounterIncrement, not here.)
 	 */
 	if (res != LOCKACQUIRE_ALREADY_HELD)
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 224e1f3..0698caf 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -77,7 +77,15 @@ CheckRelationOwnership(RangeVar *rel, bool noCatalogs)
 	Oid			relOid;
 	HeapTuple	tuple;
 
-	relOid = RangeVarGetRelid(rel, false);
+	/*
+	 * XXX: This is unsafe in the presence of concurrent DDL, since it is
+	 * called before acquiring any lock on the target relation.  However,
+	 * locking the target relation (especially using something like
+	 * AccessExclusiveLock) before verifying that the user has permissions
+	 * is not appealing either.
+	 */
+	relOid = RangeVarGetRelid(rel, NoLock, false, false);
+
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
 	if (!HeapTupleIsValid(tuple))		/* should not happen */
 		elog(ERROR, "cache lookup failed for relation %u", relOid);
@@ -1111,20 +1119,17 @@ standard_ProcessUtility(Node *parsetree,
 		case T_DropPropertyStmt:
 			{
 				DropPropertyStmt *stmt = (DropPropertyStmt *) parsetree;
-				Oid			relId;
-
-				relId = RangeVarGetRelid(stmt->relation, false);
 
 				switch (stmt->removeType)
 				{
 					case OBJECT_RULE:
 						/* RemoveRewriteRule checks permissions */
-						RemoveRewriteRule(relId, stmt->property,
+						RemoveRewriteRule(stmt->relation, stmt->property,
 										  stmt->behavior, stmt->missing_ok);
 						break;
 					case OBJECT_TRIGGER:
 						/* DropTrigger checks permissions */
-						DropTrigger(relId, stmt->property,
+						DropTrigger(stmt->relation, stmt->property,
 									stmt->behavior, stmt->missing_ok);
 						break;
 					default:
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 4a3e241..3fa95e2 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -1939,7 +1939,8 @@ convert_table_name(text *tablename)
 
 	relrv = makeRangeVarFromNameList(textToQualifiedNameList(tablename));
 
-	return RangeVarGetRelid(relrv, false);
+	/* We might not even have permissions on this relation; don't lock it. */
+	return RangeVarGetRelid(relrv, NoLock, false, false);
 }
 
 /*
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 6716c02..0d42a39 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -823,7 +823,9 @@ regclassin(PG_FUNCTION_ARGS)
 	 */
 	names = stringToQualifiedNameList(class_name_or_oid);
 
-	result = RangeVarGetRelid(makeRangeVarFromNameList(names), false);
+	/* We might not even have permissions on this relation; don't lock it. */
+	result = RangeVarGetRelid(makeRangeVarFromNameList(names), NoLock, false,
+							  false);
 
 	PG_RETURN_OID(result);
 }
@@ -1294,7 +1296,9 @@ text_regclass(PG_FUNCTION_ARGS)
 	RangeVar   *rv;
 
 	rv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
-	result = RangeVarGetRelid(rv, false);
+
+	/* We might not even have permissions on this relation; don't lock it. */
+	result = RangeVarGetRelid(rv, NoLock, false, false);
 
 	PG_RETURN_OID(result);
 }
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 49dc9c8..3fd99e0 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -385,8 +385,9 @@ pg_get_viewdef_name(PG_FUNCTION_ARGS)
 	RangeVar   *viewrel;
 	Oid			viewoid;
 
+	/* Look up view name.  Can't lock it - we might not have privileges. */
 	viewrel = makeRangeVarFromNameList(textToQualifiedNameList(viewname));
-	viewoid = RangeVarGetRelid(viewrel, false);
+	viewoid = RangeVarGetRelid(viewrel, NoLock, false, false);
 
 	PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, 0)));
 }
@@ -403,8 +404,10 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS)
 	Oid			viewoid;
 
 	prettyFlags = pretty ? PRETTYFLAG_PAREN | PRETTYFLAG_INDENT : 0;
+
+	/* Look up view name.  Can't lock it - we might not have privileges. */
 	viewrel = makeRangeVarFromNameList(textToQualifiedNameList(viewname));
-	viewoid = RangeVarGetRelid(viewrel, false);
+	viewoid = RangeVarGetRelid(viewrel, NoLock, false, false);
 
 	PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, prettyFlags)));
 }
@@ -1567,9 +1570,9 @@ pg_get_serial_sequence(PG_FUNCTION_ARGS)
 	SysScanDesc scan;
 	HeapTuple	tup;
 
-	/* Get the OID of the table */
+	/* Look up table name.  Can't lock it - we might not have privileges. */
 	tablerv = makeRangeVarFromNameList(textToQualifiedNameList(tablename));
-	tableOid = RangeVarGetRelid(tablerv, false);
+	tableOid = RangeVarGetRelid(tablerv, NoLock, false, false);
 
 	/* Get the number of the column */
 	column = text_to_cstring(columnname);
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 7e1e194..4bcbc20 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -15,6 +15,7 @@
 #define NAMESPACE_H
 
 #include "nodes/primnodes.h"
+#include "storage/lock.h"
 
 
 /*
@@ -47,7 +48,8 @@ typedef struct OverrideSearchPath
 } OverrideSearchPath;
 
 
-extern Oid	RangeVarGetRelid(const RangeVar *relation, bool failOK);
+extern Oid	RangeVarGetRelid(const RangeVar *relation, LOCKMODE lockmode,
+				 bool missing_ok, bool nowait);
 extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
 extern Oid	RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation);
 extern void RangeVarAdjustRelationPersistence(RangeVar *newRelation, Oid nspid);
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index ad97871..fe21298 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -112,7 +112,7 @@ extern Oid CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			  Oid constraintOid, Oid indexOid,
 			  bool isInternal);
 
-extern void DropTrigger(Oid relid, const char *trigname,
+extern void DropTrigger(RangeVar *relation, const char *trigname,
 			DropBehavior behavior, bool missing_ok);
 extern void RemoveTriggerById(Oid trigOid);
 extern Oid	get_trigger_oid(Oid relid, const char *name, bool missing_ok);
diff --git a/src/include/rewrite/rewriteRemove.h b/src/include/rewrite/rewriteRemove.h
index 90df045..b9a63ba 100644
--- a/src/include/rewrite/rewriteRemove.h
+++ b/src/include/rewrite/rewriteRemove.h
@@ -17,7 +17,7 @@
 #include "nodes/parsenodes.h"
 
 
-extern void RemoveRewriteRule(Oid owningRel, const char *ruleName,
+extern void RemoveRewriteRule(RangeVar *relation, const char *ruleName,
 				  DropBehavior behavior, bool missing_ok);
 extern void RemoveRewriteRuleById(Oid ruleOid);
 
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index e9ce025..aba474d 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -116,6 +116,10 @@ typedef union
 } SharedInvalidationMessage;
 
 
+/* Counter of messages processed; don't worry about overflow. */
+extern uint64 SharedInvalidMessageCounter;
+
+
 extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
 						  int n);
 extern void ReceiveSharedInvalidMessages(
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index afd20b4..d22fa68 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -1708,7 +1708,8 @@ plpgsql_parse_cwordtype(List *idents)
 		relvar = makeRangeVar(strVal(linitial(idents)),
 							  strVal(lsecond(idents)),
 							  -1);
-		classOid = RangeVarGetRelid(relvar, true);
+		/* Can't lock relation - we might not have privileges. */
+		classOid = RangeVarGetRelid(relvar, NoLock, true, false);
 		if (!OidIsValid(classOid))
 			goto done;
 		fldname = strVal(lthird(idents));
@@ -1804,16 +1805,11 @@ plpgsql_parse_cwordrowtype(List *idents)
 	/* Avoid memory leaks in long-term function context */
 	oldCxt = MemoryContextSwitchTo(compile_tmp_cxt);
 
-	/* Lookup the relation */
+	/* Look up relation name.  Can't lock it - we might not have privileges. */
 	relvar = makeRangeVar(strVal(linitial(idents)),
 						  strVal(lsecond(idents)),
 						  -1);
-	classOid = RangeVarGetRelid(relvar, true);
-	if (!OidIsValid(classOid))
-		ereport(ERROR,
-				(errcode(ERRCODE_UNDEFINED_TABLE),
-				 errmsg("relation \"%s.%s\" does not exist",
-						strVal(linitial(idents)), strVal(lsecond(idents)))));
+	classOid = RangeVarGetRelid(relvar, NoLock, false, false);
 
 	MemoryContextSwitchTo(oldCxt);
 
#63Noah Misch
noah@2ndQuadrant.com
In reply to: Robert Haas (#62)
Re: Make relation_openrv atomic wrt DDL

On Thu, Jul 07, 2011 at 09:30:47PM -0400, Robert Haas wrote:

On Thu, Jul 7, 2011 at 3:54 PM, Noah Misch <noah@2ndquadrant.com> wrote:

Yes. �DDL-DDL concurrency is a much smaller practical concern, but it is a
quality-of-implementation hole.

Agreed, although I'm not too pleased about the fact that this doesn't
fix nextval(). That seems like a fairly significant hole (but one to
be addressed by a later patch).

True.

The third sentence is true for all lock levels. �The fourth sentence is true
for lock levels _except_ NoLock.

I rewrote this whole blurb. See what you think.

Seems better:

+	 * DDL operations can change the results of a name lookup.  Since all
+	 * such operations will generate invalidation messages, we keep track
+	 * of whether any such messages show up while we're performing the
+	 * operation, and retry until either (1) no more invalidation messages
+	 * show up or (2) the answer doesn't change.
+	 *
+	 * But if lockmode = NoLock, then we assume that either the caller is OK
+	 * with the answer changing under them, or that they already hold some
+	 * appropriate lock, and therefore return the first answer we get without
+	 * checking for invalidation messages.  Also, if the requested lock is
+	 * already held, no LockRelationOid will not AcceptInvalidationMessages,

Extra word in that sentence.

+ * so we may fail to notice a change. We could protect against that case

I failed to note it last time, but it might be worth mentioning that failing to
notice a change only happens due to search_path interposition.

+	 * by calling AcceptInvalidationMessages() before beginning this loop,
+	 * but that would add a significant amount overhead, so for now we don't.
+ � � � � � � /*
+ � � � � � � �* If no lock requested, we assume the caller knows what they're
+ � � � � � � �* doing. �They should have already acquired a heavyweight lock on
+ � � � � � � �* this relation earlier in the processing of this same statement,
+ � � � � � � �* so it wouldn't be appropriate to AcceptInvalidationMessages()
+ � � � � � � �* here, as that might pull the rug out from under them.
+ � � � � � � �*/

What sort of rug-pullout do you have in mind? �Also, I don't think it matters
if the caller acquired the lock during this _statement_. �It just needs to
hold a lock, somehow.

What I'm specifically concerned about here is the possibility that we
have code which does RangeVarGetRelid() multiple times and expects to
get the same relation every time. Now, granted, any such places are
bugs. But I am not eager to change the logic here without looking
harder for them (and also for performance reasons).

Yeah, I think a key point is that we're not promising to avoid calling
AcceptInvalidationMessages() to prop up code that relies on it not getting
called. We just know it's not needed in this case, so we save that expense.

... also making it cleaner to preserve this optimization.

That optimization is now gone anyway.

Okay.

Incidentally, you've added in many places this pattern of commenting that a
lock level must match another lock level used elsewhere. �Would it be better
to migrate away from looking up a relation oid in one function and opening the
relation in another function, instead passing either a Relation or a RangeVar?
You did substitute passing a Relation in a couple of places.

[reasons this is a can of worms]

At any rate, I'm not inventing this problem; there are plenty of
existing instances where this same phenomenon occurs. At least I'm
documenting the ones I'm adding. There's probably room for further
improvement and restructuring of this code, but right at the moment I
feel like the reasonable alternatives are (a) to pass a lock level
that matches what will ultimately be taken later on or (b) pass
NoLock. I'm voting for the former.

Works for me.