Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

Started by Nikhil Sontakkeabout 14 years ago25 messages
#1Nikhil Sontakke
nikkhils@gmail.com

Hi,

Consider the following sequence of events:

s1 #> CREATE SCHEMA test_schema;

s1 #> CREATE TABLE test_schema.c1(x int);

Now open another session s2 and via gdb issue a breakpoint on
heap_create_with_catalog() which is called by DefineRelation().

s2 #> CREATE TABLE test_schema.c2(y int);

The above will break on the function. Now issue a drop schema in session s1

s1 #> DROP SCHEMA test_schema CASCADE;
NOTICE: drop cascades to table test_schema.c1
DROP SCHEMA

Continuing in gdb, also completes the creation of c2 table without any
errors. We are now left with a dangling entry in pg_class along with all
the corresponding data files in our data directory. The problem becomes
worse if c2 was created using a TABLESPACE. Now dropping of that tablespace
does not work at all. Am sure we can come up with myriad such other issues.

Am sure other CREATE commands in this namespace will have similar issues
when faced with a concurrent DROP SCHEMA.

We definitely need some interlocking to handle this. For lack of better
APIs, we could do a LockDatabaseObject() call in AccessShareLock mode on
the namespace and release the same on completion of the creation of the
object.

Thoughts?

Regards,
Nikhils

#2Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Sontakke (#1)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Wed, Nov 9, 2011 at 4:56 AM, Nikhil Sontakke <nikkhils@gmail.com> wrote:

Consider the following sequence of events:

s1 #> CREATE SCHEMA test_schema;

s1 #> CREATE TABLE test_schema.c1(x int);

Now open another session s2 and via gdb issue a breakpoint on
heap_create_with_catalog() which is called by DefineRelation().

s2 #> CREATE TABLE test_schema.c2(y int);

The above will break on the function. Now issue a drop schema in session s1

s1 #> DROP SCHEMA test_schema CASCADE;
NOTICE:  drop cascades to table test_schema.c1
DROP SCHEMA

Continuing in gdb, also completes the creation of c2 table without any
errors. We are now left with a dangling entry in pg_class along with all the
corresponding data files in our data directory. The problem becomes worse if
c2 was created using a TABLESPACE. Now dropping of that tablespace does not
work at all. Am sure we can come up with myriad such other issues.

Am sure other CREATE commands in this namespace will have similar issues
when faced with a concurrent DROP SCHEMA.

We definitely need some interlocking to handle this. For lack of better
APIs, we could do a LockDatabaseObject() call in AccessShareLock mode on the
namespace and release the same on completion of the creation of the object.

Thoughts?

In general, we've been reluctant to add locking on non-table objects
for reasons of overhead. You can, for example, drop a type or
function while a query is running that depends on it (which is not
true for tables). But I think it is sensible to do it for DDL
commands, which shouldn't be frequent enough for the overhead to
matter much. When I rewrote the comment code for 9.1, I added locking
that works just this way, to prevent pg_description entries from being
orphaned; see the end of get_object_address().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Nikhil Sontakke
nikkhils@gmail.com
In reply to: Robert Haas (#2)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

We definitely need some interlocking to handle this. For lack of better
APIs, we could do a LockDatabaseObject() call in AccessShareLock mode on

the

namespace and release the same on completion of the creation of the

object.

Thoughts?

In general, we've been reluctant to add locking on non-table objects
for reasons of overhead. You can, for example, drop a type or
function while a query is running that depends on it (which is not
true for tables). But I think it is sensible to do it for DDL
commands, which shouldn't be frequent enough for the overhead to
matter much.

Agreed. Especially if the race condition has non-trivial downsides as
mentioned in the tablespace case.

When I rewrote the comment code for 9.1, I added locking
that works just this way, to prevent pg_description entries from being
orphaned; see the end of get_object_address().

Yeah thanks, that does the object locking. For pre-9.1 versions, we will
need a similar solution. I encountered the issue on 8.3.x..

Regards,
Nikhils

#4Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Sontakke (#3)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Wed, Nov 9, 2011 at 9:51 AM, Nikhil Sontakke <nikkhils@gmail.com> wrote:

Yeah thanks, that does the object locking. For pre-9.1 versions, we will
need a similar solution. I encountered the issue on 8.3.x..

I don't think we should back-patch a fix of this type. There is a lot
of cruftiness of this type scattered throughout the code base, and if
we start back-patching all the fixes for it, we're going to end up
destabilizing older branches for little real benefit.

Also, the fix would need to be quite different in older branches. For
example, in the master branch, you can probably fix 90% of the issue
by adjusting dropcmds.c, which now handles drop operations for most
object types. I believe KaiGai Kohei is still working on code which
will allow that code to support drop operations for most of the
remaining object types as well. But in any previous release you would
need scattered fixes all over the code base.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Nikhil Sontakke
nikhil.sontakke@enterprisedb.com
In reply to: Robert Haas (#4)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

Yeah thanks, that does the object locking. For pre-9.1 versions, we will
need a similar solution. I encountered the issue on 8.3.x..

I don't think we should back-patch a fix of this type. There is a lot
of cruftiness of this type scattered throughout the code base, and if
we start back-patching all the fixes for it, we're going to end up
destabilizing older branches for little real benefit.

Ok, understood.

Thanks and Regards,
Nikhils

#6Nikhil Sontakke
nikkhils@gmail.com
In reply to: Nikhil Sontakke (#5)
1 attachment(s)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

Hi,

Ok, understood.

PFA, a patch against git head. We take the AccessShareLock lock on the
schema in DefineRelation now. Note that we do not want to interlock with
other concurrent creations in the schema. We only want to interlock with
deletion activity. So even performance wise this should not be much of an
overhead in case of concurrent DDL operations to the same schema.

Adding this in DefineRelation handles creation of
tables/views/types/sequences. I do not think we need to do stuff in ALTER
commands, because the objects pre-exist and this issue appears to be with
new objects only.

Comments?

Regards,
Nikhils

Attachments:

git_head_lock_schema_ddl.patchapplication/octet-stream; name=git_head_lock_schema_ddl.patchDownload
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c4622c0..c3248f8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -453,6 +453,20 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 				 errmsg("cannot create temporary table within security-restricted operation")));
 
 	/*
+	 * Lock the namespace so as to handle concurrent DROP operations. Kinda
+	 * klugy with the oid to namespace name list stuff, but then we can handily
+	 * call the existing get_object_address function which does most of the
+	 * work.
+	 *
+	 * AccessShareLock is enough, because we do not want to interlock with
+	 * other create objects in the same schema, but with concurrent deletion
+	 * only
+	 */
+	(void) get_object_address(OBJECT_SCHEMA,
+						list_make1(makeString(get_namespace_name(namespaceId))),
+						NIL, &rel, AccessShareLock, false);
+
+	/*
 	 * Select tablespace to use.  If not specified, use default tablespace
 	 * (which may in turn default to database's default).
 	 */
@@ -633,7 +647,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 
 	/*
 	 * Clean up.  We keep lock on new relation (although it shouldn't be
-	 * visible to anyone else anyway, until commit).
+	 * visible to anyone else anyway, until commit). We also retain other locks
+	 * obtained above to guard against concurrent activity
 	 */
 	relation_close(rel, NoLock);
 
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nikhil Sontakke (#6)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

Nikhil Sontakke <nikkhils@gmail.com> writes:

PFA, a patch against git head. We take the AccessShareLock lock on the
schema in DefineRelation now.

Um ... why would we do this only for tables, and not for creations of
other sorts of objects that belong to schemas?

Also, if we are going to believe that this is a serious problem, what
of ALTER ... SET SCHEMA?

Also, the proposed solution is pretty silly on its face, because it has
not removed the race condition only made the window somewhat narrower.
You would have to acquire the lock as part of the initial schema lookup,
not lock the OID after the fact. And could we please not do something
as silly as translate the OID back to a string and then look up that
string a second time?

(To be clear, I don't particularly believe that this is a problem worthy
of spending code space and cycles on. But if it's deemed to be a
problem, I want to see a solution that's actually watertight.)

regards, tom lane

#8Nikhil Sontakke
nikkhils@gmail.com
In reply to: Tom Lane (#7)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

Um ... why would we do this only for tables, and not for creations of
other sorts of objects that belong to schemas?

Right, we need to do it for other objects like functions etc. too.

Also, if we are going to believe that this is a serious problem, what
of ALTER ... SET SCHEMA?

I admit, I hadn't thought of this.

Also, the proposed solution is pretty silly on its face, because it has
not removed the race condition only made the window somewhat narrower.
You would have to acquire the lock as part of the initial schema lookup,
not lock the OID after the fact. And could we please not do something
as silly as translate the OID back to a string and then look up that
string a second time?

The comment mentions that part is a kluge but that we get to re-use the
existing function because of it. The get_object_address function will bail
out anyways if the schema has vanished from down under and it does lock it
up immediately after it's found to be valid.

(To be clear, I don't particularly believe that this is a problem worthy
of spending code space and cycles on. But if it's deemed to be a
problem, I want to see a solution that's actually watertight.)

Got the message.

Regards,
Nikhils

#9Daniel Farina
daniel@heroku.com
In reply to: Nikhil Sontakke (#1)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Wed, Nov 9, 2011 at 1:56 AM, Nikhil Sontakke <nikkhils@gmail.com> wrote:

Hi,

Consider the following sequence of events:

s1 #> CREATE SCHEMA test_schema;

s1 #> CREATE TABLE test_schema.c1(x int);

Now open another session s2 and via gdb issue a breakpoint on
heap_create_with_catalog() which is called by DefineRelation().

s2 #> CREATE TABLE test_schema.c2(y int);

The above will break on the function. Now issue a drop schema in session s1

s1 #> DROP SCHEMA test_schema CASCADE;
NOTICE:  drop cascades to table test_schema.c1
DROP SCHEMA

Continuing in gdb, also completes the creation of c2 table without any
errors. We are now left with a dangling entry in pg_class along with all the
corresponding data files in our data directory. The problem becomes worse if
c2 was created using a TABLESPACE. Now dropping of that tablespace does not
work at all. Am sure we can come up with myriad such other issues.

Hmm. Does this break pg_dump? I have reported a bug whereby dangling
pg_class entries point to a namespace that has since been dropped in
the past (and has been reported many times before that, even).

The bug report is here, whereby I also aggregate other similar bug
reports that have taken place over a very long period of time:

http://archives.postgresql.org/pgsql-bugs/2011-02/msg00185.php

Given that the schema is successfully dropped, yet another table is
created presumably using this already-resolved schema OID, it seems
like it would run into this...

You could run this query, which should return 0, but may not in your case:

select count(distinct typnamespace) from pg_type where not exists
(select 1 from pg_namespace where oid = pg_type.typnamespace);

--
fdr

#10Nikhil Sontakke
nikhil.sontakke@enterprisedb.com
In reply to: Daniel Farina (#9)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

Continuing in gdb, also completes the creation of c2 table without any
errors. We are now left with a dangling entry in pg_class along with all

the

corresponding data files in our data directory. The problem becomes

worse if

c2 was created using a TABLESPACE. Now dropping of that tablespace does

not

work at all. Am sure we can come up with myriad such other issues.

Hmm. Does this break pg_dump? I have reported a bug whereby dangling
pg_class entries point to a namespace that has since been dropped in
the past (and has been reported many times before that, even).

Sure does! I just tried it and got:
pg_dump: schema with OID 16384 does not exist

The bug report is here, whereby I also aggregate other similar bug
reports that have taken place over a very long period of time:

http://archives.postgresql.org/pgsql-bugs/2011-02/msg00185.php

I guess we DO need to pay attention to fix this properly now as there are
some reports with 9.x too. And I have just provided a way to reproduce this
reliably too.

Regards,
Nikhils

#11Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Sontakke (#6)
1 attachment(s)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Thu, Nov 10, 2011 at 7:00 AM, Nikhil Sontakke <nikkhils@gmail.com> wrote:

PFA, a patch against git head. We take the AccessShareLock lock on the
schema in DefineRelation now. Note that we do not want to interlock with
other concurrent creations in the schema. We only want to interlock with
deletion activity. So even performance wise this should not be much of an
overhead in case of concurrent DDL operations to the same schema.

If all you need to do is lock a schema, you can just call
LockDatabaseObject(NamespaceRelationId, namespace_oid, 0,
AccessShareLock); there's no need to fake up an objectaddress just to
take a lock. But I think that's not really all you need to do,
because somebody could drop the namespace between the time that you
decide what OID to lock and the time you acquire the lock. So I think
you need something like what we did in RangeVarGetRelid(). See
attached patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

lock-namespace.patchapplication/octet-stream; name=lock-namespace.patchDownload
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 6d4d4b1..519794a 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -463,31 +463,63 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 
 /*
  * RangeVarGetAndCheckCreationNamespace
- *		As RangeVarGetCreationNamespace, but with a permissions check.
+ *		As RangeVarGetCreationNamespace, but check permissions and lock the
+ *		target namespace.
  */
 Oid
 RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation)
 {
-	Oid			namespaceId;
-
-	namespaceId = RangeVarGetCreationNamespace(newRelation);
+	Oid			namespace_oid;
+	Oid			old_namespace_oid = InvalidOid;
+	bool		retry = false;
+	uint64		inval_count = SharedInvalidMessageCounter;
 
-	/*
-	 * Check we have permission to create there. Skip check if bootstrapping,
-	 * since permissions machinery may not be working yet.
-	 */
-	if (!IsBootstrapProcessingMode())
+	while (1)
 	{
 		AclResult	aclresult;
 
-		aclresult = pg_namespace_aclcheck(namespaceId, GetUserId(),
+		namespace_oid = RangeVarGetCreationNamespace(newRelation);
+
+		/*
+		 * In bootstrap mode, permissions machinery may not be working yet;
+		 * and there's no risk of concurrent activity, so we're done.
+		 */
+		if (IsBootstrapProcessingMode())
+			break;
+
+		/*
+		 * If the meaning of the name doesn't change on retry, we're done.
+		 * If it does, we'll have to release the lock we took previously
+		 * and lock the new referrent of the name instead.
+		 */
+		if (retry)
+		{
+			if (namespace_oid == old_namespace_oid)
+				break;
+			else
+				UnlockDatabaseObject(NamespaceRelationId, old_namespace_oid,
+									 0, AccessShareLock);
+		}
+
+		/* Check permissions. */
+		aclresult = pg_namespace_aclcheck(namespace_oid, GetUserId(),
 										  ACL_CREATE);
 		if (aclresult != ACLCHECK_OK)
 			aclcheck_error(aclresult, ACL_KIND_NAMESPACE,
-						   get_namespace_name(namespaceId));
+						   get_namespace_name(namespace_oid));
+
+		/* Lock target namespace against concurrent drop. */
+		LockDatabaseObject(NamespaceRelationId, namespace_oid, 0,
+						   AccessShareLock);
+
+		/* If nothing's changed under us, we are done.  Else, retry. */
+		if (inval_count == SharedInvalidMessageCounter)
+			break;
+		retry = true;
+		old_namespace_oid = namespace_oid;
 	}
 
-	return namespaceId;
+	return namespace_oid;
 }
 
 /*
#12Nikhil Sontakke
nikhil.sontakke@enterprisedb.com
In reply to: Robert Haas (#11)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

If all you need to do is lock a schema, you can just call
LockDatabaseObject(NamespaceRelationId, namespace_oid, 0,
AccessShareLock); there's no need to fake up an objectaddress just to
take a lock. But I think that's not really all you need to do,
because somebody could drop the namespace between the time that you
decide what OID to lock and the time you acquire the lock. So I think
you need something like what we did in RangeVarGetRelid(). See
attached patch.

Thanks Robert. But currently there are very few callers of
RangeVarGetAndCheckCreationNamespace() function. For the sake of
completeness we will have to introduce a call to this function while
creating all other objects too.

Regards,
Nikhils

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Sontakke (#12)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Mon, Nov 14, 2011 at 11:48 AM, Nikhil Sontakke
<nikhil.sontakke@enterprisedb.com> wrote:

If all you need to do is lock a schema, you can just call
LockDatabaseObject(NamespaceRelationId, namespace_oid, 0,
AccessShareLock); there's no need to fake up an objectaddress just to
take a lock.  But I think that's not really all you need to do,
because somebody could drop the namespace between the time that you
decide what OID to lock and the time you acquire the lock.  So I think
you need something like what we did in RangeVarGetRelid().  See
attached patch.

Thanks Robert. But currently there are very few callers of
RangeVarGetAndCheckCreationNamespace() function. For the sake of
completeness we will have to introduce a call to this function while
creating all other objects too.

Well, RangeVarGetAndCheckCreationNamespace is only (and can only) be
used for relations. To get similar protection for other object types,
we'd need to add a similar logic elsewhere. I haven't looked at where
it would need to go.

In fact, I think that the technique demonstrated here (which was
pioneered by Noah Misch) is actually quite general, and there are
probably a lot of places where we need to be doing it but currently
are not. So it's probably going to take a while to get this
completely nailed down, but we can keep chipping away at it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Nikhil Sontakke
nikkhils@gmail.com
In reply to: Robert Haas (#13)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

So it's probably going to take a while to get this
completely nailed down, but we can keep chipping away at it.

Agreed. So are you planning to commit this change? Or we want some more
objects to be fixed? Last I looked at this, we will need locking to be done
while creating tables, views, types, sequences, functions, indexes,
extensions, constraints, operators stuff, ts stuff, rules, domains, etc.
that can go into schemas.

So might even make sense to write a schema specific function based on your
patch template to cater in general to schema locking during object creation.

Regards,
Nikhils

#15Robert Haas
robertmhaas@gmail.com
In reply to: Nikhil Sontakke (#14)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Mon, Nov 14, 2011 at 12:48 PM, Nikhil Sontakke <nikkhils@gmail.com> wrote:

So it's probably going to take a while to get this
completely nailed down, but we can keep chipping away at it.

Agreed. So are you planning to commit this change? Or we want some more
objects to be fixed? Last I looked at this, we will need locking to be done
while creating tables, views, types, sequences, functions, indexes,
extensions, constraints, operators stuff, ts stuff, rules, domains, etc.
that can go into schemas.

<reads the code>

Well, it looks to me like there are three different places that we
need to nail down: RangeVarGetAndCheckCreationNamespace() is used for
relations (except that a few places call RangeVarGetCreationNamespace
directly, which means my previous patch probably needs some tweaking
before commit), QualifiedNameGetCreationNamespace() is used for pretty
much all other schema-qualified objects, and LookupCreationNamespace()
is used for ALTER BLAH SET SCHEMA (which I think has a problem when
you rename an object into a schema that is concurrently being
dropped).

I'm fairly unhappy with the idea of modifying a function that is
described as doing a "get" or "lookup" to have the side effect of
"locking something". So probably some renaming or refactoring is in
order here. It seems like we're duplicating almost identical logic in
an awful lot of places in namespace.c.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#15)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

Excerpts from Robert Haas's message of lun nov 14 15:56:43 -0300 2011:

Well, it looks to me like there are three different places that we
need to nail down: RangeVarGetAndCheckCreationNamespace() is used for
relations (except that a few places call RangeVarGetCreationNamespace
directly, which means my previous patch probably needs some tweaking
before commit), QualifiedNameGetCreationNamespace() is used for pretty
much all other schema-qualified objects, and LookupCreationNamespace()
is used for ALTER BLAH SET SCHEMA (which I think has a problem when
you rename an object into a schema that is concurrently being
dropped).

I'm fairly unhappy with the idea of modifying a function that is
described as doing a "get" or "lookup" to have the side effect of
"locking something". So probably some renaming or refactoring is in
order here. It seems like we're duplicating almost identical logic in
an awful lot of places in namespace.c.

So RangeVarGetCheckAndLockCreationNamespace(), uh? Pity you can't
stick a comma in there.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#16)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Mon, Nov 14, 2011 at 2:26 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of lun nov 14 15:56:43 -0300 2011:

Well, it looks to me like there are three different places that we
need to nail down: RangeVarGetAndCheckCreationNamespace() is used for
relations (except that a few places call RangeVarGetCreationNamespace
directly, which means my previous patch probably needs some tweaking
before commit), QualifiedNameGetCreationNamespace() is used for pretty
much all other schema-qualified objects, and LookupCreationNamespace()
is used for ALTER BLAH SET SCHEMA (which I think has a problem when
you rename an object into a schema that is concurrently being
dropped).

I'm fairly unhappy with the idea of modifying a function that is
described as doing a "get" or "lookup" to have the side effect of
"locking something".  So probably some renaming or refactoring is in
order here.  It seems like we're duplicating almost identical logic in
an awful lot of places in namespace.c.

So RangeVarGetCheckAndLockCreationNamespace(), uh?  Pity you can't
stick a comma in there.

Yeah, really. :-)

Actually, I think that one could probably stay as-is. "Check" implies
that there's something else going on besides just a lookup, and we
can't go nuts with it. I'm more concerned about
QualifiedNameGetCreationNamespace() and LookupCreationNamespace().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Daniel Farina
daniel@heroku.com
In reply to: Robert Haas (#17)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Mon, Nov 14, 2011 at 12:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Nov 14, 2011 at 2:26 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of lun nov 14 15:56:43 -0300 2011:

Well, it looks to me like there are three different places that we
need to nail down: RangeVarGetAndCheckCreationNamespace() is used for
relations (except that a few places call RangeVarGetCreationNamespace
directly, which means my previous patch probably needs some tweaking
before commit), QualifiedNameGetCreationNamespace() is used for pretty
much all other schema-qualified objects, and LookupCreationNamespace()
is used for ALTER BLAH SET SCHEMA (which I think has a problem when
you rename an object into a schema that is concurrently being
dropped).

I'm fairly unhappy with the idea of modifying a function that is
described as doing a "get" or "lookup" to have the side effect of
"locking something".  So probably some renaming or refactoring is in
order here.  It seems like we're duplicating almost identical logic in
an awful lot of places in namespace.c.

So RangeVarGetCheckAndLockCreationNamespace(), uh?  Pity you can't
stick a comma in there.

Yeah, really.  :-)

Actually, I think that one could probably stay as-is.  "Check" implies
that there's something else going on besides just a lookup, and we
can't go nuts with it.  I'm more concerned about
QualifiedNameGetCreationNamespace() and LookupCreationNamespace().

Hmm, just to prod this thread: has any fix for this been committed?
After Nikhil confirmed that this bug could cause pg_dump to not be
able to operate without direct catalog surgery I have encountered this
bug (and treated its symptoms in the same manner) twice. I tried to
do my homework in a backbranch, but am not seeing anything beaming out
at me.

Cheers,

--
fdr

#19Robert Haas
robertmhaas@gmail.com
In reply to: Daniel Farina (#18)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Tue, Nov 29, 2011 at 3:37 AM, Daniel Farina <daniel@heroku.com> wrote:

Hmm, just to prod this thread: has any fix for this been committed?
After Nikhil confirmed that this bug could cause pg_dump to not be
able to operate without direct catalog surgery I have encountered this
bug (and treated its symptoms in the same manner) twice.  I tried to
do my homework in a backbranch, but am not seeing anything beaming out
at me.

I have plans to try to improve this, but it's one of those things that
I care about more than the people who write the checks do, so it
hasn't quite gotten to the top of the priority list yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#19)
1 attachment(s)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Tue, Nov 29, 2011 at 10:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 29, 2011 at 3:37 AM, Daniel Farina <daniel@heroku.com> wrote:

Hmm, just to prod this thread: has any fix for this been committed?
After Nikhil confirmed that this bug could cause pg_dump to not be
able to operate without direct catalog surgery I have encountered this
bug (and treated its symptoms in the same manner) twice.  I tried to
do my homework in a backbranch, but am not seeing anything beaming out
at me.

I have plans to try to improve this, but it's one of those things that
I care about more than the people who write the checks do, so it
hasn't quite gotten to the top of the priority list yet.

All right... I have a patch that I think fixes this, at least so far
as relations are concerned. I rewrote
RangeVarGetAndCheckCreationNamespace() extensively, did surgery on its
callers, and then modified CREATE OR REPLACE VIEW and ALTER <relkind>
.. SET SCHEMA to make use of it rather than doing things as they did
before.

So this patch prevents (1) concurrently dropping a schema in which a
new relation is being created, (2) concurrently dropping a schema into
which an existing relation is being moved, and (3) using CREATE OR
REPLACE VIEW to attempt to obtain a lock on a relation you don't own
(the command would eventually fail, of course, but if there were, say,
an outstanding AccessShareLock on the relation, you'd queue up for the
lock and thus prevent any further locks from being granted, despite
your lack of any rights on the target).

It doesn't fix any of the non-relation object types. That would be
nice to do, but I think it's material for a separate patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

concurrent-drop-schema.patchapplication/octet-stream; name=concurrent-drop-schema.patchDownload
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index a9a64fe..80d6fc7 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -480,31 +480,131 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 
 /*
  * RangeVarGetAndCheckCreationNamespace
- *		As RangeVarGetCreationNamespace, but with a permissions check.
+ *
+ * This function returns the OID of the namespace in which a new relation
+ * with a given name should be created.  If the user does not have CREATE
+ * permission on the target namespace, this function will instead signal
+ * an ERROR.
+ *
+ * If non-NULL, *existing_oid is set to the OID of any existing relation with
+ * the same name which already exists in that namespace, or to InvalidOid if
+ * no such relation exists.
+ *
+ * If lockmode != NoLock, the specified lock mode is acquire on the existing
+ * relation, if any, provided that the current user owns the target relation.
+ * However, if lockmode != NoLock and the user does not own the target
+ * relation, we throw an ERROR, as we must not try to lock relations the
+ * user does not have permissions on.
+ *
+ * As a side effect, this function acquires AccessShareLock on the target
+ * namespace.  Without this, the namespace could be dropped before our
+ * transaction commits, leaving behind relations with relnamespace pointing
+ * to a no-longer-exstant namespace.
+ *
+ * As a further side-effect, if the select namespace is a temporary namespace,
+ * we mark the RangeVar as RELPERSISTENCE_TEMP.
  */
 Oid
-RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation)
+RangeVarGetAndCheckCreationNamespace(RangeVar *relation,
+									 LOCKMODE lockmode,
+									 Oid *existing_relation_id)
 {
-	Oid			namespaceId;
+	uint64		inval_count;
+	Oid			relid;
+	Oid			oldrelid = InvalidOid;
+	Oid			nspid;
+	Oid			oldnspid = InvalidOid;
+	bool		retry = false;
 
-	namespaceId = RangeVarGetCreationNamespace(newRelation);
+	/*
+	 * We check the catalog name and then ignore it.
+	 */
+	if (relation->catalogname)
+	{
+		if (strcmp(relation->catalogname, get_database_name(MyDatabaseId)) != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cross-database references are not implemented: \"%s.%s.%s\"",
+							relation->catalogname, relation->schemaname,
+							relation->relname)));
+	}
 
 	/*
-	 * Check we have permission to create there. Skip check if bootstrapping,
-	 * since permissions machinery may not be working yet.
+	 * As in RangeVarGetRelidExtended(), we guard against concurrent DDL
+	 * operations by tracking whether any invalidation messages are processed
+	 * while we're doing the name lookups and acquiring locks.  See comments
+	 * in that function for a more detailed explanation of this logic.
 	 */
-	if (!IsBootstrapProcessingMode())
+	for (;;)
 	{
 		AclResult	aclresult;
 
-		aclresult = pg_namespace_aclcheck(namespaceId, GetUserId(),
-										  ACL_CREATE);
+		inval_count = SharedInvalidMessageCounter;
+
+		/* Look up creation namespace and check for existing relation. */
+		nspid = RangeVarGetCreationNamespace(relation);
+		Assert(OidIsValid(nspid));
+		if (existing_relation_id != NULL)
+			relid = get_relname_relid(relation->relname, nspid);
+		else
+			relid = InvalidOid;
+
+		/*
+		 * In bootstrap processing mode, we don't bother with permissions
+		 * or locking.  Permissions might not be working yet, and locking is
+		 * unnecessary.
+		 */
+		if (IsBootstrapProcessingMode())
+			break;
+
+		/* Check namespace permissions. */
+		aclresult = pg_namespace_aclcheck(nspid, GetUserId(), ACL_CREATE);
 		if (aclresult != ACLCHECK_OK)
 			aclcheck_error(aclresult, ACL_KIND_NAMESPACE,
-						   get_namespace_name(namespaceId));
+						   get_namespace_name(nspid));
+
+		if (retry)
+		{
+			/* If nothing changed, we're done. */
+			if (relid == oldrelid && nspid == oldnspid)
+				break;
+			/* If creation namespace has changed, give up old lock. */
+			if (nspid != oldnspid)
+				UnlockDatabaseObject(NamespaceRelationId, oldnspid, 0,
+									 AccessShareLock);
+			/* If name points to something different, give up old lock. */
+			if (relid != oldrelid && OidIsValid(oldrelid) && lockmode != NoLock)
+				UnlockRelationOid(oldrelid, lockmode);
+		}
+
+		/* Lock namespace. */
+		if (nspid != oldnspid)
+			LockDatabaseObject(NamespaceRelationId, nspid, 0, AccessShareLock);
+
+		/* Lock relation, if required if and we have permission. */
+		if (lockmode != NoLock && OidIsValid(relid))
+		{
+			if (!pg_class_ownercheck(relid, GetUserId()))
+				aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
+							   relation->relname);
+			if (relid != oldrelid)
+				LockRelationOid(relid, lockmode);
+		}
+
+		/* If no invalidation message were processed, we're done! */
+		if (inval_count == SharedInvalidMessageCounter)
+			break;
+
+		/* Something may have changed, so recheck our work. */
+		retry = true;
+		oldrelid = relid;
+		oldnspid = nspid;
 	}
 
-	return namespaceId;
+	RangeVarAdjustRelationPersistence(relation, nspid);
+	if (existing_relation_id != NULL)
+		*existing_relation_id = relid;
+	return nspid;
 }
 
 /*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c373016..d0843b2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -451,10 +451,12 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 
 	/*
 	 * Look up the namespace in which we are supposed to create the relation,
-	 * and check we have permission to create there.
+	 * check we have permission to create there, lock it against concurrent
+	 * drop, and mark stmt->relation as RELPERSISTENCE_TEMP if a temporary
+	 * namespace is selected.
 	 */
-	namespaceId = RangeVarGetAndCheckCreationNamespace(stmt->relation);
-	RangeVarAdjustRelationPersistence(stmt->relation, namespaceId);
+	namespaceId =
+		RangeVarGetAndCheckCreationNamespace(stmt->relation, NoLock, NULL);
 
 	/*
 	 * Security check: disallow creating temp tables from security-restricted
@@ -9417,6 +9419,7 @@ AlterTableNamespace(AlterObjectSchemaStmt *stmt)
 	Oid			oldNspOid;
 	Oid			nspOid;
 	Relation	classRel;
+	RangeVar   *newrv;
 
 	relid = RangeVarGetRelidExtended(stmt->relation, AccessExclusiveLock,
 									 false, false,
@@ -9441,8 +9444,9 @@ AlterTableNamespace(AlterObjectSchemaStmt *stmt)
 						get_rel_name(tableId))));
 	}
 
-	/* get schema OID and check its permissions */
-	nspOid = LookupCreationNamespace(stmt->newschema);
+	/* Get and lock schema OID and check its permissions. */
+	newrv = makeRangeVar(stmt->newschema, RelationGetRelationName(rel), -1);
+	nspOid = RangeVarGetAndCheckCreationNamespace(newrv, NoLock, NULL);
 
 	/* common checks on switching namespaces */
 	CheckSetNamespace(oldNspOid, nspOid, RelationRelationId, relid);
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 0f8af31..0043bf1 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -2005,7 +2005,8 @@ DefineCompositeType(const RangeVar *typevar, List *coldeflist)
 	 * check is here mainly to get a better error message about a "type"
 	 * instead of below about a "relation".
 	 */
-	typeNamespace = RangeVarGetCreationNamespace(createStmt->relation);
+	typeNamespace = RangeVarGetAndCheckCreationNamespace(createStmt->relation,
+														 NoLock, NULL);
 	RangeVarAdjustRelationPersistence(createStmt->relation, typeNamespace);
 	old_type_oid =
 		GetSysCacheOid2(TYPENAMENSP,
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index ff9c449..c3520ae 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -98,10 +98,12 @@ isViewOnTempTable_walker(Node *node, void *context)
  *---------------------------------------------------------------------
  */
 static Oid
-DefineVirtualRelation(const RangeVar *relation, List *tlist, bool replace,
-					  Oid namespaceId, List *options)
+DefineVirtualRelation(RangeVar *relation, List *tlist, bool replace,
+					  List *options)
 {
 	Oid			viewOid;
+	Oid			namespaceId;
+	LOCKMODE	lockmode;
 	CreateStmt *createStmt = makeNode(CreateStmt);
 	List	   *attrList;
 	ListCell   *t;
@@ -159,9 +161,14 @@ DefineVirtualRelation(const RangeVar *relation, List *tlist, bool replace,
 				 errmsg("view must have at least one column")));
 
 	/*
-	 * Check to see if we want to replace an existing view.
+	 * Look up, check permissions on, and lock the creation namespace; also
+	 * check for a preexisting view with the same name.  This will also set
+	 * relation->relpersistence to RELPERSISTENCE_TEMP if the selected
+	 * namespace is temporary.
 	 */
-	viewOid = get_relname_relid(relation->relname, namespaceId);
+	lockmode = replace ? AccessExclusiveLock : NoLock;
+	namespaceId =
+		RangeVarGetAndCheckCreationNamespace(relation, lockmode, &viewOid);
 
 	if (OidIsValid(viewOid) && replace)
 	{
@@ -170,24 +177,16 @@ DefineVirtualRelation(const RangeVar *relation, List *tlist, bool replace,
 		List	   *atcmds = NIL;
 		AlterTableCmd *atcmd;
 
-		/*
-		 * Yes.  Get exclusive lock on the existing view ...
-		 */
-		rel = relation_open(viewOid, AccessExclusiveLock);
+		/* Relation is already locked, but we must build a relcache entry. */
+		rel = relation_open(viewOid, NoLock);
 
-		/*
-		 * Make sure it *is* a view, and do permissions checks.
-		 */
+		/* Make sure it *is* a view. */
 		if (rel->rd_rel->relkind != RELKIND_VIEW)
 			ereport(ERROR,
 					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 					 errmsg("\"%s\" is not a view",
 							RelationGetRelationName(rel))));
 
-		if (!pg_class_ownercheck(viewOid, GetUserId()))
-			aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
-						   RelationGetRelationName(rel));
-
 		/* Also check it's not in use already */
 		CheckTableNotInUse(rel, "CREATE OR REPLACE VIEW");
 
@@ -428,7 +427,6 @@ DefineView(ViewStmt *stmt, const char *queryString)
 {
 	Query	   *viewParse;
 	Oid			viewOid;
-	Oid			namespaceId;
 	RangeVar   *view;
 
 	/*
@@ -514,10 +512,6 @@ DefineView(ViewStmt *stmt, const char *queryString)
 						view->relname)));
 	}
 
-	/* Might also need to make it temporary if placed in temp schema. */
-	namespaceId = RangeVarGetCreationNamespace(view);
-	RangeVarAdjustRelationPersistence(view, namespaceId);
-
 	/*
 	 * Create the view relation
 	 *
@@ -525,7 +519,7 @@ DefineView(ViewStmt *stmt, const char *queryString)
 	 * aborted.
 	 */
 	viewOid = DefineVirtualRelation(view, viewParse->targetList,
-									stmt->replace, namespaceId, stmt->options);
+									stmt->replace, stmt->options);
 
 	/*
 	 * The relation we have just created is not visible to any other commands
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 569d0ba..422f737 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2532,11 +2532,13 @@ OpenIntoRel(QueryDesc *queryDesc)
 	}
 
 	/*
-	 * Find namespace to create in, check its permissions
+	 * Find namespace to create in, check its permissions, lock it against
+	 * concurrent drop, and mark into->rel as RELPERSISTENCE_TEMP if the
+	 * selected namespace is temporary.
 	 */
 	intoName = into->rel->relname;
-	namespaceId = RangeVarGetAndCheckCreationNamespace(into->rel);
-	RangeVarAdjustRelationPersistence(into->rel, namespaceId);
+	namespaceId = RangeVarGetAndCheckCreationNamespace(into->rel, NoLock,
+													   NULL);
 
 	/*
 	 * Security check: disallow creating temp tables from security-restricted
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 335bdc6..99157c5 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -146,6 +146,7 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	List	   *save_alist;
 	ListCell   *elements;
 	Oid			namespaceid;
+	Oid			existing_relid;
 
 	/*
 	 * We must not scribble on the passed-in CreateStmt, so copy it.  (This is
@@ -155,30 +156,25 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 
 	/*
 	 * Look up the creation namespace.	This also checks permissions on the
-	 * target namespace, so that we throw any permissions error as early as
-	 * possible.
+	 * target namespace, locks it against concurrent drops, checks for a
+	 * preexisting relation in that namespace with the same name, and updates
+	 * stmt->relation->relpersistence if the select namespace is temporary.
 	 */
-	namespaceid = RangeVarGetAndCheckCreationNamespace(stmt->relation);
-	RangeVarAdjustRelationPersistence(stmt->relation, namespaceid);
+	namespaceid =
+		RangeVarGetAndCheckCreationNamespace(stmt->relation, NoLock,
+											 &existing_relid);
 
 	/*
 	 * If the relation already exists and the user specified "IF NOT EXISTS",
 	 * bail out with a NOTICE.
 	 */
-	if (stmt->if_not_exists)
+	if (stmt->if_not_exists && OidIsValid(existing_relid))
 	{
-		Oid			existing_relid;
-
-		existing_relid = get_relname_relid(stmt->relation->relname,
-										   namespaceid);
-		if (existing_relid != InvalidOid)
-		{
-			ereport(NOTICE,
-					(errcode(ERRCODE_DUPLICATE_TABLE),
-					 errmsg("relation \"%s\" already exists, skipping",
-							stmt->relation->relname)));
-			return NIL;
-		}
+		ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_TABLE),
+				 errmsg("relation \"%s\" already exists, skipping",
+						stmt->relation->relname)));
+		return NIL;
 	}
 
 	/*
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 37b259d..fa3ba5b 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -58,7 +58,9 @@ extern Oid	RangeVarGetRelidExtended(const RangeVar *relation,
 						 RangeVarGetRelidCallback callback,
 						 void *callback_arg);
 extern Oid	RangeVarGetCreationNamespace(const RangeVar *newRelation);
-extern Oid	RangeVarGetAndCheckCreationNamespace(const RangeVar *newRelation);
+extern Oid	RangeVarGetAndCheckCreationNamespace(RangeVar *newRelation,
+									 LOCKMODE lockmode,
+									 Oid *existing_relation_id);
 extern void RangeVarAdjustRelationPersistence(RangeVar *newRelation, Oid nspid);
 extern Oid	RelnameGetRelid(const char *relname);
 extern bool RelationIsVisible(Oid relid);
#21Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Robert Haas (#20)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Fri, Jan 13, 2012 at 2:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 29, 2011 at 10:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I have plans to try to improve this, but it's one of those things that
I care about more than the people who write the checks do, so it
hasn't quite gotten to the top of the priority list yet.

All right... I have a patch that I think fixes this, at least so far
as relations are concerned.  I rewrote
RangeVarGetAndCheckCreationNamespace() extensively, did surgery on its
callers, and then modified CREATE OR REPLACE VIEW and ALTER <relkind>
.. SET SCHEMA to make use of it rather than doing things as they did
before.

So this patch prevents (1) concurrently dropping a schema in which a
new relation is being created, (2) concurrently dropping a schema into
which an existing relation is being moved, and (3) using CREATE OR
REPLACE VIEW to attempt to obtain a lock on a relation you don't own
(the command would eventually fail, of course, but if there were, say,
an outstanding AccessShareLock on the relation, you'd queue up for the
lock and thus prevent any further locks from being granted, despite
your lack of any rights on the target).

The patch looks ok, though I wonder if we could have a way to release
the lock on namespace much before the end of transaction. Since the
lock is held all the time, now the possibility of dead lock is bigger.
Say,

-- tx1
BEGIN;
SELECT * FROM s.x;
DROP SCHEMA t;

-- tx2
BEGIN;
SELECT * FROM t.y;
DROP SCHEMA s;

I know it's a limited situation, though. Maybe the right way would be
to check the namespace at the end of the transaction if any object is
created, rather than locking it.

It doesn't fix any of the non-relation object types.  That would be
nice to do, but I think it's material for a separate patch.

I took a quick look under src/include/catalog and the objects that
have namespace are

- collation
- constraint
- conversion
- extension
- operator
- operator class
- operator family
- function (proc)
- ts_config
- ts_dict
- ts_parser
- ts_template
- (what's missing?)

I agree with you that it's not worth doing everything, but function is
nice to have. I don't mind if we don't have it with the other objects.

Thanks,
--
Hitoshi Harada

#22Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Hitoshi Harada (#21)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Sat, Jan 14, 2012 at 2:25 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

On Fri, Jan 13, 2012 at 2:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 29, 2011 at 10:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I have plans to try to improve this, but it's one of those things that
I care about more than the people who write the checks do, so it
hasn't quite gotten to the top of the priority list yet.

All right... I have a patch that I think fixes this, at least so far
as relations are concerned.  I rewrote
RangeVarGetAndCheckCreationNamespace() extensively, did surgery on its
callers, and then modified CREATE OR REPLACE VIEW and ALTER <relkind>
.. SET SCHEMA to make use of it rather than doing things as they did
before.

So this patch prevents (1) concurrently dropping a schema in which a
new relation is being created, (2) concurrently dropping a schema into
which an existing relation is being moved, and (3) using CREATE OR
REPLACE VIEW to attempt to obtain a lock on a relation you don't own
(the command would eventually fail, of course, but if there were, say,
an outstanding AccessShareLock on the relation, you'd queue up for the
lock and thus prevent any further locks from being granted, despite
your lack of any rights on the target).

The patch looks ok, though I wonder if we could have a way to release
the lock on namespace much before the end of transaction. Since the
lock is held all the time, now the possibility of dead lock is bigger.
Say,

-- tx1
BEGIN;
SELECT * FROM s.x;
DROP SCHEMA t;

-- tx2
BEGIN;
SELECT * FROM t.y;
DROP SCHEMA s;

Although I wrote

I know it's a limited situation, though. Maybe the right way would be
to check the namespace at the end of the transaction if any object is
created, rather than locking it.

actually what's surprising to me is that even SELECT holds lock on
namespace to the end of transaction. The ideal way is that we lock
only on CREATE or other DDL.

Thanks,
--
Hitoshi Harada

#23Robert Haas
robertmhaas@gmail.com
In reply to: Hitoshi Harada (#21)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Sat, Jan 14, 2012 at 5:25 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

The patch looks ok, though I wonder if we could have a way to release
the lock on namespace much before the end of transaction.

Well, that wold kind of miss the point, wouldn't it? I mean, the race
is that the process dropping the schema might not see the newly
created object. The only way to prevent that is to hold some sort of
lock until the newly created object is visible, which doesn't happen
until commit.

I know it's a limited situation, though. Maybe the right way would be
to check the namespace at the end of the transaction if any object is
created, rather than locking it.

And then what? If you find that you created an object in a namespace
that's been subsequently dropped, what do you do about that? As far
as I can see, your own really choice would be to roll back the
transaction that uncovers this problem, which is likely to produce
more rollbacks than just letting the deadlock detector sort it out.

It doesn't fix any of the non-relation object types.  That would be
nice to do, but I think it's material for a separate patch.

I took a quick look under src/include/catalog and the objects that
have namespace are

- collation
- constraint
- conversion
- extension
- operator
- operator class
- operator family
- function (proc)
- ts_config
- ts_dict
- ts_parser
- ts_template
- (what's missing?)

I agree with you that it's not worth doing everything, but function is
nice to have. I don't mind if we don't have it with the other objects.

I think the fix for all of them will be very symmetrical; it's just
relations that have a different code path. I don't see the point of
plugging some but not others; that just provides a mixture of good and
bad examples for future hackers to copy, which doesn't seem ideal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#24Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Robert Haas (#23)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Sat, Jan 14, 2012 at 6:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jan 14, 2012 at 5:25 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

The patch looks ok, though I wonder if we could have a way to release
the lock on namespace much before the end of transaction.

Well, that wold kind of miss the point, wouldn't it?  I mean, the race
is that the process dropping the schema might not see the newly
created object.  The only way to prevent that is to hold some sort of
lock until the newly created object is visible, which doesn't happen
until commit.

I know it's a limited situation, though. Maybe the right way would be
to check the namespace at the end of the transaction if any object is
created, rather than locking it.

And then what?  If you find that you created an object in a namespace
that's been subsequently dropped, what do you do about that?  As far
as I can see, your own really choice would be to roll back the
transaction that uncovers this problem, which is likely to produce
more rollbacks than just letting the deadlock detector sort it out.

Right. I thought this patch introduced lock on schema in SELECT, but
actually we'd been locking schema with SELECT since before the patch.
So the behavior becomes consistent (between SELECT and CREATE) now
with it. And I agree that my insist is pointless after looking more.

Thanks,
--
Hitoshi Harada

#25Robert Haas
robertmhaas@gmail.com
In reply to: Hitoshi Harada (#24)
Re: Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

On Sun, Jan 15, 2012 at 11:03 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

On Sat, Jan 14, 2012 at 6:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jan 14, 2012 at 5:25 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:

The patch looks ok, though I wonder if we could have a way to release
the lock on namespace much before the end of transaction.

Well, that wold kind of miss the point, wouldn't it?  I mean, the race
is that the process dropping the schema might not see the newly
created object.  The only way to prevent that is to hold some sort of
lock until the newly created object is visible, which doesn't happen
until commit.

I know it's a limited situation, though. Maybe the right way would be
to check the namespace at the end of the transaction if any object is
created, rather than locking it.

And then what?  If you find that you created an object in a namespace
that's been subsequently dropped, what do you do about that?  As far
as I can see, your own really choice would be to roll back the
transaction that uncovers this problem, which is likely to produce
more rollbacks than just letting the deadlock detector sort it out.

Right. I thought this patch introduced lock on schema in SELECT, but
actually we'd been locking schema with SELECT since before the patch.

No, we lock the table with SELECT, not the schema. This doesn't add
any additional locking for DML: nor is any needed, AFAICS.

So the behavior becomes consistent (between SELECT and CREATE) now
with it. And I agree that my insist is pointless after looking more.

OK. Thanks for your review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company