a modest improvement to get_object_address()

Started by Robert Haasabout 14 years ago11 messages
#1Robert Haas
robertmhaas@gmail.com
1 attachment(s)

I'd like to propose the attached patch, which changes
get_object_address() in a manner similar to what we did in
RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.
The basic idea is that, if we look up an object name, acquire the
corresponding lock, and then find that the object was dropped during
the lock wait, we retry the whole operation instead of emitting a
baffling error message. Example:

rhaas=# create schema x;
CREATE SCHEMA
rhaas=# begin;
BEGIN
rhaas=# drop schema x;
DROP SCHEMA

Then, in another session:

rhaas=# comment on schema x is 'doodle';

Then, in the first session:

rhaas=# commit;
COMMIT

At this point, the first session must error out. The current code
produces this:

ERROR: cache lookup failed for class 2615 object 16386 subobj 0

With the attached patch, you instead get:

ERROR: schema "x" does not exist

...which is obviously quite a bit nicer.

Also, if the concurrent transaction drops and creates the schema
instead of just dropping it, the new code will allow the operation to
succeed (with the expected results) rather than failing.

Objections?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

objectaddress-retry.patchapplication/octet-stream; name=objectaddress-retry.patchDownload
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index ec4c987..94135fd 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -281,6 +281,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 	/* Some kind of lock must be taken. */
 	Assert(lockmode != NoLock);
 
+retry:
 	switch (objtype)
 	{
 		case OBJECT_INDEX:
@@ -432,10 +433,14 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 			LockSharedObject(address.classId, address.objectId, 0, lockmode);
 		else
 			LockDatabaseObject(address.classId, address.objectId, 0, lockmode);
-		/* Did it go away while we were waiting for the lock? */
+		/*
+		 * Did it go away while we were waiting for the lock?  If so, look up
+		 * the name again.  This will either throw a better error message than
+		 * we could generate otherwise, or it will find the new object that
+		 * now has the same name as the dropped one.
+		 */
 		if (!object_exists(address))
-			elog(ERROR, "cache lookup failed for class %u object %u subobj %d",
-				 address.classId, address.objectId, address.objectSubId);
+			goto retry;
 	}
 
 	/* Return the object address and the relation. */
#2Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#1)
Re: a modest improvement to get_object_address()

2011/11/9 Robert Haas <robertmhaas@gmail.com>:

I'd like to propose the attached patch, which changes
get_object_address() in a manner similar to what we did in
RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.
 The basic idea is that, if we look up an object name, acquire the
corresponding lock, and then find that the object was dropped during
the lock wait, we retry the whole operation instead of emitting a
baffling error message.  Example:

rhaas=# create schema x;
CREATE SCHEMA
rhaas=# begin;
BEGIN
rhaas=# drop schema x;
DROP SCHEMA

Then, in another session:

rhaas=# comment on schema x is 'doodle';

Then, in the first session:

rhaas=# commit;
COMMIT

At this point, the first session must error out.  The current code
produces this:

ERROR:  cache lookup failed for class 2615 object 16386 subobj 0

With the attached patch, you instead get:

ERROR:  schema "x" does not exist

...which is obviously quite a bit nicer.

Also, if the concurrent transaction drops and creates the schema
instead of just dropping it, the new code will allow the operation to
succeed (with the expected results) rather than failing.

Objections?

Maybe I miss something but:
The ERROR message is misleading: the schema 'x' does exist. And also
why a drop schema would fail and a drop+create would success ?!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#3Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#2)
Re: a modest improvement to get_object_address()

On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

Maybe I miss something but:
The ERROR message is misleading:  the schema 'x' does exist.

No, it doesn't. The concurrent transaction has dropped it.

And also
why a drop schema would fail and a drop+create would success ?!

Because you can't comment on a schema that doesn't exist any more, but
you can comment on one that does.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: a modest improvement to get_object_address()

Robert Haas <robertmhaas@gmail.com> writes:

I'd like to propose the attached patch, which changes
get_object_address() in a manner similar to what we did in
RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.

I would think you need to drop the now-useless lock, and I sure hope
that RangeVarGetRelid does likewise.

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: a modest improvement to get_object_address()

On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'd like to propose the attached patch, which changes
get_object_address() in a manner similar to what we did in
RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.

I would think you need to drop the now-useless lock, and I sure hope
that RangeVarGetRelid does likewise.

It doesn't currently. The now-useless lock doesn't really hurt
anything, aside from taking up space in the lock table. But we can
certainly make it (and this) do that, if you think it's worth the
extra lines of code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: a modest improvement to get_object_address()

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would think you need to drop the now-useless lock, and I sure hope
that RangeVarGetRelid does likewise.

It doesn't currently. The now-useless lock doesn't really hurt
anything, aside from taking up space in the lock table.

Well, there are corner cases where the object OID gets reused during
the lifetime of the transaction, and then the lock *does* do something
(and what it does would be bad). But taking up extra space in the
finite-size lock table is sufficient reason IMO to drop the lock.
It's not like these are performance-critical code paths.

regards, tom lane

#7Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#3)
Re: a modest improvement to get_object_address()

2011/11/9 Robert Haas <robertmhaas@gmail.com>:

On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

Maybe I miss something but:

I read that the error was produced by first session and didn't check
carefuly (it fails silently in 9.0! and 'works' as expected in 9.1)

No objection, but I would like to still be able to diagnose the same
things as in the past, can you make it clear that the schema/object
just disappear ? (else we don't know if the relation just never exists
or was drop while we were waiting)

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: a modest improvement to get_object_address()

On Wed, Nov 9, 2011 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would think you need to drop the now-useless lock, and I sure hope
that RangeVarGetRelid does likewise.

It doesn't currently.  The now-useless lock doesn't really hurt
anything, aside from taking up space in the lock table.

Well, there are corner cases where the object OID gets reused during
the lifetime of the transaction, and then the lock *does* do something
(and what it does would be bad).  But taking up extra space in the
finite-size lock table is sufficient reason IMO to drop the lock.
It's not like these are performance-critical code paths.

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#7)
Re: a modest improvement to get_object_address()

On Wed, Nov 9, 2011 at 10:50 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

2011/11/9 Robert Haas <robertmhaas@gmail.com>:

On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

Maybe I miss something but:

I read that the error was produced by first session and didn't check
carefuly (it fails silently in 9.0! and 'works' as expected in 9.1)

No objection, but I would like to still be able to diagnose the same
things as in the past, can you make it clear that the schema/object
just disappear ? (else we don't know if the relation just never exists
or was drop while we were waiting)

I don't see a clean way to do that, and I'm not convinced it's a good
idea anyway. I think that if we start generating different error
messages based on whether or not a lock wait was involved at some
point in the operation, we're going to drive ourselves nuts. There
are a lot of places where that can happen.

e.g. Suppose that you have a table with a unique index on column a.
Transaction A deletes the tuple where a = 1. Transaction B attempts to
insert a new tuple with a = 1, and blocks. Now if A commits, B will
succeed, but if A rolls back, B will abort. Had transaction A not
existed, B would simply abort at once. But the error message will not
indicate which of the two it was, and I don't thinkit needs to.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#9)
Re: a modest improvement to get_object_address()

Robert Haas <robertmhaas@gmail.com> writes:

e.g. Suppose that you have a table with a unique index on column a.
Transaction A deletes the tuple where a = 1. Transaction B attempts to

That's DML, I agree with you there, no need. In DML we have MVCC.

Back to the problem you raised, it's DDL and we're sitting in between
SnapshotNow and catalog cache entries. Not so comfy. I would guess
that the problem (I confess didn't read carefully enough) happens after
having done a cache lookup when trying to use its result?

Could we check the object still exists as part of the cache lookup, or
would that mean we don't have a cache anymore? Or is the answer related
to consuming invalidation messages before returning a stale entry from
the cache?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#11Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#10)
Re: a modest improvement to get_object_address()

On Wed, Nov 9, 2011 at 3:40 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Back to the problem you raised, it's DDL and we're sitting in between
SnapshotNow and catalog cache entries.  Not so comfy.  I would guess
that the problem (I confess didn't read carefully enough) happens after
having done a cache lookup when trying to use its result?

There's a test case in the original post, but yes, the problem happens
when something changes between the time you do the catcache lookup and
the time you acquire the lock. This is not a new problem; I'm just
trying to give a more intelligible error message - and avoid
unnecessary failures, as in the case where two concurrent DROP IF
EXISTS operations target the same object and one of them unnecessarily
rolls back.

Could we check the object still exists as part of the cache lookup, or
would that mean we don't have a cache anymore?  Or is the answer related
to consuming invalidation messages before returning a stale entry from
the cache?

All of that is way beyond the scope of what I'm doing here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company