ALTER TABLE deadlock with concurrent INSERT

Started by Joe Conwayalmost 15 years ago8 messages
#1Joe Conway
mail@joeconway.com

I'm working with a client on an application upgrade script which
executes a function to conditionally do an:

ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz

If this is run while the application is concurrently doing inserts into
foo, we are occasionally seeing deadlocks. Aside from the fact that they
are better off not altering the table amid concurrent inserts, I'm
trying to understand why this is even able to happen. I expect one to
block the other, not a deadlock.

This is 8.4.1 (I know, I know, I have advised strongly that they upgrade
to 8.4.latest).

We have not been able to repeat this forcibly. Here is what the log shows:
------------------------------
2011-02-25 14:38:07 PST [31686]: [1-1] ERROR: deadlock detected
2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL: Process 31686 waits for
AccessExclusiveLock on relation 16896 of database 16386; blocked by
process 31634.
Process 31634 waits for RowExclusiveLock on relation 16902 of
database 16386; blocked by process 31686.
Process 31686: SELECT change_column_type('attribute_summary',
'sequence_number', 'numeric');
Process 31634: insert into attribute_summary (attribute_value,
sequence_number, attribute_id) values ($1, $2, $3)
2011-02-25 14:38:07 PST [31686]: [3-1] HINT: See server log for query
details.
2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT: SQL statement "ALTER
TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric"
PL/pgSQL function "change_column_type" line 18 at EXECUTE statement
2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT: SELECT
change_column_type('attribute_summary', 'sequence_number', 'numeric');
------------------------------

Reviewing the release notes, I see some marginally related commits, but
nothing that jumps out to me as a specific fix. Thoughts?

Thanks,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#1)
Re: ALTER TABLE deadlock with concurrent INSERT

Joe Conway <mail@joeconway.com> writes:

I'm working with a client on an application upgrade script which
executes a function to conditionally do an:

ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz

If this is run while the application is concurrently doing inserts into
foo, we are occasionally seeing deadlocks. Aside from the fact that they
are better off not altering the table amid concurrent inserts, I'm
trying to understand why this is even able to happen. I expect one to
block the other, not a deadlock.

Looks like the process trying to do the ALTER has already got some
lower-level lock on the table. It evidently hasn't got
AccessExclusiveLock, but nonetheless has something strong enough to
block an INSERT, such as ShareLock.

regards, tom lane

#3Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#2)
Re: ALTER TABLE deadlock with concurrent INSERT

On 03/02/2011 12:41 PM, Tom Lane wrote:

Looks like the process trying to do the ALTER has already got some
lower-level lock on the table. It evidently hasn't got
AccessExclusiveLock, but nonetheless has something strong enough to
block an INSERT, such as ShareLock.

Hmmm, is it possible that the following might do that, whereas a simple
ALTER TABLE would not?

8<-----------------------------------
BEGIN;

CREATE OR REPLACE FUNCTION change_column_type
(
tablename text,
columnname text,
newtype text
) RETURNS text AS $$
DECLARE
newtypeid oid;
tableoid oid;
curtypeid oid;
BEGIN
SELECT INTO newtypeid oid FROM pg_type WHERE oid =
newtype::regtype::oid;
SELECT INTO tableoid oid FROM pg_class WHERE relname = tablename;
IF NOT FOUND THEN
RETURN 'TABLE NOT FOUND';
END IF;

SELECT INTO curtypeid atttypid FROM pg_attribute WHERE
attrelid = tableoid AND attname::text = columnname;
IF NOT FOUND THEN
RETURN 'COLUMN NOT FOUND';
END IF;

IF curtypeid != newtypeid THEN
EXECUTE 'ALTER TABLE ' || tablename || ' ALTER COLUMN ' ||
columnname || ' SET DATA TYPE ' || newtype;
RETURN 'CHANGE SUCCESSFUL';
ELSE
RETURN 'CHANGE SKIPPED';
END IF;
EXCEPTION
WHEN undefined_object THEN
RETURN 'INVALID TARGET TYPE';
END;
$$ LANGUAGE plpgsql;

SELECT change_column_type('attribute_summary',
'sequence_number',
'numeric');

COMMIT;
8<-----------------------------------

This text is in a file being run from a shell script with something like:

psql dbname < script.sql

The concurrent INSERTs are being done by the main application code
(running on Tomcat).

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

#4Jim Nasby
jim@nasby.net
In reply to: Joe Conway (#3)
Re: ALTER TABLE deadlock with concurrent INSERT

On Mar 2, 2011, at 2:54 PM, Joe Conway wrote:

On 03/02/2011 12:41 PM, Tom Lane wrote:

Looks like the process trying to do the ALTER has already got some
lower-level lock on the table. It evidently hasn't got
AccessExclusiveLock, but nonetheless has something strong enough to
block an INSERT, such as ShareLock.

Hmmm, is it possible that the following might do that, whereas a simple
ALTER TABLE would not?

Impossible to tell without seeing what's in the script... ie: if the script was

BEGIN;
-- Do something to that table that blocks inserts
SELECT change_column_type(...);
COMMIT;

You'd get a deadlock.

The script also has several race conditions:

- Someone could drop the table after you query pg_class
- Someone could alter/drop the column after you query pg_attribute

My suggestion would be to try to grab an exclusive lock on the table as the first line in the function (and then don't do anything cute in the declare section, such as use tablename::regprocedure).

Speaking of which, I would recommend using the regprocedure and regtype casts instead of querying the catalog directly; that way you have working schema support and you're immune from future catalog changes. Unfortunately you'll still have to do things the hard way to find the column (unless we added regcolumn post 8.3), but you might want to use information_schema, or at least see what it's doing there. The query *technically* should include WHERE attnum > 0 (maybe >=) AND NOT attisdropped, though it's probably not a big deal that it isn't since ALTER TABLE will save your bacon there (though, I'd include a comment to that effect to protect anyone who decides to blindly cut and paste that query somewhere else where it does matter...).

8<-----------------------------------
BEGIN;

CREATE OR REPLACE FUNCTION change_column_type
(
tablename text,
columnname text,
newtype text
) RETURNS text AS $$
DECLARE
newtypeid oid;
tableoid oid;
curtypeid oid;
BEGIN
SELECT INTO newtypeid oid FROM pg_type WHERE oid =
newtype::regtype::oid;
SELECT INTO tableoid oid FROM pg_class WHERE relname = tablename;
IF NOT FOUND THEN
RETURN 'TABLE NOT FOUND';
END IF;

SELECT INTO curtypeid atttypid FROM pg_attribute WHERE
attrelid = tableoid AND attname::text = columnname;
IF NOT FOUND THEN
RETURN 'COLUMN NOT FOUND';
END IF;

IF curtypeid != newtypeid THEN
EXECUTE 'ALTER TABLE ' || tablename || ' ALTER COLUMN ' ||
columnname || ' SET DATA TYPE ' || newtype;
RETURN 'CHANGE SUCCESSFUL';
ELSE
RETURN 'CHANGE SKIPPED';
END IF;
EXCEPTION
WHEN undefined_object THEN
RETURN 'INVALID TARGET TYPE';
END;
$$ LANGUAGE plpgsql;

SELECT change_column_type('attribute_summary',
'sequence_number',
'numeric');

COMMIT;
8<-----------------------------------

This text is in a file being run from a shell script with something like:

psql dbname < script.sql

The concurrent INSERTs are being done by the main application code
(running on Tomcat).

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#5Joe Conway
mail@joeconway.com
In reply to: Jim Nasby (#4)
Re: ALTER TABLE deadlock with concurrent INSERT

On 03/03/2011 03:49 PM, Jim Nasby wrote:

On Mar 2, 2011, at 2:54 PM, Joe Conway wrote:

On 03/02/2011 12:41 PM, Tom Lane wrote:

Looks like the process trying to do the ALTER has already got some
lower-level lock on the table. It evidently hasn't got
AccessExclusiveLock, but nonetheless has something strong enough to
block an INSERT, such as ShareLock.

Hmmm, is it possible that the following might do that, whereas a simple
ALTER TABLE would not?

Impossible to tell without seeing what's in the script... ie: if the script was

BEGIN;
-- Do something to that table that blocks inserts
SELECT change_column_type(...);
COMMIT;

You'd get a deadlock.

The script was exactly the one posted, i.e.
BEGIN;
CREATE FUNCTION change_column_type(...);
SELECT change_column_type(...);
COMMIT;

That's all there is to it. And the function itself has no specific
reference to the table being altered. That's why I'm left scratching my
head ;-)

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

#6Jim Nasby
jim@nasby.net
In reply to: Joe Conway (#5)
Re: ALTER TABLE deadlock with concurrent INSERT

On Mar 3, 2011, at 6:26 PM, Joe Conway wrote:

On 03/03/2011 03:49 PM, Jim Nasby wrote:

On Mar 2, 2011, at 2:54 PM, Joe Conway wrote:

On 03/02/2011 12:41 PM, Tom Lane wrote:

Looks like the process trying to do the ALTER has already got some
lower-level lock on the table. It evidently hasn't got
AccessExclusiveLock, but nonetheless has something strong enough to
block an INSERT, such as ShareLock.

Hmmm, is it possible that the following might do that, whereas a simple
ALTER TABLE would not?

Impossible to tell without seeing what's in the script... ie: if the script was

BEGIN;
-- Do something to that table that blocks inserts
SELECT change_column_type(...);
COMMIT;

You'd get a deadlock.

The script was exactly the one posted, i.e.
BEGIN;
CREATE FUNCTION change_column_type(...);
SELECT change_column_type(...);
COMMIT;

That's all there is to it. And the function itself has no specific
reference to the table being altered. That's why I'm left scratching my
head ;-)

I suggest grabbing a snapshot of pg_locks for the connection that's creating the function, and then do the same for the insert and see what could potentially conflict...
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#7Noah Misch
noah@leadboat.com
In reply to: Joe Conway (#1)
Re: ALTER TABLE deadlock with concurrent INSERT

On Wed, Mar 02, 2011 at 12:25:16PM -0800, Joe Conway wrote:

I'm working with a client on an application upgrade script which
executes a function to conditionally do an:

ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz

If this is run while the application is concurrently doing inserts into
foo, we are occasionally seeing deadlocks. Aside from the fact that they
are better off not altering the table amid concurrent inserts, I'm
trying to understand why this is even able to happen. I expect one to
block the other, not a deadlock.

This is 8.4.1 (I know, I know, I have advised strongly that they upgrade
to 8.4.latest).

We have not been able to repeat this forcibly. Here is what the log shows:
------------------------------
2011-02-25 14:38:07 PST [31686]: [1-1] ERROR: deadlock detected
2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL: Process 31686 waits for
AccessExclusiveLock on relation 16896 of database 16386; blocked by
process 31634.
Process 31634 waits for RowExclusiveLock on relation 16902 of
database 16386; blocked by process 31686.
Process 31686: SELECT change_column_type('attribute_summary',
'sequence_number', 'numeric');
Process 31634: insert into attribute_summary (attribute_value,
sequence_number, attribute_id) values ($1, $2, $3)
2011-02-25 14:38:07 PST [31686]: [3-1] HINT: See server log for query
details.
2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT: SQL statement "ALTER
TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric"
PL/pgSQL function "change_column_type" line 18 at EXECUTE statement
2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT: SELECT
change_column_type('attribute_summary', 'sequence_number', 'numeric');
------------------------------

Does relation 16902 (attribute_summary) have a foreign key constraint over the
sequence_number column, in either direction, with relation 16896? That would
explain it:

session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt>
session 2: SELECT 1 FROM rel16896 LIMIT 0;
session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks>
session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896>

Off the cuff, I think you could make sure this never deadlocks with a PL/pgSQL
recipe like this:

LOOP
BEGIN
LOCK TABLE rel16896;
LOCK TABLE attribute_summary NOWAIT;
EXIT;
EXCEPTION WHEN lock_not_available THEN
END;
END LOOP;

Granted, the cure may be worse than the disease.

nm

#8Joe Conway
mail@joeconway.com
In reply to: Noah Misch (#7)
Re: ALTER TABLE deadlock with concurrent INSERT

On 03/03/2011 11:36 PM, Noah Misch wrote:

Does relation 16902 (attribute_summary) have a foreign key constraint over the
sequence_number column, in either direction, with relation 16896? That would
explain it:

session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt>
session 2: SELECT 1 FROM rel16896 LIMIT 0;
session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks>
session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896>

Ah, OK -- then that would explain it as there are foreign keys on that
column. Thanks, hadn't thought about that aspect.

Granted, the cure may be worse than the disease.

Right. I've already advised they shut down the application during the
alter table, which they can do (and in fact already do -- they were
restarting the application just prior to this step, which really makes
no sense anyway).

Thanks,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support