autovacuum process blocks without reporting a deadlock

Started by Thomas Chilleover 18 years ago13 messagesgeneral
Jump to latest
#1Thomas Chille
thomas.chille@gmail.com

Hi anybody,

I step in just one of our identically customer databases in a kind of
a deadlock with Autovacuum involved.

One Autovacuum process stuck in the middle of the night and it seemed
that it compete with another Select process for an index:

[14398 / 2007-11-21 00:52:04 CET]CONTEXT: SQL statement "UPDATE
hst_timerecording SET id_timerecording_join = NULL WHERE
id_timerecording_join = -1"
PL/pgSQL function "set_id_timerecording_join" line 121 at SQL statement
SQL statement "UPDATE hst_timerecording SET sales_volume =
NULL, sales_volume_commission = NULL WHERE business_day = $1 AND
id_employee = $2 "
PL/pgSQL function "compress_salaries_day" line 168 at SQL statement
SQL statement "SELECT compress_salaries_day( $1 , NULL, NULL)"
PL/pgSQL function "compress" line 460 at perform
[14398 / 2007-11-21 00:52:04 CET]LOCATION: exec_stmt_raise, pl_exec.c:2110
[14391 / 2007-11-21 00:52:14 CET]DEBUG: 00000: index
"hst_timerecording_id_timerecording_idx" now contains 8537 row
versions in 61 pages
[14391 / 2007-11-21 00:52:14 CET]DETAIL: 4454 index row versions were removed.
12 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 3.47 sec.
[14391 / 2007-11-21 00:52:14 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736

This are the last log entires for these both processes. Over 9 hours
later, i can see them allready running in the process list :

14391 ? S 0:00 postgres: autovacuum process
backoffice_db
14398 ? S 0:02 postgres: spoon backoffice_db office(39302)
SELECT waiting

This happens every night and a dump restore wont help.

I dont set any explicit locks and so i would expect that no deadlock
could occure? And when, then i would expect that it would be logged
together with the dead locked relations?

Can anyone give a tipp, please?

Version: PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc
(GCC) 4.0.2 (Debian 4.0.2-2)

regards
thomas

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Chille (#1)
Re: autovacuum process blocks without reporting a deadlock

Thomas Chille escribi�:

This are the last log entires for these both processes. Over 9 hours
later, i can see them allready running in the process list :

14391 ? S 0:00 postgres: autovacuum process
backoffice_db
14398 ? S 0:02 postgres: spoon backoffice_db office(39302)
SELECT waiting

Can you paste the relevant lines from pg_locks?

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira s� existe y tu est�s mintiendo" (G. Lama)

#3Thomas Chille
thomas@chille.de
In reply to: Alvaro Herrera (#2)
Re: autovacuum process blocks without reporting a deadlock

i have to wait till monday, then i can provide these lines.

thanks,
thomas

Show quoted text

On Nov 23, 2007 1:32 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Thomas Chille escribió:

This are the last log entires for these both processes. Over 9 hours
later, i can see them allready running in the process list :

14391 ? S 0:00 postgres: autovacuum process
backoffice_db
14398 ? S 0:02 postgres: spoon backoffice_db office(39302)
SELECT waiting

Can you paste the relevant lines from pg_locks?

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Chille (#1)
Re: autovacuum process blocks without reporting a deadlock

"Thomas Chille" <thomas.chille@gmail.com> writes:

One Autovacuum process stuck in the middle of the night and it seemed
that it compete with another Select process for an index:

[14391 / 2007-11-21 00:52:14 CET]DEBUG: 00000: index
"hst_timerecording_id_timerecording_idx" now contains 8537 row
versions in 61 pages

What other indexes does that table have?

regards, tom lane

#5Thomas Chille
thomas@chille.de
In reply to: Tom Lane (#4)
Re: autovacuum process blocks without reporting a deadlock

On Nov 24, 2007 6:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What other indexes does that table have?

regards, tom lane

Hi,

last night it happend again. In the log-snippet u can see all indexes
of this table:

[9293 / 2007-11-26 21:46:28 CET]CONTEXT: SQL statement "UPDATE
hst_timerecording SET id_timerecording_join = NULL WHERE
id_timerecording_join = -1"
PL/pgSQL function "set_id_timerecording_join" line 121 at SQL statement
SQL statement "UPDATE hst_timerecording SET sales_volume =
NULL, sales_volume_commission = NULL WHERE business_day = $1 AND
id_employee = $2 "
PL/pgSQL function "compress_salaries_day" line 168 at SQL statement
SQL statement "SELECT compress_salaries_day( $1 , NULL, NULL)"
PL/pgSQL function "compress" line 460 at perform
[9293 / 2007-11-26 21:46:28 CET]LOCATION: exec_stmt_raise, pl_exec.c:2110
[9317 / 2007-11-26 21:46:34 CET]DEBUG: 00000: index
"hst_timerecording_business_day_idx" now contains 8640 row versions in
80 pages
[9317 / 2007-11-26 21:46:34 CET]DETAIL: 4469 index row versions were removed.
13 index pages have been deleted, 9 are currently reusable.
CPU 0.00s/0.00u sec elapsed 3.40 sec.
[9317 / 2007-11-26 21:46:34 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736
[9317 / 2007-11-26 21:46:39 CET]DEBUG: 00000: index
"hst_timerecording_id_employee_idx" now contains 8640 row versions in
95 pages
[9317 / 2007-11-26 21:46:39 CET]DETAIL: 4469 index row versions were removed.
10 index pages have been deleted, 6 are currently reusable.
CPU 0.00s/0.00u sec elapsed 4.22 sec.
[9317 / 2007-11-26 21:46:39 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736
[9317 / 2007-11-26 21:46:43 CET]DEBUG: 00000: index
"hst_timerecording_id_timerecording_idx" now contains 8640 row
versions in 97 pages
[9317 / 2007-11-26 21:46:43 CET]DETAIL: 4469 index row versions were removed.
11 index pages have been deleted, 5 are currently reusable.
CPU 0.00s/0.00u sec elapsed 4.40 sec.
[9317 / 2007-11-26 21:46:43 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736

This are again the last log-entries of the blocking processes.
Obviously the three indexes where successful vacuumed?

I think this are the relevant pg_locks entries:

relation 75685778 75686189
9017862 25467 AccessShareLock f
relation 75685778 75686189
9009323 9317 ShareUpdateExclusiveLock
t
relation 75685778 75686189
9009312 9293 AccessShareLock t
relation 75685778 75686189
9009312 9293 RowExclusiveLock t
relation 75685778 75686189
9009312 9293 AccessExclusiveLock f
relation 75685778 75686189
9012978 28370 AccessShareLock f

75686189 is the table hst_timerecording. for me it looks like the
autovacuum is not releasing the blocking ShareUpdateExclusiveLock?

I hope this infos could help.

regards, t
thomas

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Chille (#5)
Re: autovacuum process blocks without reporting a deadlock

Thomas Chille wrote:

I think this are the relevant pg_locks entries:

relation 75685778 75686189
9017862 25467 AccessShareLock f
relation 75685778 75686189
9009323 9317 ShareUpdateExclusiveLock
t
relation 75685778 75686189
9009312 9293 AccessShareLock t
relation 75685778 75686189
9009312 9293 RowExclusiveLock t
relation 75685778 75686189
9009312 9293 AccessExclusiveLock f
relation 75685778 75686189
9012978 28370 AccessShareLock f

75686189 is the table hst_timerecording. for me it looks like the
autovacuum is not releasing the blocking ShareUpdateExclusiveLock?

What are the column headings? I find this difficult to read.

Please post the whole of pg_locks. I may be missing something but I
think we're missing part of the picture here. Autovacuum does not seem
to be locking on anything.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
One man's impedance mismatch is another man's layer of abstraction.
(Lincoln Yeoh)

#7Thomas Chille
thomas@chille.de
In reply to: Alvaro Herrera (#6)
Re: autovacuum process blocks without reporting a deadlock

On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
hat are the column headings? I find this difficult to read.

Please post the whole of pg_locks. I may be missing something but I
think we're missing part of the picture here. Autovacuum does not seem
to be locking on anything.

Unfortunately i logged the pg_locks-state not well formated. i added
now the heading manually. maybe it is better or i have to wait till
tomorrow morning.

but what i think i can see, is that the process with pid 9317 is
holding a ShareUpdateExclusiveLock

locktype | database | relation | transaction | pid
| mode | granted
relation | 75685778 | 75686189 | 9009323 | 9317 |
ShareUpdateExclusiveLock | t

and process with pid 9293 is waiting to get AccessExclusiveLock on the
same relation:

relation | 75685778 | 75686189 | 9009312 | 9293 |
AccessShareLock | f

... and 9317 is the autvacuumprocess?

afterkiling it, erything is fine.

i attached the full pg_lock.

regards,
thomas

Attachments:

pg_locks.txttext/plain; name=pg_locks.txtDownload
#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Chille (#7)
Re: autovacuum process blocks without reporting a deadlock

Thomas Chille wrote:

On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
hat are the column headings? I find this difficult to read.

Please post the whole of pg_locks. I may be missing something but I
think we're missing part of the picture here. Autovacuum does not seem
to be locking on anything.

Unfortunately i logged the pg_locks-state not well formated. i added
now the heading manually. maybe it is better or i have to wait till
tomorrow morning.

but what i think i can see, is that the process with pid 9317 is
holding a ShareUpdateExclusiveLock

You missed that 9317 is also holding an ExclusiveLock. What are the
vacuum_cost_delay and autovacuum_vacuum_cost_delay setting? I think
what's happening here is that you're being bitten by the bug that made
autovac sleep because of vacuum_delay, with the exclusive lock held
trying to truncate the table.

The solution is to upgrade to 8.1.10.

Also I just noticed you're on 8.1.4. This is a bad idea because of
another ancient autovacuum bug. As soon as you upgrade, allow
connections to template0 (update pg_database, set datallowconn), connect
to it and issue VACUUM FREEZE. Then disallow connections to it again.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"La experiencia nos dice que el hombre pel� millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelar�an al hombre" (Ijon Tichy)

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#8)
Re: autovacuum process blocks without reporting a deadlock

Alvaro Herrera wrote:

Thomas Chille wrote:

On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
hat are the column headings? I find this difficult to read.

Please post the whole of pg_locks. I may be missing something but I
think we're missing part of the picture here. Autovacuum does not seem
to be locking on anything.

Unfortunately i logged the pg_locks-state not well formated. i added
now the heading manually. maybe it is better or i have to wait till
tomorrow morning.

but what i think i can see, is that the process with pid 9317 is
holding a ShareUpdateExclusiveLock

You missed that 9317 is also holding an ExclusiveLock.

Sorry, I misread it. My analysis is bogus :-) I suggest you upgrade
anyway because there are nasty bugs in the version you are using.

I assume 77862554, 77862552 and 77862553 are indexes on the 75686189
table?

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"La experiencia nos dice que el hombre pel� millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelar�an al hombre" (Ijon Tichy)

#10Thomas Chille
thomas@chille.de
In reply to: Alvaro Herrera (#9)
Re: autovacuum process blocks without reporting a deadlock

yes, u are right.

this are the 3 involved indexes:

hst_timerecording_business_day_idx on hst_timerecording
hst_timerecording_id_employee_idxon hst_timerecording
hst_timerecording_id_timerecording_idxon hst_timerecording

lg t

Show quoted text

On Nov 27, 2007 4:07 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Alvaro Herrera wrote:

Thomas Chille wrote:

On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
hat are the column headings? I find this difficult to read.

Please post the whole of pg_locks. I may be missing something but I
think we're missing part of the picture here. Autovacuum does not seem
to be locking on anything.

Unfortunately i logged the pg_locks-state not well formated. i added
now the heading manually. maybe it is better or i have to wait till
tomorrow morning.

but what i think i can see, is that the process with pid 9317 is
holding a ShareUpdateExclusiveLock

You missed that 9317 is also holding an ExclusiveLock.

Sorry, I misread it. My analysis is bogus :-) I suggest you upgrade
anyway because there are nasty bugs in the version you are using.

I assume 77862554, 77862552 and 77862553 are indexes on the 75686189
table?

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4

"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Chille (#5)
Re: autovacuum process blocks without reporting a deadlock

"Thomas Chille" <thomas@chille.de> writes:

I think this are the relevant pg_locks entries:

relation 75685778 75686189
9017862 25467 AccessShareLock f
relation 75685778 75686189
9009323 9317 ShareUpdateExclusiveLock
t
relation 75685778 75686189
9009312 9293 AccessShareLock t
relation 75685778 75686189
9009312 9293 RowExclusiveLock t
relation 75685778 75686189
9009312 9293 AccessExclusiveLock f
relation 75685778 75686189
9012978 28370 AccessShareLock f

I don't think the vacuum is the problem here. The problem is process
9293, which for some reason is trying to get AccessExclusiveLock, and is
blocked behind autovac, and everything else is stacking up behind it.

You didn't happen to note what 9293 was doing did you? It's living
fairly dangerously in any case by trying to acquire exclusive lock
when it already holds a bunch of other lower-level locks; that's a
recipe for deadlock if I ever saw one.

regards, tom lane

#12Thomas Chille
thomas@chille.de
In reply to: Tom Lane (#11)
Re: autovacuum process blocks without reporting a deadlock

On Nov 27, 2007 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You didn't happen to note what 9293 was doing did you? It's living
fairly dangerously in any case by trying to acquire exclusive lock
when it already holds a bunch of other lower-level locks; that's a
recipe for deadlock if I ever saw one.

regards, tom lane

Ah ok, 9293 is a triggerd process and tries to "ALTER TABLE ...
DISABLE TRIGGER (other trigger)" and so implicitly tries to acquire an
AccessExclusiveLock and runs in a deadlock?

So is better not to use "ALTER TABLE ... " in triggerfunctions,
because there are always existing lower-level locks?

regards,
thomas

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Chille (#12)
Re: autovacuum process blocks without reporting a deadlock

"Thomas Chille" <thomas@chille.de> writes:

Ah ok, 9293 is a triggerd process and tries to "ALTER TABLE ...
DISABLE TRIGGER (other trigger)" and so implicitly tries to acquire an
AccessExclusiveLock and runs in a deadlock?

Well, you're certainly risking deadlock with that; and even if no
actual deadlock happens, waiting for the exclusive lock needed to
do the ALTER can cause the kind of lock queueing you had here.

I'd suggest trying to think of a way to solve your problem that doesn't
need trigger enabling/disabling.

regards, tom lane