BUG #13472: VACUUM ANALYZE hangs on certain tables
The following bug has been logged on the website:
Bug reference: 13472
Logged by: Matthew Seaman
Email address: matthew.seaman@adestra.com
PostgreSQL version: 9.2.10
Operating system: FreeBSD 10.1-RELEASE-p9
Description:
We have two tables which we have not been able to vacuum sucessfully on one
of our database servers. The vacuum process just hangs almost
instantaneously after issuing the 'VACUUM ANALYZE' command.
Here's the result of truss'ing the psql process like so: db-21a:/tmp:# truss
-p 85976 -d -s 128 -f -o /tmp/postgres-vacuum.truss
db-21a:/tmp:# cat postgres-vacuum.truss
85976: -1435316384.769176354 SIGNAL 17 (SIGSTOP)
85976: 3.038535625 recvfrom(10,"Q\0\0\0\^_VACUUM ANALYZE page_tags
;\0",8192,0x0,NULL,0x0) = 32 (0x20)
85976: 3.038826780 open("pg_stat_tmp/pgstat.stat",O_RDONLY,0666) = 21
(0x15)
85976: 3.038891300 fstat(21,{ mode=-rw-------
,inode=7142821,size=862100,blksize=32768 }) = 0 (0x0)
85976: 3.039011646
read(21,"\M^Z\M-<\M-%\^A\M-*\M-}\M-_gh\M-<\^A\0+B\0\0\0\0\0\0#\^U\0\0\0\0\0\0o\M-2\M-\\M^F\0\0\0\0\M-)3;\0\0\0\0\0\M-g\^^`2\0\0\0\0\M-H"...,32768)
= 32768 (0x8000)
85976: 3.040517980 close(21) = 0 (0x0)
85976: 3.040590876
sendto(9,"\^A\0\0\0\^P\0\0\0g\M^H\^Xhh\M-<\^A\0",16,0x0,NULL,0x0) = 16
(0x10)
85976: 3.050990883 select(0,0x0,0x0,0x0,{0.010000 }) = 0 (0x0)
85976: 3.051077076 open("pg_stat_tmp/pgstat.stat",O_RDONLY,0666) = 21
(0x15)
85976: 3.051124309 fstat(21,{ mode=-rw-------
,inode=7142808,size=862100,blksize=32768 }) = 0 (0x0)
85976: 3.051211802 read(21,"\M^Z\M-<\M-%\^A\M-z0
hh\M-<\^A\0+B\0\0\0\0\0\0#\^U\0\0\0\0\0\0o\M-2\M-\\M^F\0\0\0\0\M-)3;\0\0\0\0\0\M-g\^^`2\0\0\0\0\M-H\M-V\M^P"...,32768)
= 32768 (0x8000)
85976: 3.052608356 close(21) = 0 (0x0)
85976: 3.052693536 open("pg_stat_tmp/pgstat.stat",O_RDONLY,0666) = 21
(0x15)
85976: 3.052743522 fstat(21,{ mode=-rw-------
,inode=7142808,size=862100,blksize=32768 }) = 0 (0x0)
85976: 3.052812245 read(21,"\M^Z\M-<\M-%\^A\M-z0
hh\M-<\^A\0+B\0\0\0\0\0\0#\^U\0\0\0\0\0\0o\M-2\M-\\M^F\0\0\0\0\M-)3;\0\0\0\0\0\M-g\^^`2\0\0\0\0\M-H\M-V\M^P"...,32768)
= 32768 (0x8000)
85976: 3.054225196
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\M^D"...,32768)
= 32768 (0x8000)
85976: 3.055429568
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.056594506
read(21,"\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.057768974
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T\M-\\M-v%\0\0\0\0\0K\^B\0\0\0\0\0\0\^F\b\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.058982546
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T1\M-x%\0\0\0\0\0\^D\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.060145744
read(21,"\0O\M^V$\M--\\\M-<\^A\0:\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\M-?$\M--\\\M-<\^A\0:\0\0\0\0\0\0\0\M-A8z\^S\M-O\M-7\^A"...,32768)
= 32768 (0x8000)
85976: 3.061483712
read(21,"\0\0\0\M-3\^_\0\0\0\0\0\0$\^]\0\0\0\0\0\0\M-e\M-oC\M--\\\M-<\^A\0:\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0@\M-sC\M--\\\M-<"...,32768)
= 32768 (0x8000)
85976: 3.062796877
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\M-I\M-V\^P\M-0\\"...,32768)
= 32768 (0x8000)
85976: 3.064062535
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.065262827
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.066425202
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0T\M^@\n\0\0\0\0\0\0\^C\0\0\0\0\0\0\0\^C\0\0\0\0\0\0\0\^C\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.067633907
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T:\^C\0\0\0\0\0\0\^B\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.068792535
read(21,"\^A\08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0>\M-K"\M-.\\\M-<\^A\08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T\M^[_&"...,32768)
= 32768 (0x8000)
85976: 3.069993310
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\M-h\M^F\M-,\M--\\\M-<\^A\08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\M-k\M^G\M-,\M--\\\M-<\^A\08\0"...,32768)
= 32768 (0x8000)
85976: 3.071352321
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\t\0\0\0\0\0\0\0\^E\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.072538236
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\M^PI\M--\M-."...,32768)
= 32768 (0x8000)
85976: 3.073751354
read(21,"\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\b\0\0"...,32768)
= 32768 (0x8000)
85976: 3.074951546
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.076144411
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T\M-v\^P\M^N\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.077484792
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0T\M-|\^]\M-!\0\0\0\0\0\M^L"...,32768)
= 32768 (0x8000)
85976: 3.078847630
read(21,"\0\0\0\0\0O"\M-$\M-:\\\M-<\^A\0005\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.080215317
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\M-<\M-?}\M-a\\\M-<\^A\0/\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.081569181
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0{\^E\0\0\0\0\0\0B\^D\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 32768 (0x8000)
85976: 3.082836663
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\M-_\M-+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^RL\0\0\0\0\0\0.\t\0\0\0\0\0\0^\M^N"...,32768)
= 32768 (0x8000)
85976: 3.084285987
read(21,"\0\0\0\0\0v+\^B\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0v+\^B\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\M-v\M-b"...,32768)
= 32768 (0x8000)
85976: 3.085686937
read(21,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,32768)
= 10132 (0x2794)
85976: 3.086923792 close(21) = 0 (0x0)
85976: 3.087002286 lseek(11,0x0,SEEK_END) = 8192 (0x2000)
85976: 3.087083538 lseek(17,0x0,SEEK_END) = 483328 (0x76000)
85976: 3.088391400
madvise(0x80507c000,0x1000,0x5,0xaaaaaaaaaaaaaaab,0x805003eb8,0x802759be0) =
0 (0x0)
85976: 3.088444000
madvise(0x805217000,0x8000,0x5,0xaaaaaaaaaaaaaaab,0x805003eb8,0x802759be0) =
0 (0x0)
85976: 3.088511966
madvise(0x8052a1000,0x58000,0x5,0xaaaaaaaaaaaaaaab,0x805003eb8,0x802759be0)
= 0 (0x0)
85976: 3.088565242
madvise(0x805252000,0x2000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffcd00,0x802759be0)
= 0 (0x0)
85976: 3.088629049
madvise(0x805248000,0xa000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffcd00,0x802759be0)
= 0 (0x0)
85976: 3.088672235
madvise(0x805254000,0x4000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffcd00,0x802759be0)
= 0 (0x0)
85976: 3.088742918
madvise(0x805281000,0x20000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffcd00,0x802759be0)
= 0 (0x0)
85976: 3.088807845
madvise(0x80525f000,0x1c000,0x5,0xaaaaaaaaaaaaaaab,0x805003888,0x802759be0)
= 0 (0x0)
85976: 3.088875758 setitimer(0,{0.000000, 1.000000 },0x0) = 0 (0x0)
85976: 4.089210052 semop(0x3000b,0x7fffffffd550,0x1,0x0,0x20,0x1431e650f0)
ERR#4 'Interrupted system call'
85976: 4.089210052 SIGNAL 14 (SIGALRM)
85976: 4.089414257 sigprocmask(SIG_SETMASK,SIGALRM,0x0) = 0 (0x0)
85976: 4.089493593
semop(0x3000b,0x7fffffffcb28,0x1,0x0,0x6559a0,0x1431e650f0) = 0 (0x0)
85976: 4.089541500
sigreturn(0x7fffffffcb80,0x7fffffffcb80,0x301,0x0,0xfffffffffffffbc0,0x1431e650f0)
ERR#4 'Interrupted system call'
85976: 4.089594057 semop(0x3000b,0x7fffffffd550,0x1,0x0,0x20,0x1431e650f0) =
0 (0x0)
85976: 4.089914562 write(2,"\0\0N\^A\M-XO\^A\0t2015-06-26 11:59:47 BST 85976
pgsql [local] psql amf_more2 2015-06-26 11:38:43 BST [11]LOG: process 85976
st"...,343) = 343 (0x157)
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hello,m
On 2015-06-26 11:08:30 +0000, matthew.seaman@adestra.com wrote:
We have two tables which we have not been able to vacuum sucessfully on one
of our database servers. The vacuum process just hangs almost
instantaneously after issuing the 'VACUUM ANALYZE' command.
Hm. Is there a chance that your application uses cursors?
Could you send the output of:
SELECT oid::regclass, txid_current(), relfrozenxid, age(relfrozenxid)
FROM pg_class
WHERE oid = 'your relation'::regclass;
My theory is that there's a page that needs to be frozen, but a pin on
the page exists, preventing the cleanup lock from being acquired. Or
rather delaying it long enough that little progress is made.
Greetings,
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On 2015-06-26 15:39:37 +0100, Matthew Seaman wrote:
On 06/26/15 14:53, Andres Freund wrote:
Hello,m
On 2015-06-26 11:08:30 +0000, matthew.seaman@adestra.com wrote:
We have two tables which we have not been able to vacuum sucessfully on one
of our database servers. The vacuum process just hangs almost
instantaneously after issuing the 'VACUUM ANALYZE' command.Hm. Is there a chance that your application uses cursors?
We do use cursors, but not all the time. They're only used for some
ad-hoc data exports that run occasionally. There's no cursors in use on
that database at the moment.
Hm. And there was none when the vacuum was hanging last?
What level of concurrency do you have? Is it possible/likely that there
is a number of accesses to the same page going on all the time?
How large is that table?
Could you send the output of:
SELECT oid::regclass, txid_current(), relfrozenxid, age(relfrozenxid)
FROM pg_class
WHERE oid = 'your relation'::regclass;My theory is that there's a page that needs to be frozen, but a pin on
the page exists, preventing the cleanup lock from being acquired. Or
rather delaying it long enough that little progress is made.Certainly:
amf_more2=# SELECT oid::regclass, txid_current(), relfrozenxid,
age(relfrozenxid)
FROM pg_class
WHERE oid = 'page_tags'::regclass;
oid | txid_current | relfrozenxid | age
-----------+--------------+--------------+-----------
page_tags | 633097656 | 370306842 | 262790814
(1 row)
Hm. So that table is already considerably older than the default
autovacuum_freeze_max_age (200000000). If indeed pins are the problem
they already migh have held up autovacuum for a while.
Is that the value you currently have configured?
One easy way to confirm whether this is the root cause would be to
change autovacuum_freeze_max_age to be significantly higher (just uses a
bit more disk space, but is also more efficient overall); but that
requires a restart which I guess you're not looking forward to doing?
Greetings,
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Import Notes
Reply to msg id not found: 558D6429.7080400@adestra.com
On 06/26/15 16:21, Matthew Seaman wrote:
Yes. There are in fact two table showing these symptoms, and they both
get quite frequently updated. Periods of activity tend to wax and wane
though, and both are pretty idle at the moment except for autovacuum
processes "autovacuum: VACUUM public.page_tags (to prevent wraparound)"Even so, I'm still seeing the same symptoms: vacuum analyze hangs
My colleague has just informed me that they had been seeing vacuum hang
for around a week before the autovacuum 'to prevent wraparound' was
triggered.
Cheers,
Matthew
Import Notes
Reply to msg id not found: 558D6DEF.3090208@adestra.com
On 2015-06-26 16:21:19 +0100, Matthew Seaman wrote:
Restarting the DB is going to require all sorts of outage notifications
and general palaver, so we'd definitely prefer not to that if we can
possibly avoid it.
Is it feasible to lock the table exclusively for a short time? If so you
can just VACUUM FULL them - given the table sizes that should be fairly
quick.
(The difference between table level exclusive locks and cleanup locks is
that the latter aren't "fair", i.e. a continious stream of pins will
block them, whereas heavyweight locks are queued)
- Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Import Notes
Reply to msg id not found: 558D6DEF.3090208@adestra.com
On 2015/06/26 16:58, Andres Freund wrote:
On 2015-06-26 16:21:19 +0100, Matthew Seaman wrote:
Restarting the DB is going to require all sorts of outage notifications
and general palaver, so we'd definitely prefer not to that if we can
possibly avoid it.Is it feasible to lock the table exclusively for a short time? If so you
can just VACUUM FULL them - given the table sizes that should be fairly
quick.(The difference between table level exclusive locks and cleanup locks is
that the latter aren't "fair", i.e. a continious stream of pins will
block them, whereas heavyweight locks are queued)
We have found the culprit -- a long running query was apparently locking
one of the tables, and killing it has allowed the vacuum to complete.
We had checked pg_locks but for some reason this query wasn't obvious.
Thanks for you help, and I think we can close this one.
Cheers
Matthew (posting from the account I subscribed to the list with, but I
am the same person as matthew.seaman@adestra.com)