vacuum taking an unusually long time

Started by Mason Halealmost 18 years ago9 messagesgeneral
Jump to latest
#1Mason Hale
masonhale@gmail.com

Vacuum operations on several tables are taking much longer than they
previously were.We currently have 3 autovacuum processes that have been
running more than 3 days each.

The tables are large (between 40 and 90GB each).

Postgresql version is 8.3.1
maintenance_work_mem is 512MB (on a 32GB server).

Any ideas what would make vacuum take so long?
What can I do to speed things up?

thanks,
- Mason

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Mason Hale (#1)
Re: vacuum taking an unusually long time

On Mon, Jul 14, 2008 at 3:08 PM, Mason Hale <masonhale@gmail.com> wrote:

Vacuum operations on several tables are taking much longer than they
previously were.
We currently have 3 autovacuum processes that have been running more than 3
days each.
The tables are large (between 40 and 90GB each).
Postgresql version is 8.3.1
maintenance_work_mem is 512MB (on a 32GB server).
Any ideas what would make vacuum take so long?
What can I do to speed things up?

Have you adjusted your vacuum / autovacuum cost parameters up? that
will certainly slow down vacuums.

#3Mason Hale
masonhale@gmail.com
In reply to: Scott Marlowe (#2)
Re: vacuum taking an unusually long time

None of these values have changed recently.
The values are:

vacuum_cost_delay = 10ms
vacuum_cost_limit = 200

Are there any other values I should be looking at?

The longest running vacuum has been running more than 6 days at this point.

Thanks,
Mason

On Mon, Jul 14, 2008 at 4:39 PM, Scott Marlowe <scott.marlowe@gmail.com>
wrote:

Show quoted text

On Mon, Jul 14, 2008 at 3:08 PM, Mason Hale <masonhale@gmail.com> wrote:

Vacuum operations on several tables are taking much longer than they
previously were.
We currently have 3 autovacuum processes that have been running more than

3

days each.
The tables are large (between 40 and 90GB each).
Postgresql version is 8.3.1
maintenance_work_mem is 512MB (on a 32GB server).
Any ideas what would make vacuum take so long?
What can I do to speed things up?

Have you adjusted your vacuum / autovacuum cost parameters up? that
will certainly slow down vacuums.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mason Hale (#3)
Re: vacuum taking an unusually long time

"Mason Hale" <masonhale@gmail.com> writes:

The longest running vacuum has been running more than 6 days at this point.

Is it actually *doing* anything, or is it just blocked waiting for
someone else? strace or local equivalent would be the most definitive
way to check.

regards, tom lane

#5Mason Hale
masonhale@gmail.com
In reply to: Tom Lane (#4)
Re: vacuum taking an unusually long time

Here's some of the strace output:
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
semop(9895945, 0x7fff1321db70, 1) = 0
read(72, "\233\7\0\0H\207f2\1\0\1\0`\0\0 \0 \4 \0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "!\5\0\0\370\277\371\247\1\0\1\0\214\0\330\23\0 \4
\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0h]\333\217\1\0\1\0X\0\310\v\0 \4
\0\0\0\0\0\0\0\0X\235H\5"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0(\260\333\217\1\0\1\0\\\0\200\v\0 \4
\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0p\305\333\217\1\0\1\0\\\0p\v\0 \4
\0\0\0\0\370\235\20\4\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "!\5\0\0\10\330\371\247\1\0\1\0P\0\240\10\0 \4
\0\0\0\0\0\0\0\0\360\234 \6"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\233\7\0\0\340\212f2\1\0\1\0t\0(\35\0 \4
\0\0\0\0\0\0\0\0(\235\250\5"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\24\10\0\0\230\333\372\207\1\0\1\0`\0\330\n\0 \4
\0\0\0\0X\220@\4\7\0\1\0"...,
8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
semop(9830407, 0x7fff1321db90, 1) = 0
read(72, "\362\n\0\0H\316,r\1\0\1\0\204\0P\r\0 \4
_\17L\21x\235\20\5\5\0\1\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\177\5\0\0\300\317A\276\1\0\1\0p\0\210\1\0 \4
\0\0\0\0\26\0\1\0\30\235\310\5"..., 8192) = 8192

All those "select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)" commands
sure look suspicious.

Any thoughts?

Mason

On Tue, Jul 15, 2008 at 10:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

"Mason Hale" <masonhale@gmail.com> writes:

The longest running vacuum has been running more than 6 days at this

point.

Is it actually *doing* anything, or is it just blocked waiting for
someone else? strace or local equivalent would be the most definitive
way to check.

regards, tom lane

#6Mason Hale
masonhale@gmail.com
In reply to: Mason Hale (#5)
Re: vacuum taking an unusually long time

On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com> wrote:

Here's some of the strace output:
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)

If I read the 'select(2)' man page correctly, it appears this process is
waiting indefinitely for a NULL file descriptor.
That looks pretty stuck to me.

Should I kill this autovacuum process via pg_cancel_backend?
Or is there a better way to "unstick" it?

Any ideas what may have caused it to get into this condition?

- Mason

#7Steve Atkins
steve@blighty.com
In reply to: Mason Hale (#6)
Re: vacuum taking an unusually long time

On Jul 16, 2008, at 4:40 PM, Mason Hale wrote:

On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com>
wrote:
Here's some of the strace output:

select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)

If I read the 'select(2)' man page correctly, it appears this
process is waiting indefinitely for a NULL file descriptor.
That looks pretty stuck to me.

No, it's just sleeping. For 10 milliseconds or so.

Should I kill this autovacuum process via pg_cancel_backend?
Or is there a better way to "unstick" it?

Any ideas what may have caused it to get into this condition?

It looks like "read a page, sleep for 80 milliseconds, repeat".

I'd look at your settings for autovacuum_vacuum_cost_limit /
autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum.

Cheers,
Steve

#8Mason Hale
masonhale@gmail.com
In reply to: Steve Atkins (#7)
Re: vacuum taking an unusually long time

On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com> wrote:

Here's some of the strace output:

select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)

If I read the 'select(2)' man page correctly, it appears this process is
waiting indefinitely for a NULL file descriptor.
That looks pretty stuck to me.

No, it's just sleeping. For 10 milliseconds or so.

Ah, thanks.

Should I kill this autovacuum process via pg_cancel_backend?

Or is there a better way to "unstick" it?

Any ideas what may have caused it to get into this condition?

It looks like "read a page, sleep for 80 milliseconds, repeat".

I'd look at your settings for autovacuum_vacuum_cost_limit /
autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum.

autovacuum_vacuum_cost_limit = -1
autovacuum_vacuum_cost_delay = 20ms

'select * from pg_autovacuum;' returns zero rows

I killed the one autovacuum process already that I thought was hung (via
pg_cancel_backend).
I can see one other autovacuum process running. The other vacuum process was
started manually.
Should I be concerned that pg_autovacuum is empty when I have an autovacuum
process running?

Mason

Show quoted text

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mason Hale (#8)
Re: vacuum taking an unusually long time

"Mason Hale" <masonhale@gmail.com> writes:

It looks like "read a page, sleep for 80 milliseconds, repeat".

That's what it looks like to me too.

I'd look at your settings for autovacuum_vacuum_cost_limit /
autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum.

autovacuum_vacuum_cost_limit = -1
autovacuum_vacuum_cost_delay = 20ms

That process is *clearly* not using those vacuum cost parameters ---
it's evidently using a delay of 80ms and some completely over-aggressive
cost settings that're making it sleep for each single page read.

So you need to find out where those whacked-out values are coming from.
pg_autovacuum might be a likely source. Or maybe you just forgot a
SIGHUP after a recent change to postgresql.conf?

regards, tom lane