vacuum slowness

Started by Bruce Momjianalmost 27 years ago12 messages
#1Bruce Momjian
maillist@candle.pha.pa.us

I just deleted all 50,000 rows from a table that has one int4 and one text
field.

Why does vacuum take so long? If all the rows are superceeded, so no
rows actually have to be moved, should it take so long for vacuum to
run?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: [HACKERS] vacuum slowness

Bruce Momjian <maillist@candle.pha.pa.us> writes:

I just deleted all 50,000 rows from a table that has one int4 and one text
field.

Why does vacuum take so long? If all the rows are superceeded, so no
rows actually have to be moved, should it take so long for vacuum to
run?

Do you have any indexes on the table? I've noticed (and complained in
the past ;-)) that vacuuming a table takes unreasonably long if there
are a lot of dead index entries to be cleaned. It seems faster to drop
and recreate the index in a case like that.

regards, tom lane

#3Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#1)
Re: vacuum slowness

Bruce Momjian wrote:

I just deleted all 50,000 rows from a table that has one int4 and one text
field.

Why does vacuum take so long? If all the rows are superceeded, so no
rows actually have to be moved, should it take so long for vacuum to
run?

Indices?

Vadim

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim Mikheev (#3)
Re: [HACKERS] Re: vacuum slowness

Bruce Momjian wrote:

I just deleted all 50,000 rows from a table that has one int4 and one text
field.

Why does vacuum take so long? If all the rows are superceeded, so no
rows actually have to be moved, should it take so long for vacuum to
run?

Indices?

Yes. That seems to be the problem. 45k lines, COPY is fast, DELETE is
fast if there are no indexes. With an index, it takes a long time.
Bummer. Ideas?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Wayne Piekarski
wayne@gateway.senet.com.au
In reply to: Bruce Momjian (#4)
Re: [HACKERS] vacuum slowness

Tom Lane writes:

Bruce Momjian <maillist@candle.pha.pa.us> writes:

I just deleted all 50,000 rows from a table that has one int4 and one

text

field.

Why does vacuum take so long? If all the rows are superceeded, so no
rows actually have to be moved, should it take so long for vacuum to
run?

Do you have any indexes on the table? I've noticed (and complained in
the past ;-)) that vacuuming a table takes unreasonably long if there
are a lot of dead index entries to be cleaned. It seems faster to drop
and recreate the index in a case like that.

Hi everyone,

I am working on a large project right now which involves the use of a
table that has thousands of inserts and updates performed each day. (At
the end of the day, about 20000 inserts have occured, and each inserted
row gets modified 2 or 3 times) Vacuums take absolutely ages and
unfortunately the system must run continuously 24 hours per day so I can't
afford to have the table locked for ages while it is being vacuumed.

I've played around with vacuum quite a bit, and I've found that if I do
one huge vacuum every so often, it takes longer than if I do lots of
vacuum's during the day, this way the tables are kept more 'compacted' and
there is less moving around of data required, and so it runs a bit faster.

As the number of days of new data stored increases, the size of the tables
grows to the point where a vacuum can take 10 minutes or so, and this is
unacceptable considering it occurs in a few seconds without indexes. To
get around this, once every day, I grab entries which are in the active
table that are older than two days, and move them into an archive table
which never changes. This way, I can keep the active table small and do
vacuums within a minute or so, allowing me to keep my software from
waiting too long. I'd really like to avoid doing this though, because it
causes complications - lately I've found that vacuuming is becoming a
major hassle which I'd rather not have to do at all :)

What I was wanting to know if there was a way of temporarily disabling
indexes while the vacuum is occuring, and then update it all in one hit
once the update is completely finished. This would be equivalent to
dropping and recreating them, but I don't want to do that in case
something dies during the vacuum and my tables are left without indexes on
them.

Or perhaps telling Postgres to do a partial vacuum, with a time limit set
to say 20 seconds and it will do it in stages over the period of a day.
This way the database can still run and we can keep the dbms cleaned. From
what I understand, the new MVCC support in 6.5 will be able to do vacuum's
in the background, or is this for the future?

Also, I had a look at the src/commands/vacuum.c code, and had a bit of a
read through it. One thing I wasn't sure about is the method it uses to
move the rows around while it is doing the index. Lets say that we have
100 rows, and the first one is deleted and so is empty. Does every single
row get moved back one, or does only one row get moved to fill in the
empty gap?

Is the vacuum code moving tons of rows around the table, causing the
indexes to be updated lots of times and slowing things down?

If someone could give me some hints about how to best handle my tables to
get good vacuum times I would really appreciate it.

btw, keep up the good work everyone, I've been following this mailing list
and developing with Postgres since the days of pre-6.0 and I'm very
impressed with all the great improvements that have been made to Postgres
over the years!

Thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski Tel: (08) 8221 5221
Research & Development Manager Fax: (08) 8221 5220
SE Network Access Pty Ltd Mob: 0407 395 889
222 Grote Street Email: wayne@senet.com.au
Adelaide SA 5000 WWW: http://www.senet.com.au

#6Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: vacuum slowness

Bruce Momjian wrote:

Indices?

Yes. That seems to be the problem. 45k lines, COPY is fast, DELETE is
fast if there are no indexes. With an index, it takes a long time.
Bummer. Ideas?

I hope to implement space re-using and address vacuum slowness
in 6.6

Vadim

#7Clark Evans
clark.evans@manhattanproject.com
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: vacuum slowness

Vadim Mikheev wrote:

I hope to implement space re-using and address vacuum slowness in 6.6

Are you intending to keep it so that you could still run PostgreSQL
on top of a WORM (Write once Read Many) device? I'm plannng to
put some databases directly on these new write-only DVD drives
coming out.... I'd want to keep the indexes on a (WMRM) hard drive though.

:) Clark

#8Vadim Mikheev
vadim@krs.ru
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: vacuum slowness

Clark Evans wrote:

Vadim Mikheev wrote:

I hope to implement space re-using and address vacuum slowness in 6.6

Are you intending to keep it so that you could still run PostgreSQL
on top of a WORM (Write once Read Many) device? I'm plannng to
put some databases directly on these new write-only DVD drives
coming out.... I'd want to keep the indexes on a (WMRM) hard drive though.

Is it possible to use WORM now?

Vadim

#9Clark Evans
clark.evans@manhattanproject.com
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: vacuum slowness

Vadim Mikheev wrote:

Is it possible to use WORM now?

I don't know, but it's on my to-try list. I'm hoping it
will work (got all excited when I was reading the acedemic papers)
This was one of the goals of the database...

It just seems for situations where a high degree of auditability is
needed that running the database on top of a WORM is a fantastic idea.
I'm writing a bookkeeping system, and think it would be a very
valueable reason to move to 'free software'. It's the killer feature
Oracle dosn't have. Well, acedemically it sounds nice. *smirk*

It's all speculation, but fun speculation anyway...

:) Clark

P.S. Perhaps it's not all that great of an idea. I intend to journal
all of the interactions with the database to a CDR, I was just hoping
to get it for free.... *evil grin*

#10Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Vadim Mikheev (#6)
RE: [HACKERS] Re: vacuum slowness

-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Vadim Mikheev
Sent: Thursday, March 18, 1999 2:07 PM
To: Bruce Momjian
Cc: hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: vacuum slowness

Bruce Momjian wrote:

Indices?

Yes. That seems to be the problem. 45k lines, COPY is fast, DELETE is
fast if there are no indexes. With an index, it takes a long time.
Bummer. Ideas?

I hope to implement space re-using and address vacuum slowness
in 6.6

We would be able to vacuum without blocking same-table writers in v6.5 ?
Or would VACUUM block same-table readers as VACUUM does currently ?

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#11Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: vacuum slowness

Is it possible to use WORM now?

I don't know, but it's on my to-try list. I'm hoping it
will work (got all excited when I was reading the acedemic papers)
This was one of the goals of the database...

... which we probably gave up when we removed time travel, quite a
while ago.

- Tom

#12Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Wayne Piekarski (#5)
Re: [HACKERS] vacuum slowness

6.5beta has a much faster vacuumer when indexes are used. Please try
that when you can.

Tom Lane writes:

Bruce Momjian <maillist@candle.pha.pa.us> writes:

I just deleted all 50,000 rows from a table that has one int4 and one

text

field.

Why does vacuum take so long? If all the rows are superceeded, so no
rows actually have to be moved, should it take so long for vacuum to
run?

Do you have any indexes on the table? I've noticed (and complained in
the past ;-)) that vacuuming a table takes unreasonably long if there
are a lot of dead index entries to be cleaned. It seems faster to drop
and recreate the index in a case like that.

Hi everyone,

I am working on a large project right now which involves the use of a
table that has thousands of inserts and updates performed each day. (At
the end of the day, about 20000 inserts have occured, and each inserted
row gets modified 2 or 3 times) Vacuums take absolutely ages and
unfortunately the system must run continuously 24 hours per day so I can't
afford to have the table locked for ages while it is being vacuumed.

I've played around with vacuum quite a bit, and I've found that if I do
one huge vacuum every so often, it takes longer than if I do lots of
vacuum's during the day, this way the tables are kept more 'compacted' and
there is less moving around of data required, and so it runs a bit faster.

As the number of days of new data stored increases, the size of the tables
grows to the point where a vacuum can take 10 minutes or so, and this is
unacceptable considering it occurs in a few seconds without indexes. To
get around this, once every day, I grab entries which are in the active
table that are older than two days, and move them into an archive table
which never changes. This way, I can keep the active table small and do
vacuums within a minute or so, allowing me to keep my software from
waiting too long. I'd really like to avoid doing this though, because it
causes complications - lately I've found that vacuuming is becoming a
major hassle which I'd rather not have to do at all :)

What I was wanting to know if there was a way of temporarily disabling
indexes while the vacuum is occuring, and then update it all in one hit
once the update is completely finished. This would be equivalent to
dropping and recreating them, but I don't want to do that in case
something dies during the vacuum and my tables are left without indexes on
them.

Or perhaps telling Postgres to do a partial vacuum, with a time limit set
to say 20 seconds and it will do it in stages over the period of a day.
This way the database can still run and we can keep the dbms cleaned. From
what I understand, the new MVCC support in 6.5 will be able to do vacuum's
in the background, or is this for the future?

Also, I had a look at the src/commands/vacuum.c code, and had a bit of a
read through it. One thing I wasn't sure about is the method it uses to
move the rows around while it is doing the index. Lets say that we have
100 rows, and the first one is deleted and so is empty. Does every single
row get moved back one, or does only one row get moved to fill in the
empty gap?

Is the vacuum code moving tons of rows around the table, causing the
indexes to be updated lots of times and slowing things down?

If someone could give me some hints about how to best handle my tables to
get good vacuum times I would really appreciate it.

btw, keep up the good work everyone, I've been following this mailing list
and developing with Postgres since the days of pre-6.0 and I'm very
impressed with all the great improvements that have been made to Postgres
over the years!

Thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski Tel: (08) 8221 5221
Research & Development Manager Fax: (08) 8221 5220
SE Network Access Pty Ltd Mob: 0407 395 889
222 Grote Street Email: wayne@senet.com.au
Adelaide SA 5000 WWW: http://www.senet.com.au

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026