db grows and grows

Started by Bjoern Metzdorfalmost 24 years ago29 messagesgeneral
Jump to latest
#1Bjoern Metzdorf
bm@turtle-entertainment.de

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Greetings,
Bjoern

#2scott.marlowe
scott.marlowe@ihs.com
In reply to: Bjoern Metzdorf (#1)
Re: db grows and grows

Look through the mailing list archives for a program called "reindexdb"
that will likely fix your problems. There was a whole thread on this not
more than a month or so ago.

Basically, large updates can cause uncontrolled index growth that vacuum
doesn't fix.

On Tue, 18 Jun 2002, Bjoern Metzdorf wrote:

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Greetings,
Bjoern

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

--
"Force has no place where there is need of skill.", "Haste in every
business brings failures.", "This is the bitterest pain among men, to have
much knowledge but no power." -- Herodotus

#3Alvaro Herrera
alvherre@atentus.com
In reply to: Bjoern Metzdorf (#1)
Re: db grows and grows

On Tue, 18 Jun 2002, Bjoern Metzdorf wrote:

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

Your indexes are probably growing. In extant versions, VACUUM doesn't
shrink them, so you should do it manually using REINDEX. There's a
contributed script that does it in a reasonably automated way, like the
vacuum script does. Look in the archives.

Shouldn't the vacuuming take care of this?

Yes IMVHO, but at present it doesn't.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)

#4Terry Fielder
terry@greatgulfhomes.com
In reply to: Bjoern Metzdorf (#1)
Re: db grows and grows

I had a similar issue:

Look at your indexes, I'll bet they are growing and not recovering unused
pages.

Consider scheduling the REINDEX command like you do for VACUUM.

Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of
Bjoern Metzdorf
Sent: Tuesday, June 18, 2002 10:28 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] db grows and grows

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly
do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and
creating the db,
reinserting the dump and vacuuming again, my pgdata directory
only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what
has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much
faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the
difference there
also.

Any hints?

Greetings,
Bjoern

---------------------------(end of
broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

#5Jason Earl
jason.earl@simplot.com
In reply to: Bjoern Metzdorf (#1)
Re: db grows and grows

"Bjoern Metzdorf" <bm@turtle-entertainment.de> writes:

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Greetings,
Bjoern

Chances are good that your indexes are growing out of control. If you
have tables with a lot of turnover that is almost certainly the
problem. Shaun Thomas has written a script that will reindex your
database. It works very well, but it does lock tables, so it might
not be appropriate for your environment.

The script was posted to this list (a search for reindex turned it up
in my local mirror of the mailing list). If you can't find it feel
free to contact me.

Jason

#6Bill Moran
wmoran@potentialtech.com
In reply to: Bjoern Metzdorf (#1)
Re: db grows and grows

Bjoern Metzdorf wrote:

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Have you tried "VACUUM FULL"?
Note that this needs exclusive table access, so run it when usage is low
or (preferrably) during scheduled downtime.
Maybe if you could squeeze this in once a week or so ...

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

#7Bjoern Metzdorf
bm@turtle-entertainment.de
In reply to: Bjoern Metzdorf (#1)
Re: db grows and grows

Thanks for all pointers to REINDEX. Seems to work perfectly now.

Greetings,
Bjoern

----- Original Message -----
From: "Bjoern Metzdorf" <bm@turtle-entertainment.de>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, June 18, 2002 4:28 PM
Subject: [GENERAL] db grows and grows

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only

contains

1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now.

Queries

get executed much faster.

If I compare the relpages from before and after, I see the difference

there

Show quoted text

also.

Any hints?

Greetings,
Bjoern

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

#8Shaun Thomas
sthomas@townnews.com
In reply to: Jason Earl (#5)
Re: db grows and grows

On 18 Jun 2002, Jason Earl wrote:

The script was posted to this list (a search for reindex turned it up
in my local mirror of the mailing list). If you can't find it feel
free to contact me.

Oh come now, I troll the list once in a while. I sent a copy to him.
My script doesn't really lock anything, it must just be a natural
side-effect of the REINDEX command, kinda like the old VACUUM in
Postgres < 7.2. I can see why, though. It looks like it drops the
index and then recreates it. Not good for integrity if you don't lock.

Ah well, it'll get fixed eventually. How about postgres 8.9?

^_^

-- 
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
#9Andrew Sullivan
andrew@libertyrms.info
In reply to: Shaun Thomas (#8)
Re: db grows and grows

On Wed, Jun 19, 2002 at 09:30:23AM -0500, Shaun Thomas wrote:

On 18 Jun 2002, Jason Earl wrote:

My script doesn't really lock anything, it must just be a natural
side-effect of the REINDEX command,

According to

<http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/locking-tables.html&gt;

CREATE INDEX gets a ShareLock. Which will block UPDATE, DELETE and
INSERT. So, not _quite_ as bad as VACUUM FULL, but not something to
do in the middle of a busy day.

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110
#10Bjoern Metzdorf
bm@turtle-entertainment.de
In reply to: Jason Earl (#5)
Re: db grows and grows

My script doesn't really lock anything, it must just be a natural
side-effect of the REINDEX command,

According to

<http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/locking-tables.
html>

CREATE INDEX gets a ShareLock. Which will block UPDATE, DELETE and
INSERT. So, not _quite_ as bad as VACUUM FULL, but not something to
do in the middle of a busy day.

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

Greetings,
Bjoern

#11Andrew Sullivan
andrew@libertyrms.info
In reply to: Bjoern Metzdorf (#10)
Re: db grows and grows

On Wed, Jun 19, 2002 at 09:09:32PM +0200, Bjoern Metzdorf wrote:

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

But why would a ShareLock cause SELECT to wait? Anyone? ShareLock
isn't supposed to prevent that. It _should_ cause UPDATE to wait,
but not SELECT (or at least, that's what I think the docs say).

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110
#12Varun Kacholia
varunk@cse.iitb.ac.in
In reply to: Andrew Sullivan (#11)
Re: db grows and grows

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

But why would a ShareLock cause SELECT to wait? Anyone? ShareLock
isn't supposed to prevent that. It _should_ cause UPDATE to wait,
but not SELECT (or at least, that's what I think the docs say).

Exaclty i also feel that sharelock shud not make select wait..
(unless pgsql goes bunkers? :-D )

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
+1 416 646 3304 x110

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
------
Varun
Why doesn't DOS ever say "EXCELLENT command or filename!"

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Sullivan (#11)
Re: db grows and grows

Andrew Sullivan <andrew@libertyrms.info> writes:

On Wed, Jun 19, 2002 at 09:09:32PM +0200, Bjoern Metzdorf wrote:

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

But why would a ShareLock cause SELECT to wait?

He was doing REINDEX, which grabs an exclusive lock.

The reason CREATE INDEX can run concurrently with SELECTs is that the
SELECTs won't see (and perhaps try to use) the new index, because its
catalog entries aren't committed yet. REINDEX needs an exclusive lock
to prevent other transactions from trying to use the
index-under-reconstruction, since they certainly would see it as
an available index.

While one can think of hacks that might be able to work around that,
it's not clear to me that we should expend development time on making
REINDEX slicker --- fixing the underlying space-management problem in
btree indexes would be a better expenditure of time IMHO. REINDEX
is really intended for disaster recovery, not routine space management.

regards, tom lane

#14Varun Kacholia
varunk@cse.iitb.ac.in
In reply to: Tom Lane (#13)
Re: db grows and grows

While one can think of hacks that might be able to work around that,
it's not clear to me that we should expend development time on making
REINDEX slicker --- fixing the underlying space-management problem in
btree indexes would be a better expenditure of time IMHO. REINDEX
is really intended for disaster recovery, not routine space management.

ah.. well if you say that REINDEX is for recovery then what should be
used for routine space management? some thing is needed if you need
regular and heavy updates and still dont wanna compromise speed.

regards, tom lane

--
------
Varun
If Bill Gates had a penny for every time Windows crashed......Oh wait, he does.

#15Terry Fielder
terry@greatgulfhomes.com
In reply to: Tom Lane (#13)
Re: db grows and grows

If he needs to REINDEX live without locking out selects, then simply DROP
INDEX <indexname>; and then do a CREATE INDEX ... to recreate it. Drop is
fast, and the create does not acquire the exclusive lock.

The only down side of delete/recreate is that with REINDEX one did not need
to know the statement to recreate the index, and the latter does.

Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Wednesday, June 19, 2002 5:32 PM
To: Andrew Sullivan
Cc: PostgreSQL general list
Subject: Re: [GENERAL] db grows and grows

Andrew Sullivan <andrew@libertyrms.info> writes:

On Wed, Jun 19, 2002 at 09:09:32PM +0200, Bjoern Metzdorf wrote:

During the REINDEX-run I see plenty "SELECT (waiting)"

with ps. So I can

confirm this.

But why would a ShareLock cause SELECT to wait?

He was doing REINDEX, which grabs an exclusive lock.

The reason CREATE INDEX can run concurrently with SELECTs is that the
SELECTs won't see (and perhaps try to use) the new index, because its
catalog entries aren't committed yet. REINDEX needs an exclusive lock
to prevent other transactions from trying to use the
index-under-reconstruction, since they certainly would see it as
an available index.

While one can think of hacks that might be able to work around that,
it's not clear to me that we should expend development time on making
REINDEX slicker --- fixing the underlying space-management problem in
btree indexes would be a better expenditure of time IMHO. REINDEX
is really intended for disaster recovery, not routine space
management.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#16Martijn van Oosterhout
kleptog@svana.org
In reply to: Terry Fielder (#15)
Re: db grows and grows

On Wed, Jun 19, 2002 at 06:37:38PM -0400, terry@greatgulfhomes.com wrote:

If he needs to REINDEX live without locking out selects, then simply DROP
INDEX <indexname>; and then do a CREATE INDEX ... to recreate it. Drop is
fast, and the create does not acquire the exclusive lock.

The only down side of delete/recreate is that with REINDEX one did not need
to know the statement to recreate the index, and the latter does.

A while ago I did post a script that did this, only it did the create first,
then the drop and then renamed the new index to the old one. All within a
transaction so other queries wouldn't be left without an index.

It used pg_dump to get the CREATE INDEX command.

Should be in the archive somewhere...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.

#17Andrew Sullivan
andrew@libertyrms.info
In reply to: Martijn van Oosterhout (#16)
Re: db grows and grows

On Thu, Jun 20, 2002 at 09:38:41AM +1000, Martijn van Oosterhout wrote:

A while ago I did post a script that did this, only it did the create first,

Yes, actually, that's why I was confused. I thought it was that
script that was being run.

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110
#18Fran Fabrizio
ffabrizio@mmrd.com
In reply to: Tom Lane (#13)
Re: db grows and grows

A while ago I did post a script that did this, only it did the create first,
then the drop and then renamed the new index to the old one. All within a
transaction so other queries wouldn't be left without an index.

It used pg_dump to get the CREATE INDEX command.

Should be in the archive somewhere...

Any hints? I'm not having much luck searching the archives for this
script.

Thanks,
Fran

#19Fran Fabrizio
ffabrizio@mmrd.com
In reply to: Tom Lane (#13)
Re: db grows and grows

Nevermind, I just found it.

http://archives.postgresql.org/pgsql-general/2002-05/msg00242.php

Thanks!

-Fran

Show quoted text

On Wed, Jun 19, 2002 at 06:37:38PM -0400, terry@greatgulfhomes.com wrote:

If he needs to REINDEX live without locking out selects, then simply DROP
INDEX <indexname>; and then do a CREATE INDEX ... to recreate it. Drop is
fast, and the create does not acquire the exclusive lock.

The only down side of delete/recreate is that with REINDEX one did not need
to know the statement to recreate the index, and the latter does.

A while ago I did post a script that did this, only it did the create first,
then the drop and then renamed the new index to the old one. All within a
transaction so other queries wouldn't be left without an index.

It used pg_dump to get the CREATE INDEX command.

Should be in the archive somewhere...

#20Shaun Thomas
sthomas@townnews.com
In reply to: Tom Lane (#13)
Re: db grows and grows

On Wed, 19 Jun 2002, Tom Lane wrote:

REINDEX is really intended for disaster recovery, not routine space
management.

That's nice. Except the fact Postgres is so bad at maintaining its own
indexes makes us have to *use* it as routine space management. Having a
200MB index that's still getting bigger, and really should only be 200k,
tells me that there's no better use for it. Sure, I could wait until
the index takes up my entire disk, and then consider it disaster
recovery, but that's just silly.

Besides that, if REINDEX isn't meant for this, what is? Nothing? So if
nothing is meant to fix this, we're just supposed to let this continue
and only use things for their intended purpose? Sure, writing something
that reads the pg_* tables to get index information and drops/creates
them from scratch is nice, but that would also mean recreating foreign
keys. Basically everything aside from basic database structure would
have to be abandoned while your script was running. Or it would have to
be in one huge transaction or something. Either way, that sounds like a
lot of work I'd have to do to fix something that is inherantly
Postgres's fault.

So, I'll at least continue to use REINDEX. I'll encourage other people
to use REINDEX, and I'll even tell my dog to use REINDEX, because right
now, that's the only adequate tool we have.

Besides, I'm not talking about the "REINDEX DATABASE blah" command that
redoes the system-table indexes. That, I agree, is really only
system-recovery, because I can only run it in standalone mode. But
REINDEX TABLE and REINDEX INDEX do not have that problem, and so I'll
use them. Bob will use them, Fred will use them, and my dog will use
them until something better comes along, or it's no longer necessary.

So enjoy the vacuumdb script everyone, it's likely all you'll get for a
long time, since not even the developers themselves know how to keep
indexes from growing. Maybe over the next few weeks, I'll delve into
the code and see if I can't come up with something. More eyes, right?

-- 
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
#21Bjoern Metzdorf
bm@turtle-entertainment.de
In reply to: Shaun Thomas (#20)
#22Varun Kacholia
varunk@cse.iitb.ac.in
In reply to: Bjoern Metzdorf (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Bjoern Metzdorf (#21)
#24scott.marlowe
scott.marlowe@ihs.com
In reply to: Martijn van Oosterhout (#16)
#25Robert Treat
xzilla@users.sourceforge.net
In reply to: Bruce Momjian (#23)
#26Terry Fielder
terry@greatgulfhomes.com
In reply to: Robert Treat (#25)
#27Martijn van Oosterhout
kleptog@svana.org
In reply to: Terry Fielder (#26)
#28Robert Treat
xzilla@users.sourceforge.net
In reply to: Martijn van Oosterhout (#27)
#29Shaun Thomas
sthomas@townnews.com
In reply to: Bjoern Metzdorf (#21)