db grows and grows

Started by Bjoern Metzdorfalmost 24 years ago29 messagesgeneral

bm@turtle-entertainment.de

almost 24 years ago

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Greetings,
Bjoern

scott.marlowe

scott.marlowe@ihs.com

almost 24 years ago

In reply to: Bjoern Metzdorf (#1)

Re: db grows and grows

Look through the mailing list archives for a program called "reindexdb"
that will likely fix your problems. There was a whole thread on this not
more than a month or so ago.

Basically, large updates can cause uncontrolled index growth that vacuum
doesn't fix.

On Tue, 18 Jun 2002, Bjoern Metzdorf wrote:

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Greetings,
Bjoern

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

--
"Force has no place where there is need of skill.", "Haste in every
business brings failures.", "This is the bitterest pain among men, to have
much knowledge but no power." -- Herodotus

Alvaro Herrera

alvherre@atentus.com

almost 24 years ago

In reply to: Bjoern Metzdorf (#1)

Re: db grows and grows

On Tue, 18 Jun 2002, Bjoern Metzdorf wrote:

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

Your indexes are probably growing. In extant versions, VACUUM doesn't
shrink them, so you should do it manually using REINDEX. There's a
contributed script that does it in a reasonably automated way, like the
vacuum script does. Look in the archives.

Shouldn't the vacuuming take care of this?

Yes IMVHO, but at present it doesn't.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)

Terry Fielder

terry@greatgulfhomes.com

almost 24 years ago

In reply to: Bjoern Metzdorf (#1)

Re: db grows and grows

I had a similar issue:

Look at your indexes, I'll bet they are growing and not recovering unused
pages.

Consider scheduling the REINDEX command like you do for VACUUM.

Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of
Bjoern Metzdorf
Sent: Tuesday, June 18, 2002 10:28 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] db grows and grows

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly
do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and
creating the db,
reinserting the dump and vacuuming again, my pgdata directory
only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what
has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much
faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the
difference there
also.

Any hints?

Greetings,
Bjoern

---------------------------(end of
broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Jason Earl

jason.earl@simplot.com

almost 24 years ago

In reply to: Bjoern Metzdorf (#1)

Re: db grows and grows

"Bjoern Metzdorf" <bm@turtle-entertainment.de> writes:

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Greetings,
Bjoern

Chances are good that your indexes are growing out of control. If you
have tables with a lot of turnover that is almost certainly the
problem. Shaun Thomas has written a script that will reindex your
database. It works very well, but it does lock tables, so it might
not be appropriate for your environment.

The script was posted to this list (a search for reindex turned it up
in my local mirror of the mailing list). If you can't find it feel
free to contact me.

Jason

Bill Moran

wmoran@potentialtech.com

almost 24 years ago

In reply to: Bjoern Metzdorf (#1)

Re: db grows and grows

Bjoern Metzdorf wrote:

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only contains
1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now. Queries
get executed much faster.

If I compare the relpages from before and after, I see the difference there
also.

Any hints?

Have you tried "VACUUM FULL"?
Note that this needs exclusive table access, so run it when usage is low
or (preferrably) during scheduled downtime.
Maybe if you could squeeze this in once a week or so ...

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Bjoern Metzdorf

bm@turtle-entertainment.de

almost 24 years ago

In reply to: Bjoern Metzdorf (#1)

Re: db grows and grows

Thanks for all pointers to REINDEX. Seems to work perfectly now.

Greetings,
Bjoern

----- Original Message -----
From: "Bjoern Metzdorf" <bm@turtle-entertainment.de>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, June 18, 2002 4:28 PM
Subject: [GENERAL] db grows and grows

Hi,

I have a 3 GB (fs based) large pgdata directory. I regularly do vacuums
every 15 minutes and vacuums with analyzing every night.

After dumping the whole db (pg_dump -c db), dropping and creating the db,
reinserting the dump and vacuuming again, my pgdata directory only

contains

1 GB. The dump had no errors, all data has been saved and reinserted.

The xlogs/clogs didnt take up 2 GB, so I am wondering what has happened.

Shouldn't the vacuuming take care of this?

A (desired) sideeffect is, that the postmaster runs much faster now.

Queries

get executed much faster.

If I compare the relpages from before and after, I see the difference

there

Show quoted text

also.

Any hints?

Greetings,
Bjoern

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Shaun Thomas

sthomas@townnews.com

almost 24 years ago

In reply to: Jason Earl (#5)

Re: db grows and grows

On 18 Jun 2002, Jason Earl wrote:

The script was posted to this list (a search for reindex turned it up
in my local mirror of the mailing list). If you can't find it feel
free to contact me.

Oh come now, I troll the list once in a while. I sent a copy to him.
My script doesn't really lock anything, it must just be a natural
side-effect of the REINDEX command, kinda like the old VACUUM in
Postgres < 7.2. I can see why, though. It looks like it drops the
index and then recreates it. Not good for integrity if you don't lock.

Ah well, it'll get fixed eventually. How about postgres 8.9?

^_^

-- 
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Andrew Sullivan

andrew@libertyrms.info

almost 24 years ago

In reply to: Shaun Thomas (#8)

Re: db grows and grows

On Wed, Jun 19, 2002 at 09:30:23AM -0500, Shaun Thomas wrote:

On 18 Jun 2002, Jason Earl wrote:

My script doesn't really lock anything, it must just be a natural
side-effect of the REINDEX command,

According to

<http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/locking-tables.html>

CREATE INDEX gets a ShareLock. Which will block UPDATE, DELETE and
INSERT. So, not _quite_ as bad as VACUUM FULL, but not something to
do in the middle of a busy day.

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

#10

Bjoern Metzdorf

bm@turtle-entertainment.de

almost 24 years ago

In reply to: Jason Earl (#5)

Re: db grows and grows

My script doesn't really lock anything, it must just be a natural
side-effect of the REINDEX command,

According to

<http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/locking-tables.
html>

CREATE INDEX gets a ShareLock. Which will block UPDATE, DELETE and
INSERT. So, not _quite_ as bad as VACUUM FULL, but not something to
do in the middle of a busy day.

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

Greetings,
Bjoern

#11

Andrew Sullivan

andrew@libertyrms.info

almost 24 years ago

In reply to: Bjoern Metzdorf (#10)

Re: db grows and grows

On Wed, Jun 19, 2002 at 09:09:32PM +0200, Bjoern Metzdorf wrote:

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

But why would a ShareLock cause SELECT to wait? Anyone? ShareLock
isn't supposed to prevent that. It _should_ cause UPDATE to wait,
but not SELECT (or at least, that's what I think the docs say).

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

#12

Varun Kacholia

varunk@cse.iitb.ac.in

almost 24 years ago

In reply to: Andrew Sullivan (#11)

Re: db grows and grows

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

But why would a ShareLock cause SELECT to wait? Anyone? ShareLock
isn't supposed to prevent that. It _should_ cause UPDATE to wait,
but not SELECT (or at least, that's what I think the docs say).

Exaclty i also feel that sharelock shud not make select wait..
(unless pgsql goes bunkers? :-D )

A
-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
+1 416 646 3304 x110
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
------
Varun
Why doesn't DOS ever say "EXCELLENT command or filename!"

#13

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: Andrew Sullivan (#11)

Re: db grows and grows

Andrew Sullivan <andrew@libertyrms.info> writes:

On Wed, Jun 19, 2002 at 09:09:32PM +0200, Bjoern Metzdorf wrote:

During the REINDEX-run I see plenty "SELECT (waiting)" with ps. So I can
confirm this.

But why would a ShareLock cause SELECT to wait?

He was doing REINDEX, which grabs an exclusive lock.

The reason CREATE INDEX can run concurrently with SELECTs is that the
SELECTs won't see (and perhaps try to use) the new index, because its
catalog entries aren't committed yet. REINDEX needs an exclusive lock
to prevent other transactions from trying to use the
index-under-reconstruction, since they certainly would see it as
an available index.

While one can think of hacks that might be able to work around that,
it's not clear to me that we should expend development time on making
REINDEX slicker --- fixing the underlying space-management problem in
btree indexes would be a better expenditure of time IMHO. REINDEX
is really intended for disaster recovery, not routine space management.

regards, tom lane

#14

Varun Kacholia

varunk@cse.iitb.ac.in

almost 24 years ago

In reply to: Tom Lane (#13)

Re: db grows and grows

While one can think of hacks that might be able to work around that,
it's not clear to me that we should expend development time on making
REINDEX slicker --- fixing the underlying space-management problem in
btree indexes would be a better expenditure of time IMHO. REINDEX
is really intended for disaster recovery, not routine space management.

ah.. well if you say that REINDEX is for recovery then what should be
used for routine space management? some thing is needed if you need
regular and heavy updates and still dont wanna compromise speed.

regards, tom lane

--
------
Varun
If Bill Gates had a penny for every time Windows crashed......Oh wait, he does.

#15

Terry Fielder

terry@greatgulfhomes.com

almost 24 years ago

In reply to: Tom Lane (#13)

Re: db grows and grows

If he needs to REINDEX live without locking out selects, then simply DROP
INDEX <indexname>; and then do a CREATE INDEX ... to recreate it. Drop is
fast, and the create does not acquire the exclusive lock.

The only down side of delete/recreate is that with REINDEX one did not need
to know the statement to recreate the index, and the latter does.

Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Wednesday, June 19, 2002 5:32 PM
To: Andrew Sullivan
Cc: PostgreSQL general list
Subject: Re: [GENERAL] db grows and grows

Andrew Sullivan <andrew@libertyrms.info> writes:

On Wed, Jun 19, 2002 at 09:09:32PM +0200, Bjoern Metzdorf wrote:

During the REINDEX-run I see plenty "SELECT (waiting)"

with ps. So I can

confirm this.

But why would a ShareLock cause SELECT to wait?

He was doing REINDEX, which grabs an exclusive lock.

The reason CREATE INDEX can run concurrently with SELECTs is that the
SELECTs won't see (and perhaps try to use) the new index, because its
catalog entries aren't committed yet. REINDEX needs an exclusive lock
to prevent other transactions from trying to use the
index-under-reconstruction, since they certainly would see it as
an available index.

While one can think of hacks that might be able to work around that,
it's not clear to me that we should expend development time on making
REINDEX slicker --- fixing the underlying space-management problem in
btree indexes would be a better expenditure of time IMHO. REINDEX
is really intended for disaster recovery, not routine space
management.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#16

Martijn van Oosterhout

kleptog@svana.org

almost 24 years ago

In reply to: Terry Fielder (#15)

Re: db grows and grows

On Wed, Jun 19, 2002 at 06:37:38PM -0400, terry@greatgulfhomes.com wrote:

If he needs to REINDEX live without locking out selects, then simply DROP
INDEX <indexname>; and then do a CREATE INDEX ... to recreate it. Drop is
fast, and the create does not acquire the exclusive lock.

The only down side of delete/recreate is that with REINDEX one did not need
to know the statement to recreate the index, and the latter does.

A while ago I did post a script that did this, only it did the create first,
then the drop and then renamed the new index to the old one. All within a
transaction so other queries wouldn't be left without an index.

It used pg_dump to get the CREATE INDEX command.

Should be in the archive somewhere...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.

#17

Andrew Sullivan

andrew@libertyrms.info

almost 24 years ago

In reply to: Martijn van Oosterhout (#16)

Re: db grows and grows

On Thu, Jun 20, 2002 at 09:38:41AM +1000, Martijn van Oosterhout wrote:

A while ago I did post a script that did this, only it did the create first,

Yes, actually, that's why I was confused. I thought it was that
script that was being run.

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

#18

Fran Fabrizio

ffabrizio@mmrd.com

almost 24 years ago

In reply to: Tom Lane (#13)

Re: db grows and grows

A while ago I did post a script that did this, only it did the create first,
then the drop and then renamed the new index to the old one. All within a
transaction so other queries wouldn't be left without an index.

It used pg_dump to get the CREATE INDEX command.

Should be in the archive somewhere...

Any hints? I'm not having much luck searching the archives for this
script.

Thanks,
Fran

#19

Fran Fabrizio

ffabrizio@mmrd.com

almost 24 years ago

In reply to: Tom Lane (#13)

Re: db grows and grows

Nevermind, I just found it.

http://archives.postgresql.org/pgsql-general/2002-05/msg00242.php

Thanks!

-Fran

Show quoted text

On Wed, Jun 19, 2002 at 06:37:38PM -0400, terry@greatgulfhomes.com wrote:

If he needs to REINDEX live without locking out selects, then simply DROP
INDEX <indexname>; and then do a CREATE INDEX ... to recreate it. Drop is
fast, and the create does not acquire the exclusive lock.

The only down side of delete/recreate is that with REINDEX one did not need
to know the statement to recreate the index, and the latter does.

A while ago I did post a script that did this, only it did the create first,
then the drop and then renamed the new index to the old one. All within a
transaction so other queries wouldn't be left without an index.

It used pg_dump to get the CREATE INDEX command.

Should be in the archive somewhere...

#20

Shaun Thomas

sthomas@townnews.com

almost 24 years ago

In reply to: Tom Lane (#13)

Re: db grows and grows

On Wed, 19 Jun 2002, Tom Lane wrote:

REINDEX is really intended for disaster recovery, not routine space
management.

That's nice. Except the fact Postgres is so bad at maintaining its own
indexes makes us have to *use* it as routine space management. Having a
200MB index that's still getting bigger, and really should only be 200k,
tells me that there's no better use for it. Sure, I could wait until
the index takes up my entire disk, and then consider it disaster
recovery, but that's just silly.

Besides that, if REINDEX isn't meant for this, what is? Nothing? So if
nothing is meant to fix this, we're just supposed to let this continue
and only use things for their intended purpose? Sure, writing something
that reads the pg_* tables to get index information and drops/creates
them from scratch is nice, but that would also mean recreating foreign
keys. Basically everything aside from basic database structure would
have to be abandoned while your script was running. Or it would have to
be in one huge transaction or something. Either way, that sounds like a
lot of work I'd have to do to fix something that is inherantly
Postgres's fault.

So, I'll at least continue to use REINDEX. I'll encourage other people
to use REINDEX, and I'll even tell my dog to use REINDEX, because right
now, that's the only adequate tool we have.

Besides, I'm not talking about the "REINDEX DATABASE blah" command that
redoes the system-table indexes. That, I agree, is really only
system-recovery, because I can only run it in standalone mode. But
REINDEX TABLE and REINDEX INDEX do not have that problem, and so I'll
use them. Bob will use them, Fred will use them, and my dog will use
them until something better comes along, or it's no longer necessary.

So enjoy the vacuumdb script everyone, it's likely all you'll get for a
long time, since not even the developers themselves know how to keep
indexes from growing. Maybe over the next few weeks, I'll delve into
the code and see if I can't come up with something. More eyes, right?

-- 
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

#21

Bjoern Metzdorf

bm@turtle-entertainment.de

almost 24 years ago

In reply to: Shaun Thomas (#20)

#22

Varun Kacholia

varunk@cse.iitb.ac.in

almost 24 years ago

In reply to: Bjoern Metzdorf (#21)

#23