bool / vacuum full bug followup

Started by scott.marloweabout 24 years ago37 messagesgeneral

scott.marlowe@ihs.com

about 24 years ago

OK, this is just beyond weird. I swear to GOD that my table size was
growing. I'm on a workstation that doesn't accept connections from
anybody but me.

I was running 7.2 for weeks.
I created a simple table:
create table test (buf text, yn bool);
I run an explain:
explain select * from test where yn=true;
I run an analyze:
analyze;
I update the table:
update test set yn=true where yn=true;
update test set yn=true where yn=true;
update test set yn=true where yn=true;
I do a REGULAR vacuum:
vacuum;

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now. I blew away the table, recreated it, and now it doesn't
do that. I blew away the database, and installed 7.2.1 and it won't do
it. I reinstalled 7.2 and blew away the database and it still won't do
it.

So, I don't know why it WAS doing it, but it isn't now. And of course,
it's my personal workstation, so no log files. :-( I'll see if I can
reproduce this though, as I've seen similar problems pop up once in the
past with a seldom vacuumed database that reached a point where it stopped
freeing dead tuples. Very bothersome.

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: scott.marlowe (#1)

Re: Subject: bool / vacuum full bug followup

Scott Marlowe <scott.marlowe@ihs.com> writes:

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now.

Perhaps you had an open transaction laying about? That'd prevent vacuum
from recovering space ...

regards, tom lane

Jeffrey W. Baker

jwbaker@acm.org

about 24 years ago

In reply to: scott.marlowe (#1)

Re: Subject: bool / vacuum full bug followup

On Thu, May 02, 2002 at 04:30:15PM -0600, Scott Marlowe wrote:

OK, this is just beyond weird. I swear to GOD that my table size was
growing. I'm on a workstation that doesn't accept connections from
anybody but me.

I was running 7.2 for weeks.
I created a simple table:
create table test (buf text, yn bool);
I run an explain:
explain select * from test where yn=true;
I run an analyze:
analyze;
I update the table:
update test set yn=true where yn=true;
update test set yn=true where yn=true;
update test set yn=true where yn=true;
I do a REGULAR vacuum:
vacuum;

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now. I blew away the table, recreated it, and now it doesn't
do that. I blew away the database, and installed 7.2.1 and it won't do
it. I reinstalled 7.2 and blew away the database and it still won't do
it.

So, I don't know why it WAS doing it, but it isn't now. And of course,
it's my personal workstation, so no log files. :-( I'll see if I can
reproduce this though, as I've seen similar problems pop up once in the
past with a seldom vacuumed database that reached a point where it stopped
freeing dead tuples. Very bothersome.

Well, if the table has an associated TOAST table, it is possible
that it will suddenly start growing out of control, and nothing you
can do with VACUUM will help. The TOAST system has a free space map
(FSM) with some default number of pages it can track. If you exceed
this number, it will become completely unable to reclaim space. If
this happens you will have to dump and reload the database or take
other drastic action.

Check the VACUUM VERBOSE output to see the pages in your TOAST
tables.

-jwb

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Jeffrey W. Baker (#3)

Re: Subject: bool / vacuum full bug followup

Jeffrey Baker <jwbaker@acm.org> writes:

Well, if the table has an associated TOAST table, it is possible
that it will suddenly start growing out of control, and nothing you
can do with VACUUM will help. The TOAST system has a free space map
(FSM) with some default number of pages it can track. If you exceed
this number, it will become completely unable to reclaim space.

That's a gross misstatement.

If you have more pages with free space than the FSM can track, it will
forget about the ones with the least free space (at least until the next
vacuum tries to reload the info). This will very possibly lead to table
growth over time, but it's hardly the same as "completely unable to
reclaim space".

BTW, TOAST has nothing to do with this. Space in main tables and
toast tables is tracked alike.

regards, tom lane

Jeffrey W. Baker

jwbaker@acm.org

about 24 years ago

In reply to: Tom Lane (#4)

Re: Subject: bool / vacuum full bug followup

On Fri, May 03, 2002 at 12:20:44AM -0400, Tom Lane wrote:

Jeffrey Baker <jwbaker@acm.org> writes:

Well, if the table has an associated TOAST table, it is possible
that it will suddenly start growing out of control, and nothing you
can do with VACUUM will help. The TOAST system has a free space map
(FSM) with some default number of pages it can track. If you exceed
this number, it will become completely unable to reclaim space.

That's a gross misstatement.

If you have more pages with free space than the FSM can track, it will
forget about the ones with the least free space (at least until the next
vacuum tries to reload the info). This will very possibly lead to table
growth over time, but it's hardly the same as "completely unable to
reclaim space".

I expected someone to come out with this reply, but my own
monitoring in productions shows otherwise. I'll forward my data
along once I have completed my study.

The short of my observations is that once you have more free pages
than the FSM can track, the table grows linearly over time. This is
for a table with heavy insert and delete activity. This is perhaps
not the design intent but it is what we are seeing.

As I said I intend to send along unbearably detailed info once I
think I have mastered the problem.

BTW, TOAST has nothing to do with this. Space in main tables and
toast tables is tracked alike.

In my observation only TOAST tables have this behavior.

Regards
jwb

Holger Marzen

holger@marzen.de

about 24 years ago

In reply to: scott.marlowe (#1)

Re: Subject: bool / vacuum full bug followup

On Thu, 2 May 2002, Scott Marlowe wrote:

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now. I blew away the table, recreated it, and now it doesn't
do that. I blew away the database, and installed 7.2.1 and it won't do
it. I reinstalled 7.2 and blew away the database and it still won't do
it.

So, I don't know why it WAS doing it, but it isn't now. And of course,
it's my personal workstation, so no log files. :-( I'll see if I can
reproduce this though, as I've seen similar problems pop up once in the
past with a seldom vacuumed database that reached a point where it stopped
freeing dead tuples. Very bothersome.

Looks like something that happens only if counters or OIDs are big.
Possible integer overflow in the vacuum code?

--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1

scott.marlowe

scott.marlowe@ihs.com

about 24 years ago

In reply to: Tom Lane (#2)

Re: Subject: bool / vacuum full bug followup

On Thu, 2 May 2002, Tom Lane wrote:

Scott Marlowe <scott.marlowe@ihs.com> writes:

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now.

Perhaps you had an open transaction laying about? That'd prevent vacuum
from recovering space ...

I thought of that, but no, I had just rebooted my box, and was using top
to view all the postgres user activities, and my psql session was the only
thing attached to it. If I can get it to do it, I'll pg_ctl stop/pg_ctl
start the database and see if it keeps doing it.

scott.marlowe

scott.marlowe@ihs.com

about 24 years ago

In reply to: Jeffrey W. Baker (#3)

Re: Subject: bool / vacuum full bug followup

On Thu, 2 May 2002, Jeffrey Baker wrote:

On Thu, May 02, 2002 at 04:30:15PM -0600, Scott Marlowe wrote:

OK, this is just beyond weird. I swear to GOD that my table size was
growing. I'm on a workstation that doesn't accept connections from
anybody but me.

SNIPPAGE

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now. I blew away the table, recreated it, and now it doesn't
do that. I blew away the database, and installed 7.2.1 and it won't do
it. I reinstalled 7.2 and blew away the database and it still won't do
it.

Well, if the table has an associated TOAST table, it is possible
that it will suddenly start growing out of control, and nothing you
can do with VACUUM will help. The TOAST system has a free space map
(FSM) with some default number of pages it can track. If you exceed
this number, it will become completely unable to reclaim space. If
this happens you will have to dump and reload the database or take
other drastic action.

Check the VACUUM VERBOSE output to see the pages in your TOAST
tables.

Thanks, I'll do that. The tuples themselves were quite small, a single
ascii character in a text field and a single boole. So I wouldn't expect
them to be toasted.

scott.marlowe

scott.marlowe@ihs.com

about 24 years ago

In reply to: Holger Marzen (#6)

Re: Subject: bool / vacuum full bug followup

On Fri, 3 May 2002, Holger Marzen wrote:

On Thu, 2 May 2002, Scott Marlowe wrote:

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now. I blew away the table, recreated it, and now it doesn't
do that. I blew away the database, and installed 7.2.1 and it won't do
it. I reinstalled 7.2 and blew away the database and it still won't do
it.

So, I don't know why it WAS doing it, but it isn't now. And of course,
it's my personal workstation, so no log files. :-( I'll see if I can
reproduce this though, as I've seen similar problems pop up once in the
past with a seldom vacuumed database that reached a point where it stopped
freeing dead tuples. Very bothersome.

Looks like something that happens only if counters or OIDs are big.
Possible integer overflow in the vacuum code?

Prior to the test I had been running 100,000 transaction pgbench tests
repeatedly. I'll try running a few million transactions first and see if
the problem surfaces.

Thanks for the suggestions.

#10

scott.marlowe

scott.marlowe@ihs.com

about 24 years ago

In reply to: scott.marlowe (#8)

bool / vacuum full bug followup part 2

OK, I remembered what was different. I had an index the first time. So,
it was the index I was watchin grow and never shrink.

Now this is reproduceable for me.

I dropped and recreated my test index

drop index test_yn_dx ;
create index test_yn_dx on test (yn);
vacuum full;

[postgres@css120] oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
16557 = test
126563 = test_yn_dx

cd into the base/oid of my database, then:

du -s 16557 126563
11128 16557
1772 126563

update test set yn=true where yn=true;
UPDATE 50080

[postgres@css120 16556]$ du -s 16557 126563
16704 16557
2948 126563

vacuum;

[postgres@css120 16556]$ du -s 16557 126563
16704 16557
2948 126563

vacuum full;

[postgres@css120 16556]$ du -s 16557 126563
11128 16557
4100 126563

I tried:

vacuum test_yn_dx;
NOTICE: VACUUM: can not process indexes, views or special system tables
NOTICE: VACUUM: table not found
VACUUM

so, then I tried:

reindex index test_yn_dx;
REINDEX

oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
16557 = test
126564 = test_yn_dx

[postgres@css120 16556]$ du -s 16557 126564
11128 16557
1772 126564

And reclaimed the space. Is that the official way, short of dropping and
recreating an index to reclaim its space? Is there a plan to make vacuum
reclaim unused space in indexes? Just wondering.

#11

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: scott.marlowe (#10)

Re: Subject: bool / vacuum full bug followup part 2

Scott Marlowe <scott.marlowe@ihs.com> writes:

And reclaimed the space. Is that the official way, short of dropping and
recreating an index to reclaim its space? Is there a plan to make vacuum
reclaim unused space in indexes?

Yes, and yes, but don't hold your breath on the latter part --- that
TODO item has been around for awhile. And it's gotten harder now that
we have lazy VACUUM; that means we need to be able to condense indexes
concurrently with other index operations.

AFAIK there's not a big problem with index growth if the range of index
keys remains reasonably static. The problem comes in if you have a
range of values that keeps growing (eg, you are indexing a SERIAL or
timestamp column). The right end of the btree keeps growing, but
there's no mechanism to collapse out no-longer-used space at the left
end.

regards, tom lane

#12

scott.marlowe

scott.marlowe@ihs.com

about 24 years ago

In reply to: Tom Lane (#11)

Re: Subject: bool / vacuum full bug followup part 2

Well, my keys aren't changing and the index is growing like they are.
I.e. the update statement is:

update table set field1=y where field1=y;

I'm not adding rows, I'm not deleting rows, and I'm not even changing the
value really. I'm sure this is the problem the earlier poster was
complaining about making him dump and restore his whole database every few
weeks.

I'd say the indexes keep growing whether the data they are indexing
changes or not from my observations. Do your own tests if you want, but
so far this looks like a serious issue for joe average DBA. He expects
vacuum [full] to reclaim all his unused space, but indexes, which are
constantly growing, are not reclaimed, and his store fills up. Leading
him to dump / restore instead of just reindexing.

We should at least add something to the administrator docs to say you need
to reindex heavily updated indexes to save space.

Here's my quick and dirty php script to reindex all indexes in all
databases on a given server. It's ugly, but it works.

#!/usr/local/bin/php -q
<?php
$conn0 = pg_connect("dbname=template1 user=postgres");
$res0 = pg_exec($conn0,"select datname from pg_database where datname not
like 'template%'");
$drows = pg_numrows($res0);
for ($j=0;$j<$drows;$j++){
$datname = pg_result($res0,$j,'datname');
print 'database: '.$datname."n";
$conn1 = pg_connect("dbname=$datname user=postgres");
$res1 = pg_exec($conn1,"select indexname from pg_indexes where
indexname not like 'pg_%'");
$rows = pg_numrows($res1);
for ($i=0;$i<$rows;$i++){
$indexname = pg_result($res1,$i,indexname);
$query = "reindex index $indexname";
print 'table: '.$indexname."\n";
pg_exec($conn1,$query);
}
}
?>

On Fri, 3 May 2002, Tom Lane wrote:

Show quoted text

Scott Marlowe <scott.marlowe@ihs.com> writes:

And reclaimed the space. Is that the official way, short of dropping and
recreating an index to reclaim its space? Is there a plan to make vacuum
reclaim unused space in indexes?

Yes, and yes, but don't hold your breath on the latter part --- that
TODO item has been around for awhile. And it's gotten harder now that
we have lazy VACUUM; that means we need to be able to condense indexes
concurrently with other index operations.

AFAIK there's not a big problem with index growth if the range of index
keys remains reasonably static. The problem comes in if you have a
range of values that keeps growing (eg, you are indexing a SERIAL or
timestamp column). The right end of the btree keeps growing, but
there's no mechanism to collapse out no-longer-used space at the left
end.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#13

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: scott.marlowe (#12)

Re: Subject: bool / vacuum full bug followup part 2

Scott Marlowe <scott.marlowe@ihs.com> writes:

Well, my keys aren't changing and the index is growing like they are.

Could we see the exact details of your test case?

regards, tom lane

#14

scott.marlowe

scott.marlowe@ihs.com

about 24 years ago

In reply to: Tom Lane (#13)

Re: Subject: bool / vacuum full bug followup part 2

On Fri, 3 May 2002, Tom Lane wrote:

Scott Marlowe <scott.marlowe@ihs.com> writes:

Well, my keys aren't changing and the index is growing like they are.

Could we see the exact details of your test case?

Sure. I think I posted most of it here already...

Here's my table:

Here's the indexes:

scott.marlowe=# \di test_id_dx
List of relations
Name | Type | Owner
------------+-------+---------------
test_id_dx | index | scott.marlowe

scott.marlowe=# \di test_yn_dx
List of relations
Name | Type | Owner
------------+-------+---------------
test_yn_dx | index | scott.marlowe
(1 row)

Here's the php script that creates the test data, it makes $count number
of rows and sets the bool to true or false randomly:

set_time_limit(3600);
$conn = pg_connect("dbname=scott.marlowe host=css120.ihs.com");
$count = 100000;
pg_exec($conn,"begin");
$flag[0]="false";
$flag[1]="true";
for ($i=0;$i<$count;$i++){
if ($i%1000==0) {
pg_exec($conn,"end");
pg_exec($conn,"begin");
}
$letter = chr(rand(65,91));
$tf = rand(0,1);
$query = "insert into test (buf,yn) values ('";
$query.= $letter."',".$flag[$tf].")";
pg_exec($conn,$query);
}
pg_exec($conn,"end");

Here's the files in my database directory, and their size by du in
kbytes after vacuum full;
vacuum;
reindex index test_id_dx;
reindex index test_yn_dx;

[postgres@css120 16556]$ oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
126572 = accounts
126574 = accounts_pkey
126566 = branches
126568 = branches_pkey
126575 = history
126569 = tellers
126571 = tellers_pkey
16557 = test
1126687 = test_id_dx
1126688 = test_yn_dx

[postgres@css120 16556]$ du -s 16557 1126687 1126688
11448 16557 (test)
1772 1126687 (test_id_dx)
1772 1126688 (test_yn_dx)

WHAT I DID:

scott.marlowe=# update test set yn=true where yn=true;
UPDATE 50080

[postgres@css120 16556]$ du -s 16557 1126687 1126688
17176 16557
3516 1126687
2924 1126688

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
17176 16557
3516 1126687
2924 1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
11448 16557
3516 1126687
4052 1126688 <-- Notice that the index here just GREW

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

[postgres@css120 16556]$ du -s 16557 1126687 1126688
68744 16557
13980 1126687
15660 1126688

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
68736 16557
13964 1126687
15652 1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
11448 16557
13964 1126687
16808 1126688

***************
So, now thinking the problem might be just vacuum full, I try plain old
vacuums
***************

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
22908 16557
13964 1126687
20088 1126688

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
22908 16557
13964 1126687
22380 1126688

*****************************
Nope, the index on the bools just keeps growing and growing.
Given a few million updates and it will be bigger than the data it is
supposed to index.

scott.marlowe=# reindex index test_yn_dx;
REINDEX
scott.marlowe=# reindex index test_id_dx;
REINDEX

oid2name -d scott.marlowe |grep dx
1126690 = test_id_dx
1126689 = test_yn_dx
[postgres@css120 16556]$ du -s 16557 1126690 1126689
22908 16557
1772 1126690
1772 1126689

and now they're small again.

It would at least be nice if reindex was operational in a transaction so
it would be safe to use on a live database, since it appears to not be
intended for this purpose really, but for fixing broken indexes. Til then
I'll write a script that asks pg_indexes that drops the index and
recreates it in a transaction to keep my data store svelt and clean.

Thanks for the attention to this. Scott.

#15

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: scott.marlowe (#14)

Re: Subject: bool / vacuum full bug followup part 2

Scott Marlowe <scott.marlowe@ihs.com> writes:

On Fri, 3 May 2002, Tom Lane wrote:

Scott Marlowe <scott.marlowe@ihs.com> writes:

Well, my keys aren't changing and the index is growing like they are.

Could we see the exact details of your test case?

Sure. I think I posted most of it here already...

Okay, what I see is that the index on the integer column behaves like I
would expect: you can update, vacuum, update, vacuum, and it doesn't get
bigger. But the index on the boolean column does grow. I believe the
problem is that there are so many equal keys. The reinserted index
entries are always inserted at the end of the range of matching keys,
and so there's no opportunity to re-use space within other pages of the
index. There are only two leaf pages getting the insertions, and so
nothing to do but split them over and over.

What this really points up, of course, is that making a btree index on
a boolean column is a pretty foolish thing to do. I'm not particularly
unhappy about the performance being bad with respect to space usage,
because the fact of the matter is that performance is going to be bad
by any measure.

regards, tom lane

#16

Jeffrey W. Baker

jwbaker@acm.org

about 24 years ago

In reply to: Tom Lane (#11)

Re: Subject: bool / vacuum full bug followup part 2

On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:

Scott Marlowe <scott.marlowe@ihs.com> writes:

And reclaimed the space. Is that the official way, short of dropping and
recreating an index to reclaim its space? Is there a plan to make vacuum
reclaim unused space in indexes?

Yes, and yes, but don't hold your breath on the latter part --- that
TODO item has been around for awhile. And it's gotten harder now that
we have lazy VACUUM; that means we need to be able to condense indexes
concurrently with other index operations.

AFAIK there's not a big problem with index growth if the range of index
keys remains reasonably static. The problem comes in if you have a
range of values that keeps growing (eg, you are indexing a SERIAL or
timestamp column). The right end of the btree keeps growing, but
there's no mechanism to collapse out no-longer-used space at the left
end.

Wouldn't that explain the complaints I have about my toast tables
always growing? Because each toast table has an index, and the
above paragraph makes it sound like indexes on serial values grow
all the time, that would imply that table that where tuples live for
windows of time will always be growing.

Or did I read that incorrectly?

-jwb

#17

Jeffrey W. Baker

jwbaker@acm.org

about 24 years ago

In reply to: Jeffrey W. Baker (#16)

Re: Subject: bool / vacuum full bug followup part 2

On Sat, May 04, 2002 at 10:48:47AM -0700, Jeffrey Baker wrote:

On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:

Scott Marlowe <scott.marlowe@ihs.com> writes:

And reclaimed the space. Is that the official way, short of dropping and
recreating an index to reclaim its space? Is there a plan to make vacuum
reclaim unused space in indexes?

Yes, and yes, but don't hold your breath on the latter part --- that
TODO item has been around for awhile. And it's gotten harder now that
we have lazy VACUUM; that means we need to be able to condense indexes
concurrently with other index operations.

AFAIK there's not a big problem with index growth if the range of index
keys remains reasonably static. The problem comes in if you have a
range of values that keeps growing (eg, you are indexing a SERIAL or
timestamp column). The right end of the btree keeps growing, but
there's no mechanism to collapse out no-longer-used space at the left
end.

Wouldn't that explain the complaints I have about my toast tables
always growing? Because each toast table has an index, and the
above paragraph makes it sound like indexes on serial values grow
all the time, that would imply that table that where tuples live for
windows of time will always be growing.

Or did I read that incorrectly?

Indeed, I did not. Part of the space leak I am seeing is from this:

Start.
Insert 20,000 tuples of 13KB each.
Delete 20,000 tuples.
Vacuum full.
Goto Start.

Toast index grows by ~535 pages or 4.2MB[1] per cycle, even though
vacuum is able to truncate the main relation and the toast relation
to zero pages.

-jwb

1: This implies a page size of 16KB in the index. I expected it to
be smaller.

#18

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Jeffrey W. Baker (#16)

Re: Subject: bool / vacuum full bug followup part 2

Jeffrey Baker <jwbaker@acm.org> writes:

On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:

AFAIK there's not a big problem with index growth if the range of index
keys remains reasonably static. The problem comes in if you have a
range of values that keeps growing (eg, you are indexing a SERIAL or
timestamp column). The right end of the btree keeps growing, but
there's no mechanism to collapse out no-longer-used space at the left
end.

Wouldn't that explain the complaints I have about my toast tables
always growing?

It'd explain the indexes growing --- the index key is an OID, which will
keep increasing as you store new toasted values. I thought you'd been
complaining about the tables themselves, though.

regards, tom lane

#19

Jeffrey W. Baker

jwbaker@acm.org

about 24 years ago

In reply to: Tom Lane (#18)

Re: Subject: bool / vacuum full bug followup part 2

On Sat, May 04, 2002 at 06:06:38PM -0400, Tom Lane wrote:

Jeffrey Baker <jwbaker@acm.org> writes:

On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:

AFAIK there's not a big problem with index growth if the range of index
keys remains reasonably static. The problem comes in if you have a
range of values that keeps growing (eg, you are indexing a SERIAL or
timestamp column). The right end of the btree keeps growing, but
there's no mechanism to collapse out no-longer-used space at the left
end.

Wouldn't that explain the complaints I have about my toast tables
always growing?

It'd explain the indexes growing --- the index key is an OID, which will
keep increasing as you store new toasted values. I thought you'd been
complaining about the tables themselves, though.

You're right, I am. But in my quest to operate Pg properly I am
trying to nail down everything that causes its disk usage to
increase. I just had a look at my prod. database and the toast
tables are much larger than their indices, so it is probably
irrelevant.

-jwb

#20

scott.marlowe

scott.marlowe@ihs.com

about 24 years ago

In reply to: Tom Lane (#15)

Re: Subject: bool / vacuum full bug followup part 2

On Fri, 3 May 2002, Tom Lane wrote:

Scott Marlowe <scott.marlowe@ihs.com> writes:

On Fri, 3 May 2002, Tom Lane wrote:

Scott Marlowe <scott.marlowe@ihs.com> writes:

Well, my keys aren't changing and the index is growing like they are.

Could we see the exact details of your test case?

Sure. I think I posted most of it here already...

Okay, what I see is that the index on the integer column behaves like I
would expect: you can update, vacuum, update, vacuum, and it doesn't get
bigger.

Yes, it does get bigger, but only with use, not vacuum full.

It doesn't look like the index on the text column is getting reused
either. Is that because I'm update a lot of rows with a single update
statement? would it be reused if I was changing one row at a time and
commiting it?

I'll test that theory, but I'm gonna bet right now that it won't.

But the index on the boolean column does grow.

Not only that, but it grows from a vacuum full. I would expect it to at
least stay the same.

I believe the
problem is that there are so many equal keys. The reinserted index
entries are always inserted at the end of the range of matching keys,
and so there's no opportunity to re-use space within other pages of the
index. There are only two leaf pages getting the insertions, and so
nothing to do but split them over and over.

What this really points up, of course, is that making a btree index on
a boolean column is a pretty foolish thing to do.

Since postgresql 7.2.1 refuses to make an index of any kind other than
btree, what is the answer? no indexes? While single column indexes may
seem wasteful, remember that the boole may be stored in a table that has
very large tuples, and a sequential scan of such a table could be quite
slow, or there may be a situation where a tiny percentage of the booles
are one setting while most are the other, like an approval system for
online content. Either way, creating an index shouldn't result in a
database directory that grows to 100X it's original size, gets slower, and
doesn't give back space to vacuum;

scott.marlowe=# create index test_yn_dx on test using hash (yn);
ERROR: data type boolean has no default operator class for access method
"hash"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using rtree (yn);
ERROR: data type boolean has no default operator class for access method
"rtree"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using gist (yn);
ERROR: data type boolean has no default operator class for access method
"gist"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using btree (yn);
CREATE
scott.marlowe=# drop index tesT_yn_dx;
DROP

I'm not particularly
unhappy about the performance being bad with respect to space usage,
because the fact of the matter is that performance is going to be bad
by any measure.

If the database gets slow, that's forgiveable. If it runs out of space in
the middle of the day because indexes just grow and grow, that's not. It
makes postgresql look like a toy database.

A growing index that vacuum doesn't shrink is a serious issue for people
who expect to reclaim lost space with vacuum. We at least need to let
people know of this behavior in the admin docs, as right now they (the
docs) seem to imply that vacuum frees up all unused space. For indexes,
this isn't true, and people who are getting started don't need this kind
of gotcha waiting to kill a production database 2 or 6 months into use.

Is it maybe at least possible to make reindex either transaction safe or
have an option that pretty much drops and recreates the index in a
transactionally safe mode or something?

I never knew about this problem until now, and I've found that I had
indexes that were 180Megs that reindexed to 48k in size. My database had
been getting slower and slower, and now it flies again. This was true of
ALL types of indexes, on ints, text, everything. The indexes were HUGE.

#21

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: scott.marlowe (#20)

#22