pg_class reltuples/relpages not updated by autovacuum/vacuum

Started by Florian Helmbergeralmost 15 years ago48 messageshackers
Jump to latest
#1Florian Helmberger
fh@25th-floor.com

Hi.

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

Currently, there are no long running transactions and/or dumps running.

I've confirmed that PostgreSQL is using the values stored in pg_class
for it's calculations, it starts autovacuum for the table at around
375k dead rows (threshold is 50, scale_factor 0.2 (both default)).

Additionally I've done manual VACUUM ANALYZE of both the parent table
and the TOAST table, which didn't help either.

Other databases with the same hardware, PostgreSQL and OS versions don't
have this issue.

Currently I've worked around the issue by disabling autovacuum for the
TOAST table and doing manual VACUUM ANALYZE once a week.

Any clue how to get PostgreSQL to store the correct values?

Side note: while trying to debug this I've noticed, that the TOAST
chunks on 32-bit systems have the documented size of 2000 bytes, on
64-bit systems they have 1996 bytes. Is this normal/on purpose?

Regards,
Florian Helmberger

--

Florian Helmberger --------------------

25th-floor - Operating Custom Solutions
de Pretis & Helmberger KG

Gluckgasse 2/6, 1010 Wien, Austria

Mail: fh@25th-floor.com
Web : http://www.25th-floor.com
Tel.: +43 1 / 512 82 89 - 60
Fax : +43 1 / 512 82 89 - 76
Mob.: +43 699 / 109 24 24 5
---------------------------------------

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Helmberger (#1)
Re: pg_class reltuples/relpages not updated by autovacuum/vacuum

Florian Helmberger <fh@25th-floor.com> writes:

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

That's exceedingly weird. Do the pg_stat_all_tables columns update
after autovacuums on that table?

regards, tom lane

#3Florian Helmberger
fh@25th-floor.com
In reply to: Tom Lane (#2)
Re: pg_class reltuples/relpages not updated by autovacuum/vacuum

On 25.05.11 04:47, Tom Lane wrote:

Florian Helmberger<fh@25th-floor.com> writes:

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

That's exceedingly weird. Do the pg_stat_all_tables columns update
after autovacuums on that table?

Hi Tom,

Yes they do:

-[ RECORD 1 ]----+------------------------------
relid | 16391
schemaname | pg_toast
relname | pg_toast_16386
seq_scan | 0
seq_tup_read | 0
idx_scan | 298820512
idx_tup_fetch | 1812697121
n_tup_ins | 60907628
n_tup_upd | 0
n_tup_del | 56710637
n_tup_hot_upd | 0
n_live_tup | 4196999
n_dead_tup | 20746580
last_vacuum | 2011-05-21 06:33:49.869459+02
last_autovacuum | 2011-05-15 18:40:49.746234+02
last_analyze | NULL
last_autoanalyze | NULL

That was the last autovacuum run before I disabled it (via storage
parameter on the main table) and switched to manual vacuum's once per week.

I've also rechecked the "sister" database (same Hareware, OS/PostgreSQL
Version and database schema) which is working as intended.

Regards,
Florian

--

Florian Helmberger --------------------

25th-floor - Operating Custom Solutions
de Pretis & Helmberger KG

Gluckgasse 2/6, 1010 Wien, Austria

Mail: fh@25th-floor.com
Web : http://www.25th-floor.com
Tel.: +43 1 / 512 82 89 - 60
Fax : +43 1 / 512 82 89 - 76
Mob.: +43 699 / 109 24 24 5
---------------------------------------

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Helmberger (#3)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Florian Helmberger <fh@25th-floor.com> writes:

On 25.05.11 04:47, Tom Lane wrote:

Florian Helmberger<fh@25th-floor.com> writes:

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

That's exceedingly weird. Do the pg_stat_all_tables columns update
after autovacuums on that table?

Yes they do:

I think I see what must be going on here: that toast table must contain
a long run of all-visible-according-to-the-VM pages (which is hardly a
surprising situation). This results in VACUUM choosing not to update
the pg_class entry:

/*
* Update statistics in pg_class. But only if we didn't skip any pages;
* the tuple count only includes tuples from the pages we've visited, and
* we haven't frozen tuples in unvisited pages either. The page count is
* accurate in any case, but because we use the reltuples / relpages ratio
* in the planner, it's better to not update relpages either if we can't
* update reltuples.
*/
if (vacrelstats->scanned_all)
vac_update_relstats(onerel,
vacrelstats->rel_pages, vacrelstats->rel_tuples,
vacrelstats->hasindex,
FreezeLimit);

For an ordinary table this wouldn't be fatal because we'll still do an
ANALYZE from time to time, and that will update the entries with new
(approximate) values. But we never run ANALYZE on toast tables.

And this would *still* be okay, because as noted in the comment, the
planner only depends on the ratio being roughly correct, not on either
individual value being current. But autovacuum didn't get the memo;
it thinks it can use reltuples to make decisions.

I can see two basic approaches we might take here:

1. Modify autovacuum to use something from the stats collector, rather
than reltuples, to make its decisions. I'm not too clear on why
reltuples is being used there anyway; is there some good algorithmic or
statistical reason why AV should be looking at a number from the last
vacuum?

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

Thoughts?

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Helmberger (#1)
Re: pg_class reltuples/relpages not updated by autovacuum/vacuum

Florian Helmberger <fh@25th-floor.com> writes:

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:
relpages | 433596
reltuples | 1868538
These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:
pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain
but these aren't stored in pg_class after each run.

I've moved discussion of this to pgsql-hackers, since this appears to be
an actual bug.

Side note: while trying to debug this I've noticed, that the TOAST
chunks on 32-bit systems have the documented size of 2000 bytes, on
64-bit systems they have 1996 bytes. Is this normal/on purpose?

I don't have the exact numbers in my head, but the TOAST chunk size does
depend on a MAXALIGN calculation, so it being different between 32- and
64-bit isn't surprising.

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Tom Lane's message of mié may 25 11:47:52 -0400 2011:

I think I see what must be going on here: that toast table must contain
a long run of all-visible-according-to-the-VM pages (which is hardly a
surprising situation). This results in VACUUM choosing not to update
the pg_class entry:

/*
* Update statistics in pg_class. But only if we didn't skip any pages;
* the tuple count only includes tuples from the pages we've visited, and
* we haven't frozen tuples in unvisited pages either. The page count is
* accurate in any case, but because we use the reltuples / relpages ratio
* in the planner, it's better to not update relpages either if we can't
* update reltuples.
*/
if (vacrelstats->scanned_all)
vac_update_relstats(onerel,
vacrelstats->rel_pages, vacrelstats->rel_tuples,
vacrelstats->hasindex,
FreezeLimit);

For an ordinary table this wouldn't be fatal because we'll still do an
ANALYZE from time to time, and that will update the entries with new
(approximate) values. But we never run ANALYZE on toast tables.

Ouch.

I can see two basic approaches we might take here:

1. Modify autovacuum to use something from the stats collector, rather
than reltuples, to make its decisions. I'm not too clear on why
reltuples is being used there anyway; is there some good algorithmic or
statistical reason why AV should be looking at a number from the last
vacuum?

It uses reltuples simply because it was what the original contrib code
was using. Since pgstat was considerably weaker at the time, reltuples
might have been the only thing available. It's certainly the case that
pgstat has improved a lot since autovacuum got in, and some things have
been revised but not this one.

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

Hmm, interesting idea. This would be done only for toast tables, or all
tables?

At this point I only wonder why we store tuples & pages rather than just
live tuple density.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries.  We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

This approach doesn't seem like a good idea to me. The skipped
portions of the table are the ones that haven't been modified in a
while, so this is or embeds an assumption that the tuples in the hot
and cold portions of the table are the same size. That might be true,
but it isn't hard to think of cases where it might not be. There
could also very easily be sampling error, because it's easy to imagine
a situation where 99% of the table is getting skipped. Any error that
creeps into the estimate is going to get scaled up along with the
estimate itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of mié may 25 11:47:52 -0400 2011:

I can see two basic approaches we might take here:

1. Modify autovacuum to use something from the stats collector, rather
than reltuples, to make its decisions. I'm not too clear on why
reltuples is being used there anyway; is there some good algorithmic or
statistical reason why AV should be looking at a number from the last
vacuum?

It uses reltuples simply because it was what the original contrib code
was using. Since pgstat was considerably weaker at the time, reltuples
might have been the only thing available. It's certainly the case that
pgstat has improved a lot since autovacuum got in, and some things have
been revised but not this one.

On reflection I'm hesitant to do this, especially for a backpatched bug
fix, because it would be changing the feedback loop behavior for
autovacuum scheduling. That could have surprising consequences.

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

Hmm, interesting idea. This would be done only for toast tables, or all
tables?

I'm thinking just do it for all. The fact that these numbers don't
necessarily update after a vacuum is certainly surprising in and of
itself, and it did not work that way before the VM patch went in.
I'm concerned about other stuff besides AV not dealing well with
obsolete values.

At this point I only wonder why we store tuples & pages rather than just
live tuple density.

It's just for backwards compatibility. I've thought about doing that in
the past, but I don't know what client-side code might be looking at
relpages/reltuples. It's not like collapsing them into one field would
save much, anyway.

regards, tom lane

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#8)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know what client-side code might be looking at
relpages/reltuples.

I know that I find reltuples useful for getting an "accurate enough"
sense of rows in a table (or set of tables) without resorting to
count(*). I'd be OK with any two of pages, tuples, and density; but
dropping to one would make databases harder to manage, IMV.

Personally, I don't have much code that uses those columns;
eliminating an existing column wouldn't involve much pain for me as
long as it could still be derived.

-Kevin

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 25, 2011 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. �We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

This approach doesn't seem like a good idea to me. The skipped
portions of the table are the ones that haven't been modified in a
while, so this is or embeds an assumption that the tuples in the hot
and cold portions of the table are the same size. That might be true,
but it isn't hard to think of cases where it might not be. There
could also very easily be sampling error, because it's easy to imagine
a situation where 99% of the table is getting skipped.

Yeah, I had been thinking about the latter point. We could be
conservative and just keep the reported tuple density the same (ie,
update relpages to the new correct value, while setting reltuples so
that the density ratio doesn't change). But that has its own problems
when the table contents *do* change. What I'm currently imagining is
to do a smoothed moving average, where we factor in the new density
estimate with a weight dependent on the percentage of the table we did
scan. That is, the calculation goes something like

old_density = old_reltuples / old_relpages
new_density = counted_tuples / scanned_pages
reliability = scanned_pages / new_relpages
updated_density = old_density + (new_density - old_density) * reliability
new_reltuples = updated_density * new_relpages

We could slow the moving-average convergence even further when
reliability is small; for instance if you were really paranoid you might
want to use the square of reliability in line 4. That might be
overdoing it, though.

regards, tom lane

#11Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, I had been thinking about the latter point.  We could be
conservative and just keep the reported tuple density the same (ie,
update relpages to the new correct value, while setting reltuples so
that the density ratio doesn't change).  But that has its own problems
when the table contents *do* change.  What I'm currently imagining is
to do a smoothed moving average, where we factor in the new density
estimate with a weight dependent on the percentage of the table we did
scan.  That is, the calculation goes something like

old_density = old_reltuples / old_relpages
new_density = counted_tuples / scanned_pages
reliability = scanned_pages / new_relpages
updated_density = old_density + (new_density - old_density) * reliability
new_reltuples = updated_density * new_relpages

We could slow the moving-average convergence even further when
reliability is small; for instance if you were really paranoid you might
want to use the square of reliability in line 4.  That might be
overdoing it, though.

I don't know. That's maybe better, but I'd be willing to wager that
in some cases it will just slow down the rate at which we converge to
a completely incorrect value, while in other cases it'll fail to
update the data when it really has changed.

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables. Your email seems to indicate that regular
tables are already handled well enough, and certainly if we only whack
around the TOAST behavior it's much less likely to fry anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kevin Grittner (#9)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Kevin Grittner's message of mié may 25 12:37:24 -0400 2011:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know what client-side code might be looking at
relpages/reltuples.

I know that I find reltuples useful for getting an "accurate enough"
sense of rows in a table (or set of tables) without resorting to
count(*). I'd be OK with any two of pages, tuples, and density; but
dropping to one would make databases harder to manage, IMV.

Personally, I don't have much code that uses those columns;
eliminating an existing column wouldn't involve much pain for me as
long as it could still be derived.

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

(We feel free to hack around catalogs in other places, and we regularly
break pgAdmin and the like when we drop columns -- people just live with
it and update their tools. I don't think it's such a big deal in this
particular case.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#11)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 25, 2011 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, I had been thinking about the latter point. �We could be
conservative and just keep the reported tuple density the same (ie,
update relpages to the new correct value, while setting reltuples so
that the density ratio doesn't change). �But that has its own problems
when the table contents *do* change. �What I'm currently imagining is
to do a smoothed moving average, where we factor in the new density
estimate with a weight dependent on the percentage of the table we did
scan. �That is, the calculation goes something like

old_density = old_reltuples / old_relpages
new_density = counted_tuples / scanned_pages
reliability = scanned_pages / new_relpages
updated_density = old_density + (new_density - old_density) * reliability
new_reltuples = updated_density * new_relpages

We could slow the moving-average convergence even further when
reliability is small; for instance if you were really paranoid you might
want to use the square of reliability in line 4. �That might be
overdoing it, though.

I don't know. That's maybe better, but I'd be willing to wager that
in some cases it will just slow down the rate at which we converge to
a completely incorrect value, while in other cases it'll fail to
update the data when it really has changed.

[ shrug... ] When you don't have complete information, it's *always*
the case that you will sometimes make a mistake. That's not
justification for paralysis, especially not when the existing code is
demonstrably broken.

What occurs to me after thinking a bit more is that the existing tuple
density is likely to be only an estimate, too (one coming from the last
ANALYZE, which could very well have scanned even less of the table than
VACUUM did). So what I now think is that both VACUUM and ANALYZE ought
to use a calculation of the above form to arrive at a new value for
pg_class.reltuples. In both cases it would be pretty easy to track the
number of pages we looked at while counting tuples, so the same raw
information is available.

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables.

Because the problem is not specific to TOAST tables. As things
currently stand, we will accept the word of an ANALYZE as gospel even if
it scanned 1% of the table, and completely ignore the results from a
VACUUM even if it scanned 99% of the table. This is not sane.

regards, tom lane

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#11)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Robert Haas's message of mié may 25 12:54:28 -0400 2011:

I don't know. That's maybe better, but I'd be willing to wager that
in some cases it will just slow down the rate at which we converge to
a completely incorrect value, while in other cases it'll fail to
update the data when it really has changed.

For regular tables I don't think there's a real problem because it'll
get fixed next time a full scan happens anyway. For toast tables, I
think the set of operations is limited enough that this is easy to prove
correct (or fixed so that it is) -- no HOT updates (in fact no updates
at all), etc.

BTW one thing we haven't fixed at all is how do HOT updates affect
vacuuming behavior ...

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables. Your email seems to indicate that regular
tables are already handled well enough, and certainly if we only whack
around the TOAST behavior it's much less likely to fry anything.

Well, having two paths one of which is uncommonly used means that it
will get all the bugs. As with the xl_commit WAL record comment from
Simon, I'd rather stick with having a single one.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#15Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ shrug... ]  When you don't have complete information, it's *always*
the case that you will sometimes make a mistake.  That's not
justification for paralysis, especially not when the existing code is
demonstrably broken.

What occurs to me after thinking a bit more is that the existing tuple
density is likely to be only an estimate, too (one coming from the last
ANALYZE, which could very well have scanned even less of the table than
VACUUM did).  So what I now think is that both VACUUM and ANALYZE ought
to use a calculation of the above form to arrive at a new value for
pg_class.reltuples.  In both cases it would be pretty easy to track the
number of pages we looked at while counting tuples, so the same raw
information is available.

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables.

Because the problem is not specific to TOAST tables.  As things
currently stand, we will accept the word of an ANALYZE as gospel even if
it scanned 1% of the table, and completely ignore the results from a
VACUUM even if it scanned 99% of the table.  This is not sane.

I agree that if VACUUM scanned 99% of the table, it's probably fine to
use its numbers. It's also fine to use the numbers from ANALYZE,
because those pages are chosen randomly. What bothers me is the idea
of using a small *non-random* sample, and I'm not sure that
incorporating possibly-bogus results slowly is any better than
incorporating them quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Alvaro Herrera (#12)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011/5/25 Alvaro Herrera <alvherre@commandprompt.com>:

Excerpts from Kevin Grittner's message of mié may 25 12:37:24 -0400 2011:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know what client-side code might be looking at
relpages/reltuples.

I know that I find reltuples useful for getting an "accurate enough"
sense of rows in a table (or set of tables) without resorting to
count(*).  I'd be OK with any two of pages, tuples, and density; but
dropping to one would make databases harder to manage, IMV.

Personally, I don't have much code that uses those columns;
eliminating an existing column wouldn't involve much pain for me as
long as it could still be derived.

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

(We feel free to hack around catalogs in other places, and we regularly
break pgAdmin and the like when we drop columns -- people just live with
it and update their tools.  I don't think it's such a big deal in this
particular case.)

I may miss something but we need relation size in costsize.c even if
we have a reldensity (or we need a reltuples). Else what values should
be used to estimate the relation size ? (pg_relation_size() goes down
to kernel/fs to ask the stat.st_size, is it really what we want?)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#17Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#15)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 1:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I agree that if VACUUM scanned 99% of the table, it's probably fine to
use its numbers.  It's also fine to use the numbers from ANALYZE,
because those pages are chosen randomly.  What bothers me is the idea
of using a small *non-random* sample, and I'm not sure that
incorporating possibly-bogus results slowly is any better than
incorporating them quickly.

In particular, unless I'm misremembering, VACUUM *always* scans the
first few pages of the table, until it sees enough consecutive
all-visible bits that it decides to start skipping. If I'm right
about that, then those pages could easily end up being overweighted
when VACUUM does the counting; especially if ANALYZE or an actual
full-table vacuum aren't allowed to snap the count back to reality.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Cédric Villemain (#16)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Cédric Villemain's message of mié may 25 13:24:01 -0400 2011:

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

I may miss something but we need relation size in costsize.c even if
we have a reldensity (or we need a reltuples). Else what values should
be used to estimate the relation size ? (pg_relation_size() goes down
to kernel/fs to ask the stat.st_size, is it really what we want?)

Actually yes, the planner does go to kernel to determine the current
relation size, and then multiplies by density as computed from catalog
data to figure out current reasonably accurate number of tuples.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#19Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Alvaro Herrera (#18)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011/5/25 Alvaro Herrera <alvherre@commandprompt.com>:

Excerpts from Cédric Villemain's message of mié may 25 13:24:01 -0400 2011:

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

I may miss something but we need relation size in costsize.c even if
we have a reldensity (or we need a reltuples). Else what values should
be used to estimate the relation size ? (pg_relation_size() goes down
to kernel/fs to ask the stat.st_size, is it really what we want?)

Actually yes, the planner does go to kernel to determine the current
relation size, and then multiplies by density as computed from catalog
data to figure out current reasonably accurate number of tuples.

Okay! I just read that part. Interesting.
(If I dive correctly, we search our last segment and then use a
fileseek to the end of this segment to get our information)

make more sense, suddendly :)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 25, 2011 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Because the problem is not specific to TOAST tables. �As things
currently stand, we will accept the word of an ANALYZE as gospel even if
it scanned 1% of the table, and completely ignore the results from a
VACUUM even if it scanned 99% of the table. �This is not sane.

I agree that if VACUUM scanned 99% of the table, it's probably fine to
use its numbers. It's also fine to use the numbers from ANALYZE,
because those pages are chosen randomly. What bothers me is the idea
of using a small *non-random* sample, and I'm not sure that
incorporating possibly-bogus results slowly is any better than
incorporating them quickly.

The above is simply fuzzy thinking. The fact that ANALYZE looked at a
random subset of pages is *no guarantee whatsoever* that its results are
highly accurate. They might be more trustworthy than VACUUM's nonrandom
sample of a similar number of pages, but it doesn't hold even a little
bit of water to claim that we should believe ANALYZE completely and
VACUUM not at all even when the latter has looked at a significantly
larger sample of pages.

In any case, your line of thought doesn't help us for fixing the problem
with toast tables, because we aren't going to start doing ANALYZEs on
toast tables.

The bottom line here is that making use of stats we have is a lot better
than not making use of them, even if they aren't entirely trustworthy.

regards, tom lane

#21Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
#23Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#22)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#26)
#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#28)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#27)
#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#31)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#36)
#39Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#39)
#41Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Tom Lane (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Cédric Villemain (#41)
#43Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#43)
#45Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#45)
#47Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#46)
#48Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Tom Lane (#42)