pg_class reltuples/relpages not updated by autovacuum/vacuum

Started by Florian Helmbergerabout 15 years ago48 messageshackers

fh@25th-floor.com

about 15 years ago

Hi.

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

Currently, there are no long running transactions and/or dumps running.

I've confirmed that PostgreSQL is using the values stored in pg_class
for it's calculations, it starts autovacuum for the table at around
375k dead rows (threshold is 50, scale_factor 0.2 (both default)).

Additionally I've done manual VACUUM ANALYZE of both the parent table
and the TOAST table, which didn't help either.

Other databases with the same hardware, PostgreSQL and OS versions don't
have this issue.

Currently I've worked around the issue by disabling autovacuum for the
TOAST table and doing manual VACUUM ANALYZE once a week.

Any clue how to get PostgreSQL to store the correct values?

Side note: while trying to debug this I've noticed, that the TOAST
chunks on 32-bit systems have the documented size of 2000 bytes, on
64-bit systems they have 1996 bytes. Is this normal/on purpose?

Regards,
Florian Helmberger

Florian Helmberger --------------------

25th-floor - Operating Custom Solutions
de Pretis & Helmberger KG

Gluckgasse 2/6, 1010 Wien, Austria

Mail: fh@25th-floor.com
Web : http://www.25th-floor.com
Tel.: +43 1 / 512 82 89 - 60
Fax : +43 1 / 512 82 89 - 76
Mob.: +43 699 / 109 24 24 5
---------------------------------------

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Florian Helmberger (#1)

Re: pg_class reltuples/relpages not updated by autovacuum/vacuum

Florian Helmberger <fh@25th-floor.com> writes:

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

That's exceedingly weird. Do the pg_stat_all_tables columns update
after autovacuums on that table?

regards, tom lane

Florian Helmberger

fh@25th-floor.com

about 15 years ago

In reply to: Tom Lane (#2)

Re: pg_class reltuples/relpages not updated by autovacuum/vacuum

On 25.05.11 04:47, Tom Lane wrote:

Florian Helmberger<fh@25th-floor.com> writes:

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

That's exceedingly weird. Do the pg_stat_all_tables columns update
after autovacuums on that table?

Hi Tom,

Yes they do:

That was the last autovacuum run before I disabled it (via storage
parameter on the main table) and switched to manual vacuum's once per week.

I've also rechecked the "sister" database (same Hareware, OS/PostgreSQL
Version and database schema) which is working as intended.

Regards,
Florian

Florian Helmberger --------------------

25th-floor - Operating Custom Solutions
de Pretis & Helmberger KG

Gluckgasse 2/6, 1010 Wien, Austria

Mail: fh@25th-floor.com
Web : http://www.25th-floor.com
Tel.: +43 1 / 512 82 89 - 60
Fax : +43 1 / 512 82 89 - 76
Mob.: +43 699 / 109 24 24 5
---------------------------------------

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Florian Helmberger (#3)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Florian Helmberger <fh@25th-floor.com> writes:

On 25.05.11 04:47, Tom Lane wrote:

Florian Helmberger<fh@25th-floor.com> writes:

I'm running a production database with PostgreSQL 9.0.3 (64-bit) on
Debian 5.0.4 and have an issue with a TOAST table and far to frequent
autovacuum runs.

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:

relpages | 433596
reltuples | 1868538

These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:

pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain

but these aren't stored in pg_class after each run.

That's exceedingly weird. Do the pg_stat_all_tables columns update
after autovacuums on that table?

Yes they do:

I think I see what must be going on here: that toast table must contain
a long run of all-visible-according-to-the-VM pages (which is hardly a
surprising situation). This results in VACUUM choosing not to update
the pg_class entry:

/*
* Update statistics in pg_class. But only if we didn't skip any pages;
* the tuple count only includes tuples from the pages we've visited, and
* we haven't frozen tuples in unvisited pages either. The page count is
* accurate in any case, but because we use the reltuples / relpages ratio
* in the planner, it's better to not update relpages either if we can't
* update reltuples.
*/
if (vacrelstats->scanned_all)
vac_update_relstats(onerel,
vacrelstats->rel_pages, vacrelstats->rel_tuples,
vacrelstats->hasindex,
FreezeLimit);

For an ordinary table this wouldn't be fatal because we'll still do an
ANALYZE from time to time, and that will update the entries with new
(approximate) values. But we never run ANALYZE on toast tables.

And this would *still* be okay, because as noted in the comment, the
planner only depends on the ratio being roughly correct, not on either
individual value being current. But autovacuum didn't get the memo;
it thinks it can use reltuples to make decisions.

I can see two basic approaches we might take here:

1. Modify autovacuum to use something from the stats collector, rather
than reltuples, to make its decisions. I'm not too clear on why
reltuples is being used there anyway; is there some good algorithmic or
statistical reason why AV should be looking at a number from the last
vacuum?

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

Thoughts?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Florian Helmberger (#1)

Re: pg_class reltuples/relpages not updated by autovacuum/vacuum

Florian Helmberger <fh@25th-floor.com> writes:

I think I've pinned the problem down to the values pg_class holds for
the affected TOAST table:
relpages | 433596
reltuples | 1868538
These values are significantly too low. Interestingly, the autovacuum
logout reports the correct values:
pages: 0 removed, 34788136 remain
tuples: 932487 removed, 69599038 remain
but these aren't stored in pg_class after each run.

I've moved discussion of this to pgsql-hackers, since this appears to be
an actual bug.

Side note: while trying to debug this I've noticed, that the TOAST
chunks on 32-bit systems have the documented size of 2000 bytes, on
64-bit systems they have 1996 bytes. Is this normal/on purpose?

I don't have the exact numbers in my head, but the TOAST chunk size does
depend on a MAXALIGN calculation, so it being different between 32- and
64-bit isn't surprising.

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

about 15 years ago

In reply to: Tom Lane (#4)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Tom Lane's message of mié may 25 11:47:52 -0400 2011:

I think I see what must be going on here: that toast table must contain
a long run of all-visible-according-to-the-VM pages (which is hardly a
surprising situation). This results in VACUUM choosing not to update
the pg_class entry:

/*
* Update statistics in pg_class. But only if we didn't skip any pages;
* the tuple count only includes tuples from the pages we've visited, and
* we haven't frozen tuples in unvisited pages either. The page count is
* accurate in any case, but because we use the reltuples / relpages ratio
* in the planner, it's better to not update relpages either if we can't
* update reltuples.
*/
if (vacrelstats->scanned_all)
vac_update_relstats(onerel,
vacrelstats->rel_pages, vacrelstats->rel_tuples,
vacrelstats->hasindex,
FreezeLimit);

For an ordinary table this wouldn't be fatal because we'll still do an
ANALYZE from time to time, and that will update the entries with new
(approximate) values. But we never run ANALYZE on toast tables.

Ouch.

I can see two basic approaches we might take here:

1. Modify autovacuum to use something from the stats collector, rather
than reltuples, to make its decisions. I'm not too clear on why
reltuples is being used there anyway; is there some good algorithmic or
statistical reason why AV should be looking at a number from the last
vacuum?

It uses reltuples simply because it was what the original contrib code
was using. Since pgstat was considerably weaker at the time, reltuples
might have been the only thing available. It's certainly the case that
pgstat has improved a lot since autovacuum got in, and some things have
been revised but not this one.

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

Hmm, interesting idea. This would be done only for toast tables, or all
tables?

At this point I only wonder why we store tuples & pages rather than just
live tuple density.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Tom Lane (#4)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

This approach doesn't seem like a good idea to me. The skipped
portions of the table are the ones that haven't been modified in a
while, so this is or embeds an assumption that the tuples in the hot
and cold portions of the table are the same size. That might be true,
but it isn't hard to think of cases where it might not be. There
could also very easily be sampling error, because it's easy to imagine
a situation where 99% of the table is getting skipped. Any error that
creeps into the estimate is going to get scaled up along with the
estimate itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Alvaro Herrera (#6)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of mié may 25 11:47:52 -0400 2011:

I can see two basic approaches we might take here:

1. Modify autovacuum to use something from the stats collector, rather
than reltuples, to make its decisions. I'm not too clear on why
reltuples is being used there anyway; is there some good algorithmic or
statistical reason why AV should be looking at a number from the last
vacuum?

It uses reltuples simply because it was what the original contrib code
was using. Since pgstat was considerably weaker at the time, reltuples
might have been the only thing available. It's certainly the case that
pgstat has improved a lot since autovacuum got in, and some things have
been revised but not this one.

On reflection I'm hesitant to do this, especially for a backpatched bug
fix, because it would be changing the feedback loop behavior for
autovacuum scheduling. That could have surprising consequences.

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

Hmm, interesting idea. This would be done only for toast tables, or all
tables?

I'm thinking just do it for all. The fact that these numbers don't
necessarily update after a vacuum is certainly surprising in and of
itself, and it did not work that way before the VM patch went in.
I'm concerned about other stuff besides AV not dealing well with
obsolete values.

At this point I only wonder why we store tuples & pages rather than just
live tuple density.

It's just for backwards compatibility. I've thought about doing that in
the past, but I don't know what client-side code might be looking at
relpages/reltuples. It's not like collapsing them into one field would
save much, anyway.

regards, tom lane

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 15 years ago

In reply to: Tom Lane (#8)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know what client-side code might be looking at
relpages/reltuples.

I know that I find reltuples useful for getting an "accurate enough"
sense of rows in a table (or set of tables) without resorting to
count(*). I'd be OK with any two of pages, tuples, and density; but
dropping to one would make databases harder to manage, IMV.

Personally, I don't have much code that uses those columns;
eliminating an existing column wouldn't involve much pain for me as
long as it could still be derived.

-Kevin

#10

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#7)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 25, 2011 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. Revise the vacuum code so that it doesn't skip updating the pg_class
entries. �We could have it count the number of pages it skipped, rather
than just keeping a bool, and then scale up the rel_tuples count to be
approximately right by assuming the skipped pages have tuple density
similar to the scanned ones.

This approach doesn't seem like a good idea to me. The skipped
portions of the table are the ones that haven't been modified in a
while, so this is or embeds an assumption that the tuples in the hot
and cold portions of the table are the same size. That might be true,
but it isn't hard to think of cases where it might not be. There
could also very easily be sampling error, because it's easy to imagine
a situation where 99% of the table is getting skipped.

Yeah, I had been thinking about the latter point. We could be
conservative and just keep the reported tuple density the same (ie,
update relpages to the new correct value, while setting reltuples so
that the density ratio doesn't change). But that has its own problems
when the table contents *do* change. What I'm currently imagining is
to do a smoothed moving average, where we factor in the new density
estimate with a weight dependent on the percentage of the table we did
scan. That is, the calculation goes something like

old_density = old_reltuples / old_relpages
new_density = counted_tuples / scanned_pages
reliability = scanned_pages / new_relpages
updated_density = old_density + (new_density - old_density) * reliability
new_reltuples = updated_density * new_relpages

We could slow the moving-average convergence even further when
reliability is small; for instance if you were really paranoid you might
want to use the square of reliability in line 4. That might be
overdoing it, though.

regards, tom lane

#11

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Tom Lane (#10)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, I had been thinking about the latter point. We could be
conservative and just keep the reported tuple density the same (ie,
update relpages to the new correct value, while setting reltuples so
that the density ratio doesn't change). But that has its own problems
when the table contents *do* change. What I'm currently imagining is
to do a smoothed moving average, where we factor in the new density
estimate with a weight dependent on the percentage of the table we did
scan. That is, the calculation goes something like

old_density = old_reltuples / old_relpages
new_density = counted_tuples / scanned_pages
reliability = scanned_pages / new_relpages
updated_density = old_density + (new_density - old_density) * reliability
new_reltuples = updated_density * new_relpages

We could slow the moving-average convergence even further when
reliability is small; for instance if you were really paranoid you might
want to use the square of reliability in line 4. That might be
overdoing it, though.

I don't know. That's maybe better, but I'd be willing to wager that
in some cases it will just slow down the rate at which we converge to
a completely incorrect value, while in other cases it'll fail to
update the data when it really has changed.

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables. Your email seems to indicate that regular
tables are already handled well enough, and certainly if we only whack
around the TOAST behavior it's much less likely to fry anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Alvaro Herrera

alvherre@2ndquadrant.com

about 15 years ago

In reply to: Kevin Grittner (#9)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Kevin Grittner's message of mié may 25 12:37:24 -0400 2011:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know what client-side code might be looking at
relpages/reltuples.

I know that I find reltuples useful for getting an "accurate enough"
sense of rows in a table (or set of tables) without resorting to
count(*). I'd be OK with any two of pages, tuples, and density; but
dropping to one would make databases harder to manage, IMV.

Personally, I don't have much code that uses those columns;
eliminating an existing column wouldn't involve much pain for me as
long as it could still be derived.

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

(We feel free to hack around catalogs in other places, and we regularly
break pgAdmin and the like when we drop columns -- people just live with
it and update their tools. I don't think it's such a big deal in this
particular case.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#13

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#11)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 25, 2011 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, I had been thinking about the latter point. �We could be
conservative and just keep the reported tuple density the same (ie,
update relpages to the new correct value, while setting reltuples so
that the density ratio doesn't change). �But that has its own problems
when the table contents *do* change. �What I'm currently imagining is
to do a smoothed moving average, where we factor in the new density
estimate with a weight dependent on the percentage of the table we did
scan. �That is, the calculation goes something like

old_density = old_reltuples / old_relpages
new_density = counted_tuples / scanned_pages
reliability = scanned_pages / new_relpages
updated_density = old_density + (new_density - old_density) * reliability
new_reltuples = updated_density * new_relpages

We could slow the moving-average convergence even further when
reliability is small; for instance if you were really paranoid you might
want to use the square of reliability in line 4. �That might be
overdoing it, though.

I don't know. That's maybe better, but I'd be willing to wager that
in some cases it will just slow down the rate at which we converge to
a completely incorrect value, while in other cases it'll fail to
update the data when it really has changed.

[ shrug... ] When you don't have complete information, it's *always*
the case that you will sometimes make a mistake. That's not
justification for paralysis, especially not when the existing code is
demonstrably broken.

What occurs to me after thinking a bit more is that the existing tuple
density is likely to be only an estimate, too (one coming from the last
ANALYZE, which could very well have scanned even less of the table than
VACUUM did). So what I now think is that both VACUUM and ANALYZE ought
to use a calculation of the above form to arrive at a new value for
pg_class.reltuples. In both cases it would be pretty easy to track the
number of pages we looked at while counting tuples, so the same raw
information is available.

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables.

Because the problem is not specific to TOAST tables. As things
currently stand, we will accept the word of an ANALYZE as gospel even if
it scanned 1% of the table, and completely ignore the results from a
VACUUM even if it scanned 99% of the table. This is not sane.

regards, tom lane

#14

Alvaro Herrera

alvherre@2ndquadrant.com

about 15 years ago

In reply to: Robert Haas (#11)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Robert Haas's message of mié may 25 12:54:28 -0400 2011:

I don't know. That's maybe better, but I'd be willing to wager that
in some cases it will just slow down the rate at which we converge to
a completely incorrect value, while in other cases it'll fail to
update the data when it really has changed.

For regular tables I don't think there's a real problem because it'll
get fixed next time a full scan happens anyway. For toast tables, I
think the set of operations is limited enough that this is easy to prove
correct (or fixed so that it is) -- no HOT updates (in fact no updates
at all), etc.

BTW one thing we haven't fixed at all is how do HOT updates affect
vacuuming behavior ...

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables. Your email seems to indicate that regular
tables are already handled well enough, and certainly if we only whack
around the TOAST behavior it's much less likely to fry anything.

Well, having two paths one of which is uncommonly used means that it
will get all the bugs. As with the xl_commit WAL record comment from
Simon, I'd rather stick with having a single one.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#15

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Tom Lane (#13)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ shrug... ] When you don't have complete information, it's *always*
the case that you will sometimes make a mistake. That's not
justification for paralysis, especially not when the existing code is
demonstrably broken.

What occurs to me after thinking a bit more is that the existing tuple
density is likely to be only an estimate, too (one coming from the last
ANALYZE, which could very well have scanned even less of the table than
VACUUM did). So what I now think is that both VACUUM and ANALYZE ought
to use a calculation of the above form to arrive at a new value for
pg_class.reltuples. In both cases it would be pretty easy to track the
number of pages we looked at while counting tuples, so the same raw
information is available.

I am wondering, though, why we're not just inserting a special-purpose
hack for TOAST tables.

Because the problem is not specific to TOAST tables. As things
currently stand, we will accept the word of an ANALYZE as gospel even if
it scanned 1% of the table, and completely ignore the results from a
VACUUM even if it scanned 99% of the table. This is not sane.

I agree that if VACUUM scanned 99% of the table, it's probably fine to
use its numbers. It's also fine to use the numbers from ANALYZE,
because those pages are chosen randomly. What bothers me is the idea
of using a small *non-random* sample, and I'm not sure that
incorporating possibly-bogus results slowly is any better than
incorporating them quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Cédric Villemain

cedric.villemain.debian@gmail.com

about 15 years ago

In reply to: Alvaro Herrera (#12)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011/5/25 Alvaro Herrera <alvherre@commandprompt.com>:

Excerpts from Kevin Grittner's message of mié may 25 12:37:24 -0400 2011:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know what client-side code might be looking at
relpages/reltuples.

I know that I find reltuples useful for getting an "accurate enough"
sense of rows in a table (or set of tables) without resorting to
count(*). I'd be OK with any two of pages, tuples, and density; but
dropping to one would make databases harder to manage, IMV.

Personally, I don't have much code that uses those columns;
eliminating an existing column wouldn't involve much pain for me as
long as it could still be derived.

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

(We feel free to hack around catalogs in other places, and we regularly
break pgAdmin and the like when we drop columns -- people just live with
it and update their tools. I don't think it's such a big deal in this
particular case.)

I may miss something but we need relation size in costsize.c even if
we have a reldensity (or we need a reltuples). Else what values should
be used to estimate the relation size ? (pg_relation_size() goes down
to kernel/fs to ask the stat.st_size, is it really what we want?)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

#17

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Robert Haas (#15)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

On Wed, May 25, 2011 at 1:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I agree that if VACUUM scanned 99% of the table, it's probably fine to
use its numbers. It's also fine to use the numbers from ANALYZE,
because those pages are chosen randomly. What bothers me is the idea
of using a small *non-random* sample, and I'm not sure that
incorporating possibly-bogus results slowly is any better than
incorporating them quickly.

In particular, unless I'm misremembering, VACUUM *always* scans the
first few pages of the table, until it sees enough consecutive
all-visible bits that it decides to start skipping. If I'm right
about that, then those pages could easily end up being overweighted
when VACUUM does the counting; especially if ANALYZE or an actual
full-table vacuum aren't allowed to snap the count back to reality.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Alvaro Herrera

alvherre@2ndquadrant.com

about 15 years ago

In reply to: Cédric Villemain (#16)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Excerpts from Cédric Villemain's message of mié may 25 13:24:01 -0400 2011:

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

I may miss something but we need relation size in costsize.c even if
we have a reldensity (or we need a reltuples). Else what values should
be used to estimate the relation size ? (pg_relation_size() goes down
to kernel/fs to ask the stat.st_size, is it really what we want?)

Actually yes, the planner does go to kernel to determine the current
relation size, and then multiplies by density as computed from catalog
data to figure out current reasonably accurate number of tuples.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#19

Cédric Villemain

cedric.villemain.debian@gmail.com

about 15 years ago

In reply to: Alvaro Herrera (#18)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011/5/25 Alvaro Herrera <alvherre@commandprompt.com>:

Excerpts from Cédric Villemain's message of mié may 25 13:24:01 -0400 2011:

Well, we only actually need to store one number, because you can figure
out a much more precise number-of-pages figure with pg_relation_size()
divided by configured page size.

I may miss something but we need relation size in costsize.c even if
we have a reldensity (or we need a reltuples). Else what values should
be used to estimate the relation size ? (pg_relation_size() goes down
to kernel/fs to ask the stat.st_size, is it really what we want?)

Actually yes, the planner does go to kernel to determine the current
relation size, and then multiplies by density as computed from catalog
data to figure out current reasonably accurate number of tuples.

Okay! I just read that part. Interesting.
(If I dive correctly, we search our last segment and then use a
fileseek to the end of this segment to get our information)

make more sense, suddendly :)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

#20

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#15)

Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, May 25, 2011 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Because the problem is not specific to TOAST tables. �As things
currently stand, we will accept the word of an ANALYZE as gospel even if
it scanned 1% of the table, and completely ignore the results from a
VACUUM even if it scanned 99% of the table. �This is not sane.

I agree that if VACUUM scanned 99% of the table, it's probably fine to
use its numbers. It's also fine to use the numbers from ANALYZE,
because those pages are chosen randomly. What bothers me is the idea
of using a small *non-random* sample, and I'm not sure that
incorporating possibly-bogus results slowly is any better than
incorporating them quickly.

The above is simply fuzzy thinking. The fact that ANALYZE looked at a
random subset of pages is *no guarantee whatsoever* that its results are
highly accurate. They might be more trustworthy than VACUUM's nonrandom
sample of a similar number of pages, but it doesn't hold even a little
bit of water to claim that we should believe ANALYZE completely and
VACUUM not at all even when the latter has looked at a significantly
larger sample of pages.

In any case, your line of thought doesn't help us for fixing the problem
with toast tables, because we aren't going to start doing ANALYZEs on
toast tables.

The bottom line here is that making use of stats we have is a lot better
than not making use of them, even if they aren't entirely trustworthy.

regards, tom lane

#21

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Tom Lane (#20)

#22

Bruce Momjian

bruce@momjian.us

about 15 years ago

In reply to: Tom Lane (#10)

#23

Bruce Momjian

bruce@momjian.us

about 15 years ago

In reply to: Bruce Momjian (#22)

#24

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Bruce Momjian (#22)

#25

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Tom Lane (#24)

#26

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#25)

#27

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Tom Lane (#26)

#28

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 15 years ago

In reply to: Robert Haas (#27)

#29

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Kevin Grittner (#28)

#30

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 15 years ago

In reply to: Robert Haas (#29)

#31

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Kevin Grittner (#30)

#32

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#31)

#33

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#27)

#34

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 15 years ago

In reply to: Robert Haas (#31)

#35

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Kevin Grittner (#34)

#36

Robert Haas

robertmhaas@gmail.com

about 15 years ago

In reply to: Tom Lane (#35)

#37

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#36)

#38

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Robert Haas (#36)

#39

Bruce Momjian

bruce@momjian.us

about 15 years ago

In reply to: Tom Lane (#38)

#40

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Bruce Momjian (#39)

#41

Cédric Villemain

cedric.villemain.debian@gmail.com

about 15 years ago

In reply to: Tom Lane (#40)

#42

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Cédric Villemain (#41)

#43

Pavan Deolasee

pavan.deolasee@gmail.com

about 15 years ago

In reply to: Tom Lane (#42)

#44

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Pavan Deolasee (#43)

#45

Pavan Deolasee

pavan.deolasee@gmail.com

about 15 years ago

In reply to: Tom Lane (#44)

#46

Tom Lane

tgl@sss.pgh.pa.us

about 15 years ago

In reply to: Pavan Deolasee (#45)

#47

Pavan Deolasee

pavan.deolasee@gmail.com

about 15 years ago

In reply to: Tom Lane (#46)

#48

Cédric Villemain

cedric.villemain.debian@gmail.com

about 15 years ago

In reply to: Tom Lane (#42)