VACUUM and ANALYZE disagreeing on what reltuples means

Started by Tomas Vondraover 8 years ago26 messages
#1Tomas Vondra
tomas.vondra@2ndquadrant.com

Hi,

It seems to me that VACUUM and ANALYZE somewhat disagree on what exactly
reltuples means. VACUUM seems to be thinking that

reltuples = live + dead

while ANALYZE apparently believes that

reltuples = live

This causes somewhat bizarre changes in the value, depending on which of
those commands was executed last.

To demonstrate the issue, let's create a simple table with 1M rows,
delete 10% rows and then we'll do a bunch of VACUUM / ANALYZE and check
reltuples, n_live_tup and n_dead_tup in the catalogs.

I've disabled autovacuum so that it won't interfere with this, and
there's another transaction blocking VACUUM from actually cleaning any
dead tuples.

test=# create table t as
select i from generate_series(1,1000000) s(i);

test=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
1e+06 | 1000000 | 0

So, that's nice. Now let's delete 10% of rows, and run VACUUM and
ANALYZE a few times.

test=# delete from t where random() < 0.1;

test=# vacuum t;

test=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
1e+06 | 900413 | 99587

test=# analyze t;

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
900413 | 900413 | 99587

test=# vacuum t;

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
1e+06 | 900413 | 99587

So, analyze and vacuum disagree.

To further confuse the poor DBA, VACUUM always simply ignores the old
values while ANALYZE combines the old and new values on large tables
(and converges to the "correct" value after a few steps). This table is
small (less than 30k pages), so ANALYZE does not do that.

This is quite annoying, because people tend to look at reltuples while
investigating bloat (e.g. because the check_postgres query mentioned on
our wiki [1]https://wiki.postgresql.org/wiki/Show_database_bloat uses reltuples in the formula).

[1]: https://wiki.postgresql.org/wiki/Show_database_bloat

And when the cleanup is blocked for some reason (as in the example
above), VACUUM tends to be running much more often (because it can't
cleanup anything). So reltuples tend to be set to the higher value,
which I'd argue is the wrong value for estimating bloat.

I haven't looked at the code yet, but I've confirmed this happens both
on 9.6 and 10. I haven't checked older versions, but I guess those are
affected too.

The question is - which of the reltuples definitions is the right one?
I've always assumed that "reltuples = live + dead" but perhaps not?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#1)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

It seems to me that VACUUM and ANALYZE somewhat disagree on what exactly
reltuples means. VACUUM seems to be thinking that
reltuples = live + dead
while ANALYZE apparently believes that
reltuples = live

The question is - which of the reltuples definitions is the right one?
I've always assumed that "reltuples = live + dead" but perhaps not?

I think the planner basically assumes that reltuples is the live tuple
count, so maybe we'd better change VACUUM to get in step.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#2)
1 attachment(s)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

On 7/25/17 12:55 AM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

It seems to me that VACUUM and ANALYZE somewhat disagree on what
exactly reltuples means. VACUUM seems to be thinking that reltuples
= live + dead while ANALYZE apparently believes that reltuples =
live

The question is - which of the reltuples definitions is the right
one? I've always assumed that "reltuples = live + dead" but perhaps
not?

I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.

Attached is a patch that (I think) does just that. The disagreement was
caused by VACUUM treating recently dead tuples as live, while ANALYZE
treats both of those as dead.

At first I was worried that this will negatively affect plans in the
long-running transaction, as it will get underestimates (due to
reltuples not including rows it can see). But that's a problem we
already have anyway, you just need to run ANALYZE in the other session.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

vacuum-reltuples-fix.patchtext/x-patch; name=vacuum-reltuples-fix.patchDownload
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c5c194a..574ca70 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1261,7 +1261,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
 														 nblocks,
 												 vacrelstats->tupcount_pages,
-														 num_tuples);
+														 num_tuples - nkeep);
 
 	/*
 	 * Release any remaining pin on visibility map page.
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#3)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 7/25/17 12:55 AM, Tom Lane wrote:

I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.

Attached is a patch that (I think) does just that. The disagreement was
caused by VACUUM treating recently dead tuples as live, while ANALYZE
treats both of those as dead.

At first I was worried that this will negatively affect plans in the
long-running transaction, as it will get underestimates (due to
reltuples not including rows it can see). But that's a problem we
already have anyway, you just need to run ANALYZE in the other session.

This definitely will have some impact on plans, at least in cases where
there's a significant number of unvacuumable dead tuples. So I think
it's a bit late for v10, and I wouldn't want to back-patch at all.
Please add to the next commitfest.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

On 7/25/17 5:04 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 7/25/17 12:55 AM, Tom Lane wrote:

I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.

Attached is a patch that (I think) does just that. The disagreement
was caused by VACUUM treating recently dead tuples as live, while
ANALYZE treats both of those as dead.

At first I was worried that this will negatively affect plans in
the long-running transaction, as it will get underestimates (due
to reltuples not including rows it can see). But that's a problem
we already have anyway, you just need to run ANALYZE in the other
session.

This definitely will have some impact on plans, at least in cases
where there's a significant number of unvacuumable dead tuples. So I
think it's a bit late for v10, and I wouldn't want to back-patch at
all. Please add to the next commitfest.

I dare to disagree here, for two reasons.

Firstly, the impact *is* already there, it only takes running ANALYZE.
Or VACUUM ANALYZE. In both those cases we already end up with
reltuples=n_live_tup.

Secondly, I personally strongly prefer stable predictable behavior over
intermittent oscillations between two values. That's a major PITA on
production, both to investigate and fix.

So people already have this issue, although it only strikes randomly.
And no way to fix it (well, except for fixing the cleanup, but that may
not be possible).

It is true we tend to run VACUUM more often than ANALYZE, particularly
in situations where the cleanup can't proceed - ANALYZE will do it's
work and VACUUM will be triggered over and over again, so it "wins" this
way. But I'm not sure that's something we should rely on.

FWIW I personally see this as a fairly annoying bug, and would vote to
backpatch it, although I understand people might object. But I don't
quite see a reason not to fix this in v10.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Noah Misch
noah@leadboat.com
In reply to: Tomas Vondra (#5)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

On Tue, Jul 25, 2017 at 07:02:28PM +0200, Tomas Vondra wrote:

On 7/25/17 5:04 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Attached is a patch that (I think) does just that. The disagreement
was caused by VACUUM treating recently dead tuples as live, while
ANALYZE treats both of those as dead.

At first I was worried that this will negatively affect plans in
the long-running transaction, as it will get underestimates (due
to reltuples not including rows it can see). But that's a problem
we already have anyway, you just need to run ANALYZE in the other
session.

This definitely will have some impact on plans, at least in cases
where there's a significant number of unvacuumable dead tuples. So I
think it's a bit late for v10, and I wouldn't want to back-patch at
all. Please add to the next commitfest.

I dare to disagree here, for two reasons.

Firstly, the impact *is* already there, it only takes running ANALYZE. Or
VACUUM ANALYZE. In both those cases we already end up with
reltuples=n_live_tup.

Secondly, I personally strongly prefer stable predictable behavior over
intermittent oscillations between two values. That's a major PITA on
production, both to investigate and fix.

FWIW I personally see this as a fairly annoying bug, and would vote to
backpatch it, although I understand people might object.

I tend to agree. If you have a setup that somehow never uses ANALYZE or never
uses VACUUM on a particular table, you might prefer today's (accidental)
behavior. However, the discrepancy arises only on a table that gets dead
tuples, and a table that gets dead tuples will see both VACUUM and ANALYZE
soon enough. It does seem like quite a stretch to imagine someone wanting
plans to depend on which of VACUUM or ANALYZE ran most recently.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Tomas Vondra (#3)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 7/25/17 12:55 AM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

It seems to me that VACUUM and ANALYZE somewhat disagree on what
exactly reltuples means. VACUUM seems to be thinking that reltuples
= live + dead while ANALYZE apparently believes that reltuples =
live

The question is - which of the reltuples definitions is the right

one? I've always assumed that "reltuples = live + dead" but perhaps
not?

I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.

Attached is a patch that (I think) does just that. The disagreement was
caused by VACUUM treating recently dead tuples as live, while ANALYZE
treats both of those as dead.

At first I was worried that this will negatively affect plans in the
long-running transaction, as it will get underestimates (due to reltuples
not including rows it can see). But that's a problem we already have
anyway, you just need to run ANALYZE in the other session.

Thanks for the patch.
From the mail, I understand that this patch tries to improve the
reltuples value update in the catalog table by the vacuum command
to consider the proper visible tuples similar like analyze command.

- num_tuples);
+ num_tuples - nkeep);

With the above correction, there is a problem in reporting the number
of live tuples to the stats.

postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899818 | 799636 | 100182
(1 row)

The live tuples data value is again decremented with dead tuples
value before sending them to stats in function lazy_vacuum_rel(),

/* report results to the stats collector, too */
new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;

The fix needs a correction here also. Or change the correction in
lazy_vacuum_rel() function itself before updating catalog table similar
like stats.

While testing this patch, I found another problem that is not related to
this patch. When the vacuum command is executed mutiple times on
a table with no dead rows, the number of reltuples value is slowly
reducing.

postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899674 | 899674 | 0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899622 | 899622 | 0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899570 | 899570 | 0
(1 row)

In lazy_scan_heap() function, we force to scan the last page of the
relation to avoid the access exclusive lock in lazy_truncate_heap
if there are tuples in the last page. Because of this reason, the
scanned_pages value will never be 0, so the vac_estimate_reltuples
function will estimate the tuples based on the number of tuples
from the last page of the relation. This estimation is leading to
reduce the number of retuples.

I am thinking whether this problem really happen in real world scenarios
to produce a fix?

Regards,
Hari Babu
Fujitsu Australia

#8Daniel Gustafsson
daniel@yesql.se
In reply to: Haribabu Kommi (#7)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

On 06 Sep 2017, at 09:45, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
On 7/25/17 12:55 AM, Tom Lane wrote:
Tomas Vondra <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> writes:
It seems to me that VACUUM and ANALYZE somewhat disagree on what
exactly reltuples means. VACUUM seems to be thinking that reltuples
= live + dead while ANALYZE apparently believes that reltuples =
live

The question is - which of the reltuples definitions is the right
one? I've always assumed that "reltuples = live + dead" but perhaps
not?

I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.

Attached is a patch that (I think) does just that. The disagreement was caused by VACUUM treating recently dead tuples as live, while ANALYZE treats both of those as dead.

At first I was worried that this will negatively affect plans in the long-running transaction, as it will get underestimates (due to reltuples not including rows it can see). But that's a problem we already have anyway, you just need to run ANALYZE in the other session.

Thanks for the patch.
From the mail, I understand that this patch tries to improve the
reltuples value update in the catalog table by the vacuum command
to consider the proper visible tuples similar like analyze command.

-														 num_tuples);
+														 num_tuples - nkeep);

With the above correction, there is a problem in reporting the number
of live tuples to the stats.

postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899818 | 799636 | 100182
(1 row)

The live tuples data value is again decremented with dead tuples
value before sending them to stats in function lazy_vacuum_rel(),

/* report results to the stats collector, too */
new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;

The fix needs a correction here also. Or change the correction in
lazy_vacuum_rel() function itself before updating catalog table similar
like stats.

This patch is marked Waiting for Author, have you had a chance to look at this
to address the comments in the above review?

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Haribabu Kommi (#7)
1 attachment(s)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

On 09/06/2017 09:45 AM, Haribabu Kommi wrote:

On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

On 7/25/17 12:55 AM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com
<mailto:tomas.vondra@2ndquadrant.com>> writes:

It seems to me that VACUUM and ANALYZE somewhat disagree on what
exactly reltuples means. VACUUM seems to be thinking that
reltuples
= live + dead while ANALYZE apparently believes that reltuples =
live

The question is - which of the reltuples definitions is the
right
one? I've always assumed that "reltuples = live + dead" but
perhaps
not?

I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.

Attached is a patch that (I think) does just that. The disagreement
was caused by VACUUM treating recently dead tuples as live, while
ANALYZE treats both of those as dead.

At first I was worried that this will negatively affect plans in the
long-running transaction, as it will get underestimates (due to
reltuples not including rows it can see). But that's a problem we
already have anyway, you just need to run ANALYZE in the other session.

Thanks for the patch.
From the mail, I understand that this patch tries to improve the
reltuples value update in the catalog table by the vacuum command
to consider the proper visible tuples similar like analyze command.

-num_tuples);
+num_tuples - nkeep);

With the above correction, there is a problem in reporting the number
of live tuples to the stats.

postgres=# select reltuples, n_live_tup, n_dead_tup
              from pg_stat_user_tables join pg_class using (relname)
             where relname = 't';
 reltuples | n_live_tup | n_dead_tup 
-----------+------------+------------
    899818 |     799636 |     100182
(1 row)

The live tuples data value is again decremented with dead tuples
value before sending them to stats in function lazy_vacuum_rel(),

/* report results to the stats collector, too */
new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;

The fix needs a correction here also. Or change the correction in 
lazy_vacuum_rel() function itself before updating catalog table similar
like stats.

Ah, haven't noticed that for some reason - you're right, we estimate the
reltuples based on (num_tuples - nkeep), so it doesn't make much sense
to subtract nkeep again. Attached is v2 of the fix.

I've removed the subtraction from lazy_vacuum_rel(), leaving just

new_live_tuples = new_rel_tuples;

and now it behaves as expected (no second subtraction). That means we
can get rid of new_live_tuples altogether (and the protection against
negative values), and use new_rel_tuples directly.

Which pretty much means that in this case

(pg_class.reltuples == pg_stat_user_tables.n_live_tup)

but I guess that's fine, based on the initial discussion in this thread.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

vacuum-reltuples-fix-v2.patchtext/x-patch; name=vacuum-reltuples-fix-v2.patchDownload
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 45b1859..1886f0d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -198,7 +198,6 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	BlockNumber new_rel_pages;
 	double		new_rel_tuples;
 	BlockNumber new_rel_allvisible;
-	double		new_live_tuples;
 	TransactionId new_frozen_xid;
 	MultiXactId new_min_multi;
 
@@ -335,13 +334,9 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						false);
 
 	/* report results to the stats collector, too */
-	new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
-	if (new_live_tuples < 0)
-		new_live_tuples = 0;	/* just in case */
-
 	pgstat_report_vacuum(RelationGetRelid(onerel),
 						 onerel->rd_rel->relisshared,
-						 new_live_tuples,
+						 new_rel_tuples,
 						 vacrelstats->new_dead_tuples);
 	pgstat_progress_end_command();
 
@@ -1267,7 +1262,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
 														 nblocks,
 														 vacrelstats->tupcount_pages,
-														 num_tuples);
+														 num_tuples - nkeep);
 
 	/*
 	 * Release any remaining pin on visibility map page.
#10Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Haribabu Kommi (#7)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

Hi,

Apologies, I forgot to respond to the second part of your message.

On 09/06/2017 09:45 AM, Haribabu Kommi wrote:

While testing this patch, I found another problem that is not related to
this patch. When the vacuum command is executed mutiple times on
a table with no dead rows, the number of reltuples value is slowly
reducing.

postgres=# select reltuples, n_live_tup, n_dead_tup
              from pg_stat_user_tables join pg_class using (relname)
             where relname = 't';
 reltuples | n_live_tup | n_dead_tup 
-----------+------------+------------
    899674 |     899674 |          0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
              from pg_stat_user_tables join pg_class using (relname)
             where relname = 't';
 reltuples | n_live_tup | n_dead_tup 
-----------+------------+------------
    899622 |     899622 |          0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
              from pg_stat_user_tables join pg_class using (relname)
             where relname = 't';
 reltuples | n_live_tup | n_dead_tup 
-----------+------------+------------
    899570 |     899570 |          0
(1 row)

In lazy_scan_heap() function, we force to scan the last page of the
relation to avoid the access exclusive lock in lazy_truncate_heap if
there are tuples in the last page. Because of this reason, the
scanned_pages value will never be 0, so the vac_estimate_reltuples
function will estimate the tuples based on the number of tuples from
the last page of the relation. This estimation is leading to reduce
the number of retuples.

Hmmm, that's annoying. Perhaps if we should not update the values in
this case, then? I mean, if we only scan the last page, how reliable the
derived values are?

For the record - AFAICS this issue is unrelated to do with the patch
(i.e. it's not introduced by it).

I am thinking whether this problem really happen in real world
scenarios to produce a fix?

Not sure.

As vacuum run decrements the query only a little bit, so you'd have to
run the vacuum many times to be actually bitten by it. For people
relying on autovacuum that won't happen, as it only runs on tables with
certain number of dead tuples.

So you'd have to be running VACUUM in a loop or something (but not
VACUUM ANALYZE, because that works fine) from a script, or something
like that.

That being said, fixing a bug is always a good thing I guess.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Tomas Vondra (#9)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

On Mon, Sep 25, 2017 at 4:39 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 09/06/2017 09:45 AM, Haribabu Kommi wrote:

On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>

wrote:

On 7/25/17 12:55 AM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com
<mailto:tomas.vondra@2ndquadrant.com>> writes:

It seems to me that VACUUM and ANALYZE somewhat disagree on

what

exactly reltuples means. VACUUM seems to be thinking that
reltuples
= live + dead while ANALYZE apparently believes that

reltuples =

live

The question is - which of the reltuples definitions is the
right
one? I've always assumed that "reltuples = live + dead" but
perhaps
not?

I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.

Attached is a patch that (I think) does just that. The disagreement
was caused by VACUUM treating recently dead tuples as live, while
ANALYZE treats both of those as dead.

At first I was worried that this will negatively affect plans in the
long-running transaction, as it will get underestimates (due to
reltuples not including rows it can see). But that's a problem we
already have anyway, you just need to run ANALYZE in the other

session.

Thanks for the patch.
From the mail, I understand that this patch tries to improve the
reltuples value update in the catalog table by the vacuum command
to consider the proper visible tuples similar like analyze command.

-num_tuples);
+num_tuples - nkeep);

With the above correction, there is a problem in reporting the number
of live tuples to the stats.

postgres=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';
reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
899818 | 799636 | 100182
(1 row)

The live tuples data value is again decremented with dead tuples
value before sending them to stats in function lazy_vacuum_rel(),

/* report results to the stats collector, too */
new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;

The fix needs a correction here also. Or change the correction in
lazy_vacuum_rel() function itself before updating catalog table similar
like stats.

Ah, haven't noticed that for some reason - you're right, we estimate the
reltuples based on (num_tuples - nkeep), so it doesn't make much sense
to subtract nkeep again. Attached is v2 of the fix.

I've removed the subtraction from lazy_vacuum_rel(), leaving just

new_live_tuples = new_rel_tuples;

and now it behaves as expected (no second subtraction). That means we
can get rid of new_live_tuples altogether (and the protection against
negative values), and use new_rel_tuples directly.

Which pretty much means that in this case

(pg_class.reltuples == pg_stat_user_tables.n_live_tup)

but I guess that's fine, based on the initial discussion in this thread.

The changes are fine and now it reports proper live tuples in both
pg_class and stats. The other issue of continuous vacuum operation
leading to decrease of number of live tuples is not related to this
patch and that can be handled separately.

I changed the patch status as ready for committer.

Regards,
Hari Babu
Fujitsu Australia

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Haribabu Kommi (#11)
1 attachment(s)
Re: VACUUM and ANALYZE disagreeing on what reltuples means

Haribabu Kommi <kommi.haribabu@gmail.com> writes:

The changes are fine and now it reports proper live tuples in both
pg_class and stats. The other issue of continuous vacuum operation
leading to decrease of number of live tuples is not related to this
patch and that can be handled separately.

I did not like this patch too much, because it did nothing to fix
the underlying problem of confusion between "live tuples" and "total
tuples"; in fact, it made that worse, because now the comment on
LVRelStats.new_rel_tuples is a lie. And there's at least one usage
of that field value where I think we do want total tuples not live
tuples: where we pass it to index AM cleanup functions. Indexes
don't really care whether heap entries are live or not, only that
they're there. Plus the VACUUM VERBOSE log message that uses the
field is supposed to be reporting total tuples not live tuples.

I hacked up the patch trying to make that better, finding along the
way that contrib/pgstattuple shared in the confusion about what
it was trying to count. Results attached.

However, I'm not sure we're there yet, because there remains a fairly
nasty discrepancy even once we've gotten everyone onto the same page
about reltuples counting just live tuples: VACUUM and ANALYZE have
different definitions of what's "live". In particular they do not treat
INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same. Should we
try to do something about that? If so, what? It looks like ANALYZE's
behavior is pretty tightly constrained, judging by the comments in
acquire_sample_rows.

Another problem is that it looks like CREATE INDEX will set reltuples
to the total number of heap entries it chose to index, because that
is what IndexBuildHeapScan counts. Maybe we should adjust that?

regards, tom lane

Attachments:

vacuum-reltuples-fix-v3.patchtext/x-diff; charset=us-ascii; name=vacuum-reltuples-fix-v3.patchDownload
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 5bf0613..68211c6 100644
*** a/contrib/pgstattuple/pgstatapprox.c
--- b/contrib/pgstattuple/pgstatapprox.c
*************** statapprox_heap(Relation rel, output_typ
*** 68,74 ****
  	Buffer		vmbuffer = InvalidBuffer;
  	BufferAccessStrategy bstrategy;
  	TransactionId OldestXmin;
- 	uint64		misc_count = 0;
  
  	OldestXmin = GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM);
  	bstrategy = GetAccessStrategy(BAS_BULKREAD);
--- 68,73 ----
*************** statapprox_heap(Relation rel, output_typ
*** 153,177 ****
  			tuple.t_tableOid = RelationGetRelid(rel);
  
  			/*
! 			 * We count live and dead tuples, but we also need to add up
! 			 * others in order to feed vac_estimate_reltuples.
  			 */
  			switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
  			{
- 				case HEAPTUPLE_RECENTLY_DEAD:
- 					misc_count++;
- 					/* Fall through */
- 				case HEAPTUPLE_DEAD:
- 					stat->dead_tuple_len += tuple.t_len;
- 					stat->dead_tuple_count++;
- 					break;
  				case HEAPTUPLE_LIVE:
  					stat->tuple_len += tuple.t_len;
  					stat->tuple_count++;
  					break;
! 				case HEAPTUPLE_INSERT_IN_PROGRESS:
! 				case HEAPTUPLE_DELETE_IN_PROGRESS:
! 					misc_count++;
  					break;
  				default:
  					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
--- 152,172 ----
  			tuple.t_tableOid = RelationGetRelid(rel);
  
  			/*
! 			 * We follow VACUUM's lead in counting INSERT_IN_PROGRESS and
! 			 * DELETE_IN_PROGRESS tuples as "live".
  			 */
  			switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
  			{
  				case HEAPTUPLE_LIVE:
+ 				case HEAPTUPLE_INSERT_IN_PROGRESS:
+ 				case HEAPTUPLE_DELETE_IN_PROGRESS:
  					stat->tuple_len += tuple.t_len;
  					stat->tuple_count++;
  					break;
! 				case HEAPTUPLE_DEAD:
! 				case HEAPTUPLE_RECENTLY_DEAD:
! 					stat->dead_tuple_len += tuple.t_len;
! 					stat->dead_tuple_count++;
  					break;
  				default:
  					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
*************** statapprox_heap(Relation rel, output_typ
*** 184,191 ****
  
  	stat->table_len = (uint64) nblocks * BLCKSZ;
  
  	stat->tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
! 											   stat->tuple_count + misc_count);
  
  	/*
  	 * Calculate percentages if the relation has one or more pages.
--- 179,190 ----
  
  	stat->table_len = (uint64) nblocks * BLCKSZ;
  
+ 	/*
+ 	 * Extrapolate the live-tuple count to the whole table in the same way
+ 	 * that VACUUM does.  (XXX shouldn't we also extrapolate tuple_len?)
+ 	 */
  	stat->tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
! 											   stat->tuple_count);
  
  	/*
  	 * Calculate percentages if the relation has one or more pages.
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ef60a58..87bba31 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
*************** SCRAM-SHA-256$<replaceable>&lt;iteration
*** 1739,1746 ****
        <entry><type>float4</type></entry>
        <entry></entry>
        <entry>
!        Number of rows in the table.  This is only an estimate used by the
!        planner.  It is updated by <command>VACUUM</command>,
         <command>ANALYZE</command>, and a few DDL commands such as
         <command>CREATE INDEX</command>.
        </entry>
--- 1739,1746 ----
        <entry><type>float4</type></entry>
        <entry></entry>
        <entry>
!        Number of live rows in the table.  This is only an estimate used by
!        the planner.  It is updated by <command>VACUUM</command>,
         <command>ANALYZE</command>, and a few DDL commands such as
         <command>CREATE INDEX</command>.
        </entry>
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index cbd6e9b..067034d 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*************** vacuum_set_xid_limits(Relation rel,
*** 771,776 ****
--- 771,779 ----
   *		we take the old value of pg_class.reltuples as a measurement of the
   *		tuple density in the unscanned pages.
   *
+  *		Note: scanned_tuples should count only *live* tuples, since
+  *		pg_class.reltuples is defined that way.
+  *
   *		This routine is shared by VACUUM and ANALYZE.
   */
  double
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6587db7..ed14db2 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*************** typedef struct LVRelStats
*** 115,122 ****
  	BlockNumber frozenskipped_pages;	/* # of frozen pages we skipped */
  	BlockNumber tupcount_pages; /* pages whose tuples we counted */
  	double		scanned_tuples; /* counts only tuples on tupcount_pages */
! 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
  	double		new_rel_tuples; /* new estimated total # of tuples */
  	double		new_dead_tuples;	/* new estimated total # of dead tuples */
  	BlockNumber pages_removed;
  	double		tuples_deleted;
--- 115,123 ----
  	BlockNumber frozenskipped_pages;	/* # of frozen pages we skipped */
  	BlockNumber tupcount_pages; /* pages whose tuples we counted */
  	double		scanned_tuples; /* counts only tuples on tupcount_pages */
! 	double		old_live_tuples;	/* previous value of pg_class.reltuples */
  	double		new_rel_tuples; /* new estimated total # of tuples */
+ 	double		new_live_tuples;	/* new estimated total # of live tuples */
  	double		new_dead_tuples;	/* new estimated total # of dead tuples */
  	BlockNumber pages_removed;
  	double		tuples_deleted;
*************** lazy_vacuum_rel(Relation onerel, int opt
*** 196,202 ****
  	TransactionId xidFullScanLimit;
  	MultiXactId mxactFullScanLimit;
  	BlockNumber new_rel_pages;
- 	double		new_rel_tuples;
  	BlockNumber new_rel_allvisible;
  	double		new_live_tuples;
  	TransactionId new_frozen_xid;
--- 197,202 ----
*************** lazy_vacuum_rel(Relation onerel, int opt
*** 245,251 ****
  	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
  
  	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
! 	vacrelstats->old_rel_tuples = onerel->rd_rel->reltuples;
  	vacrelstats->num_index_scans = 0;
  	vacrelstats->pages_removed = 0;
  	vacrelstats->lock_waiter_detected = false;
--- 245,251 ----
  	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
  
  	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
! 	vacrelstats->old_live_tuples = onerel->rd_rel->reltuples;
  	vacrelstats->num_index_scans = 0;
  	vacrelstats->pages_removed = 0;
  	vacrelstats->lock_waiter_detected = false;
*************** lazy_vacuum_rel(Relation onerel, int opt
*** 311,321 ****
  	 * since then we don't know for certain that all tuples have a newer xmin.
  	 */
  	new_rel_pages = vacrelstats->rel_pages;
! 	new_rel_tuples = vacrelstats->new_rel_tuples;
  	if (vacrelstats->tupcount_pages == 0 && new_rel_pages > 0)
  	{
  		new_rel_pages = vacrelstats->old_rel_pages;
! 		new_rel_tuples = vacrelstats->old_rel_tuples;
  	}
  
  	visibilitymap_count(onerel, &new_rel_allvisible, NULL);
--- 311,321 ----
  	 * since then we don't know for certain that all tuples have a newer xmin.
  	 */
  	new_rel_pages = vacrelstats->rel_pages;
! 	new_live_tuples = vacrelstats->new_live_tuples;
  	if (vacrelstats->tupcount_pages == 0 && new_rel_pages > 0)
  	{
  		new_rel_pages = vacrelstats->old_rel_pages;
! 		new_live_tuples = vacrelstats->old_live_tuples;
  	}
  
  	visibilitymap_count(onerel, &new_rel_allvisible, NULL);
*************** lazy_vacuum_rel(Relation onerel, int opt
*** 327,333 ****
  
  	vac_update_relstats(onerel,
  						new_rel_pages,
! 						new_rel_tuples,
  						new_rel_allvisible,
  						vacrelstats->hasindex,
  						new_frozen_xid,
--- 327,333 ----
  
  	vac_update_relstats(onerel,
  						new_rel_pages,
! 						new_live_tuples,
  						new_rel_allvisible,
  						vacrelstats->hasindex,
  						new_frozen_xid,
*************** lazy_vacuum_rel(Relation onerel, int opt
*** 335,344 ****
  						false);
  
  	/* report results to the stats collector, too */
- 	new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
- 	if (new_live_tuples < 0)
- 		new_live_tuples = 0;	/* just in case */
- 
  	pgstat_report_vacuum(RelationGetRelid(onerel),
  						 onerel->rd_rel->relisshared,
  						 new_live_tuples,
--- 335,340 ----
*************** lazy_scan_heap(Relation onerel, int opti
*** 1275,1284 ****
  	vacrelstats->new_dead_tuples = nkeep;
  
  	/* now we can compute the new value for pg_class.reltuples */
! 	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
! 														 nblocks,
! 														 vacrelstats->tupcount_pages,
! 														 num_tuples);
  
  	/*
  	 * Release any remaining pin on visibility map page.
--- 1271,1284 ----
  	vacrelstats->new_dead_tuples = nkeep;
  
  	/* now we can compute the new value for pg_class.reltuples */
! 	vacrelstats->new_live_tuples = vac_estimate_reltuples(onerel, false,
! 														  nblocks,
! 														  vacrelstats->tupcount_pages,
! 														  num_tuples - nkeep);
! 
! 	/* indexes probably care more about the total number of heap entries */
! 	vacrelstats->new_rel_tuples =
! 		vacrelstats->new_live_tuples + vacrelstats->new_dead_tuples;
  
  	/*
  	 * Release any remaining pin on visibility map page.
*************** lazy_vacuum_index(Relation indrel,
*** 1614,1620 ****
  	ivinfo.analyze_only = false;
  	ivinfo.estimated_count = true;
  	ivinfo.message_level = elevel;
! 	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
  	ivinfo.strategy = vac_strategy;
  
  	/* Do bulk deletion */
--- 1614,1621 ----
  	ivinfo.analyze_only = false;
  	ivinfo.estimated_count = true;
  	ivinfo.message_level = elevel;
! 	/* We can only provide an approximate value of num_heap_tuples here */
! 	ivinfo.num_heap_tuples = vacrelstats->old_live_tuples;
  	ivinfo.strategy = vac_strategy;
  
  	/* Do bulk deletion */
*************** lazy_cleanup_index(Relation indrel,
*** 1645,1650 ****
--- 1646,1652 ----
  	ivinfo.analyze_only = false;
  	ivinfo.estimated_count = (vacrelstats->tupcount_pages < vacrelstats->rel_pages);
  	ivinfo.message_level = elevel;
+ 	/* Now we can provide a better estimate of total # of surviving tuples */
  	ivinfo.num_heap_tuples = vacrelstats->new_rel_tuples;
  	ivinfo.strategy = vac_strategy;
  
#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#12)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Hi,

On 11/02/2017 08:15 PM, Tom Lane wrote:

Haribabu Kommi <kommi.haribabu@gmail.com> writes:

The changes are fine and now it reports proper live tuples in both
pg_class and stats. The other issue of continuous vacuum operation
leading to decrease of number of live tuples is not related to this
patch and that can be handled separately.

I did not like this patch too much, because it did nothing to fix
the underlying problem of confusion between "live tuples" and "total
tuples"; in fact, it made that worse, because now the comment on
LVRelStats.new_rel_tuples is a lie. And there's at least one usage
of that field value where I think we do want total tuples not live
tuples: where we pass it to index AM cleanup functions. Indexes
don't really care whether heap entries are live or not, only that
they're there. Plus the VACUUM VERBOSE log message that uses the
field is supposed to be reporting total tuples not live tuples.

I hacked up the patch trying to make that better, finding along the
way that contrib/pgstattuple shared in the confusion about what
it was trying to count. Results attached.

Thanks for looking into this. I agree your patch is more consistent and
generally cleaner.

However, I'm not sure we're there yet, because there remains a fairly
nasty discrepancy even once we've gotten everyone onto the same page
about reltuples counting just live tuples: VACUUM and ANALYZE have
different definitions of what's "live". In particular they do not treat
INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same. Should we
try to do something about that? If so, what? It looks like ANALYZE's
behavior is pretty tightly constrained, judging by the comments in
acquire_sample_rows.

Yeah :-(

For the record (and people following this thread who are too lazy to
open the analyze.c and search for the comments), the problem is that
acquire_sample_rows does not count HEAPTUPLE_INSERT_IN_PROGRESS as live
(and HEAPTUPLE_DELETE_IN_PROGRESS as dead) as it assumes the transaction
will send the stats at the end.

Which is a correct assumption, but it means that when there is a
long-running transaction that inserted/deleted many rows, the reltuples
value will oscillate during VACUUM / ANALYZE runs.

ISTM we need to unify those definitions, probably so that VACUUM adopts
what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats
will be updated at the end of transaction, why shouldn't VACUUM do the
same thing?

The one reason I can think of is that VACUUM is generally expected to
run longer than ANALYZE, so the assumption that large transactions
complete later is not quite reliable.

Or is there some other reason why VACUUM can't treat the in-progress
tuples the same way?

Another problem is that it looks like CREATE INDEX will set reltuples
to the total number of heap entries it chose to index, because that
is what IndexBuildHeapScan counts. Maybe we should adjust that?

You mean by only counting live tuples in IndexBuildHeapRangeScan,
following whatever definition we end up using in VACUUM/ANALYZE? Seems
like a good idea I guess, although CREATE INDEX not as frequent as the
other commands.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#13)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 11/02/2017 08:15 PM, Tom Lane wrote:

However, I'm not sure we're there yet, because there remains a fairly
nasty discrepancy even once we've gotten everyone onto the same page
about reltuples counting just live tuples: VACUUM and ANALYZE have
different definitions of what's "live". In particular they do not treat
INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same. Should we
try to do something about that? If so, what? It looks like ANALYZE's
behavior is pretty tightly constrained, judging by the comments in
acquire_sample_rows.

ISTM we need to unify those definitions, probably so that VACUUM adopts
what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats
will be updated at the end of transaction, why shouldn't VACUUM do the
same thing?

That was the way I was leaning. I haven't thought very hard about the
implications, but as long as the change in VACUUM's behavior extends
only to the live-tuple count it reports, it seems like adjusting it
couldn't break anything too badly.

Another problem is that it looks like CREATE INDEX will set reltuples
to the total number of heap entries it chose to index, because that
is what IndexBuildHeapScan counts. Maybe we should adjust that?

You mean by only counting live tuples in IndexBuildHeapRangeScan,
following whatever definition we end up using in VACUUM/ANALYZE?

Right. One issue is that, as I mentioned, the index AMs probably want to
think about total-tuples-indexed not live-tuples; so for their purposes,
what IndexBuildHeapScan currently counts is the right thing. We need to
look and see if any AMs are actually using that value rather than just
silently passing it back. If they are, we might need to go to the trouble
of computing/returning two values.

regards, tom lane

#15Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#14)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

BTW see bug #14863 which is related to this:
/messages/by-id/CAEBTBzu5j_E1K1jb9OKwTZj98MPeM7V81-vadp5adRm=NhJnwA@mail.gmail.com

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Michael Paquier
michael.paquier@gmail.com
In reply to: Tomas Vondra (#13)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Thanks for looking into this. I agree your patch is more consistent and
generally cleaner.

This is classified as a bug fix, and is marked as waiting on author. I
am moving it to next CF as work continues.
--
Michael

#17Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#16)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Tomas,

* Michael Paquier (michael.paquier@gmail.com) wrote:

On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Thanks for looking into this. I agree your patch is more consistent and
generally cleaner.

This is classified as a bug fix, and is marked as waiting on author. I
am moving it to next CF as work continues.

This is still in waiting-on-author state; it'd be great to get your
thoughts on where this patch is and what the next steps are to move it
forward. If you need additional feedback or there needs to be more
discussion (perhaps with Tom) then maybe this should be in needs-review
instead, but it'd be great if you could review the discussion and patch
again and at least provide an update on it (last update from you was in
November).

Thanks!

Stephen

#18Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Stephen Frost (#17)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

On 01/05/2018 05:25 AM, Stephen Frost wrote:

Tomas,

* Michael Paquier (michael.paquier@gmail.com) wrote:

On Sun, Nov 19, 2017 at 6:40 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Thanks for looking into this. I agree your patch is more consistent and
generally cleaner.

This is classified as a bug fix, and is marked as waiting on author. I
am moving it to next CF as work continues.

This is still in waiting-on-author state; it'd be great to get your
thoughts on where this patch is and what the next steps are to move
it forward. If you need additional feedback or there needs to be
more discussion (perhaps with Tom) then maybe this should be in
needs-review instead, but it'd be great if you could review the
discussion and patch again and at least provide an update on it (last
update from you was in November).

Hmmm, I'm not sure, TBH.

As I already mentioned, Tom's updated patch is better than what I posted
initially, and I agree with his approach to the remaining issues he
pointed out. But I somehow assumed that he's already looking into that.

Tom, do you plan to look into this patch soon, or should I?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#18)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

As I already mentioned, Tom's updated patch is better than what I posted
initially, and I agree with his approach to the remaining issues he
pointed out. But I somehow assumed that he's already looking into that.
Tom, do you plan to look into this patch soon, or should I?

No, I thought you were going to run with those ideas. I have a lot of
other stuff on my plate ...

regards, tom lane

#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#19)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

On 01/08/2018 08:39 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

As I already mentioned, Tom's updated patch is better than what I
posted initially, and I agree with his approach to the remaining
issues he pointed out. But I somehow assumed that he's already
looking into that. Tom, do you plan to look into this patch soon,
or should I?

No, I thought you were going to run with those ideas. I have a lot
of other stuff on my plate ...

OK, will do.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21David Steele
david@pgmasters.net
In reply to: Tomas Vondra (#20)
Re: Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Hi Tomas,

On 1/8/18 3:28 PM, Tomas Vondra wrote:

On 01/08/2018 08:39 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

As I already mentioned, Tom's updated patch is better than what I
posted initially, and I agree with his approach to the remaining
issues he pointed out. But I somehow assumed that he's already
looking into that. Tom, do you plan to look into this patch soon,
or should I?

No, I thought you were going to run with those ideas. I have a lot
of other stuff on my plate ...

OK, will do.

What the status of this patch? It's been waiting on author since last
November, though I can see there was some confusion over who was working
on it.

Given that it's a bug fix it would be good to see a patch for this CF,
or very soon after.

Thanks,
--
-David
david@pgmasters.net

#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Steele (#21)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

On 03/05/2018 04:12 PM, David Steele wrote:

Hi Tomas,

On 1/8/18 3:28 PM, Tomas Vondra wrote:

On 01/08/2018 08:39 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

As I already mentioned, Tom's updated patch is better than what I
posted initially, and I agree with his approach to the remaining
issues he pointed out. But I somehow assumed that he's already
looking into that. Tom, do you plan to look into this patch soon,
or should I?

No, I thought you were going to run with those ideas. I have a lot
of other stuff on my plate ...

OK, will do.

What the status of this patch? It's been waiting on author since last
November, though I can see there was some confusion over who was working
on it.

Given that it's a bug fix it would be good to see a patch for this CF,
or very soon after.

Yeah, it's the next thing on my TODO.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#22)
2 attachment(s)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Hi,

So here is an updated version of the patch/fix, addressing the remaining
issues in v3 posted by Tom in November.

The 0001 part is actually a bugfix in bloom and spgist index AM, which
did something like this:

reltuples = IndexBuildHeapScan(...)

result->heap_tuples = result->index_tuples = reltuples;

That is, these two AMs simply used the number of heap rows for the
index. That does not work for partial indexes, of course, where the
correct index reltuples value is likely much lower.

0001 fixes this by tracking the number of actually indexed rows in the
build states, just like in the other index AMs.

A VACUUM or ANALYZE will fix the estimate, of course, but for tables
that are not changing very much it may take quite a while. So I think
this is something we definitely need to back-patch.

The 0002 part is the main part, unifying the definition of reltuples on
three main places:

a) acquire_sample_rows (ANALYZE)
b) lazy_scan_heap (VACUUM)
c) IndexBuildHeapRangeScan (CREATE INDEX)

As the ANALYZE case seems the most constrained, the other two places
were updated to use the same criteria for which rows to include in the
reltuples estimate:

* HEAPTUPLE_LIVE
* HEAPTUPLE_INSERT_IN_PROGRESS (same transaction)
* HEAPTUPLE_DELETE_IN_PROGRESS (not the same trasaction)

This resolves the issue with oscillating reltuples estimates, produced
by VACUUM and ANALYZE (with many non-removable dead tuples).

I've checked all IndexBuildHeapRangeScan callers, and none of them is
using the reltuples estimate for anything except for passing it to
index_update_stats. Aside from the bug fixed in 0001, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Compute-index_tuples-correctly-in-bloom-and-spgist.patch.gzapplication/gzip; name=0001-Compute-index_tuples-correctly-in-bloom-and-spgist.patch.gzDownload
0002-Unify-the-definition-of-reltuples-in-VACUUM-ANALYZE-.patch.gzapplication/gzip; name=0002-Unify-the-definition-of-reltuples-in-VACUUM-ANALYZE-.patch.gzDownload
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#23)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

0001 fixes this by tracking the number of actually indexed rows in the
build states, just like in the other index AMs.
A VACUUM or ANALYZE will fix the estimate, of course, but for tables
that are not changing very much it may take quite a while. So I think
this is something we definitely need to back-patch.

Agreed, and done. I noticed a second bug in contrib/bloom, too: it'd
forget to write out the last index page if that contained only one
tuple, because the new-page code path in bloomBuildCallback failed to
increment the "count" field after clearing it.

The 0002 part is the main part, unifying the definition of reltuples on
three main places:

On to this part ...

regards, tom lane

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#24)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

I wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

The 0002 part is the main part, unifying the definition of reltuples on
three main places:

On to this part ...

I've pushed 0002 with several corrections: it did not seem to me that
you'd correctly propagated what ANALYZE is doing into CREATE INDEX or
pgstattuple. Also, since one of the things VACUUM does with num_tuples
is to report it as "the total number of non-removable tuples", I thought
we'd better leave that calculation alone. I made the added code count
live tuples in a new variable live_tuples, instead.

regards, tom lane

#26Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#25)
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

On 03/22/2018 08:51 PM, Tom Lane wrote:

I wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

The 0002 part is the main part, unifying the definition of reltuples on
three main places:

On to this part ...

I've pushed 0002 with several corrections: it did not seem to me that
you'd correctly propagated what ANALYZE is doing into CREATE INDEX or
pgstattuple. Also, since one of the things VACUUM does with num_tuples
is to report it as "the total number of non-removable tuples", I thought
we'd better leave that calculation alone. I made the added code count
live tuples in a new variable live_tuples, instead.

OK, makes sense. Thanks.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services