Seems like there is an issue with reltuples showing twice the number of rows

Started by Ranjith Ramachandraabout 8 years ago4 messagesgeneral
Jump to latest
#1Ranjith Ramachandra
ranjith@mammoth.io

I am relying on reltuples on my web app to get fast row counts.

This was recommended by this article to get fast approx row counts:
https://wiki.postgresql.org/wiki/Count_estimate

However for some table I am getting twice as many values when I try to do
this. I did some more research and came up with this query.

select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname =
'main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d';

it returns

reltuples | n_live_tup | n_dead_tup
-------------+------------+------------
2.7209e+06 | 1360448 | 1360448

If I run analyze main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d

and I run the same query again,

reltuples | n_live_tup | n_dead_tup
-------------+------------+------------
1.36045e+06 | 1360448 | 1360448

But after some time the value goes back to being double the value. This is
causing me a lot of problems since this inaccuracy does not make any sense
to me.

Any help would be appreciated.

FYI, also asked the same question on stackoverflow since I am new to
postgres mail lists.

https://stackoverflow.com/questions/49625259/postgres-reltuples-seems-to-return-twice-the-number-of-values

--
~Ranjith

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ranjith Ramachandra (#1)
Re: Seems like there is an issue with reltuples showing twice the number of rows

On 04/03/2018 11:14 AM, Ranjith Ramachandra wrote:

I am relying on reltuples on my web app to get fast row counts.

This was recommended by this article to get fast approx row
counts: https://wiki.postgresql.org/wiki/Count_estimate

However for some table I am getting twice as many values when I try to
do this. I did some more research and came up with this query.

select reltuples, n_live_tup, n_dead_tup
              from pg_stat_user_tables join pg_class using (relname)
             where relname =
'main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d';

it returns

 reltuples  | n_live_tup | n_dead_tup
-------------+------------+------------
2.7209e+06 |    1360448 |    1360448

If I run analyze main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d

and I run the same query again,

  reltuples  | n_live_tup | n_dead_tup
-------------+------------+------------
 1.36045e+06 |    1360448 |    1360448

But after some time the value goes back to being double the value. This
is causing me a lot of problems since this inaccuracy does not make any
sense to me.

There was a difference between VACUUM and ANALYZE in handling recently
dead rows (essentially deleted rows that can't be removed yet), causing
similar changes to reltuples. Essentially if you do VACUUM and ANALYZE,
it may set reltuples to rather different estimates. That is fixed now
and should be in the next minor release.

Hard to say if this is a case of that, but judging by the number of dead
tuples chances are it is.

It's probably better to use n_live_tup instead, though. I'd say that's
closer to the "live tuples" definition.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#2)
Re: Seems like there is an issue with reltuples showing twice the number of rows

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 04/03/2018 11:14 AM, Ranjith Ramachandra wrote:

it returns
 reltuples  | n_live_tup | n_dead_tup
-------------+------------+------------
2.7209e+06 |    1360448 |    1360448

If I run analyze main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d
and I run the same query again,
  reltuples  | n_live_tup | n_dead_tup
-------------+------------+------------
 1.36045e+06 |    1360448 |    1360448

But after some time the value goes back to being double the value.

There was a difference between VACUUM and ANALYZE in handling recently
dead rows (essentially deleted rows that can't be removed yet), causing
similar changes to reltuples. Essentially if you do VACUUM and ANALYZE,
it may set reltuples to rather different estimates. That is fixed now
and should be in the next minor release.

No, I think this is the *other* thing we fixed recently: VACUUM thinks
it should set reltuples to total tuples (live + dead) whereas ANALYZE
counts only live tuples. We did not risk back-patching that.

The question I'd ask about this case is why is there persistently 100%
bloat? Those dead tuples should've gotten reclaimed by autovacuum.
Perhaps an open prepared transaction, or some such?

It's probably better to use n_live_tup instead, though. I'd say that's
closer to the "live tuples" definition.

Yeah, you might be better off looking at that, particularly since it
updates on-the-fly not just after a vacuum or analyze.

regards, tom lane

#4Ranjith Ramachandra
ranjith@mammoth.io
In reply to: Tom Lane (#3)
Re: Seems like there is an issue with reltuples showing twice the number of rows

List,

OP here. Thank you for replying. Confirms my diagnosis that it might have
to do with analyze vaccum.

Some debug info.

1. Loaded a CSV to fill the table with data.
2. performed analyse vacuum on this table after uploading.
3. I do not see any reason for dead rows because I have not updated data in
this table. But I may not understand dead rows correctly.
4. I can reproduce this problem on multiple machines with 9.6.8 postres
installed.

Can not wait for next minor update since my prod would get updated. My
current strategy is to use n_live_tup. On my local it seems to work fine.

Thinking about it, I could even develop another mechanism for keeping a
track of row counts by manually scanning the row count with a background
process, a hard count with count(*) too.

But happy provide debug any other info if needed. Will reply within 24
hours max.

This is what I had found earlier before I contacted the list. Relevant?

1. /messages/by-id/20180312231417.484d64c0@engels
2.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81b9b5ce490a645bde8df203ec4a3b2903d88f31
3.
/messages/by-id/151956654251.6915.675951950408204404.pgcf@coridan.postgresql.org

On Tue 3 Apr, 2018, 19:49 Tom Lane, <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 04/03/2018 11:14 AM, Ranjith Ramachandra wrote:

it returns
reltuples | n_live_tup | n_dead_tup
-------------+------------+------------
2.7209e+06 | 1360448 | 1360448

If I run analyze main_csv_0f247511f5c247c8813ba3cec90c6ae1_yf34fbb38d
and I run the same query again,
reltuples | n_live_tup | n_dead_tup
-------------+------------+------------
1.36045e+06 | 1360448 | 1360448

But after some time the value goes back to being double the value.

There was a difference between VACUUM and ANALYZE in handling recently
dead rows (essentially deleted rows that can't be removed yet), causing
similar changes to reltuples. Essentially if you do VACUUM and ANALYZE,
it may set reltuples to rather different estimates. That is fixed now
and should be in the next minor release.

No, I think this is the *other* thing we fixed recently: VACUUM thinks
it should set reltuples to total tuples (live + dead) whereas ANALYZE
counts only live tuples. We did not risk back-patching that.

The question I'd ask about this case is why is there persistently 100%
bloat? Those dead tuples should've gotten reclaimed by autovacuum.
Perhaps an open prepared transaction, or some such?

It's probably better to use n_live_tup instead, though. I'd say that's
closer to the "live tuples" definition.

Yeah, you might be better off looking at that, particularly since it
updates on-the-fly not just after a vacuum or analyze.

regards, tom lane