BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

Started by PG Bug reporting formover 8 years ago6 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15005
Logged by: David Gould
Email address: daveg@sonic.net
PostgreSQL version: 10.1
Operating system: Linux
Description:

ANALYZE can make pg_class.reltuples wildly inaccurate compared to the
actual
row counts for tables that are larger than the default_statistics_target.

Example from one of a clients production instances:

# analyze verbose pg_attribute;
INFO: analyzing "pg_catalog.pg_attribute"
INFO: "pg_attribute": scanned 30000 of 24519424 pages, containing 6475 live
rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total
rows.

This is a large complex database -- pg_attribute actually has about five
million rows and needs about one hundred thouand pages. However it has
become extremely bloated and is taking 25 million pages (192GB!), about
250
times too much. This happened despite aggressive autovacuum settings and a
periodic bloat monitoring script. Since pg_class.reltuples was 800
million,
our bloat monitoring script did not detect that this table was bloated and
autovacuum did not think it needed vacuuming.

When reltuples is very large compared to the actual row count it causes
problems:

- Bad input to the query planner.
- Prevents autovacuum from processing large bloated tables because
autovacuum_scale_factor * reltuples is large enough the threshold is
rarely
reached.
- Decieves bloat checking tools that rely on the relationship of relpages
to reltuples*average_row_size.

-dg

#2daveg
daveg@sonic.net
In reply to: PG Bug reporting form (#1)
Re: BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

Bug reference: 15005
Logged by: David Gould
Email address: daveg@sonic.net

ANALYZE can make pg_class.reltuples wildly inaccurate compared to the actual
row counts for tables that are larger than the default_statistics_target.

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

- vac_estimate_reltuples.patch Patch against master
- README.txt Instructions for testing
- reltuples_analyze_bug.sql Reproduction script
- analyze_counts.awk Helper for viewing results of test
- test_standard.txt Test output for unpatched postgresql 10.1
- test_patched.txt Test output with patch

Analysis:
---------

Analyze and vacuum calculate the new value for pg_class.reltuples in
vacuum.c:vac_estimate_reltuples():

old_density = old_rel_tuples / old_rel_pages;
new_density = scanned_tuples / scanned_pages;
multiplier = (double) scanned_pages / (double) total_pages;
updated_density = old_density + (new_density - old_density) * multiplier;
return floor(updated_density * total_pages + 0.5);

The comments talk about the difference between VACUUM and ANALYZE and explain
that VACUUM probably only scanned changed pages so the density of the scanned
pages is not representative of the rest of the unchanged table. Hence the new
overall density of the table should be adjusted proportionaly to the scanned
pages vs total pages. Which makes sense. However despite the comment noteing
that ANALYZE and VACUUM are different, the code actually does the same
calculation for both.

The problem is that it dilutes the impact of ANALYZE on reltuples for large
tables:

- For a table of 3000000 pages an analyze can only change the reltuples
value by 1%.
- When combined with changes in relpages due to bloat the new computed
reltuples can end up far from reality.

Reproducing the reltuples analyze estimate bug.
-----------------------------------------------

The script "reltuples_analyze_bug.sql" creates a table that is large
compared to the analyze sample size and then repeatedly updates about
10% of it followed by an analyze each iteration. The bug is that the
calculation analyze uses to update pg_class.reltuples will tend to
increase each time even though the actual rowcount does not change.

To run:

Given a postgresql 10.x server with >= 1GB of shared buffers:

createdb test
psql --no-psqlrc -f reltuples_analyze_bug.sql test > test_standard.out 2>&1
awk -f analyze_counts.awk test_standard.out

To verify the fix, restart postgres with a patched binary and repeat
the above.

Here are the results with an unpatched server:

After 10 interations of:
update 10% of rows;
analyze

reltuples has almost doubled.

/ estimated rows / / pages / /sampled rows/
relname current proposed total scanned live dead
reltuples_test 10000001 10000055 153847 3000 195000 0
reltuples_test 10981367 9951346 169231 3000 176410 18590
reltuples_test 11948112 10039979 184615 3000 163150 31850
reltuples_test 12900718 10070666 200000 3000 151060 43940
reltuples_test 13835185 9739305 215384 3000 135655 59345
reltuples_test 14758916 9864947 230768 3000 128245 66755
reltuples_test 15674572 10138631 246153 3000 123565 71435
reltuples_test 16576847 9910944 261537 3000 113685 81315
reltuples_test 17470388 10019961 276922 3000 108550 86450
reltuples_test 18356707 10234607 292306 3000 105040 89960
reltuples_test 19228409 9639927 307690 3000 93990 101010

--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Attachments:

vac_estimate_reltuples.patchtext/x-patchDownload+27-31
README.txttext/plainDownload
reltuples_analyze_bug.sqlapplication/sqlDownload
analyze_counts.awkapplication/x-awkDownload
test_standard.txttext/plainDownload
test_patched.txttext/plainDownload
#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: daveg (#2)
Re: BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

David Gould wrote:

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

- vac_estimate_reltuples.patch Patch against master
- README.txt Instructions for testing
- reltuples_analyze_bug.sql Reproduction script
- analyze_counts.awk Helper for viewing results of test
- test_standard.txt Test output for unpatched postgresql 10.1
- test_patched.txt Test output with patch

Great work. Please make sure to register this as a bug fix in the
next commitfest.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#3)
Re: BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

David Gould wrote:

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

Great work. Please make sure to register this as a bug fix in the
next commitfest.

There's already an existing discussion about what seems to be the same
issue, or at least a closely related one:

/messages/by-id/16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com

regards, tom lane

#5daveg
daveg@sonic.net
In reply to: Tom Lane (#4)
Re: BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

On Thu, 11 Jan 2018 10:12:16 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

David Gould wrote:

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

Great work. Please make sure to register this as a bug fix in the
next commitfest.

There's already an existing discussion about what seems to be the same
issue, or at least a closely related one:

/messages/by-id/16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com

Yes. I reviewed that thread, while I was researching. It is a related issue
but not the same. That one adds n_tup_dead to the reluples estimate after
vacuums. It amplifies the effect of this one which prevents analyze from
fixing reltuples.

-dg

--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

#6daveg
daveg@sonic.net
In reply to: Alvaro Herrera (#3)
Re: BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

On Thu, 11 Jan 2018 11:39:22 -0300
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

David Gould wrote:

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

- vac_estimate_reltuples.patch Patch against master
- README.txt Instructions for testing
- reltuples_analyze_bug.sql Reproduction script
- analyze_counts.awk Helper for viewing results of test
- test_standard.txt Test output for unpatched postgresql 10.1
- test_patched.txt Test output with patch

Great work. Please make sure to register this as a bug fix in the
next commitfest.

Done last week. See:

/messages/by-id/20180117164916.3fdcf2e9@engels/

I think this should be considered for back-patching, the bug exists in all
versions back to 9.4 (and probably earlier, but I did not test those).

-dg

--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.