Should heapam_estimate_rel_size consider fillfactor?

Started by Tomas Vondraover 2 years ago13 messages

tomas.vondra@enterprisedb.com

over 2 years ago

1 attachment(s)

Hi,

While testing some stuff, I noticed heapam_estimate_rel_size (or rather
table_block_relation_estimate_size it calls) ignores fillfactor, so that
for a table without statistics it ends up with reltuple estimate much
higher than reality. For example, with fillfactor=10 the estimate is
about 10x higher.

I ran into this while doing some tests with hash indexes, where I use
fillfactor to make the table look bigger (as if the tuples were wider),
and I ran into this:

drop table hash_test;
create table hash_test (a int, b text) with (fillfactor=10);
insert into hash_test select 1 + ((i - 1) / 10000), md5(i::text)
from generate_series(1, 1000000) s(i);
-- analyze hash_test;
create index hash_test_idx on hash_test using hash (a);

select pg_size_pretty(pg_relation_size('hash_test_idx'));

If you run it like this (without the analyze), the index will be 339MB.
With the analyze, it's 47MB.

This only happens if there are no relpages/reltuples statistics yet, in
which case table_block_relation_estimate_size estimates density from
tuple width etc.

So it seems the easiest "fix" is to do ANALYZE before creating the index
(and not after it, as I had in my scripts). But I wonder how many people
fail to realize this - it sure took me a while to realize the indexes
are too large and even longer what is causing it. I wouldn't be very
surprised if many people had bloated hash indexes after bulk loads.

So maybe we should make table_block_relation_estimate_size smarter to
also consider the fillfactor in the "no statistics" branch, per the
attached patch.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

relsize-fillfactor-fix.patchtext/x-patch; charset=UTF-8; name=relsize-fillfactor-fix.patchDownload

diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a5e6c92f35..3a26846f01 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -737,11 +737,19 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 		 * and (c) different table AMs might use different padding schemes.
 		 */
 		int32		tuple_width;
+		int			fillfactor;
+
+		/*
+		 * Without reltuples/relpages, we also need to consider fillfactor.
+		 * The other branch considers it implicitly by calculating density
+		 * from actual relpages/reltuples statistics.
+		 */
+		fillfactor = RelationGetFillFactor(rel, HEAP_DEFAULT_FILLFACTOR);
 
 		tuple_width = get_rel_data_width(rel, attr_widths);
 		tuple_width += overhead_bytes_per_tuple;
 		/* note: integer division is intentional here */
-		density = usable_bytes_per_page / tuple_width;
+		density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;
 	}
 	*tuples = rint(density * (double) curpages);

corey.huinker@gmail.com

over 2 years ago

In reply to: Tomas Vondra (#1)

Re: Should heapam_estimate_rel_size consider fillfactor?

So maybe we should make table_block_relation_estimate_size smarter to
also consider the fillfactor in the "no statistics" branch, per the
attached patch.

I like this a lot. The reasoning is obvious, the fix is simple,it doesn't
upset any make-check-world tests, and in order to get a performance
regression we'd need a table whose fillfactor has been changed after the
data was loaded but before an analyze happens, and that's a narrow enough
case to accept.

My only nitpick is to swap

(usable_bytes_per_page * fillfactor / 100) / tuple_width

with

(usable_bytes_per_page * fillfactor) / (tuple_width * 100)

as this will eliminate the extra remainder truncation, and it also gets the
arguments "in order" algebraically.

andres@anarazel.de

over 2 years ago

In reply to: Tomas Vondra (#1)

Re: Should heapam_estimate_rel_size consider fillfactor?

Hi,

On 2023-06-11 14:41:27 +0200, Tomas Vondra wrote:

While testing some stuff, I noticed heapam_estimate_rel_size (or rather
table_block_relation_estimate_size it calls) ignores fillfactor, so that
for a table without statistics it ends up with reltuple estimate much
higher than reality. For example, with fillfactor=10 the estimate is
about 10x higher.

I ran into this while doing some tests with hash indexes, where I use
fillfactor to make the table look bigger (as if the tuples were wider),
and I ran into this:

drop table hash_test;
create table hash_test (a int, b text) with (fillfactor=10);
insert into hash_test select 1 + ((i - 1) / 10000), md5(i::text)
from generate_series(1, 1000000) s(i);
-- analyze hash_test;
create index hash_test_idx on hash_test using hash (a);

select pg_size_pretty(pg_relation_size('hash_test_idx'));

If you run it like this (without the analyze), the index will be 339MB.
With the analyze, it's 47MB.

This only happens if there are no relpages/reltuples statistics yet, in
which case table_block_relation_estimate_size estimates density from
tuple width etc.

So it seems the easiest "fix" is to do ANALYZE before creating the index
(and not after it, as I had in my scripts). But I wonder how many people
fail to realize this - it sure took me a while to realize the indexes
are too large and even longer what is causing it. I wouldn't be very
surprised if many people had bloated hash indexes after bulk loads.

So maybe we should make table_block_relation_estimate_size smarter to
also consider the fillfactor in the "no statistics" branch, per the
attached patch.

Seems like a good idea - I can't think of a reason why we shouldn't do so.

Greetings,

Andres Freund

Peter Eisentraut

peter@eisentraut.org

over 2 years ago

In reply to: Corey Huinker (#2)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 14.06.23 20:41, Corey Huinker wrote:

So maybe we should make table_block_relation_estimate_size smarter to
also consider the fillfactor in the "no statistics" branch, per the
attached patch.

I like this a lot. The reasoning is obvious, the fix is simple,it
doesn't upset any make-check-world tests, and in order to get a
performance regression we'd need a table whose fillfactor has been
changed after the data was loaded but before an analyze happens, and
that's a narrow enough case to accept.

My only nitpick is to swap

(usable_bytes_per_page * fillfactor / 100) / tuple_width

with

(usable_bytes_per_page * fillfactor) / (tuple_width * 100)

as this will eliminate the extra remainder truncation, and it also gets
the arguments "in order" algebraically.

The fillfactor is in percent, so it makes sense to divide it by 100
first before doing any further arithmetic with it. I find your version
of this to be more puzzling without any explanation. You could do
fillfactor/100.0 to avoid the integer division, but then, the comment
above that says "integer division is intentional here", without any
further explanation. I think a bit more explanation of all the
subtleties here would be good in any case.

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Peter Eisentraut (#4)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 7/3/23 08:46, Peter Eisentraut wrote:

On 14.06.23 20:41, Corey Huinker wrote:

    So maybe we should make table_block_relation_estimate_size smarter to
    also consider the fillfactor in the "no statistics" branch, per the
    attached patch.

I like this a lot. The reasoning is obvious, the fix is simple,it
doesn't upset any make-check-world tests, and in order to get a
performance regression we'd need a table whose fillfactor has been
changed after the data was loaded but before an analyze happens, and
that's a narrow enough case to accept.

My only nitpick is to swap

    (usable_bytes_per_page * fillfactor / 100) / tuple_width

with

    (usable_bytes_per_page * fillfactor) / (tuple_width * 100)

as this will eliminate the extra remainder truncation, and it also
gets the arguments "in order" algebraically.

The fillfactor is in percent, so it makes sense to divide it by 100
first before doing any further arithmetic with it. I find your version
of this to be more puzzling without any explanation. You could do
fillfactor/100.0 to avoid the integer division, but then, the comment
above that says "integer division is intentional here", without any
further explanation. I think a bit more explanation of all the
subtleties here would be good in any case.

Yeah, I guess the formula should be doing (fillfactor / 100.0).

The "integer division is intentional here" comment is unrelated to this
change, it refers to the division by "tuple_width" (and yeah, it'd be
nice to have it explain why it's intentional).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Tomas Vondra (#5)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 7/3/23 09:00, Tomas Vondra wrote:

On 7/3/23 08:46, Peter Eisentraut wrote:

On 14.06.23 20:41, Corey Huinker wrote:

    So maybe we should make table_block_relation_estimate_size smarter to
    also consider the fillfactor in the "no statistics" branch, per the
    attached patch.

I like this a lot. The reasoning is obvious, the fix is simple,it
doesn't upset any make-check-world tests, and in order to get a
performance regression we'd need a table whose fillfactor has been
changed after the data was loaded but before an analyze happens, and
that's a narrow enough case to accept.

My only nitpick is to swap

    (usable_bytes_per_page * fillfactor / 100) / tuple_width

with

    (usable_bytes_per_page * fillfactor) / (tuple_width * 100)

as this will eliminate the extra remainder truncation, and it also
gets the arguments "in order" algebraically.

The fillfactor is in percent, so it makes sense to divide it by 100
first before doing any further arithmetic with it. I find your version
of this to be more puzzling without any explanation. You could do
fillfactor/100.0 to avoid the integer division, but then, the comment
above that says "integer division is intentional here", without any
further explanation. I think a bit more explanation of all the
subtleties here would be good in any case.

Yeah, I guess the formula should be doing (fillfactor / 100.0).

The "integer division is intentional here" comment is unrelated to this
change, it refers to the division by "tuple_width" (and yeah, it'd be
nice to have it explain why it's intentional).

FWIW the reason why the integer division is intentional is most likely
that we want "floor" semantics - if there's 10.23 rows per page, that
really means 10 rows per page.

I doubt it makes a huge difference in this particular place, considering
we're calculating the estimate from somewhat unreliable values, and then
use it for rough estimate of relation size.

But from this POV, I think it's more correct to do it "my" way:

density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;

because that's doing *two* separate integer divisions, with floor
semantics. First we calculate "usable bytes" (rounded down), then
average number of rows per page (also rounded down).

Corey's formula would do just one integer division. I don't think it
makes a huge difference, though. I mean, it's just an estimate and so we
can't expect to be 100% accurate.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Tomas Vondra (#6)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 7/3/23 11:40, Tomas Vondra wrote:

...

FWIW the reason why the integer division is intentional is most likely
that we want "floor" semantics - if there's 10.23 rows per page, that
really means 10 rows per page.

I doubt it makes a huge difference in this particular place, considering
we're calculating the estimate from somewhat unreliable values, and then
use it for rough estimate of relation size.

But from this POV, I think it's more correct to do it "my" way:

density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;

because that's doing *two* separate integer divisions, with floor
semantics. First we calculate "usable bytes" (rounded down), then
average number of rows per page (also rounded down).

Corey's formula would do just one integer division. I don't think it
makes a huge difference, though. I mean, it's just an estimate and so we
can't expect to be 100% accurate.

Pushed, using the formula with two divisions (as in the original patch).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Heikki Linnakangas

hlinnaka@iki.fi

12 months ago

In reply to: Tomas Vondra (#7)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 03/07/2023 20:54, Tomas Vondra wrote:

Pushed, using the formula with two divisions (as in the original patch).

I ran into an issue with this, in the case of a small fillfactor and
wide tuple width:

On v16:

postgres=# create table t (data char(900)) with (fillfactor = 10,
autovacuum_enabled=off);
CREATE TABLE
postgres=# insert into t select g from generate_series(1, 1000) g;
INSERT 0 1000
postgres=# explain select count(*) from t;
QUERY PLAN
-------------------------------------------------------------
Aggregate (cost=1025.00..1025.01 rows=1 width=8)
-> Seq Scan on t (cost=0.00..1020.00 rows=2000 width=0)
(2 rows)

On v17:
QUERY PLAN
----------------------------------------------------------
Aggregate (cost=1000.00..1000.01 rows=1 width=8)
-> Seq Scan on t (cost=0.00..1000.00 rows=1 width=0)
(2 rows)

The new estimeate is 1 row, which is bad. Didn't change the plan in this
case, but I originally saw this in a test with more rows, and the
planner would not choose a parallel scan for the query because of that.

The calculation table_block_relation_estimate_size() in this case is:

tuple_width=3604
overhead_bytes_per_tuple=28
fillfactor=10
usable_bytes_per_page=8168
density = (usable_bytes_per_page * fillfactor / 100) / tuple_width

which gets rounded down to 0.

The straightforward fix is to clamp it to 1. The executor will always
place at least one tuple on a page, regardless of fillfactor.

--
Heikki Linnakangas
Neon (https://neon.tech)

tomas@vondra.me

11 months ago

In reply to: Heikki Linnakangas (#8)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 1/29/25 19:56, Heikki Linnakangas wrote:

On 03/07/2023 20:54, Tomas Vondra wrote:

Pushed, using the formula with two divisions (as in the original patch).

I ran into an issue with this, in the case of a small fillfactor and
wide tuple width:

On v16:

postgres=# create table t (data char(900)) with (fillfactor = 10,
autovacuum_enabled=off);
CREATE TABLE
postgres=# insert into t select g from generate_series(1, 1000) g;
INSERT 0 1000
postgres=# explain select count(*) from t;
                         QUERY PLAN
-------------------------------------------------------------
Aggregate (cost=1025.00..1025.01 rows=1 width=8)
   -> Seq Scan on t (cost=0.00..1020.00 rows=2000 width=0)
(2 rows)

On v17:
                        QUERY PLAN
----------------------------------------------------------
Aggregate (cost=1000.00..1000.01 rows=1 width=8)
   -> Seq Scan on t (cost=0.00..1000.00 rows=1 width=0)
(2 rows)

The new estimeate is 1 row, which is bad. Didn't change the plan in this
case, but I originally saw this in a test with more rows, and the
planner would not choose a parallel scan for the query because of that.

The calculation table_block_relation_estimate_size() in this case is:

tuple_width=3604
overhead_bytes_per_tuple=28
fillfactor=10
usable_bytes_per_page=8168
density = (usable_bytes_per_page * fillfactor / 100) / tuple_width

which gets rounded down to 0.

The straightforward fix is to clamp it to 1. The executor will always
place at least one tuple on a page, regardless of fillfactor.

Thanks for the report. And yeah, clamping it to 1 seems like the right
fix for this. I wonder if it's worth inventing some sort of test for
this, shouldn't be too hard I guess.

In any case, I'll take care of the fix/backpatch soon.

regards

--
Tomas Vondra

tomas@vondra.me

11 months ago

In reply to: Tomas Vondra (#9)

1 attachment(s)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 2/4/25 16:02, Tomas Vondra wrote:

...

Thanks for the report. And yeah, clamping it to 1 seems like the right
fix for this. I wonder if it's worth inventing some sort of test for
this, shouldn't be too hard I guess.

In any case, I'll take care of the fix/backpatch soon.

Here's a proposed fix, including a simple test in the stats suite.

regards

--
Tomas Vondra

Attachments:

fillfactor-fix.patchtext/x-patch; charset=UTF-8; name=fillfactor-fix.patchDownload

diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index e18a8f8250f..a56c5eceb14 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -24,6 +24,7 @@
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
 #include "port/pg_bitutils.h"
 #include "storage/bufmgr.h"
@@ -740,6 +741,8 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 		tuple_width += overhead_bytes_per_tuple;
 		/* note: integer division is intentional here */
 		density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;
+		/* There's at least one row on the page, even with low fillfactor. */
+		density = clamp_row_est(density);
 	}
 	*tuples = rint(density * (double) curpages);
 
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 9a02481ee7e..4eacc79a9a8 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1776,4 +1776,41 @@ SELECT COUNT(*) FROM brin_hot_3 WHERE a = 2;
 
 DROP TABLE brin_hot_3;
 SET enable_seqscan = on;
+-- Test that estimation of relation size works with tuples wider than the
+-- relation fillfactor. We create a table with wide inline attributes and
+-- low fillfactor, insert rows and then see how many rows EXPLAIN shows
+-- before running analyze. We disable autovacuum so that it does not
+-- interfere with the test.
+CREATE TABLE table_fillfactor (
+  n1 name,
+  n2 name,
+  n3 name,
+  n4 name,
+  n5 name,
+  n6 name,
+  n7 name,
+  n8 name,
+  n9 name,
+  n10 name,
+  n11 name,
+  n12 name,
+  n13 name,
+  n14 name,
+  n15 name,
+  n16 name,
+  n17 name,
+  n18 name,
+  n19 name,
+  n20 name) with (fillfactor=10, autovacuum_enabled=off);
+INSERT INTO table_fillfactor
+SELECT 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
+       'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'
+FROM generate_series(1,1000);
+SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
+ estimated | actual 
+-----------+--------
+      1000 |   1000
+(1 row)
+
+DROP TABLE table_fillfactor;
 -- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 901e7bd56e3..88d0d8a095c 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -909,4 +909,40 @@ DROP TABLE brin_hot_3;
 
 SET enable_seqscan = on;
 
+-- Test that estimation of relation size works with tuples wider than the
+-- relation fillfactor. We create a table with wide inline attributes and
+-- low fillfactor, insert rows and then see how many rows EXPLAIN shows
+-- before running analyze. We disable autovacuum so that it does not
+-- interfere with the test.
+CREATE TABLE table_fillfactor (
+  n1 name,
+  n2 name,
+  n3 name,
+  n4 name,
+  n5 name,
+  n6 name,
+  n7 name,
+  n8 name,
+  n9 name,
+  n10 name,
+  n11 name,
+  n12 name,
+  n13 name,
+  n14 name,
+  n15 name,
+  n16 name,
+  n17 name,
+  n18 name,
+  n19 name,
+  n20 name) with (fillfactor=10, autovacuum_enabled=off);
+
+INSERT INTO table_fillfactor
+SELECT 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
+       'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'
+FROM generate_series(1,1000);
+
+SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
+
+DROP TABLE table_fillfactor;
+
 -- End of Stats Test

tomas@vondra.me

11 months ago

In reply to: Tomas Vondra (#10)

1 attachment(s)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 2/4/25 17:54, Tomas Vondra wrote:

On 2/4/25 16:02, Tomas Vondra wrote:

...

Thanks for the report. And yeah, clamping it to 1 seems like the right
fix for this. I wonder if it's worth inventing some sort of test for
this, shouldn't be too hard I guess.

In any case, I'll take care of the fix/backpatch soon.

Here's a proposed fix, including a simple test in the stats suite.

Here's a slightly improved fix, with a proper commit message and
simplified test case. I plan to push this once the releases get out (I
know it was stamped already, but ...).

regards

--
Tomas Vondra

Attachments:

0001-Correct-relation-size-estimate-with-low-fillfactor.patchtext/x-patch; charset=UTF-8; name=0001-Correct-relation-size-estimate-with-low-fillfactor.patchDownload

From 02bd8d0ac771da4b460a2b1489f25f23d47495c1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 18 Feb 2025 12:05:30 +0100
Subject: [PATCH] Correct relation size estimate with low fillfactor

Since commit 29cf61ade3, table_block_relation_estimate_size() considers
fillfactor when estimating number of rows in a relation before the first
ANALYZE. But the formula failed to consider that tuples may be larger
than free space determined by fillfactor, ending with density 0. This
ultimately means the relation is estimated to contain a single row.

The executor however places at least one tuple per page, even with very
low fillfactor values, i.e. the density should be at least 1. Fixed by
clamping the density estimate using clamp_row_est().

Reported by Heikki Linnakangas. Fix by me, with regression test inspired
by example provided by Heikki.

Backpatch to 17, where the issue was introduced.

Reported-by: Heikki Linnakangas
Backpatch-through: 17
Discussion: https://postgr.es/m/2bf9d973-7789-4937-a7ca-0af9fb49c71e@iki.fi
---
 src/backend/access/table/tableam.c  |  3 +++
 src/test/regress/expected/stats.out | 17 +++++++++++++++++
 src/test/regress/sql/stats.sql      | 16 ++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index e18a8f8250f..a56c5eceb14 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -24,6 +24,7 @@
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
 #include "port/pg_bitutils.h"
 #include "storage/bufmgr.h"
@@ -740,6 +741,8 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 		tuple_width += overhead_bytes_per_tuple;
 		/* note: integer division is intentional here */
 		density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;
+		/* There's at least one row on the page, even with low fillfactor. */
+		density = clamp_row_est(density);
 	}
 	*tuples = rint(density * (double) curpages);
 
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 7d91f047bb3..093e6368dbb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1749,4 +1749,21 @@ SELECT COUNT(*) FROM brin_hot_3 WHERE a = 2;
 
 DROP TABLE brin_hot_3;
 SET enable_seqscan = on;
+-- Test that estimation of relation size works with tuples wider than the
+-- relation fillfactor. We create a table with wide inline attributes and
+-- low fillfactor, insert rows and then see how many rows EXPLAIN shows
+-- before running analyze. We disable autovacuum so that it does not
+-- interfere with the test.
+CREATE TABLE table_fillfactor (
+  n char(1000)
+) with (fillfactor=10, autovacuum_enabled=off);
+INSERT INTO table_fillfactor
+SELECT 'x' FROM generate_series(1,1000);
+SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
+ estimated | actual 
+-----------+--------
+      1000 |   1000
+(1 row)
+
+DROP TABLE table_fillfactor;
 -- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 11628ebc8a1..0a44e14d9f4 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -895,4 +895,20 @@ DROP TABLE brin_hot_3;
 
 SET enable_seqscan = on;
 
+-- Test that estimation of relation size works with tuples wider than the
+-- relation fillfactor. We create a table with wide inline attributes and
+-- low fillfactor, insert rows and then see how many rows EXPLAIN shows
+-- before running analyze. We disable autovacuum so that it does not
+-- interfere with the test.
+CREATE TABLE table_fillfactor (
+  n char(1000)
+) with (fillfactor=10, autovacuum_enabled=off);
+
+INSERT INTO table_fillfactor
+SELECT 'x' FROM generate_series(1,1000);
+
+SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
+
+DROP TABLE table_fillfactor;
+
 -- End of Stats Test
-- 
2.48.1

Heikki Linnakangas

hlinnaka@iki.fi

11 months ago

In reply to: Tomas Vondra (#11)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 18/02/2025 14:01, Tomas Vondra wrote:

On 2/4/25 17:54, Tomas Vondra wrote:

On 2/4/25 16:02, Tomas Vondra wrote:

...

Thanks for the report. And yeah, clamping it to 1 seems like the right
fix for this. I wonder if it's worth inventing some sort of test for
this, shouldn't be too hard I guess.

In any case, I'll take care of the fix/backpatch soon.

Here's a proposed fix, including a simple test in the stats suite.

Here's a slightly improved fix, with a proper commit message and
simplified test case. I plan to push this once the releases get out (I
know it was stamped already, but ...).

Looks good, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)

tomas@vondra.me

11 months ago

In reply to: Heikki Linnakangas (#12)

Re: Should heapam_estimate_rel_size consider fillfactor?

On 2/18/25 13:29, Heikki Linnakangas wrote:

On 18/02/2025 14:01, Tomas Vondra wrote:

On 2/4/25 17:54, Tomas Vondra wrote:

On 2/4/25 16:02, Tomas Vondra wrote:

...

Thanks for the report. And yeah, clamping it to 1 seems like the right
fix for this. I wonder if it's worth inventing some sort of test for
this, shouldn't be too hard I guess.

In any case, I'll take care of the fix/backpatch soon.

Here's a proposed fix, including a simple test in the stats suite.

Here's a slightly improved fix, with a proper commit message and
simplified test case. I plan to push this once the releases get out (I
know it was stamped already, but ...).

Looks good, thanks!

Pushed. Thanks!

--
Tomas Vondra