Row estimates for empty tables
I realize I've never quite known this; where does the planner get the row estimates for an empty table? Example:
psql (11.8)
Type "help" for help.
xof=# CREATE TABLE t (i integer, t text, j integer);
CREATE TABLE
xof=# VACUUM ANALYZE t;
VACUUM
xof=# EXPLAIN ANALYZE SELECT * FROM t;
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..22.00 rows=1200 width=40) (actual time=0.015..0.015 rows=0 loops=1)
Planning Time: 5.014 ms
Execution Time: 0.094 ms
(3 rows)
xof=# INSERT INTO t values(1, 'this', 2);
INSERT 0 1
xof=# EXPLAIN ANALYZE SELECT * FROM t;
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..22.00 rows=1200 width=40) (actual time=0.010..0.011 rows=1 loops=1)
Planning Time: 0.039 ms
Execution Time: 0.021 ms
(3 rows)
xof=# VACUUM ANALYZE t;
VACUUM
xof=# EXPLAIN ANALYZE SELECT * FROM t;
QUERY PLAN
--------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..1.01 rows=1 width=13) (actual time=0.008..0.008 rows=1 loops=1)
Planning Time: 0.069 ms
Execution Time: 0.019 ms
(3 rows)
xof=# DELETE FROM t;
DELETE 0
xof=# VACUUM ANALYZE t;
VACUUM
xof=# EXPLAIN ANALYZE SELECT * FROM t;
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..29.90 rows=1990 width=13) (actual time=0.004..0.004 rows=0 loops=1)
Planning Time: 0.034 ms
Execution Time: 0.015 ms
(3 rows)
--
-- Christophe Pettus
xof@thebuild.com
On Fri, 24 Jul 2020 at 16:01, Christophe Pettus <xof@thebuild.com> wrote:
I realize I've never quite known this; where does the planner get the row estimates for an empty table? Example:
We just assume there are 10 pages if the relation has not yet been
vacuumed or analyzed. The row estimates you see are the number of
times 1 tuple is likely to fit onto a single page multiplied by the
assumed 10 pages. If you had made your table wider then the planner
would have assumed fewer rows
There's a comment that justifies the 10 pages, which, as of master is
in table_block_relation_estimate_size(). It'll be somewhere else in
pg12.
* HACK: if the relation has never yet been vacuumed, use a minimum size
* estimate of 10 pages. The idea here is to avoid assuming a
* newly-created table is really small, even if it currently is, because
* that may not be true once some data gets loaded into it. Once a vacuum
* or analyze cycle has been done on it, it's more reasonable to believe
* the size is somewhat stable.
*
* (Note that this is only an issue if the plan gets cached and used again
* after the table has been filled. What we're trying to avoid is using a
* nestloop-type plan on a table that has grown substantially since the
* plan was made. Normally, autovacuum/autoanalyze will occur once enough
* inserts have happened and cause cached-plan invalidation; but that
* doesn't happen instantaneously, and it won't happen at all for cases
* such as temporary tables.)
*
* We approximate "never vacuumed" by "has relpages = 0", which means this
* will also fire on genuinely empty relations. Not great, but
* fortunately that's a seldom-seen case in the real world, and it
* shouldn't degrade the quality of the plan too much anyway to err in
* this direction.
*
* If the table has inheritance children, we don't apply this heuristic.
* Totally empty parent tables are quite common, so we should be willing
* to believe that they are empty.
The code which decides if the table has been vacuumed here assumes it
has not if pg_class.relpages == 0. So even if you were to manually
vacuum the table the code here would think it's not yet been vacuumed.
David
David Rowley <dgrowleyml@gmail.com> writes:
On Fri, 24 Jul 2020 at 16:01, Christophe Pettus <xof@thebuild.com> wrote:
I realize I've never quite known this; where does the planner get the row estimates for an empty table? Example:
We just assume there are 10 pages if the relation has not yet been
vacuumed or analyzed. The row estimates you see are the number of
times 1 tuple is likely to fit onto a single page multiplied by the
assumed 10 pages. If you had made your table wider then the planner
would have assumed fewer rows
Yeah. Also note that since we have no ANALYZE stats in this scenario,
the row width estimate is going to be backed into via some guesses
based on column data types. (It's fine for fixed-width types, much
less fine for var-width.)
There's certainly not a lot besides tradition to justify the exact
numbers used in this case. However, we do have a good deal of
practical experience to justify the principle of "never assume a
table is empty, or even contains just one row, unless you're really
sure of that". Otherwise you tend to end up with nestloop joins that
will perform horrifically if you were wrong. The other join types
are notably less brittle.
regards, tom lane
On Jul 24, 2020, at 06:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
There's certainly not a lot besides tradition to justify the exact
numbers used in this case.
Since we already special-case parent tables for partition sets, would a storage parameter that lets you either tell the planner "no, really, zero is reasonable here" or sets a minimum number of rows to plan for be reasonable? I happened to get bit by this tracking down an issue where several tables in a large query had zero rows, and the planner's assumption of a few pages worth caused some sub-optimal plans. The performance hit wasn't huge, but they were being joined to some *very* large tables, and the differences added up.
--
-- Christophe Pettus
xof@thebuild.com
pá 24. 7. 2020 v 16:38 odesílatel Christophe Pettus <xof@thebuild.com>
napsal:
On Jul 24, 2020, at 06:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
There's certainly not a lot besides tradition to justify the exact
numbers used in this case.Since we already special-case parent tables for partition sets, would a
storage parameter that lets you either tell the planner "no, really, zero
is reasonable here" or sets a minimum number of rows to plan for be
reasonable? I happened to get bit by this tracking down an issue where
several tables in a large query had zero rows, and the planner's assumption
of a few pages worth caused some sub-optimal plans. The performance hit
wasn't huge, but they were being joined to some *very* large tables, and
the differences added up.
I did this patch ten years ago. GoodData application
https://www.gooddata.com/ uses Postgres lot, and this application stores
some results in tables (as guard against repeated calculations). Lot of
these tables have zero or one row.
Although we ran an ANALYZE over all tables - the queries on empty tables
had very bad plans, and I had to fix it by this patch. Another company uses
a fake one row in table - so there is no possibility to have a really empty
table.
It is an issue for special, not typical applications (this situation is
typical for some OLAP patterns) - it is not too often - but some clean
solution (instead hacking postgres) can be nice.
Regards
Pavel
Show quoted text
--
-- Christophe Pettus
xof@thebuild.com
On Jul 24, 2020, at 12:14, Pavel Stehule <pavel.stehule@gmail.com> wrote:
this application stores some results in tables (as guard against repeated calculations). Lot of these tables have zero or one row.
Yes, that's the situation we encountered, too. It's not very common (and even less common, I would assume, that it results in a bad plan), but it did in this case.
--
-- Christophe Pettus
xof@thebuild.com
Pavel Stehule <pavel.stehule@gmail.com> writes:
pá 24. 7. 2020 v 16:38 odesílatel Christophe Pettus <xof@thebuild.com>
napsal:Since we already special-case parent tables for partition sets, would a
storage parameter that lets you either tell the planner "no, really, zero
is reasonable here" or sets a minimum number of rows to plan for be
reasonable?
It is an issue for special, not typical applications (this situation is
typical for some OLAP patterns) - it is not too often - but some clean
solution (instead hacking postgres) can be nice.
The core issue here is "how do we know whether the table is likely to stay
empty?". I can think of a couple of more or less klugy solutions:
1. Arrange to send out a relcache inval when adding the first page to
a table, and then remove the planner hack for disbelieving relpages = 0.
I fear this'd be a mess from a system structural standpoint, but it might
work fairly transparently.
2. Establish the convention that vacuuming or analyzing an empty table
is what you do to tell the system that this state is going to persist.
That's more or less what the existing comments in plancat.c envision,
but we never made a definition for how the occurrence of that event
would be recorded in the catalogs, other than setting relpages > 0.
Rather than adding another pg_class column, I'm tempted to say that
vacuum/analyze should set relpages to a minimum of 1, even if the
relation has zero pages. That does get the job done:
regression=# create table foo(f1 text);
CREATE TABLE
regression=# explain select * from foo;
QUERY PLAN
--------------------------------------------------------
Seq Scan on foo (cost=0.00..23.60 rows=1360 width=32)
(1 row)
regression=# vacuum foo; -- doesn't help
VACUUM
regression=# explain select * from foo;
QUERY PLAN
--------------------------------------------------------
Seq Scan on foo (cost=0.00..23.60 rows=1360 width=32)
(1 row)
regression=# update pg_class set relpages = 1 where relname = 'foo';
UPDATE 1
regression=# explain select * from foo;
QUERY PLAN
----------------------------------------------------
Seq Scan on foo (cost=0.00..0.00 rows=1 width=32)
(1 row)
(We're still estimating one row, but that's as a result of different
decisions that I'm not nearly as willing to compromise on...)
regards, tom lane
On Jul 24, 2020, at 14:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Rather than adding another pg_class column, I'm tempted to say that
vacuum/analyze should set relpages to a minimum of 1, even if the
relation has zero pages.
If there's not an issue about relpages != actual pages on disk, that certain seems straight-forward, and no *more* hacky than the current situation.
--
-- Christophe Pettus
xof@thebuild.com
[ redirecting to -hackers ]
I wrote:
The core issue here is "how do we know whether the table is likely to stay
empty?". I can think of a couple of more or less klugy solutions:
1. Arrange to send out a relcache inval when adding the first page to
a table, and then remove the planner hack for disbelieving relpages = 0.
I fear this'd be a mess from a system structural standpoint, but it might
work fairly transparently.
I experimented with doing this. It's not hard to code, if you don't mind
having RelationGetBufferForTuple calling CacheInvalidateRelcache. I'm not
sure whether that code path might cause any long-term problems, but it
seems to work OK right now. However, this solution causes massive
"failures" in the regression tests as a result of plans changing. I'm
sure that's partly because we use so many small tables in the tests.
Nonetheless, it's not promising from the standpoint of not causing
unexpected problems in the real world.
2. Establish the convention that vacuuming or analyzing an empty table
is what you do to tell the system that this state is going to persist.
That's more or less what the existing comments in plancat.c envision,
but we never made a definition for how the occurrence of that event
would be recorded in the catalogs, other than setting relpages > 0.
Rather than adding another pg_class column, I'm tempted to say that
vacuum/analyze should set relpages to a minimum of 1, even if the
relation has zero pages.
I also tried this, and it seems a lot more promising: no existing
regression test cases change. So perhaps we should do the attached
or something like it.
regards, tom lane
Attachments:
set-relpages-nonzero-after-vacuum-or-analyze.patchtext/x-diff; charset=us-ascii; name=set-relpages-nonzero-after-vacuum-or-analyze.patchDownload+26-5
On Sat, 25 Jul 2020 at 10:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
1. Arrange to send out a relcache inval when adding the first page to
a table, and then remove the planner hack for disbelieving relpages = 0.
I fear this'd be a mess from a system structural standpoint, but it might
work fairly transparently.I experimented with doing this. It's not hard to code, if you don't mind
having RelationGetBufferForTuple calling CacheInvalidateRelcache. I'm not
sure whether that code path might cause any long-term problems, but it
seems to work OK right now. However, this solution causes massive
"failures" in the regression tests as a result of plans changing. I'm
sure that's partly because we use so many small tables in the tests.
Nonetheless, it's not promising from the standpoint of not causing
unexpected problems in the real world.
I guess all these changes would be the planner moving towards a plan
that suits having fewer rows for the given table better. If so, that
does seem quite scary as we already have enough problems from the
planner choosing poor plans when it thinks there are fewer rows than
there actually are. Don't we need to keep something like the 10-page
estimate there so safer plans are produced before auto-vacuum gets in
and gathers some proper stats?
I think if anything we'd want to move in the direction of producing
more cautious plans when the estimated number of rows is low. Perhaps
especially so for when the planner opts to do things like perform a
non-parameterized nested loop join when it thinks the RelOptInfo with,
say 3, unbeknown-to-the-planner, correlated, base restrict quals that
are thought to produce just 1 row, but actually produce many more.
2. Establish the convention that vacuuming or analyzing an empty table
is what you do to tell the system that this state is going to persist.
That's more or less what the existing comments in plancat.c envision,
but we never made a definition for how the occurrence of that event
would be recorded in the catalogs, other than setting relpages > 0.
Rather than adding another pg_class column, I'm tempted to say that
vacuum/analyze should set relpages to a minimum of 1, even if the
relation has zero pages.I also tried this, and it seems a lot more promising: no existing
regression test cases change. So perhaps we should do the attached
or something like it.
This sounds like a more plausible solution. At least this way there's
an escape hatch for people who suffer due to this.
David
so 25. 7. 2020 v 0:34 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
[ redirecting to -hackers ]
I wrote:
The core issue here is "how do we know whether the table is likely to
stay
empty?". I can think of a couple of more or less klugy solutions:
For these special cases is probably possible to ensure ANALYZE before any
SELECT. When the table is created, then it is analyzed, and after that it
is published and used for SELECT. Usually this table is not modified ever.
Because it is a special case, then it is not necessarily too sophisticated
a solution. But for built in solution it can be designed more goneral
1. Arrange to send out a relcache inval when adding the first page to
a table, and then remove the planner hack for disbelieving relpages = 0.
I fear this'd be a mess from a system structural standpoint, but it might
work fairly transparently.I experimented with doing this. It's not hard to code, if you don't mind
having RelationGetBufferForTuple calling CacheInvalidateRelcache. I'm not
sure whether that code path might cause any long-term problems, but it
seems to work OK right now. However, this solution causes massive
"failures" in the regression tests as a result of plans changing. I'm
sure that's partly because we use so many small tables in the tests.
Nonetheless, it's not promising from the standpoint of not causing
unexpected problems in the real world.2. Establish the convention that vacuuming or analyzing an empty table
is what you do to tell the system that this state is going to persist.
That's more or less what the existing comments in plancat.c envision,
but we never made a definition for how the occurrence of that event
would be recorded in the catalogs, other than setting relpages > 0.
Rather than adding another pg_class column, I'm tempted to say that
vacuum/analyze should set relpages to a minimum of 1, even if the
relation has zero pages.I also tried this, and it seems a lot more promising: no existing
regression test cases change. So perhaps we should do the attached
or something like it.
I am sending a patch that is years used in GoodData.
I am not sure if the company uses 0 or 1, but I can ask.
Regards
Pavel
Show quoted text
regards, tom lane
Attachments:
fakepages.patchtext/x-patch; charset=US-ASCII; name=fakepages.patchDownload+16-2
Pavel Stehule <pavel.stehule@gmail.com> writes:
I am sending a patch that is years used in GoodData.
I'm quite unexcited about that. I'd be the first to agree that the
ten-pages estimate is a hack, but it's not an improvement to ask users
to think of a better value ... especially not as a one-size-fits-
all-relations GUC setting.
I did have an idea that I think is better than my previous one:
rather than lying about the value of relpages, let's represent the
case where we don't know the tuple density by setting reltuples = -1
initially. This leads to a patch that's a good bit more invasive than
the quick-hack solution, but I think it's a lot cleaner on the whole.
A possible objection is that this changes the FDW API slightly, as
GetForeignRelSize callbacks now need to deal with rel->tuples possibly
being -1. We could avoid an API break if we made plancat.c clamp
that value to zero; but then FDWs still couldn't tell the difference
between the "empty" and "never analyzed" cases, and I think this is
just as much of an issue for them as for the core code.
I'll add this to the upcoming CF.
regards, tom lane
Attachments:
use-minus-one-for-unknown-reltuples-1.patchtext/x-diff; charset=us-ascii; name=use-minus-one-for-unknown-reltuples-1.patchDownload+76-68
ne 23. 8. 2020 v 23:08 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I am sending a patch that is years used in GoodData.
I'm quite unexcited about that. I'd be the first to agree that the
ten-pages estimate is a hack, but it's not an improvement to ask users
to think of a better value ... especially not as a one-size-fits-
all-relations GUC setting.
This patch is just a workaround that works well 10 years (but for one
special use case) - nothing more. Without this patch that application
cannot work ever.
I did have an idea that I think is better than my previous one:
rather than lying about the value of relpages, let's represent the
case where we don't know the tuple density by setting reltuples = -1
initially. This leads to a patch that's a good bit more invasive than
the quick-hack solution, but I think it's a lot cleaner on the whole.
A possible objection is that this changes the FDW API slightly, as
GetForeignRelSize callbacks now need to deal with rel->tuples possibly
being -1. We could avoid an API break if we made plancat.c clamp
that value to zero; but then FDWs still couldn't tell the difference
between the "empty" and "never analyzed" cases, and I think this is
just as much of an issue for them as for the core code.
I'll add this to the upcoming CF.
I'll check it
Regards
Pavel
Show quoted text
regards, tom lane
On Fri, Jul 24, 2020 at 09:14:04PM +0200, Pavel Stehule wrote:
p� 24. 7. 2020 v 16:38 odes�latel Christophe Pettus <xof@thebuild.com> napsal:
Since we already special-case parent tables for partition sets, would a
storage parameter that lets you either tell the planner "no, really, zero
is reasonable here" or sets a minimum number of rows to plan for be
reasonable? I happened to get bit by this tracking down an issue where
several tables in a large query had zero rows, and the planner's assumption
of a few pages worth caused some sub-optimal plans. The performance hit
wasn't huge, but they were being joined to some *very* large tables, and
the differences added up.I did this patch ten years ago. GoodData application
https://www.gooddata.com/ uses Postgres lot, and this application stores
some results in tables (as guard against repeated calculations). Lot of
these tables have zero or one row.Although we ran an ANALYZE over all tables - the queries on empty tables
had very bad plans, and I had to fix it by this patch. Another company uses
a fake one row in table - so there is no possibility to have a really empty
table.It is an issue for special, not typical applications (this situation is
typical for some OLAP patterns) - it is not too often - but some clean
solution (instead hacking postgres) can be nice.
On Mon, Aug 24, 2020 at 09:43:49PM +0200, Pavel Stehule wrote:
This patch is just a workaround that works well 10 years (but for one
special use case) - nothing more. Without this patch that application
cannot work ever.
My own workaround was here:
/messages/by-id/20200427181034.GA28974@telsasoft.com
|... 1) create an child table: CREATE TABLE x_child() INHERITS(x)
|and, 2) change the query to use "select from ONLY".
|
|(1) allows the planner to believe that the table really is empty, a conclusion
|it otherwise avoids and (2) avoids decending into the child (for which the
|planner would likewise avoid the conclusion that it's actually empty).
--
Justin
Import Notes
Reply to msg id not found: CAFj8pRAH+WzDsGeNu9noO+fNX5Q8GC5Xw8JtNfAE2bXAmqfipw@mail.gmail.comCAFj8pRBfm4-J-fULxn3H84n39P9D8csgRh3PDptUYa9YKZZDA@mail.gmail.com | Resolved by subject fallback
po 24. 8. 2020 v 21:43 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:
ne 23. 8. 2020 v 23:08 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I am sending a patch that is years used in GoodData.
I'm quite unexcited about that. I'd be the first to agree that the
ten-pages estimate is a hack, but it's not an improvement to ask users
to think of a better value ... especially not as a one-size-fits-
all-relations GUC setting.This patch is just a workaround that works well 10 years (but for one
special use case) - nothing more. Without this patch that application
cannot work ever.I did have an idea that I think is better than my previous one:
rather than lying about the value of relpages, let's represent the
case where we don't know the tuple density by setting reltuples = -1
initially. This leads to a patch that's a good bit more invasive than
the quick-hack solution, but I think it's a lot cleaner on the whole.A possible objection is that this changes the FDW API slightly, as
GetForeignRelSize callbacks now need to deal with rel->tuples possibly
being -1. We could avoid an API break if we made plancat.c clamp
that value to zero; but then FDWs still couldn't tell the difference
between the "empty" and "never analyzed" cases, and I think this is
just as much of an issue for them as for the core code.I'll add this to the upcoming CF.
I'll check it
I think it can work. It is a good enough solution for people who need a
different behaviour with minimal impact on people who don't need a change.
Regards
Pavel
Show quoted text
Regards
Pavel
regards, tom lane
út 25. 8. 2020 v 9:32 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:
po 24. 8. 2020 v 21:43 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:ne 23. 8. 2020 v 23:08 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I am sending a patch that is years used in GoodData.
I'm quite unexcited about that. I'd be the first to agree that the
ten-pages estimate is a hack, but it's not an improvement to ask users
to think of a better value ... especially not as a one-size-fits-
all-relations GUC setting.This patch is just a workaround that works well 10 years (but for one
special use case) - nothing more. Without this patch that application
cannot work ever.I did have an idea that I think is better than my previous one:
rather than lying about the value of relpages, let's represent the
case where we don't know the tuple density by setting reltuples = -1
initially. This leads to a patch that's a good bit more invasive than
the quick-hack solution, but I think it's a lot cleaner on the whole.A possible objection is that this changes the FDW API slightly, as
GetForeignRelSize callbacks now need to deal with rel->tuples possibly
being -1. We could avoid an API break if we made plancat.c clamp
that value to zero; but then FDWs still couldn't tell the difference
between the "empty" and "never analyzed" cases, and I think this is
just as much of an issue for them as for the core code.I'll add this to the upcoming CF.
I'll check it
I think it can work. It is a good enough solution for people who need a
different behaviour with minimal impact on people who don't need a change.
all tests passed
I'll mark this patch as ready for commit
Regards
Pavel
Show quoted text
Regards
Pavel
Regards
Pavel
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
I'll mark this patch as ready for commit
Pushed, thanks for looking.
regards, tom lane