How does the planner determine plan_rows ?
Hi,
I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
the results have different row numbers, but the tables are all empty.
=# CREATE TABLE t1(id INT, data INT);
=# EXPLAIN ANALYZE SELECT * FROM t1;
Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual
time=0.003..0.003 rows=0 loops=1)
=# CREATE TABLE t2(data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t2;
Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual
time=0.002..0.002 rows=0 loops=1)
=# CREATE TABLE t3(id INT, data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t3;
Seq Scan on t3 (cost=0.00..22.70 rows=1270 width=36) (actual
time=0.001..0.001 rows=0 loops=1)
I found this behavior unexpected. I'm still trying to find out how/where the planner
determines the plan_rows. Any help will be appreciated!
Thank you,
Donald Dong
"Donald" == Donald Dong <xdong@csumb.edu> writes:
Donald> Hi,
Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
Donald> `SELECT * `. I found the results have different row numbers,
Donald> but the tables are all empty.
Empty tables are something of a special case, because the planner
doesn't assume that they will _stay_ empty, and using an estimate of 0
or 1 rows would tend to create a distorted plan that would likely blow
up in runtime as soon as you insert a second row.
The place to look for info would be estimate_rel_size in
optimizer/util/plancat.c, from which you can see that empty tables get
a default size estimate of 10 pages. Thus:
Donald> =# CREATE TABLE t1(id INT, data INT);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual
Donald> time=0.003..0.003 rows=0 loops=1)
An (int,int) tuple takes about 36 bytes, so you can get about 226 of
them on a page, so 10 pages is 2260 rows.
Donald> =# CREATE TABLE t2(data VARCHAR);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual
Donald> time=0.002..0.002 rows=0 loops=1)
Size of a varchar with no specified length isn't known, so the planner
determines an average length of 32 by the time-honoured method of rectal
extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
page.
--
Andrew (irc:RhodiumToad)
Thank you for the great explanation!
Show quoted text
On Jan 10, 2019, at 7:48 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Donald" == Donald Dong <xdong@csumb.edu> writes:
Donald> Hi,
Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
Donald> `SELECT * `. I found the results have different row numbers,
Donald> but the tables are all empty.Empty tables are something of a special case, because the planner
doesn't assume that they will _stay_ empty, and using an estimate of 0
or 1 rows would tend to create a distorted plan that would likely blow
up in runtime as soon as you insert a second row.The place to look for info would be estimate_rel_size in
optimizer/util/plancat.c, from which you can see that empty tables get
a default size estimate of 10 pages. Thus:Donald> =# CREATE TABLE t1(id INT, data INT);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual
Donald> time=0.003..0.003 rows=0 loops=1)An (int,int) tuple takes about 36 bytes, so you can get about 226 of
them on a page, so 10 pages is 2260 rows.Donald> =# CREATE TABLE t2(data VARCHAR);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual
Donald> time=0.002..0.002 rows=0 loops=1)Size of a varchar with no specified length isn't known, so the planner
determines an average length of 32 by the time-honoured method of rectal
extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
page.--
Andrew (irc:RhodiumToad)
Donald Dong <xdong@csumb.edu> writes:
I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
the results have different row numbers, but the tables are all empty.
This isn't a terribly interesting case, since you've neither loaded
any data nor vacuumed/analyzed the table, but ...
I found this behavior unexpected. I'm still trying to find out how/where the planner
determines the plan_rows.
... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.
regards, tom lane
On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.
Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.
To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.
I'm very interested to learn how the professionals explore the codebase!
On Thu, Jan 10, 2019 at 11:41:51PM -0800, Donald Dong wrote:
On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.I'm very interested to learn how the professionals explore the codebase!
Uh, the developer FAQ has some info on this:
https://wiki.postgresql.org/wiki/Developer_FAQ
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +