How does the planner determine plan_rows ?

Started by Donald Dongabout 7 years ago6 messages

xdong@csumb.edu

about 7 years ago

Hi,

I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
the results have different row numbers, but the tables are all empty.

=# CREATE TABLE t1(id INT, data INT);
=# EXPLAIN ANALYZE SELECT * FROM t1;
Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual
time=0.003..0.003 rows=0 loops=1)

=# CREATE TABLE t2(data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t2;
Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual
time=0.002..0.002 rows=0 loops=1)

=# CREATE TABLE t3(id INT, data VARCHAR);
=# EXPLAIN ANALYZE SELECT * FROM t3;
Seq Scan on t3 (cost=0.00..22.70 rows=1270 width=36) (actual
time=0.001..0.001 rows=0 loops=1)

I found this behavior unexpected. I'm still trying to find out how/where the planner
determines the plan_rows. Any help will be appreciated!

Thank you,
Donald Dong

andrew@tao11.riddles.org.uk

about 7 years ago

In reply to: Donald Dong (#1)

Re: How does the planner determine plan_rows ?

"Donald" == Donald Dong <xdong@csumb.edu> writes:

Donald> Hi,
Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
Donald> `SELECT * `. I found the results have different row numbers,
Donald> but the tables are all empty.

Empty tables are something of a special case, because the planner
doesn't assume that they will _stay_ empty, and using an estimate of 0
or 1 rows would tend to create a distorted plan that would likely blow
up in runtime as soon as you insert a second row.

The place to look for info would be estimate_rel_size in
optimizer/util/plancat.c, from which you can see that empty tables get
a default size estimate of 10 pages. Thus:

Donald> =# CREATE TABLE t1(id INT, data INT);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual
Donald> time=0.003..0.003 rows=0 loops=1)

An (int,int) tuple takes about 36 bytes, so you can get about 226 of
them on a page, so 10 pages is 2260 rows.

Donald> =# CREATE TABLE t2(data VARCHAR);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual
Donald> time=0.002..0.002 rows=0 loops=1)

Size of a varchar with no specified length isn't known, so the planner
determines an average length of 32 by the time-honoured method of rectal
extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
page.

--
Andrew (irc:RhodiumToad)

xdong@csumb.edu

about 7 years ago

In reply to: Andrew Gierth (#2)

Re: How does the planner determine plan_rows ?

Thank you for the great explanation!

Show quoted text

On Jan 10, 2019, at 7:48 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

"Donald" == Donald Dong <xdong@csumb.edu> writes:

Donald> Hi,
Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
Donald> `SELECT * `. I found the results have different row numbers,
Donald> but the tables are all empty.

Empty tables are something of a special case, because the planner
doesn't assume that they will _stay_ empty, and using an estimate of 0
or 1 rows would tend to create a distorted plan that would likely blow
up in runtime as soon as you insert a second row.

The place to look for info would be estimate_rel_size in
optimizer/util/plancat.c, from which you can see that empty tables get
a default size estimate of 10 pages. Thus:

Donald> =# CREATE TABLE t1(id INT, data INT);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual
Donald> time=0.003..0.003 rows=0 loops=1)

An (int,int) tuple takes about 36 bytes, so you can get about 226 of
them on a page, so 10 pages is 2260 rows.

Donald> =# CREATE TABLE t2(data VARCHAR);
Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual
Donald> time=0.002..0.002 rows=0 loops=1)

Size of a varchar with no specified length isn't known, so the planner
determines an average length of 32 by the time-honoured method of rectal
extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
page.

--
Andrew (irc:RhodiumToad)

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Donald Dong (#1)

Re: How does the planner determine plan_rows ?

Donald Dong <xdong@csumb.edu> writes:

I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
the results have different row numbers, but the tables are all empty.

This isn't a terribly interesting case, since you've neither loaded
any data nor vacuumed/analyzed the table, but ...

I found this behavior unexpected. I'm still trying to find out how/where the planner
determines the plan_rows.

... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.

regards, tom lane

xdong@csumb.edu

about 7 years ago

In reply to: Tom Lane (#4)

Re: How does the planner determine plan_rows ?

On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.

Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.

To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.

I'm very interested to learn how the professionals explore the codebase!

bruce@momjian.us

almost 7 years ago

In reply to: Donald Dong (#5)

Re: How does the planner determine plan_rows ?

On Thu, Jan 10, 2019 at 11:41:51PM -0800, Donald Dong wrote:

On Jan 10, 2019, at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.

Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.

To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.

I'm very interested to learn how the professionals explore the codebase!

Uh, the developer FAQ has some info on this:

https://wiki.postgresql.org/wiki/Developer_FAQ

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +