Materialiation is slower than non-materialized
During my routine work, I observed that incase of execution of plan having inner node of NLJ as materialized node (on top of SeqScan) is slower compared to non-materialized SeqScan node. This happens only if "Work_mem is not big enough to hold all tuples in memory."
To make test easy and faster, I set the work_mem as 256kB. Then result is as below:
=========With Material off=============
postgres=# set enable_material to off;
SET
Time: 0.225 ms
postgres=# select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
count
----------
49995000
(1 row)
Time: 26674.299 ms
postgres=# explain select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=2783478.33..2783478.34 rows=1 width=4)
-> Nested Loop (cost=0.00..2700145.00 rows=33333333 width=4)
Join Filter: (tbl.id1 < tbl2.id1)
-> Seq Scan on tbl (cost=0.00..145.00 rows=10000 width=4)
-> Seq Scan on tbl2 (cost=0.00..145.00 rows=10000 width=4)
Planning time: 0.120 ms
(6 rows)
=========With Material on=============
postgres=# set enable_material to on;
SET
Time: 0.222 ms
postgres=# select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
count
----------
49995000
(1 row)
Time: 32839.627 ms
postgres=# explain select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
QUERY PLAN
----------------------------------------------------------------------------
Aggregate (cost=1983648.33..1983648.34 rows=1 width=4)
-> Nested Loop (cost=0.00..1900315.00 rows=33333333 width=4)
Join Filter: (tbl.id1 < tbl2.id1)
-> Seq Scan on tbl (cost=0.00..145.00 rows=10000 width=4)
-> Materialize (cost=0.00..235.00 rows=10000 width=4)
-> Seq Scan on tbl2 (cost=0.00..145.00 rows=10000 width=4)
Planning time: 0.140 ms
(7 rows)
As per my analysis, above result is aligned with our current design.
Materialization Node:
Cost Calculation @ Plan time:
If the results spills over to disk in case of Materialization, it considers the cost for the same in total cost.
Actual Execution:
Result is actually fetched from disk only even on re-scan.
Scan Node:
Cost Calculation @ Plan time:
The cost of re-scan of SeqScan node is considered to be same scan of SeqScan node, which always assumes that the records is fetched from disk and hence disk access cost is added (As we don't know really how much memory will be available to cache during execution).
Actual Execution:
After first scan, once the whole records is loaded to memory (provided shared_buffer is big enough), rescan of records are read from memory only and hence it is much faster.
So because of this while planning cost of Materialized node is lesser than that of SeqScan node but while execution SeqScan is faster because it fetches tuples from memory on re-scan.
I am not sure if we can consider this to be a problem or not but I just wanted to share as generally it is expected by user to be Materialization faster than Non-materialized.
Please provide your opinion. If we can do something about this then I can take up this work.
Thanks and Regards,
Kumar Rajeev Rastogi
------------------------------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!
On Mon, Mar 23, 2015 at 6:01 AM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
The cost of re-scan of SeqScan node is considered to be same scan of SeqScan
node, which always assumes that the records is fetched from disk and hence
disk access cost is added (As we don’t know really how much memory will be
available to cache during execution).
That's a general problem not limited to materialize nodes. We might
choose to do a heap-sort rather than a quick-sort, but it may turn out
that the "tapes" we create end up in the OS buffer cache instead of on
physical storage; in fact, it's probably the common case. Scans are
costed using seq_page_cost and random_page_cost, but most of the time
the "random" page cost will not be the cost of a head seek, because
we'll find the data in the OS page cache. Some of the time it really
will be a head seek, but we have no idea whether that will happen in
any given case. The autovacuum cost delays have this problem too: a
"miss" in shared buffers may really be a hit in the OS page cache, but
we don't know.
This e-mail and its attachments contain confidential information from
HUAWEI, which
is intended only for the person or entity whose address is listed above. Any
use of the
information contained herein in any way (including, but not limited to,
total or partial
disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by
phone or email immediately and delete it!
This kind of disclaimer is inappropriate on a public mailing list.
Don't send confidential information to public mailing lists. You
probably don't have any legal right to control what happens to it
after that, regardless of what you put in your email.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 March 2015 21:39, Robert Haas
On Mon, Mar 23, 2015 at 6:01 AM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:The cost of re-scan of SeqScan node is considered to be same scan of
SeqScan node, which always assumes that the records is fetched from
disk and hence disk access cost is added (As we don’t know really how
much memory will be available to cache during execution).That's a general problem not limited to materialize nodes. We might
choose to do a heap-sort rather than a quick-sort, but it may turn out
that the "tapes" we create end up in the OS buffer cache instead of on
physical storage; in fact, it's probably the common case. Scans are
costed using seq_page_cost and random_page_cost, but most of the time
the "random" page cost will not be the cost of a head seek, because
we'll find the data in the OS page cache. Some of the time it really
will be a head seek, but we have no idea whether that will happen in
any given case. The autovacuum cost delays have this problem too: a
"miss" in shared buffers may really be a hit in the OS page cache, but
we don't know.
Yes, I agree.
This kind of disclaimer is inappropriate on a public mailing list.
Don't send confidential information to public mailing lists. You
probably don't have any legal right to control what happens to it after
that, regardless of what you put in your email.
Sorry for this. Generally we delete this legal message before sending mails to
community but somehow missed to do the same this time.
Thanks and Regards,
Kumar Rajeev Rastogi.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers