Parallel Index-only scan

Started by Rafia Sabihover 9 years ago23 messageshackers

rafia.sabih@enterprisedb.com

over 9 years ago

Hello all,
This is to propose a patch for enabling parallel index-only scans. With the
patch of parallel index scan around [1]/messages/by-id/CAOGQiiOneen9WEppO6V_myKpQ97CrjBQJ0Pv7ri0rxmMYvLcTg@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/, adding the mechanism for
parallelising index-only scans makes sense. Without this mechanism for the
queries preferring index-only scans, it is likely that at higher scale
parallel index or parallel seq scan serve as cheaper alternative to
index-only scans and then query performance might suffer because of the
added processing of index or seq scans.

Performance
-----------------
Consider the performance of a simple query on TPC-H schema,

explain analyse select count(*) from lineitem where l_shipdate < date
'1995-01-03';

Without parallel index-only scan, parallel seq scan got picked and it took
around 23 seconds for the query to execute,

Finalize Aggregate (cost=651586.63..651586.64 rows=1 width=8) (actual
time=22853.872..22853.873 rows=1 loops=1)
-> Gather (cost=651586.21..651586.62 rows=4 width=8) (actual
time=22853.684..22853.864 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=650586.21..650586.22 rows=1 width=8)
(actual time=22850.489..22850.489 rows=1 loops=5)
-> Parallel Seq Scan on lineitem (cost=0.00..618021.73
rows=13025795 width=0) (actual time=0.035..20553.495 rows=10342437 loops=5)
Filter: (l_shipdate < '1995-01-03'::date)
Rows Removed by Filter: 13656485
Planning time: 0.225 ms
Execution time: 22855.196 ms

However, with parallel index-only scan, it took only 8.5 seconds,

Finalize Aggregate (cost=568883.69..568883.70 rows=1 width=8) (actual

time=8548.993..8548.993 rows=1 loops=1)
-> Gather (cost=568883.27..568883.68 rows=4 width=8) (actual
time=8548.789..8548.976 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=567883.27..567883.28 rows=1 width=8)
(actual time=8541.929..8541.929 rows=1 loops=5)
-> Parallel Index Only Scan using idx_l_shipdate on
lineitem (cost=0.57..535318.78 rows=13025795 width=0) (actual
time=0.113..5866.729 rows=10342437 loops=5)
Index Cond: (l_shipdate < '1995-01-03'::date)
Heap Fetches: 0
Planning time: 0.266 ms
Execution time: 8569.735 ms

The effect of parallel index-only scan can be seen more in some more
complex queries where parallelism is enabled till top of the tree, e.g,
following query takes 118 ms on head,

explain analyse select sum(l_extendedprice * l_discount) as revenue,
avg(o_orderkey) from orders, lineitem where o_orderkey < 10000 and
o_orderkey = l_orderkey group by l_shipmode order by revenue

Sort (cost=24396.44..24396.45 rows=7 width=75) (actual

time=118.823..118.825 rows=7 loops=1)
Sort Key: (sum((lineitem.l_extendedprice * lineitem.l_discount)))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=24396.23..24396.34 rows=7 width=75) (actual
time=118.749..118.786 rows=7 loops=1)
Group Key: lineitem.l_shipmode
-> Nested Loop (cost=1.13..24293.11 rows=10312 width=27)
(actual time=0.096..73.198 rows=9965 loops=1)
-> Index Only Scan using orders_pkey on orders
(cost=0.56..46.48 rows=2578 width=4) (actual time=0.072..1.663 rows=2503
loops=1)
Index Cond: (o_orderkey < 10000)
Heap Fetches: 0
-> Index Scan using idx_lineitem_orderkey on lineitem
(cost=0.57..6.45 rows=296 width=31) (actual time=0.018..0.023 rows=4
loops=2503)
Index Cond: (l_orderkey = orders.o_orderkey)
Planning time: 1.062 ms
Execution time: 118.977 ms

With parallel index-only scan, the performance improves to 40 ms,

Sort (cost=7191.33..7191.35 rows=7 width=75) (actual time=40.475..40.476

rows=7 loops=1)
Sort Key: (sum((lineitem.l_extendedprice * lineitem.l_discount)))
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=7190.78..7191.23 rows=7 width=75)
(actual time=40.168..40.451 rows=7 loops=1)
Group Key: lineitem.l_shipmode
-> Sort (cost=7190.78..7190.85 rows=28 width=75) (actual
time=40.105..40.127 rows=35 loops=1)
Sort Key: lineitem.l_shipmode
Sort Method: quicksort Memory: 29kB
-> Gather (cost=7187.22..7190.11 rows=28 width=75)
(actual time=39.344..39.983 rows=35 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial HashAggregate (cost=6187.22..6187.31
rows=7 width=75) (actual time=25.981..26.011 rows=7 loops=5)
Group Key: lineitem.l_shipmode
-> Nested Loop (cost=1.13..6084.10 rows=10312
width=27) (actual time=0.139..16.352 rows=1993 loops=5)
-> Parallel Index Only Scan using
orders_pkey on orders (cost=0.56..27.14 rows=644 width=4) (actual
time=0.082..0.366 rows=501 loops=5)
Index Cond: (o_orderkey < 10000)
Heap Fetches: 0
-> Index Scan using
idx_lineitem_orderkey on lineitem (cost=0.57..6.45 rows=296 width=31)
(actual time=0.020..0.025 rows=4 loops=2503)
Index Cond: (l_orderkey =
orders.o_orderkey)
Planning time: 1.170 ms
Execution time: 40.898 ms

Test configuration and machine detail:
--------------------------------------------------
TPC-H: scale facto : 20
work_mem : 64MB
shared buffer : 1GB
max_parallel_workers_per_gather : 4
Machine : IBM POWER, 4 socket
machine with 512 GB RAM.
random_page_cost = seq_page_cost = 0.1
We kept random and seq page cost same since warm cache environment was
ensured, thus, keeping a value of random_page_cost that is higher than
seq_page_cost doesn't makes much sense in this setup.
Also, some additional indexes were build, for lineitem table l_shipdate,
l_shipmode, and l_returnflag, on orders table index is added for o_comment
and o_orderdate individually.

The point to note here is that with parallel index-only scan, not only the
heap fetches are saved but also the aggregates/sort can be pushed down to
workers and thus the total time of query can be improved. Clearly, the
performance of queries improved significantly with this new operator and
considering the changes required after parallel index scan patches is less
if evaluated against the improvement in performance it offers.

Attached file:
------------------
1. parallel_index_only_v1.patch

This patch is to be applied over parallel index scan [1]/messages/by-id/CAOGQiiOneen9WEppO6V_myKpQ97CrjBQJ0Pv7ri0rxmMYvLcTg@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/.

* Thanks to my colleagues Dilip Kumar and Amit Kapila for their support and
feedback.

[1]: /messages/by-id/CAOGQiiOneen9WEppO6V_myKpQ97CrjBQJ0Pv7ri0rxmMYvLcTg@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
/messages/by-id/CAOGQiiOneen9WEppO6V_myKpQ97CrjBQJ0Pv7ri0rxmMYvLcTg@mail.gmail.com
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Rafia Sabih (#1)

Re: Parallel Index-only scan

Please find the attached file for the latest version of patch.

parallel_index_only_v2.patch
<http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>

--
View this message in context: http://postgresql.nabble.com/Parallel-Index-only-scan-tp5934352p5936010.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Rafia Sabih (#2)

Re: Parallel Index-only scan

On Fri, Dec 23, 2016 at 12:04 AM, rafia.sabih
<rafia.sabih@enterprisedb.com> wrote:

Please find the attached file for the latest version of patch.

parallel_index_only_v2.patch
<http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>

We want to have the patch actually attached, not just stored on nabble.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Robert Haas (#3)

Re: Parallel Index-only scan

Robert Haas wrote:

On Fri, Dec 23, 2016 at 12:04 AM, rafia.sabih
<rafia.sabih@enterprisedb.com> wrote:

Please find the attached file for the latest version of patch.

parallel_index_only_v2.patch
<http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>

We want to have the patch actually attached, not just stored on nabble.

I think the problem is that Nabble removes the attachment before
resending the email to the list. The OP should be posting directly
instead of via Nabble.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Robert Haas (#3)

Re: Parallel Index-only scan

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Dec 23, 2016 at 12:04 AM, rafia.sabih
<rafia.sabih@enterprisedb.com> wrote:

Please find the attached file for the latest version of patch.
parallel_index_only_v2.patch
<http://postgresql.nabble.com/file/n5936010/parallel_index_only_v2.patch>

We want to have the patch actually attached, not just stored on nabble.

Or in words of one syllable: please do not use nabble to post to the
community mailing lists. Their habit of stripping attachments is not
the only evil thing about them (although it's the only one I remember
details of ATM).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Tom Lane (#5)

Re: Parallel Index-only scan

On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Or in words of one syllable: please do not use nabble to post to the
community mailing lists.

Many of those words have two syllables, and one has four.

Anyhow, I think three castigating emails from committers in a span of
62 minutes is probably enough for the OP to get the point, unless
someone else REALLY feels the need to pile on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Robert Haas (#6)

Re: Parallel Index-only scan

Extremely sorry for the inconvenience caused, please find the attached file
for the latest version of the patch.

On Sat, Dec 24, 2016 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Or in words of one syllable: please do not use nabble to post to the
community mailing lists.

Many of those words have two syllables, and one has four.

Anyhow, I think three castigating emails from committers in a span of
62 minutes is probably enough for the OP to get the point, unless
someone else REALLY feels the need to pile on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Rafia Sabih (#7)

Re: Parallel Index-only scan

Rebased patch of parallel-index only scan based on the latest version of
parallel index scan [1]/messages/by-id/CAA4eK1LiNi7_Z1%25 2BPCV4y06o_v%3DZdZ1UThE%2BW9JhthX4B8uifnA%40mail.gmail.com is attached.

[1]: /messages/by-id/CAA4eK1LiNi7_Z1%25 2BPCV4y06o_v%3DZdZ1UThE%2BW9JhthX4B8uifnA%40mail.gmail.com
2BPCV4y06o_v%3DZdZ1UThE%2BW9JhthX4B8uifnA%40mail.gmail.com

On Sat, Dec 24, 2016 at 7:55 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
wrote:

Extremely sorry for the inconvenience caused, please find the attached
file for the latest version of the patch.

On Sat, Dec 24, 2016 at 1:41 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Or in words of one syllable: please do not use nabble to post to the
community mailing lists.

Many of those words have two syllables, and one has four.

Anyhow, I think three castigating emails from committers in a span of
62 minutes is probably enough for the OP to get the point, unless
someone else REALLY feels the need to pile on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Rafia Sabih (#8)

Re: Parallel Index-only scan

On Wed, Dec 28, 2016 at 12:18 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Rebased patch of parallel-index only scan based on the latest version of
parallel index scan [1] is attached.

[1]
/messages/by-id/CAA4eK1LiNi7_Z1+PCV4y06o_v=ZdZ1UThE+W9JhthX4B8uifnA@mail.gmail.com

The link for the latest version is wrong, the correct link is:
/messages/by-id/CAA4eK1KthrAvNjmB2cWuUHz+p3ZTTtbD7o2KUw49PC8HK4r1Wg@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Rahila Syed

rahilasyed90@gmail.com

over 9 years ago

In reply to: Amit Kapila (#9)

Re: Parallel Index-only scan

Hello,

On applying the patch on latest master branch and running regression tests
following failure occurs.
I applied it on latest parallel index scan patches as given in the link
above.

***
/home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out
2017-01-03 14:06:29.122022780 +0530
---
/home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out
2017-01-12 14:35:56.652712622 +0530
***************
*** 92,103 ****
explain (costs off)
select sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
! QUERY PLAN
! ----------------------------------------------------
HashAggregate
Group Key: parallel_restricted(unique1)
! -> Index Only Scan using tenk1_unique1 on tenk1
! (3 rows)

  set force_parallel_mode=1;
  explain (costs off)
--- 92,105 ----
  explain (costs off)
    select  sum(parallel_restricted(unique1)) from tenk1
    group by(parallel_restricted(unique1));
!                             QUERY PLAN
! -------------------------------------------------------------------
   HashAggregate
     Group Key: parallel_restricted(unique1)
!    ->  Gather
!          Workers Planned: 4
!          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
! (5 rows)

IIUC, parallel operation being performed here is fine as parallel
restricted function occurs above Gather node
and just the expected output needs to be changed.

Thank you,
Rahila Syed

#11

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Rahila Syed (#10)

Re: Parallel Index-only scan

On Thu, Jan 12, 2017 at 5:39 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:

Hello,

On applying the patch on latest master branch and running regression tests
following failure occurs.
I applied it on latest parallel index scan patches as given in the link
above.

***
/home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out
2017-01-03 14:06:29.122022780 +0530
---
/home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out
2017-01-12 14:35:56.652712622 +0530
***************
*** 92,103 ****
explain (costs off)
select sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
! QUERY PLAN
! ----------------------------------------------------
HashAggregate
Group Key: parallel_restricted(unique1)
! -> Index Only Scan using tenk1_unique1 on tenk1
! (3 rows)
set force_parallel_mode=1;
explain (costs off)
--- 92,105 ----
explain (costs off)
select  sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
!                             QUERY PLAN
! -------------------------------------------------------------------
HashAggregate
Group Key: parallel_restricted(unique1)
!    ->  Gather
!          Workers Planned: 4
!          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
! (5 rows)
IIUC, parallel operation being performed here is fine as parallel restricted
function occurs above Gather node
and just the expected output needs to be changed.

True, fixed it, please find the attached file for the latest version.

Thank you,
Rahila Syed

Thanks a lot for the review Rahila.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#12

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Rafia Sabih (#11)

Re: Parallel Index-only scan

On Fri, Jan 13, 2017 at 2:19 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

On Thu, Jan 12, 2017 at 5:39 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hello,

On applying the patch on latest master branch and running regression tests
following failure occurs.
I applied it on latest parallel index scan patches as given in the link
above.

***
/home/rahila/postgres/postgres/src/test/regress/expected/select_parallel.out
2017-01-03 14:06:29.122022780 +0530
---
/home/rahila/postgres/postgres/src/test/regress/results/select_parallel.out
2017-01-12 14:35:56.652712622 +0530
***************
*** 92,103 ****
explain (costs off)
select sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
! QUERY PLAN
! ----------------------------------------------------
HashAggregate
Group Key: parallel_restricted(unique1)
! -> Index Only Scan using tenk1_unique1 on tenk1
! (3 rows)
set force_parallel_mode=1;
explain (costs off)
--- 92,105 ----
explain (costs off)
select  sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
!                             QUERY PLAN
! -------------------------------------------------------------------
HashAggregate
Group Key: parallel_restricted(unique1)
!    ->  Gather
!          Workers Planned: 4
!          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
! (5 rows)
IIUC, parallel operation being performed here is fine as parallel restricted
function occurs above Gather node
and just the expected output needs to be changed.
True, fixed it, please find the attached file for the latest version.

Please find the attached file rebased patch of parallel index-only
scan on the latest Parallel index-scan patch [1]/messages/by-id/CAA4eK1L-gb0Fum3mQN4c5PWJXNE7xs7pzwMDWsrDYLucKqvJ2A@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/.

[1]: /messages/by-id/CAA4eK1L-gb0Fum3mQN4c5PWJXNE7xs7pzwMDWsrDYLucKqvJ2A@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#13

tushar

tushar.ahuja@enterprisedb.com

over 9 years ago

In reply to: Rafia Sabih (#12)

Re: Parallel Index-only scan

On 01/19/2017 05:37 PM, Rafia Sabih wrote:

Please find the attached file rebased patch of parallel index-only
scan on the latest Parallel index-scan patch [1].

We did some testing of this feature and written few testcases. PFA the
sql script(along with the expected .out files)

In addition we have generated the LCOV (code coverage) report and
compared the files which are changed for this.
You can see the numbers for "with_patch" V/s "with_patch+TestCases"
(.pdf file is attached)

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

#14

Michael Paquier

michael@paquier.xyz

over 9 years ago

In reply to: Rafia Sabih (#12)

Re: Parallel Index-only scan

On Thu, Jan 19, 2017 at 9:07 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Please find the attached file rebased patch of parallel index-only
scan on the latest Parallel index-scan patch [1].

Moved to CF 2017-03.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Rafia Sabih (#12)

Re: Parallel Index-only scan

On Thu, Jan 19, 2017 at 7:07 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Please find the attached file rebased patch of parallel index-only
scan on the latest Parallel index-scan patch [1].

This again needs minor rebasing but basically looks fine. It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch. Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there. In particular,
he's got this:

if (reset_parallel_scan && node->iss_ScanDesc->parallel_scan)
index_parallelrescan(node->iss_ScanDesc);

And you've got this:

+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);

There might be some other inconsistencies as well that I didn't notice
on a quick look.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Rahila Syed

rahilasyed90@gmail.com

over 9 years ago

In reply to: Robert Haas (#15)

Re: Parallel Index-only scan

I reviewed the patch. Overall it looks fine to me.

One comment,

-       if (index->amcanparallel &&
-           !index_only_scan &&
+       if ((index->amcanparallel ||
+           index_only_scan) &&

Why do we need to check for index_only_scan in the above condition. IIUC,
index->amcanparallel is necessary for
parallel index scan to be chosen. We cannot chose parallel index only scan
if index_only_scan is happening
without worrying about index->amcanparallel value. So OR-ing
index->amcanparallel with index_only_scan is probably not
correct.

Thank you,
Rahila Syed

On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

On Thu, Jan 19, 2017 at 7:07 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

Please find the attached file rebased patch of parallel index-only
scan on the latest Parallel index-scan patch [1].

This again needs minor rebasing but basically looks fine. It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch. Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there. In particular,
he's got this:

if (reset_parallel_scan && node->iss_ScanDesc->parallel_
scan)
index_parallelrescan(node->iss_ScanDesc);

And you've got this:
+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);
There might be some other inconsistencies as well that I didn't notice
on a quick look.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Rahila Syed (#16)

Re: Parallel Index-only scan

On Thu, Feb 16, 2017 at 1:26 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:

I reviewed the patch. Overall it looks fine to me.

One comment,
-       if (index->amcanparallel &&
-           !index_only_scan &&
+       if ((index->amcanparallel ||
+           index_only_scan) &&
Why do we need to check for index_only_scan in the above condition. IIUC,
index->amcanparallel is necessary for
parallel index scan to be chosen. We cannot chose parallel index only scan
if index_only_scan is happening
without worrying about index->amcanparallel value. So OR-ing
index->amcanparallel with index_only_scan is probably not
correct.

True, we do not need this, only removing !index_only_scan should work.

Fixed

On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com>
wrote:
This again needs minor rebasing but basically looks fine. It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch. Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there. In particular,
he's got this:

if (reset_parallel_scan && node->iss_ScanDesc->parallel_s
can)
index_parallelrescan(node->iss_ScanDesc);

And you've got this:
+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);

Fixed.
Please find the attached patch for rebased and cleaner version.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#18

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Rafia Sabih (#17)

Re: Parallel Index-only scan

On Thu, Feb 16, 2017 at 3:40 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
wrote:

On Thu, Feb 16, 2017 at 1:26 PM, Rahila Syed <rahilasyed90@gmail.com>
wrote:
I reviewed the patch. Overall it looks fine to me.

One comment,
-       if (index->amcanparallel &&
-           !index_only_scan &&
+       if ((index->amcanparallel ||
+           index_only_scan) &&
Why do we need to check for index_only_scan in the above condition. IIUC,
index->amcanparallel is necessary for
parallel index scan to be chosen. We cannot chose parallel index only
scan if index_only_scan is happening
without worrying about index->amcanparallel value. So OR-ing
index->amcanparallel with index_only_scan is probably not
correct.

True, we do not need this, only removing !index_only_scan should work.
Fixed
On Thu, Feb 16, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com>
wrote:
This again needs minor rebasing but basically looks fine. It's a
pretty straightforward extension of the parallel index scan work.

Please make sure that this is pgindent-clean - i.e. that when you
pgindent the files that it touches, pgindent doesn't change anything
of the same parts of the file that you've changed in the patch. Also,
I believe Amit may have made some adjustments to the logic in
nodeIndexScan.c; if so, it would be good to make sure that the
nodeIndexOnlyScan.c changes match what was done there. In particular,
he's got this:

if (reset_parallel_scan && node->iss_ScanDesc->parallel_s
can)
index_parallelrescan(node->iss_ScanDesc);

And you've got this:
+               if (reset_parallel_scan)
+                       index_parallelrescan(node->ioss_ScanDesc);
Fixed.
Please find the attached patch for rebased and cleaner version.

Please find the attached patch with a minor comment update.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#19

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Rafia Sabih (#18)

Re: Parallel Index-only scan

On Thu, Feb 16, 2017 at 3:57 PM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

On Thu, Feb 16, 2017 at 3:40 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
wrote:

Please find the attached patch for rebased and cleaner version.

Please find the attached patch with a minor comment update.

Few comments:

1.
+ * ioss_PscanLen   This is needed for parallel index scan
  * ----------------
  */
 typedef struct IndexOnlyScanState
@@ -1427,6 +1428,7 @@ typedef struct IndexOnlyScanState
  IndexScanDesc ioss_ScanDesc;
  Buffer ioss_VMBuffer;
  long ioss_HeapFetches;
+ Size ioss_PscanLen; /* This is needed for parallel index scan */

No need to mention same comment at multiple places. I think you keep
it on top of structure. The explanation could be some thing like
"size of parallel index only scan descriptor"

2.
+ node->ioss_ScanDesc->xs_want_itup = true;

I think wherever you are initializing xs_want_itup, you should
initialize ioss_VMBuffer as well. Is there a reason for not doing so?

3.
explain (costs off)
  select  sum(parallel_restricted(unique1)) from tenk1
  group by(parallel_restricted(unique1));
-                     QUERY PLAN
-----------------------------------------------------
+                            QUERY PLAN
+-------------------------------------------------------------------
  HashAggregate
    Group Key: parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
+(5 rows)

It doesn't look good that you want to test parallel index only scan
for a test case that wants to test restricted function. The comments
atop test looks odd. I suggest add a separate test (both explain and
actual execution of query) for parallel index only scan as we have for
parallel plans for other queries and see if we can keep the output of
existing test same.

4.
ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
{
..
+ /*
+ * if we are here to just update the scan keys, then don't reset parallel
+ * scan
+ */
+ if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
+ reset_parallel_scan = false;
..
}

I think here you can update the comment to indicate that for detailed
reason refer ExecReScanIndexScan.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Rafia Sabih

rafia.sabih@enterprisedb.com

over 9 years ago

In reply to: Amit Kapila (#19)

Re: Parallel Index-only scan

On Thu, Feb 16, 2017 at 9:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

Few comments:
1.
+ * ioss_PscanLen   This is needed for parallel index scan
* ----------------
*/
typedef struct IndexOnlyScanState
@@ -1427,6 +1428,7 @@ typedef struct IndexOnlyScanState
IndexScanDesc ioss_ScanDesc;
Buffer ioss_VMBuffer;
long ioss_HeapFetches;
+ Size ioss_PscanLen; /* This is needed for parallel index scan */
No need to mention same comment at multiple places. I think you keep
it on top of structure. The explanation could be some thing like
"size of parallel index only scan descriptor"

Fixed.

2.
+ node->ioss_ScanDesc->xs_want_itup = true;

I think wherever you are initializing xs_want_itup, you should
initialize ioss_VMBuffer as well. Is there a reason for not doing so?

Done.

3.
explain (costs off)
select  sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
-                     QUERY PLAN
-----------------------------------------------------
+                            QUERY PLAN
+-------------------------------------------------------------------
HashAggregate
Group Key: parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
+(5 rows)
It doesn't look good that you want to test parallel index only scan
for a test case that wants to test restricted function. The comments
atop test looks odd. I suggest add a separate test (both explain and
actual execution of query) for parallel index only scan as we have for
parallel plans for other queries and see if we can keep the output of
existing test same.

Agree, but actually the objective of this test-case is met even with this
plan. To restrict parallel index-only scan here, modification in query or
other parameters would be required. However, for the proper code-coverage
and otherwise I have added test-case for parallel index-only scan.

4.
ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
{
..
+ /*
+ * if we are here to just update the scan keys, then don't reset parallel
+ * scan
+ */
+ if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
+ reset_parallel_scan = false;
..
}

I think here you can update the comment to indicate that for detailed
reason refer ExecReScanIndexScan.

Done.
Please find the attached patch for the revised version.
Just an FYI, in my recent tests on TPC-H 300 scale factor, Q16 showed
improved execution time from 830 seconds to 730 seconds with this patch
when used with parallel merge-join patch [1]/messages/by-id/CAFiTN-tX3EzDw7zYvi97eNADG9PH-nmhLa24Y3uWdzy_Y4SkfQ@mail.gmail.com Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/.

[1]: /messages/by-id/CAFiTN-tX3EzDw7zYvi97eNADG9PH-nmhLa24Y3uWdzy_Y4SkfQ@mail.gmail.com Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
/messages/by-id/CAFiTN-tX3EzDw7zYvi97eNADG9PH-nmhLa24Y3uWdzy_Y4SkfQ@mail.gmail.com
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/