Parallel bitmap heap scan

Started by Dilip Kumarover 9 years ago128 messageshackers
Jump to latest
#1Dilip Kumar
dilipbalaut@gmail.com

Hi Hackers,

I would like to propose parallel bitmap heap scan feature. After
running TPCH benchmark, It was observed that many of TPCH queries are
using bitmap scan (@TPCH_plan.tar.gz attached below). Keeping this
point in mind we thought that many query will get benefited with
parallel bitmap scan.

Robert has also pointed out the same thing in his blog related to parallel query
http://rhaas.blogspot.in/2016/04/postgresql-96-with-parallel-query-vs.html

Currently Bitmap heap plan look like this :
------------------------------------------------------
Bitmap Heap Scan
-> Bitmap Index Scan

After this patch :
---------------------
Parallel Bitmap Heap Scan
-> Bitmap Index Scan

As part of this work I have implemented parallel processing in
BitmapHeapScan node. BitmapIndexScan is still non parallel.

Brief design idea:
-----------------------
#1. Shared TIDBitmap creation and initialization
First worker to see the state as parallel bitmap info as PBM_INITIAL
become leader and set the state to PBM_INPROGRESS All other workers
see the state as PBM_INPROGRESS will wait for leader to complete the
TIDBitmap.

#2 At this level TIDBitmap is ready and all workers are awake.

#3. Bitmap processing (Iterate and process the pages).
In this phase each worker will iterate over page and chunk array and
select heap pages one by one. If prefetch is enable then there will be
two iterator. Since multiple worker are iterating over same page and
chunk array we need to have a shared iterator, so we grab a spin lock
and iterate within a lock, so that each worker get and different page
to process.

Note: For more detail on design, please refer comment of
BitmapHeapNext API in "parallel-bitmap-heap-scan-v1.patch" file.

Attached patch details:
------------------------------
1. parallel-bitmap-heap-scan-v1.patch: This is the main patch to make
bitmap heap scan node parallel aware.

2. dht-return-dsa-v1.patch: This patch will provide new API, where we
can scan full DHT[1], and get the dsa_pointers (a relative pointer).
The dsa_pointer values can be shared with other processes. We need
this because, after TIDBitmap is created, only one worker will process
whole TIDBitmap and convert it to a page and chunk array. So we need
to store the generic pointer, so that later on each worker can convert
those to their local pointer before start processing.

My patch depends on following patches.
------------------------------------------------------
1. conditional_variable
/messages/by-id/CAEepm=0zshYwB6wDeJCkrRJeoBM=jPYBe+-k_VtKRU_8zMLEfA@mail.gmail.com

2. dsa_area
/messages/by-id/CAEepm=024p-MeAsDmG=R3+tR4EGhuGJs_+rjFKF0eRoSTmMJnA@mail.gmail.com

3. Creating a DSA area to provide work space for parallel execution
/messages/by-id/CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com

4. Hash table in dynamic shared memory (DHT) [1]
/messages/by-id/CAEepm=0VrMt3s_REDhQv6z1pHL7FETOD7Rt9V2MQ3r-2ss2ccA@mail.gmail.com

Order in which patches should be applied:
--------------------------------------------------------
1. conditional_variable
2. dsa_area
3. Creating a DSA area to provide work space for parallel execution
4. Hash table in dynamic shared memory.
5. dht-return-dsa-v1.patch
6. parallel-bitmap-heap-scan-v1.patch

Performance Results:
-----------------------------
Summary :
1. After this patch, I observed currently 4 queries are getting
significant improvement (Q4, Q6, Q14, Q15).
- Q4, is converting from parallel seqscan to parallel bitmap heap scan.
- Other queries are converted from a regular bitmap heap scan to a
parallel bitmap heap scan.
2. Benefit is more visible at lower workers (upto 4), after that some
of the queries are selecting ParallelSeqScan over ParallelBitmapScan.
And, I think this is expected, because so far we have only made
BitmapHeap node as parallel whereas ParallelSeqScan is completely
parallel so at higher worker count ParallelSeqScan is better choice.
3. Detailed result is attached @TPCH_PBMS.pdf
4. Explain analyse output is attached @TPCH_plan.tar.gz (for all
changed queries at worker 2)

TPCH query plan changed example (TPCH Q6):
----------------------------------------------------------------
On Head:
-------------

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1558475.95..1558475.96 rows=1 width=32) (actual
time=40921.437..40921.438 rows=1 loops=1)
-> Aggregate (cost=1558475.95..1558475.96 rows=1 width=32)
(actual time=40921.435..40921.435 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=291783.32..1552956.39
rows=1103911 width=12) (actual time=7032.075..38997.369 rows=1140434
loops=1)
Recheck Cond: ((l_shipdate >= '1994-01-01'::date) AND
(l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity <
'24'::numeric))
Rows Removed by Index Recheck: 25284232
Heap Blocks: exact=134904 lossy=530579
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..291507.35 rows=1103911 width=0) (actual
time=6951.408..6951.408 rows=1140434 loops=1)
Index Cond: ((l_shipdate >= '1994-01-01'::date)
AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity <
'24'::numeric))
Planning time: 1.126 ms
Execution time: 40922.569 ms
(10 rows)

After Patch:
----------------

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1541767.60..1541767.61 rows=1 width=32) (actual
time=21895.008..21895.009 rows=1 loops=1)
-> Finalize Aggregate (cost=1541767.60..1541767.61 rows=1
width=32) (actual time=21895.006..21895.006 rows=1 loops=1)
-> Gather (cost=1541767.38..1541767.59 rows=2 width=32)
(actual time=21894.341..21894.970 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1540767.38..1540767.39
rows=1 width=32) (actual time=21890.990..21890.990 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on lineitem
(cost=291783.32..1538467.56 rows=459963 width=12) (actual
time=8517.126..21215.469 rows=380145 loops=3)
Recheck Cond: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03)
AND (l_quantity < '24'::numeric))
Rows Removed by Index Recheck: 8427921
Heap Blocks: exact=47761 lossy=187096
-> Bitmap Index Scan on
idx_lineitem_shipdate (cost=0.00..291507.35 rows=1103911 width=0)
(actual time=8307.291..8307.291 rows=1140434 loops=1)
Index Cond: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03)
AND (l_quantity < '24'::numeric))
Planning time: 1.173 ms
Execution time: 21915.931 ms
(14 rows)

* Thanks to Robert Haas and Amit Kapila, for helping in design review
(off list) and many valuable inputs.
* Thanks to Thomas Munro for DSA and DHT work on which my patch is based on.
* Thanks to Rafia sabih for helping with performance test.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

benchmark_machine_info.txttext/plain; charset=US-ASCII; name=benchmark_machine_info.txtDownload
dht-return-dsa-v1.patchapplication/octet-stream; name=dht-return-dsa-v1.patchDownload+44-21
parallel-bitmap-heap-scan-v1.patchapplication/octet-stream; name=parallel-bitmap-heap-scan-v1.patchDownload+1369-252
TPCH_PBMS.pdfapplication/pdf; name=TPCH_PBMS.pdfDownload+2-1
TPCH_Plan.tar.gzapplication/x-gzip; name=TPCH_Plan.tar.gzDownload
#2Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#1)
Re: Parallel bitmap heap scan

There is major chance in tidbitmap.c file after efficient hash table
commit [1]commit-http://git.postgresql.org/pg/commitdiff/75ae538bc3168bf44475240d4e0487ee2f3bb376 and my patch need to be rebased.

Only parallel-bitmap-heap-scan need to be rebased, all other patch can
be applied on head as is.
Rebased version (v2) of parallel-bitmap-heap-scan is attached.

[1]: commit-http://git.postgresql.org/pg/commitdiff/75ae538bc3168bf44475240d4e0487ee2f3bb376

On Fri, Oct 7, 2016 at 11:46 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Hi Hackers,

I would like to propose parallel bitmap heap scan feature. After
running TPCH benchmark, It was observed that many of TPCH queries are
using bitmap scan (@TPCH_plan.tar.gz attached below). Keeping this
point in mind we thought that many query will get benefited with
parallel bitmap scan.

Robert has also pointed out the same thing in his blog related to parallel query
http://rhaas.blogspot.in/2016/04/postgresql-96-with-parallel-query-vs.html

Currently Bitmap heap plan look like this :
------------------------------------------------------
Bitmap Heap Scan
-> Bitmap Index Scan

After this patch :
---------------------
Parallel Bitmap Heap Scan
-> Bitmap Index Scan

As part of this work I have implemented parallel processing in
BitmapHeapScan node. BitmapIndexScan is still non parallel.

Brief design idea:
-----------------------
#1. Shared TIDBitmap creation and initialization
First worker to see the state as parallel bitmap info as PBM_INITIAL
become leader and set the state to PBM_INPROGRESS All other workers
see the state as PBM_INPROGRESS will wait for leader to complete the
TIDBitmap.

#2 At this level TIDBitmap is ready and all workers are awake.

#3. Bitmap processing (Iterate and process the pages).
In this phase each worker will iterate over page and chunk array and
select heap pages one by one. If prefetch is enable then there will be
two iterator. Since multiple worker are iterating over same page and
chunk array we need to have a shared iterator, so we grab a spin lock
and iterate within a lock, so that each worker get and different page
to process.

Note: For more detail on design, please refer comment of
BitmapHeapNext API in "parallel-bitmap-heap-scan-v1.patch" file.

Attached patch details:
------------------------------
1. parallel-bitmap-heap-scan-v1.patch: This is the main patch to make
bitmap heap scan node parallel aware.

2. dht-return-dsa-v1.patch: This patch will provide new API, where we
can scan full DHT[1], and get the dsa_pointers (a relative pointer).
The dsa_pointer values can be shared with other processes. We need
this because, after TIDBitmap is created, only one worker will process
whole TIDBitmap and convert it to a page and chunk array. So we need
to store the generic pointer, so that later on each worker can convert
those to their local pointer before start processing.

My patch depends on following patches.
------------------------------------------------------
1. conditional_variable
/messages/by-id/CAEepm=0zshYwB6wDeJCkrRJeoBM=jPYBe+-k_VtKRU_8zMLEfA@mail.gmail.com

2. dsa_area
/messages/by-id/CAEepm=024p-MeAsDmG=R3+tR4EGhuGJs_+rjFKF0eRoSTmMJnA@mail.gmail.com

3. Creating a DSA area to provide work space for parallel execution
/messages/by-id/CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com

4. Hash table in dynamic shared memory (DHT) [1]
/messages/by-id/CAEepm=0VrMt3s_REDhQv6z1pHL7FETOD7Rt9V2MQ3r-2ss2ccA@mail.gmail.com

Order in which patches should be applied:
--------------------------------------------------------
1. conditional_variable
2. dsa_area
3. Creating a DSA area to provide work space for parallel execution
4. Hash table in dynamic shared memory.
5. dht-return-dsa-v1.patch
6. parallel-bitmap-heap-scan-v1.patch

Performance Results:
-----------------------------
Summary :
1. After this patch, I observed currently 4 queries are getting
significant improvement (Q4, Q6, Q14, Q15).
- Q4, is converting from parallel seqscan to parallel bitmap heap scan.
- Other queries are converted from a regular bitmap heap scan to a
parallel bitmap heap scan.
2. Benefit is more visible at lower workers (upto 4), after that some
of the queries are selecting ParallelSeqScan over ParallelBitmapScan.
And, I think this is expected, because so far we have only made
BitmapHeap node as parallel whereas ParallelSeqScan is completely
parallel so at higher worker count ParallelSeqScan is better choice.
3. Detailed result is attached @TPCH_PBMS.pdf
4. Explain analyse output is attached @TPCH_plan.tar.gz (for all
changed queries at worker 2)

TPCH query plan changed example (TPCH Q6):
----------------------------------------------------------------
On Head:
-------------

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1558475.95..1558475.96 rows=1 width=32) (actual
time=40921.437..40921.438 rows=1 loops=1)
-> Aggregate (cost=1558475.95..1558475.96 rows=1 width=32)
(actual time=40921.435..40921.435 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=291783.32..1552956.39
rows=1103911 width=12) (actual time=7032.075..38997.369 rows=1140434
loops=1)
Recheck Cond: ((l_shipdate >= '1994-01-01'::date) AND
(l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity <
'24'::numeric))
Rows Removed by Index Recheck: 25284232
Heap Blocks: exact=134904 lossy=530579
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..291507.35 rows=1103911 width=0) (actual
time=6951.408..6951.408 rows=1140434 loops=1)
Index Cond: ((l_shipdate >= '1994-01-01'::date)
AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity <
'24'::numeric))
Planning time: 1.126 ms
Execution time: 40922.569 ms
(10 rows)

After Patch:
----------------

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1541767.60..1541767.61 rows=1 width=32) (actual
time=21895.008..21895.009 rows=1 loops=1)
-> Finalize Aggregate (cost=1541767.60..1541767.61 rows=1
width=32) (actual time=21895.006..21895.006 rows=1 loops=1)
-> Gather (cost=1541767.38..1541767.59 rows=2 width=32)
(actual time=21894.341..21894.970 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1540767.38..1540767.39
rows=1 width=32) (actual time=21890.990..21890.990 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on lineitem
(cost=291783.32..1538467.56 rows=459963 width=12) (actual
time=8517.126..21215.469 rows=380145 loops=3)
Recheck Cond: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03)
AND (l_quantity < '24'::numeric))
Rows Removed by Index Recheck: 8427921
Heap Blocks: exact=47761 lossy=187096
-> Bitmap Index Scan on
idx_lineitem_shipdate (cost=0.00..291507.35 rows=1103911 width=0)
(actual time=8307.291..8307.291 rows=1140434 loops=1)
Index Cond: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03)
AND (l_quantity < '24'::numeric))
Planning time: 1.173 ms
Execution time: 21915.931 ms
(14 rows)

* Thanks to Robert Haas and Amit Kapila, for helping in design review
(off list) and many valuable inputs.
* Thanks to Thomas Munro for DSA and DHT work on which my patch is based on.
* Thanks to Rafia sabih for helping with performance test.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel-bitmap-heap-scan-v2.patchapplication/octet-stream; name=parallel-bitmap-heap-scan-v2.patchDownload+1307-220
#3Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#2)
Re: Parallel bitmap heap scan

On Mon, Oct 17, 2016 at 1:23 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

There is major chance in tidbitmap.c file after efficient hash table
commit [1] and my patch need to be rebased.

Only parallel-bitmap-heap-scan need to be rebased, all other patch can
be applied on head as is.
Rebased version (v2) of parallel-bitmap-heap-scan is attached.

But what's the impact on performance? Presumably parallel bitmap heap
scan was already slower than the non-parallel version, and that commit
presumably widens the gap. Seems like something to worry about...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Dilip Kumar (#1)
Re: Parallel bitmap heap scan

Hi,

On 2016-10-07 11:46:40 +0530, Dilip Kumar wrote:

Brief design idea:
-----------------------
#1. Shared TIDBitmap creation and initialization
First worker to see the state as parallel bitmap info as PBM_INITIAL
become leader and set the state to PBM_INPROGRESS All other workers
see the state as PBM_INPROGRESS will wait for leader to complete the
TIDBitmap.

#2 At this level TIDBitmap is ready and all workers are awake.

#3. Bitmap processing (Iterate and process the pages).
In this phase each worker will iterate over page and chunk array and
select heap pages one by one. If prefetch is enable then there will be
two iterator. Since multiple worker are iterating over same page and
chunk array we need to have a shared iterator, so we grab a spin lock
and iterate within a lock, so that each worker get and different page
to process.

I don't quite understand why the bitmap has to be parallel at all. As
far as I understand your approach as described here, the only thing that
needs to be shared are the iteration arrays. Since they never need to
be resized and such, it seems to make a lot more sense to just add an
API to share those, instead of the whole underlying hash.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#3)
Re: Parallel bitmap heap scan

On Tue, Oct 18, 2016 at 1:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

But what's the impact on performance? Presumably parallel bitmap heap
scan was already slower than the non-parallel version, and that commit
presumably widens the gap. Seems like something to worry about...

I have checked the performance in my local machine and there is no
impact on the gap.
If you see the explain analyze output of all queries which got
benefited which parallel bitmap map heap scan, BitmapIndex node is
taking very less time compare to BitmapHeap.

Actual execution time on head (before efficient hash table patch)
BitmapHeapNode BitmapIndexNode
Q6 38997 6951
Q14 14516 569
Q15 28530 1442

Out of 4 queries, Q4 is converted from parallel seqscan to parallel
bitmap scan so no impact.
Q14, Q15 time spent in BitmapIndex node is < 5% of time spent in
BitmapHeap Node. Q6 it's 20% but I did not see much impact on this in
my local machine. However I will take the complete performance reading
and post the data on my actual performance machine.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andres Freund (#4)
Re: Parallel bitmap heap scan

On Tue, Oct 18, 2016 at 1:45 AM, Andres Freund <andres@anarazel.de> wrote:

I don't quite understand why the bitmap has to be parallel at all. As
far as I understand your approach as described here, the only thing that
needs to be shared are the iteration arrays. Since they never need to
be resized and such, it seems to make a lot more sense to just add an
API to share those, instead of the whole underlying hash.

You are right that we only share iteration arrays. But only point is
that each entry of iteration array is just a pointer to hash entry.
So either we need to build hash in shared memory (my current approach)
or we need to copy each hash element at shared location (I think this
is going to be expensive).

Let me know if I am missing something..

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Dilip Kumar (#5)
Re: Parallel bitmap heap scan

On 2016-10-19 09:43:10 +0530, Dilip Kumar wrote:

On Tue, Oct 18, 2016 at 1:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

But what's the impact on performance? Presumably parallel bitmap heap
scan was already slower than the non-parallel version, and that commit
presumably widens the gap. Seems like something to worry about...

I have checked the performance in my local machine and there is no
impact on the gap.

Try measuring with something more heavy on bitmap scan time
itself. E.g.
SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);
or similar. The tpch queries don't actually spend that much time in the
bitmapscan itself - the parallization of the rest of the query is what
matters...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andres Freund (#7)
Re: Parallel bitmap heap scan

On Wed, Oct 19, 2016 at 12:39 PM, Andres Freund <andres@anarazel.de> wrote:

Try measuring with something more heavy on bitmap scan time
itself. E.g.
SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);
or similar. The tpch queries don't actually spend that much time in the
bitmapscan itself - the parallization of the rest of the query is what
matters...

Yeah, I agree.

I have tested with this query, with exact filter condition it was
taking parallel sequence scan, so I have modified the filter a bit and
tested.

Tested with all default configuration in my local machine. I think I
will generate more such test cases and do detail testing in my
performance machine.

Explain Analyze results:
---------------------------------
On Head:
------------
postgres=# explain analyze SELECT SUM(l_extendedprice) FROM lineitem
WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <=
'1996-03-31'::date);

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=848805.90..848805.91 rows=1 width=32) (actual
time=12440.165..12440.166 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=143372.40..834833.25
rows=5589057 width=8) (actual time=1106.217..11183.722 rows=5678841
loops=1)
Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate <= '1996-03-31'::date))
Rows Removed by Index Recheck: 20678739
Heap Blocks: exact=51196 lossy=528664
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..141975.13 rows=5589057 width=0) (actual
time=1093.376..1093.376 rows=5678841 loops=1)
Index Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate <= '1996-03-31'::date))
Planning time: 0.185 ms
Execution time: 12440.819 ms
(9 rows)

After Patch:
---------------
postgres=# explain analyze SELECT SUM(l_extendedprice) FROM lineitem
WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <=
'1996-03-31'::date);

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
---------
Finalize Aggregate (cost=792751.16..792751.17 rows=1 width=32)
(actual time=6660.157..6660.157 rows=1 loops=1)
-> Gather (cost=792750.94..792751.15 rows=2 width=32) (actual
time=6659.378..6660.117 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=791750.94..791750.95 rows=1
width=32) (actual time=6655.941..6655.941 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on lineitem
(cost=143372.40..785929.00 rows=2328774 width=8) (actual
time=1980.797..6204.232 rows=1892947 loops=
3)
Recheck Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate <= '1996-03-31'::date))
Rows Removed by Index Recheck: 6930269
Heap Blocks: exact=17090 lossy=176443
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..141975.13 rows=5589057 width=0) (actual
time=1933.454..1933.454 rows=5678841
loops=1)
Index Cond: ((l_shipdate >=
'1995-01-01'::date) AND (l_shipdate <= '1996-03-31'::date))
Planning time: 0.207 ms
Execution time: 6669.195 ms
(13 rows)

Summary:
-> With patch overall execution is 2 time faster compared to head.
-> Bitmap creation with patch is bit slower compared to head and thats
because of DHT vs efficient hash table.

I found one defect in v2 patch, that I induced during last rebasing.
That is fixed in v3.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel-bitmap-heap-scan-v3.patchapplication/octet-stream; name=parallel-bitmap-heap-scan-v3.patchDownload+1307-220
#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#8)
Re: Parallel bitmap heap scan

On Wed, Oct 19, 2016 at 9:23 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Oct 19, 2016 at 12:39 PM, Andres Freund <andres@anarazel.de> wrote:

Try measuring with something more heavy on bitmap scan time
itself. E.g.
SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);
or similar. The tpch queries don't actually spend that much time in the
bitmapscan itself - the parallization of the rest of the query is what
matters...

Yeah, I agree.

I have tested with this query, with exact filter condition it was
taking parallel sequence scan, so I have modified the filter a bit and
tested.

Tested with all default configuration in my local machine. I think I
will generate more such test cases and do detail testing in my
performance machine.

Explain Analyze results:
---------------------------------
On Head:
------------
postgres=# explain analyze SELECT SUM(l_extendedprice) FROM lineitem
WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <=
'1996-03-31'::date);

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=848805.90..848805.91 rows=1 width=32) (actual
time=12440.165..12440.166 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=143372.40..834833.25
rows=5589057 width=8) (actual time=1106.217..11183.722 rows=5678841
loops=1)
Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate <= '1996-03-31'::date))
Rows Removed by Index Recheck: 20678739
Heap Blocks: exact=51196 lossy=528664
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..141975.13 rows=5589057 width=0) (actual
time=1093.376..1093.376 rows=5678841 loops=1)
Index Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate <= '1996-03-31'::date))
Planning time: 0.185 ms
Execution time: 12440.819 ms
(9 rows)

After Patch:
---------------
postgres=# explain analyze SELECT SUM(l_extendedprice) FROM lineitem
WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <=
'1996-03-31'::date);

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
---------
Finalize Aggregate (cost=792751.16..792751.17 rows=1 width=32)
(actual time=6660.157..6660.157 rows=1 loops=1)
-> Gather (cost=792750.94..792751.15 rows=2 width=32) (actual
time=6659.378..6660.117 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=791750.94..791750.95 rows=1
width=32) (actual time=6655.941..6655.941 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on lineitem
(cost=143372.40..785929.00 rows=2328774 width=8) (actual
time=1980.797..6204.232 rows=1892947 loops=
3)
Recheck Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate <= '1996-03-31'::date))
Rows Removed by Index Recheck: 6930269
Heap Blocks: exact=17090 lossy=176443
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..141975.13 rows=5589057 width=0) (actual
time=1933.454..1933.454 rows=5678841
loops=1)
Index Cond: ((l_shipdate >=
'1995-01-01'::date) AND (l_shipdate <= '1996-03-31'::date))
Planning time: 0.207 ms
Execution time: 6669.195 ms
(13 rows)

Summary:
-> With patch overall execution is 2 time faster compared to head.
-> Bitmap creation with patch is bit slower compared to head and thats
because of DHT vs efficient hash table.

I think here the impact of slowness due to Bitmap Index Scan is not
much visible, as the time it takes as compare to overall time is less.
However, I think there is an advantage of using DHT as that will allow
us to build the hash table by multiple workers using parallel index
scan in future.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Kapila (#9)
Re: Parallel bitmap heap scan

On 19 October 2016 at 09:47, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Oct 18, 2016 at 1:45 AM, Andres Freund <andres@anarazel.de> wrote:

I don't quite understand why the bitmap has to be parallel at all. As
far as I understand your approach as described here, the only thing that
needs to be shared are the iteration arrays. Since they never need to
be resized and such, it seems to make a lot more sense to just add an
API to share those, instead of the whole underlying hash.

You are right that we only share iteration arrays. But only point is
that each entry of iteration array is just a pointer to hash entry.
So either we need to build hash in shared memory (my current approach)
or we need to copy each hash element at shared location (I think this
is going to be expensive).

While the discussion is going on regarding implementation for creating
shared tidbitmap, meanwhile I am starting with review of the bitmap
heap scan part, i.e. nodeBitmapHeapscan.c, since this looks mostly
independent of tidbitmap implementation.

Brief design idea:
-----------------------
#1. Shared TIDBitmap creation and initialization
First worker to see the state as parallel bitmap info as PBM_INITIAL
become leader and set the state to PBM_INPROGRESS All other workers
see the state as PBM_INPROGRESS will wait for leader to complete the
TIDBitmap.

#2 At this level TIDBitmap is ready and all workers are awake.

As far as correctness is concerned, the logic where the first worker
becomes leader while others synchronously wait, looks good. Workers
get allocated right from the beginning even though they would stay
idle for some percentage of time (5-20% ?) , but I guess there is
nothing we can do about it with the current parallel query
infrastructure.

In pbms_is_leader() , I didn't clearly understand the significance of
the for-loop. If it is a worker, it can call
ConditionVariablePrepareToSleep() followed by
ConditionVariableSleep(). Once it comes out of
ConditionVariableSleep(), isn't it guaranteed that the leader has
finished the bitmap ? If yes, then it looks like it is not necessary
to again iterate and go back through the pbminfo->state checking.
Also, with this, variable queuedSelf also might not be needed. But I
might be missing something here. Not sure what happens if worker calls
ConditionVariable[Prepare]Sleep() but leader has already called
ConditionVariableBroadcast(). Does the for loop have something to do
with this ? But this can happen even with the current for-loop, it
seems.

#3. Bitmap processing (Iterate and process the pages).
In this phase each worker will iterate over page and chunk array and
select heap pages one by one. If prefetch is enable then there will be
two iterator. Since multiple worker are iterating over same page and
chunk array we need to have a shared iterator, so we grab a spin lock
and iterate within a lock, so that each worker get and different page
to process.

tbm_iterate() call under SpinLock :
For parallel tbm iteration, tbm_iterate() is called while SpinLock is
held. Generally we try to keep code inside Spinlock call limited to a
few lines, and that too without occurrence of a function call.
Although tbm_iterate() code itself looks safe under a spinlock, I was
checking if we can squeeze SpinlockAcquire() and SpinLockRelease()
closer to each other. One thought is : in tbm_iterate(), acquire the
SpinLock before the while loop that iterates over lossy chunks. Then,
if both chunk and per-page data remain, release spinlock just before
returning (the first return stmt). And then just before scanning
bitmap of an exact page, i.e. just after "if (iterator->spageptr <
tbm->npages)", save the page handle, increment iterator->spageptr,
release Spinlock, and then use the saved page handle to iterate over
the page bitmap.

prefetch_pages() call under Spinlock :
Here again, prefetch_pages() is called while pbminfo->prefetch_mutex
Spinlock is held. Effectively, heavy functions like PrefetchBuffer()
would get called while under the Spinlock. These can even ereport().
One option is to use mutex lock for this purpose. But I think that
would slow things down. Moreover, the complete set of prefetch pages
would be scanned by a single worker, and others might wait for this
one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex
Spinlock only while incrementing pbminfo->prefetch_pages. The rest
part viz : iterating over the prefetch pages, and doing the
PrefetchBuffer() need not be synchronised using this
pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has
its own iterator spinlock. Only thing is, workers may not do the
actual PrefetchBuffer() sequentially. One of them might shoot ahead
and prefetch 3-4 pages while the other is lagging with the
sequentially lesser page number; but I believe this is fine, as long
as they all prefetch all the required blocks.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Khandekar (#10)
Re: Parallel bitmap heap scan

On Fri, Nov 18, 2016 at 9:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks for the review..

In pbms_is_leader() , I didn't clearly understand the significance of
the for-loop. If it is a worker, it can call
ConditionVariablePrepareToSleep() followed by
ConditionVariableSleep(). Once it comes out of
ConditionVariableSleep(), isn't it guaranteed that the leader has
finished the bitmap ? If yes, then it looks like it is not necessary
to again iterate and go back through the pbminfo->state checking.
Also, with this, variable queuedSelf also might not be needed. But I
might be missing something here.

I have taken this logic from example posted on conditional variable thread[1]/messages/by-id/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com

for (;;)
{
ConditionVariablePrepareToSleep(cv);
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep();
}
ConditionVariableCancelSleep();

[1]: /messages/by-id/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com

So it appears to me even if we come out of ConditionVariableSleep();
we need to verify our condition and then only break.

Not sure what happens if worker calls

ConditionVariable[Prepare]Sleep() but leader has already called
ConditionVariableBroadcast(). Does the for loop have something to do
with this ? But this can happen even with the current for-loop, it
seems.

If leader has already called ConditionVariableBroadcast, but after
ConditionVariablePrepareToSleep we will check the condition again
before calling ConditionVariableSleep. And condition check is under
SpinLockAcquire(&pbminfo->state_mutex);

However I think there is one problem in my code (I think you might be
pointing same), that after ConditionVariablePrepareToSleep, if
pbminfo->state is already PBM_FINISHED, I am not resetting needWait to
false, and which can lead to the problem.

#3. Bitmap processing (Iterate and process the pages).
In this phase each worker will iterate over page and chunk array and
select heap pages one by one. If prefetch is enable then there will be
two iterator. Since multiple worker are iterating over same page and
chunk array we need to have a shared iterator, so we grab a spin lock
and iterate within a lock, so that each worker get and different page
to process.

tbm_iterate() call under SpinLock :
For parallel tbm iteration, tbm_iterate() is called while SpinLock is
held. Generally we try to keep code inside Spinlock call limited to a
few lines, and that too without occurrence of a function call.
Although tbm_iterate() code itself looks safe under a spinlock, I was
checking if we can squeeze SpinlockAcquire() and SpinLockRelease()
closer to each other. One thought is : in tbm_iterate(), acquire the
SpinLock before the while loop that iterates over lossy chunks. Then,
if both chunk and per-page data remain, release spinlock just before
returning (the first return stmt). And then just before scanning
bitmap of an exact page, i.e. just after "if (iterator->spageptr <
tbm->npages)", save the page handle, increment iterator->spageptr,
release Spinlock, and then use the saved page handle to iterate over
the page bitmap.

Main reason to keep Spin lock out of this function to avoid changes
inside this function, and also this function takes local iterator as
input which don't have spin lock reference to it. But that can be
changed, we can pass shared iterator to it.

I will think about this logic and try to update in next version.

prefetch_pages() call under Spinlock :
Here again, prefetch_pages() is called while pbminfo->prefetch_mutex
Spinlock is held. Effectively, heavy functions like PrefetchBuffer()
would get called while under the Spinlock. These can even ereport().
One option is to use mutex lock for this purpose. But I think that
would slow things down. Moreover, the complete set of prefetch pages
would be scanned by a single worker, and others might wait for this
one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex
Spinlock only while incrementing pbminfo->prefetch_pages. The rest
part viz : iterating over the prefetch pages, and doing the
PrefetchBuffer() need not be synchronised using this
pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has
its own iterator spinlock. Only thing is, workers may not do the
actual PrefetchBuffer() sequentially. One of them might shoot ahead
and prefetch 3-4 pages while the other is lagging with the
sequentially lesser page number; but I believe this is fine, as long
as they all prefetch all the required blocks.

I agree with your point, will try to fix this as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#8)
Re: Parallel bitmap heap scan

On Wed, Oct 19, 2016 at 11:53 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I found one defect in v2 patch, that I induced during last rebasing.
That is fixed in v3.

So, I had a brief look at this tonight. This is not a full review,
but just some things I noticed:

+ * Update snpashot info in heap scan descriptor.

Typo. Also, why should we have a function for this at all? And if we
do have a function for this, why should it have "bm" in the name when
it's stored in heapam.c?

+ *    [PARALLEL BITMAP HEAP SCAN ALGORITHM]
+ *
+ *    #1. Shared TIDBitmap creation and initialization
+ *        a) First worker to see the state as parallel bitmap info as
+ *        PBM_INITIAL become leader and set the state to PBM_INPROGRESS
+ *        All other workers see the state as PBM_INPROGRESS will wait for
+ *        leader to complete the TIDBitmap.
+ *
+ *        Leader Worker Processing:
+ *        (Leader is responsible for creating shared TIDBitmap and create
+ *        shared page and chunk array from TIDBitmap.)
+ *            1) Create TIDBitmap using DHT.
+ *            2) Begin Iterate: convert hash table into shared page and chunk
+ *            array.
+ *            3) Restore local TIDBitmap variable information into
+ *            ParallelBitmapInfo so that other worker can see those.
+ *            4) set state to PBM_FINISHED.
+ *            5) Wake up other workers.
+ *
+ *        Other Worker Processing:
+ *            1) Wait until leader create shared TIDBitmap and shared page
+ *            and chunk array.
+ *            2) Attach to shared page table, copy TIDBitmap from
+ *            ParallelBitmapInfo to local TIDBitmap, we copy this to local
+ *            TIDBitmap so that next level processing can read information
+ *            same as in non parallel case and we can avoid extra changes
+ *            in code.
+ *
+ *    # At this level TIDBitmap is ready and all workers are awake #
+ *
+ *    #2. Bitmap processing (Iterate and process the pages).
+ *        . In this phase each worker will iterate over page and
chunk array and
+ *        select heap pages one by one. If prefetch is enable then there will
+ *        be two iterator.
+ *        . Since multiple worker are iterating over same page and chunk array
+ *        we need to have a shared iterator, so we grab a spin lock and iterate
+ *        within a lock.

The formatting of this comment is completely haphazard. "Leader
worker" is not a term that has any meaning. A given backend involved
in a parallel query is either a leader or a worker, not both.

+    /* reset parallel bitmap scan info, if present */
+    if (node->parallel_bitmap)
+    {
+        ParallelBitmapInfo *pbminfo = node->parallel_bitmap;
+
+        pbminfo->state = PBM_INITIAL;
+        pbminfo->tbmiterator.schunkbit = 0;
+        pbminfo->tbmiterator.spageptr = 0;
+        pbminfo->tbmiterator.schunkptr = 0;
+        pbminfo->prefetch_iterator.schunkbit = 0;
+        pbminfo->prefetch_iterator.spageptr = 0;
+        pbminfo->prefetch_iterator.schunkptr = 0;
+        pbminfo->prefetch_pages = 0;
+        pbminfo->prefetch_target = -1;
+    }

This is obviously not going to work in the face of concurrent
activity. When we did Parallel Seq Scan, we didn't make any changes
to the rescan code at all. I think we are assuming that there's no
way to cause a rescan of the driving table of a parallel query; if
that's wrong, we'll need some fix, but this isn't it. I would just
leave this out.

+static bool
+pbms_is_leader(ParallelBitmapInfo *pbminfo)

I think you should see if you can use Thomas Munro's barrier stuff for
this instead.

+    SerializeSnapshot(estate->es_snapshot, pbminfo->phs_snapshot_data);
+
+    shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pbminfo);
+
+    node->parallel_bitmap = pbminfo;
+    snapshot = RestoreSnapshot(pbminfo->phs_snapshot_data);
+
+    heap_bm_update_snapshot(node->ss.ss_currentScanDesc, snapshot);

This doesn't make any sense. You serialize the snapshot from the
estate, then restore it, then shove it into the scan descriptor. But
presumably that's already the snapshot the scan descriptor is using.
The workers need to do this, perhaps, but not the leader!

+ dht_parameters params = {0};

Not PostgreSQL style.

+#define TBM_IS_SHARED(tbm) (tbm)->shared

Seems pointless.

+ bool shared; /* need to build shared tbm if set*/

Style.

+ params.tranche_id = LWLockNewTrancheId();

You absolutely, positively cannot burn through tranche IDs like this.

+    if (tbm->shared_pagetable)
+        dht_detach(tbm->shared_pagetable);

Hmm, would we leak references if we errored out?

@@ -47,7 +47,6 @@ typedef enum

static List *translate_sub_tlist(List *tlist, int relid);

-
/*****************************************************************************
* MISC. PATH UTILITIES
*****************************************************************************/

Useless whitespace change.

@@ -23,7 +23,6 @@
#include "utils/relcache.h"
#include "utils/snapshot.h"

-
/* "options" flag bits for heap_insert */
#define HEAP_INSERT_SKIP_WAL 0x0001
#define HEAP_INSERT_SKIP_FSM 0x0002

Useless whitespace change.

WAIT_EVENT_MQ_RECEIVE,
WAIT_EVENT_MQ_SEND,
WAIT_EVENT_PARALLEL_FINISH,
+ WAIT_EVENT_PARALLEL_BITMAP_SCAN,
WAIT_EVENT_SAFE_SNAPSHOT,
WAIT_EVENT_SYNC_REP

Missing a documentation update.

In general, the amount of change in nodeBitmapHeapScan.c seems larger
than I would have expected. My copy of that file has 655 lines; this
patch adds 544 additional lines. I think/hope that some of that can
be simplified away.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#12)
Re: Parallel bitmap heap scan

On Wed, Nov 23, 2016 at 7:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:

So, I had a brief look at this tonight. This is not a full review,
but just some things I noticed:

Thanks for the review..

+ * Update snpashot info in heap scan descriptor.

Typo. Also, why should we have a function for this at all? And if we
do have a function for this, why should it have "bm" in the name when
it's stored in heapam.c?

We are updating snapshot in HeapScanDesc, that's the reason I am using
function and kept in heapam.c file like other function
heap_beginscan_bm.
But I think we can change it's name bm is not required in this, this
function don't do anything specific for bm. I will change this.

+ *    [PARALLEL BITMAP HEAP SCAN ALGORITHM]
+ *
+ *    #1. Shared TIDBitmap creation and initialization
+ *        a) First worker to see the state as parallel bitmap info as
+ *        PBM_INITIAL become leader and set the state to PBM_INPROGRESS
+ *        All other workers see the state as PBM_INPROGRESS will wait for
+ *        leader to complete the TIDBitmap.
+ *
+ *        Leader Worker Processing:
+ *        (Leader is responsible for creating shared TIDBitmap and create
+ *        shared page and chunk array from TIDBitmap.)
+ *            1) Create TIDBitmap using DHT.
+ *            2) Begin Iterate: convert hash table into shared page and chunk
+ *            array.
+ *            3) Restore local TIDBitmap variable information into
+ *            ParallelBitmapInfo so that other worker can see those.
+ *            4) set state to PBM_FINISHED.
+ *            5) Wake up other workers.
+ *
+ *        Other Worker Processing:
+ *            1) Wait until leader create shared TIDBitmap and shared page
+ *            and chunk array.
+ *            2) Attach to shared page table, copy TIDBitmap from
+ *            ParallelBitmapInfo to local TIDBitmap, we copy this to local
+ *            TIDBitmap so that next level processing can read information
+ *            same as in non parallel case and we can avoid extra changes
+ *            in code.
+ *
+ *    # At this level TIDBitmap is ready and all workers are awake #
+ *
+ *    #2. Bitmap processing (Iterate and process the pages).
+ *        . In this phase each worker will iterate over page and
chunk array and
+ *        select heap pages one by one. If prefetch is enable then there will
+ *        be two iterator.
+ *        . Since multiple worker are iterating over same page and chunk array
+ *        we need to have a shared iterator, so we grab a spin lock and iterate
+ *        within a lock.

The formatting of this comment is completely haphazard.

I will fix this..

"Leader

worker" is not a term that has any meaning. A given backend involved
in a parallel query is either a leader or a worker, not both.

I agree this is confusing, but we can't call it directly a leader
because IMHO we meant by a leader who, actually spawns all worker and
gather the results. But here I meant by "leader worker" is the worker
which takes responsibility of building tidbitmap, which can be any
worker. So I named it "leader worker". Let me think what we can call
it.

+    /* reset parallel bitmap scan info, if present */
+    if (node->parallel_bitmap)
+    {
+        ParallelBitmapInfo *pbminfo = node->parallel_bitmap;
+
+        pbminfo->state = PBM_INITIAL;
+        pbminfo->tbmiterator.schunkbit = 0;
+        pbminfo->tbmiterator.spageptr = 0;
+        pbminfo->tbmiterator.schunkptr = 0;
+        pbminfo->prefetch_iterator.schunkbit = 0;
+        pbminfo->prefetch_iterator.spageptr = 0;
+        pbminfo->prefetch_iterator.schunkptr = 0;
+        pbminfo->prefetch_pages = 0;
+        pbminfo->prefetch_target = -1;
+    }

This is obviously not going to work in the face of concurrent
activity. When we did Parallel Seq Scan, we didn't make any changes
to the rescan code at all. I think we are assuming that there's no
way to cause a rescan of the driving table of a parallel query; if
that's wrong, we'll need some fix, but this isn't it. I would just
leave this out.

In heap_rescan function we have similar change..

if (scan->rs_parallel != NULL)
{
parallel_scan = scan->rs_parallel;
SpinLockAcquire(&parallel_scan->phs_mutex);
parallel_scan->phs_cblock = parallel_scan->phs_startblock;
SpinLockRelease(&parallel_scan->phs_mutex);
}

And this is not for driving table, it's required for the case when
gather is as inner node for JOIN.
consider below example. I know it's a bad plan but we can produce this plan :)

Outer Node Inner Node
SeqScan(t1) NLJ (Gather -> Parallel SeqScan (t2)).

Reason for doing same is that, during ExecReScanGather we don't
recreate new DSM instead of that we just reinitialise the DSM
(ExecParallelReinitialize).

+static bool
+pbms_is_leader(ParallelBitmapInfo *pbminfo)

I think you should see if you can use Thomas Munro's barrier stuff for
this instead.

Okay, I will think over it.

+    SerializeSnapshot(estate->es_snapshot, pbminfo->phs_snapshot_data);
+
+    shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pbminfo);
+
+    node->parallel_bitmap = pbminfo;
+    snapshot = RestoreSnapshot(pbminfo->phs_snapshot_data);
+
+    heap_bm_update_snapshot(node->ss.ss_currentScanDesc, snapshot);

This doesn't make any sense. You serialize the snapshot from the
estate, then restore it, then shove it into the scan descriptor. But
presumably that's already the snapshot the scan descriptor is using.
The workers need to do this, perhaps, but not the leader!

This is wrong, will fix this.

+ dht_parameters params = {0};

Not PostgreSQL style.

I will fix..

+#define TBM_IS_SHARED(tbm) (tbm)->shared

Seems pointless.

Ok..

+ bool shared; /* need to build shared tbm if set*/

Style.

Ok.

+ params.tranche_id = LWLockNewTrancheId();

You absolutely, positively cannot burn through tranche IDs like this.

+    if (tbm->shared_pagetable)
+        dht_detach(tbm->shared_pagetable);

Hmm, would we leak references if we errored out?

I will check on this part. Anyway, In my next version I am working on
making my patch independent of DHT, so this part will be removed.

@@ -47,7 +47,6 @@ typedef enum

static List *translate_sub_tlist(List *tlist, int relid);

-
/*****************************************************************************
* MISC. PATH UTILITIES
*****************************************************************************/

Useless whitespace change.

@@ -23,7 +23,6 @@
#include "utils/relcache.h"
#include "utils/snapshot.h"

-
/* "options" flag bits for heap_insert */
#define HEAP_INSERT_SKIP_WAL 0x0001
#define HEAP_INSERT_SKIP_FSM 0x0002

Useless whitespace change.

WAIT_EVENT_MQ_RECEIVE,
WAIT_EVENT_MQ_SEND,
WAIT_EVENT_PARALLEL_FINISH,
+ WAIT_EVENT_PARALLEL_BITMAP_SCAN,
WAIT_EVENT_SAFE_SNAPSHOT,
WAIT_EVENT_SYNC_REP

Missing a documentation update.

I will fix these, in next version.

In general, the amount of change in nodeBitmapHeapScan.c seems larger
than I would have expected. My copy of that file has 655 lines; this
patch adds 544 additional lines. I think/hope that some of that can
be simplified away.

I will work on this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#13)
Re: Parallel bitmap heap scan

On Wed, Nov 23, 2016 at 12:31 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I tried to address these comments in my new version, All comments are
fixed except below

+ *
+ *    #2. Bitmap processing (Iterate and process the pages).
+ *        . In this phase each worker will iterate over page and
chunk array and
+ *        select heap pages one by one. If prefetch is enable then there will
+ *        be two iterator.
+ *        . Since multiple worker are iterating over same page and chunk array
+ *        we need to have a shared iterator, so we grab a spin lock and iterate
+ *        within a lock.

The formatting of this comment is completely haphazard.

I will fix this..

I have changed the formatting, and also moved the algorithm
description inside function body.
I am not sure does this meet your expectation or we should change it further ?

+    /* reset parallel bitmap scan info, if present */
+    if (node->parallel_bitmap)
+    {
+        ParallelBitmapInfo *pbminfo = node->parallel_bitmap;
+
+        pbminfo->state = PBM_INITIAL;
+        pbminfo->tbmiterator.schunkbit = 0;
+        pbminfo->tbmiterator.spageptr = 0;
+        pbminfo->tbmiterator.schunkptr = 0;
+        pbminfo->prefetch_iterator.schunkbit = 0;
+        pbminfo->prefetch_iterator.spageptr = 0;
+        pbminfo->prefetch_iterator.schunkptr = 0;
+        pbminfo->prefetch_pages = 0;
+        pbminfo->prefetch_target = -1;
+    }

This is obviously not going to work in the face of concurrent
activity. When we did Parallel Seq Scan, we didn't make any changes
to the rescan code at all. I think we are assuming that there's no
way to cause a rescan of the driving table of a parallel query; if
that's wrong, we'll need some fix, but this isn't it. I would just
leave this out.

In heap_rescan function we have similar change..

if (scan->rs_parallel != NULL)
{
parallel_scan = scan->rs_parallel;
SpinLockAcquire(&parallel_scan->phs_mutex);
parallel_scan->phs_cblock = parallel_scan->phs_startblock;
SpinLockRelease(&parallel_scan->phs_mutex);
}

And this is not for driving table, it's required for the case when
gather is as inner node for JOIN.
consider below example. I know it's a bad plan but we can produce this plan :)

Outer Node Inner Node
SeqScan(t1) NLJ (Gather -> Parallel SeqScan (t2)).

Reason for doing same is that, during ExecReScanGather we don't
recreate new DSM instead of that we just reinitialise the DSM
(ExecParallelReinitialize).

This is not fixed, reason is already explained.

+static bool
+pbms_is_leader(ParallelBitmapInfo *pbminfo)

I think you should see if you can use Thomas Munro's barrier stuff for
this instead.

Okay, I will think over it.

IMHO, barrier is used when multiple worker are doing some work
together in phase1, and before moving to next phase all have to
complete phase1, so we put barrier, so that before starting next phase
all cross the barrier.

But here case is different, only one worker need to finish the phase1,
and as soon as it's over all can start phase2. But we don't have
requirement that all worker should cross certain barrier. In this case
even though some worker did not start, other worker can do their work.

In general, the amount of change in nodeBitmapHeapScan.c seems larger
than I would have expected. My copy of that file has 655 lines; this
patch adds 544 additional lines. I think/hope that some of that can
be simplified away.

I will work on this.

I have removed some function which was actually not required, and code
can be merged in main function. Almost reduced by 100 lines.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#11)
Re: Parallel bitmap heap scan

On Tue, Nov 22, 2016 at 9:05 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Nov 18, 2016 at 9:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks for the review..

I have worked on these comments..

In pbms_is_leader() , I didn't clearly understand the significance of
the for-loop. If it is a worker, it can call
ConditionVariablePrepareToSleep() followed by
ConditionVariableSleep(). Once it comes out of
ConditionVariableSleep(), isn't it guaranteed that the leader has
finished the bitmap ? If yes, then it looks like it is not necessary
to again iterate and go back through the pbminfo->state checking.
Also, with this, variable queuedSelf also might not be needed. But I
might be missing something here.

I have taken this logic from example posted on conditional variable thread[1]

for (;;)
{
ConditionVariablePrepareToSleep(cv);
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep();
}
ConditionVariableCancelSleep();

[1] /messages/by-id/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com

So it appears to me even if we come out of ConditionVariableSleep();
we need to verify our condition and then only break.

Not sure what happens if worker calls

ConditionVariable[Prepare]Sleep() but leader has already called
ConditionVariableBroadcast(). Does the for loop have something to do
with this ? But this can happen even with the current for-loop, it
seems.

If leader has already called ConditionVariableBroadcast, but after
ConditionVariablePrepareToSleep we will check the condition again
before calling ConditionVariableSleep. And condition check is under
SpinLockAcquire(&pbminfo->state_mutex);

However I think there is one problem in my code (I think you might be
pointing same), that after ConditionVariablePrepareToSleep, if
pbminfo->state is already PBM_FINISHED, I am not resetting needWait to
false, and which can lead to the problem.

I have fixed the defect what I have mentioned above,

#3. Bitmap processing (Iterate and process the pages).
In this phase each worker will iterate over page and chunk array and
select heap pages one by one. If prefetch is enable then there will be
two iterator. Since multiple worker are iterating over same page and
chunk array we need to have a shared iterator, so we grab a spin lock
and iterate within a lock, so that each worker get and different page
to process.

tbm_iterate() call under SpinLock :
For parallel tbm iteration, tbm_iterate() is called while SpinLock is
held. Generally we try to keep code inside Spinlock call limited to a
few lines, and that too without occurrence of a function call.
Although tbm_iterate() code itself looks safe under a spinlock, I was
checking if we can squeeze SpinlockAcquire() and SpinLockRelease()
closer to each other. One thought is : in tbm_iterate(), acquire the
SpinLock before the while loop that iterates over lossy chunks. Then,
if both chunk and per-page data remain, release spinlock just before
returning (the first return stmt). And then just before scanning
bitmap of an exact page, i.e. just after "if (iterator->spageptr <
tbm->npages)", save the page handle, increment iterator->spageptr,
release Spinlock, and then use the saved page handle to iterate over
the page bitmap.

Main reason to keep Spin lock out of this function to avoid changes
inside this function, and also this function takes local iterator as
input which don't have spin lock reference to it. But that can be
changed, we can pass shared iterator to it.

I will think about this logic and try to update in next version.

Still this issue is not addressed.
Logic inside tbm_iterate is using same variable, like spageptr,
multiple places. IMHO this complete logic needs to be done under one
spin lock.

prefetch_pages() call under Spinlock :
Here again, prefetch_pages() is called while pbminfo->prefetch_mutex
Spinlock is held. Effectively, heavy functions like PrefetchBuffer()
would get called while under the Spinlock. These can even ereport().
One option is to use mutex lock for this purpose. But I think that
would slow things down. Moreover, the complete set of prefetch pages
would be scanned by a single worker, and others might wait for this
one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex
Spinlock only while incrementing pbminfo->prefetch_pages. The rest
part viz : iterating over the prefetch pages, and doing the
PrefetchBuffer() need not be synchronised using this
pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has
its own iterator spinlock. Only thing is, workers may not do the
actual PrefetchBuffer() sequentially. One of them might shoot ahead
and prefetch 3-4 pages while the other is lagging with the
sequentially lesser page number; but I believe this is fine, as long
as they all prefetch all the required blocks.

I agree with your point, will try to fix this as well.

I have fixed this part.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#14)
Re: Parallel bitmap heap scan

On Sat, Nov 26, 2016 at 7:40 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

IMHO, barrier is used when multiple worker are doing some work
together in phase1, and before moving to next phase all have to
complete phase1, so we put barrier, so that before starting next phase
all cross the barrier.

But here case is different, only one worker need to finish the phase1,
and as soon as it's over all can start phase2. But we don't have
requirement that all worker should cross certain barrier. In this case
even though some worker did not start, other worker can do their work.

I think the Barrier stuff has a process for choosing one worker to
conduct a particular phase. So it seems like if the Barrier API is
well-designed, you should be able to use it to decide who will conduct
the index scan, and then when that's done everyone can proceed to
scanning the heap. If that can't work for some reason, Thomas should
probably adjust his API so it does. He's presenting that as a
generally-useful primitive...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#16)
Re: Parallel bitmap heap scan

On Sun, Nov 27, 2016 at 3:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the Barrier stuff has a process for choosing one worker to
conduct a particular phase. So it seems like if the Barrier API is
well-designed, you should be able to use it to decide who will conduct
the index scan, and then when that's done everyone can proceed to
scanning the heap. If that can't work for some reason, Thomas should
probably adjust his API so it does. He's presenting that as a
generally-useful primitive...

If I understand the barrier API correctly, It has two Part.
1. BarrierInit 2. BarrierWait.

1. In BarrierInit we defined that, how many worker(lets say nworkers)
should cross the barrier, before we are allowed to cross the
BarriedWait.

2. BarrierWait, will actually make calling process wait until
BarrierWait is not called for nworkers times.

So I am not very clear, If we call BarrierInit with nworkers=1, then
first question is when should we call BarrierWait, because as soon as
we call BarrierWait count will reach 1, and now everyone is allowed to
proceed. so obviously It should be called once the Bitmap is Ready.

Second question is, if it's called only after Bitmap is ready, then
what about other process, how they are supposed to wait until bitmap
is not ready. If they wait using BarrierWait, it again make the count
1 and everyone is allowed to proceed. Which doesn't seems correct.

Correct me if I am missing something ?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Thomas Munro
thomas.munro@gmail.com
In reply to: Dilip Kumar (#17)
Re: Parallel bitmap heap scan

On Sun, Nov 27, 2016 at 3:34 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Nov 27, 2016 at 3:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the Barrier stuff has a process for choosing one worker to
conduct a particular phase. So it seems like if the Barrier API is
well-designed, you should be able to use it to decide who will conduct
the index scan, and then when that's done everyone can proceed to
scanning the heap. If that can't work for some reason, Thomas should
probably adjust his API so it does. He's presenting that as a
generally-useful primitive...

If I understand the barrier API correctly, It has two Part.
1. BarrierInit 2. BarrierWait.

1. In BarrierInit we defined that, how many worker(lets say nworkers)
should cross the barrier, before we are allowed to cross the
BarriedWait.

2. BarrierWait, will actually make calling process wait until
BarrierWait is not called for nworkers times.

So I am not very clear, If we call BarrierInit with nworkers=1, then
first question is when should we call BarrierWait, because as soon as
we call BarrierWait count will reach 1, and now everyone is allowed to
proceed. so obviously It should be called once the Bitmap is Ready.

Second question is, if it's called only after Bitmap is ready, then
what about other process, how they are supposed to wait until bitmap
is not ready. If they wait using BarrierWait, it again make the count
1 and everyone is allowed to proceed. Which doesn't seems correct.

Correct me if I am missing something ?

I'm not sure if it's the right tool for this job or not and haven't
studied this patch yet. I will. But here is one way to use barrier.c
for something like this, based on the description above. It's
slightly more complicated than you said because you don't know whether
the leader is going to participate or how many of the planned workers
will actually be able to start up, so there would be no way to provide
that 'participants' argument to BarrierInit and any given participant
might already have missed some of the 'BarrierWait' calls by the time
it starts running, so merely calling BarrierWait the right number of
times isn't enough to stay in sync. So instead you do this:

#define PBS_PHASE INIT 0
#define PBS_PHASE_BUILDING 1
#define PBS_PHASE_SCANNING 2

Initialise the barrier with BarrierInit(&something->barrier, 0), which
says that you don't know how many participants there will be.

Somewhere in each participant you need to do this exactly once:

BarrierAttach(&something->barrier);

I think you need to track whether you've called that yet and do so on
demand in your ExecBitmapHeap function. You can't just do it in
ExecBitmapHeapInitializeWorker because the leader needs to do it too,
but *only* if it runs the plan. Then you need something like this:

switch (BarrierPhase(&something->barrier)
{
case PBS_PHASE_INIT:
if (BarrierWait(&something->barrier, WAIT_EVENT_PBS_PHASE_INIT))
{
/* Serial phase that will run in only one chosen participant. */
build_the_bitmap();
}
/* Fall through. */

case PBS_PHASE_BUILDING:
BarrierWait(&something->barrier, WAIT_EVENT_PBS_PHASE_BUILDING);
/* Fall through. */

case PBS_PHASE_SCANNING:
scan_the_bitmap_and_emit_one_tuple();
}

When a new participant arrives here, if it finds that we're still in
the INIT phase, then it enters an election to see if it can build the
bitmap; one lucky participant wins and does that, while any other
participants twiddle their thumbs at the next BarrierWait call. If a
new participant finds that we're already in the BUILDING phase when it
arrives, then it has missed that election and just has to wait for the
building to be completed. Once they all agree that building has
finished, we move onto scanning. If a new arrival finds that we're in
SCANNING phase, then it happily scans and emits tuples. Does that
make sense?

Not sure exactly how to coordinate rescans yet, but probably with
BarrierWaitSet(&something->barrier, PBS_PHASE_INIT).

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Thomas Munro (#18)
Re: Parallel bitmap heap scan

On Mon, Nov 28, 2016 at 8:11 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

When a new participant arrives here, if it finds that we're still in
the INIT phase, then it enters an election to see if it can build the
bitmap; one lucky participant wins and does that, while any other
participants twiddle their thumbs at the next BarrierWait call. If a
new participant finds that we're already in the BUILDING phase when it
arrives, then it has missed that election and just has to wait for the
building to be completed. Once they all agree that building has
finished, we move onto scanning. If a new arrival finds that we're in
SCANNING phase, then it happily scans and emits tuples. Does that
make sense?

Not sure exactly how to coordinate rescans yet, but probably with
BarrierWaitSet(&something->barrier, PBS_PHASE_INIT).

Do you think that using barrier's will simplify the patch as compared
to using condition variables because in that case, it will make sense
to use barriers?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Thomas Munro
thomas.munro@gmail.com
In reply to: Amit Kapila (#19)
Re: Parallel bitmap heap scan

On Mon, Nov 28, 2016 at 3:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Do you think that using barrier's will simplify the patch as compared
to using condition variables because in that case, it will make sense
to use barriers?

It would work, but I suppose you might call it overkill. If they were
cooperating to build the bitmap in parallel then a barrier might look
more tempting, because then they'd all be waiting for each other to
agree that they've all finished doing that and are ready to scan.
When they're all just waiting for one guy to flip a single bit, then
it's debatable whether a barrier is any simpler than a condition
variable + a spinlock + a bit!

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Dilip Kumar
dilipbalaut@gmail.com
In reply to: Thomas Munro (#20)
#22Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#1)
#23Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Dilip Kumar (#22)
#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#22)
#25Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#24)
#26Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#25)
#27Andres Freund
andres@anarazel.de
In reply to: Dilip Kumar (#26)
#28Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andres Freund (#27)
#29Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#28)
#30Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Dilip Kumar (#15)
#31Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Khandekar (#30)
#32tushar
tushar.ahuja@enterprisedb.com
In reply to: Dilip Kumar (#31)
#33Dilip Kumar
dilipbalaut@gmail.com
In reply to: tushar (#32)
#34tushar
tushar.ahuja@enterprisedb.com
In reply to: Dilip Kumar (#33)
#35Dilip Kumar
dilipbalaut@gmail.com
In reply to: tushar (#34)
#36tushar
tushar.ahuja@enterprisedb.com
In reply to: Dilip Kumar (#35)
#37tushar
tushar.ahuja@enterprisedb.com
In reply to: tushar (#36)
#38Dilip Kumar
dilipbalaut@gmail.com
In reply to: tushar (#37)
#39tushar
tushar.ahuja@enterprisedb.com
In reply to: Dilip Kumar (#38)
#40Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: tushar (#39)
#41Dilip Kumar
dilipbalaut@gmail.com
In reply to: Rafia Sabih (#40)
#42tushar
tushar.ahuja@enterprisedb.com
In reply to: tushar (#39)
#43Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#41)
#44Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#43)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#44)
#46Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#45)
#47Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Dilip Kumar (#46)
#48Dilip Kumar
dilipbalaut@gmail.com
In reply to: Haribabu Kommi (#47)
#49Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#48)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#49)
#51Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Dilip Kumar (#49)
#52Michael Paquier
michael@paquier.xyz
In reply to: Haribabu Kommi (#51)
#53Dilip Kumar
dilipbalaut@gmail.com
In reply to: Haribabu Kommi (#51)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#53)
#55Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#54)
#56Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#54)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#56)
#58Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#57)
#59Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#57)
#60Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#58)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#58)
#62Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#59)
#63Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#62)
#64Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#63)
#65Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#56)
#66Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#65)
#67Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#64)
#68Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#64)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#68)
#70Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#67)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#66)
#72Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#70)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#72)
#74Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#73)
#75Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#65)
#76Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#75)
#77Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#76)
#78Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Dilip Kumar (#77)
#79Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#77)
#80Dilip Kumar
dilipbalaut@gmail.com
In reply to: Haribabu Kommi (#78)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#80)
#82Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#81)
#83Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#82)
#84Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#81)
#85Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#84)
#86Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#85)
#87Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#86)
#88Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#87)
#89Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#87)
#90Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#89)
#91Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#90)
#92Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#91)
#93Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#92)
#94Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#91)
#95Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#92)
#96Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#93)
#97Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#96)
#98Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#95)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#98)
#100Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#99)
#101Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#100)
#102Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#99)
#103Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#102)
#104Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#103)
#105Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#104)
#106Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#105)
#107Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#102)
#108Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#106)
#109Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#108)
#110Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#109)
#111Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#110)
#112Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#111)
#113Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#112)
#114Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#113)
#115Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#113)
#116Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#115)
#117Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#116)
#118Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#117)
#119Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#118)
#120Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#119)
#121Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#120)
#122Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#121)
#123Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#122)
#124Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#123)
#125Rafia Sabih
rafia.sabih@enterprisedb.com
In reply to: Robert Haas (#124)
#126Dilip Kumar
dilipbalaut@gmail.com
In reply to: Rafia Sabih (#125)
#127Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#126)
#128Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#127)