Question Regarding Merge Append and Parallel Execution of Index Scans on Partitioned Table

Started by Ayush Vatsa7 months ago6 messages
#1Ayush Vatsa
ayushvatsa1810@gmail.com

Hello Hackers,
I had a question regarding the execution of the following query plan on a
partitioned table with vector similarity search:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=398.95..420.97 rows=100 width=12) (actual time=1.750..1.912
rows=100 loops=1)
Output: vector_items.id, ((vector_items.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5)))
Buffers: shared hit=2130
-> Merge Append (cost=398.95..22420.04 rows=100000 width=12) (actual
time=1.749..1.899 rows=100 loops=1)
Sort Key: ((vector_items.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5)))
Buffers: shared hit=2130
-> Index Scan using vector_items_p0_embedding_idx on
public.vector_items_p0 vector_items_1 (cost=99.65..5250.52 rows=25126
width=12) (actual time=0.461..0.495 rows=26 loops=1)
Output: vector_items_1.id, (vector_items_1.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Order By: (vector_items_1.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Buffers: shared hit=525
-> Index Scan using vector_items_p1_embedding_idx on
public.vector_items_p1 vector_items_2 (cost=99.68..5223.56 rows=24978
width=12) (actual time=0.420..0.460 rows=29 loops=1)
Output: vector_items_2.id, (vector_items_2.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Order By: (vector_items_2.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Buffers: shared hit=494
-> Index Scan using vector_items_p2_embedding_idx on
public.vector_items_p2 vector_items_3 (cost=99.80..5227.42 rows=24971
width=12) (actual time=0.422..0.454 rows=27 loops=1)
Output: vector_items_3.id, (vector_items_3.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Order By: (vector_items_3.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Buffers: shared hit=550
-> Index Scan using vector_items_p3_embedding_idx on
public.vector_items_p3 vector_items_4 (cost=99.77..5218.50 rows=24925
width=12) (actual time=0.442..0.470 rows=21 loops=1)
Output: vector_items_4.id, (vector_items_4.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Order By: (vector_items_4.embedding <->
'[0.5,0.5,0.5,0.5,0.5]'::vector(5))
Buffers: shared hit=561
Planning:
Buffers: shared hit=38
Planning Time: 0.384 ms
Execution Time: 1.975 ms

My main question is:

-

Are these Index Scans executed *sequentially* (one after the other as
the Merge Append requests tuples)?
-

Or are they possibly executed in *parallel*, in advance, or concurrently
in some way?

I have configured PostgreSQL to allow more parallel workers than the number
of partitions, but I'm still seeing the same plan without any parallel
execution across the partitioned index scans.

I understand that Merge Append requires the first tuple from all child
nodes before producing its own first tuple, which can make parallelism
difficult. But given that:

-

Each child index scan already returns sorted output
-

There are *more workers available than the number of partitions*

... wouldn't it theoretically be possible to execute these index scans in
parallel (one per worker), with the Merge Append node just merging
pre-sorted streams?

Would appreciate any insight into how execution and planning behave in such
a case, and whether there are ways to influence or improve this behavior.
Thanks
Ayush Vatsa

#2David Rowley
dgrowleyml@gmail.com
In reply to: Ayush Vatsa (#1)
Re: Question Regarding Merge Append and Parallel Execution of Index Scans on Partitioned Table

On Thu, 5 Jun 2025 at 07:31, Ayush Vatsa <ayushvatsa1810@gmail.com> wrote:

Are these Index Scans executed sequentially (one after the other as the Merge Append requests tuples)?

It's a fairly simple answer: Merge Append does not support parallelism.

Or are they possibly executed in parallel, in advance, or concurrently in some way?

It's probably not impossible to do something in a future version of
PostgreSQL. We have Gather Merge, which handles executing some
sub-nodes and making sure the results get output in the correct order,
so it doesn't seem impossible that something could be done for Merge
Append. It just does not exist yet.

David

#3Ayush Vatsa
ayushvatsa1810@gmail.com
In reply to: David Rowley (#2)
Re: Question Regarding Merge Append and Parallel Execution of Index Scans on Partitioned Table

Merge Append does not support parallelism.

Thanks for the confirmation.

We have Gather Merge, which handles executing some

sub-nodes and making sure the results get output in the correct order
A small follow-up question - Gather merge won't gather and merge the
output from child in sorted order, but will always need an explicit Sort
node beneath it to do so. Correct?

-------------------
Thanks,
Ayush Vatsa
SDE AWS

#4David Rowley
dgrowleyml@gmail.com
In reply to: Ayush Vatsa (#3)
Re: Question Regarding Merge Append and Parallel Execution of Index Scans on Partitioned Table

On Fri, 6 Jun 2025 at 01:47, Ayush Vatsa <ayushvatsa1810@gmail.com> wrote:

A small follow-up question - Gather merge won't gather and merge the
output from child in sorted order, but will always need an explicit Sort
node beneath it to do so. Correct?

Incorrect. The input node to the Gather Merge needs to be sorted by
something, and the output of the Gather Merge will be sorted by the
same thing. Gather Merge handles putting the tuples that were gathered
by parallel workers back into the order they're meant to be in.

The Gather Merge's input node could be anything that provides sorted
output. Index Scan, Merge Join, Nested Loop, Group Aggregate, etc.
Have a look at [1]https://github.com/postgres/postgres/blob/master/src/test/regress/expected/select_parallel.out#L713

David

[1]: https://github.com/postgres/postgres/blob/master/src/test/regress/expected/select_parallel.out#L713

#5Ayush Vatsa
ayushvatsa1810@gmail.com
In reply to: David Rowley (#4)
Re: Question Regarding Merge Append and Parallel Execution of Index Scans on Partitioned Table

The input node to the Gather Merge needs to be sorted by
something, and the output of the Gather Merge will be sorted by the
same thing.

Ok now I got that. Thanks for the clarification.

Last small question:
As of now parallelism in merge append is not supported, but it could
be something we can consider implementing in the future.

That said, I’m wondering if this might not be necessary, given that
Gather Merge already accomplishes similar functionality. Would
love to hear your thoughts on whether there’s a distinct advantage to
adding parallelism to Merge Append or if Gather Merge sufficiently
covers all the use cases.

-----------------
Thanks,
Ayush Vatsa
SDE AWS

#6David Rowley
dgrowleyml@gmail.com
In reply to: Ayush Vatsa (#5)
Re: Question Regarding Merge Append and Parallel Execution of Index Scans on Partitioned Table

On Fri, 6 Jun 2025 at 07:49, Ayush Vatsa <ayushvatsa1810@gmail.com> wrote:

That said, I’m wondering if this might not be necessary, given that
Gather Merge already accomplishes similar functionality. Would
love to hear your thoughts on whether there’s a distinct advantage to
adding parallelism to Merge Append or if Gather Merge sufficiently
covers all the use cases.

Gather Merge supports only a single subnode. Merge Append supports any number.

David