RFC/PoC: GUC option to enable tuple queue autoflush for parallel workers

Started by Francesco Degrassiover 1 year ago2 messages
Jump to latest
#1Francesco Degrassi
francesco.degrassi@optionfactory.net

Hi all. A brief overview of our use case follows.

We are developing a foreign data wrapper which employs parallel scan
support and predicate pushdown; given the types of queries we run,
foreign scans can be very long and often return very few rows.

As the scan can be very long and slow, we'd like to provide partial
results to the user as rows are being returned. We found two problems
with that:
1. Leader backend would not poll the parallel workers queue until it
itself found a row to return; we worked around it by turning
`parallel_leader_participation` to off.
2. Parallel workers tuple queues have buffering, and are not flushed
until a certain fill threshold is reached; as our queries yield few
result rows, oftentimes these rows would only get returned at the end
of the (very long) scan.

The proposal is to add a `parallel_tuplequeue_autoflush` GUC (bool,
default false) that would force every row returned by a parallel
worker to be immediately flushed to the leader; this was already the
case before v15, so it simply allows to opt for the previous
behaviour.

This would be achieved by configuring a `auto_flush` field on
`TQueueDestReceiver`, so that `tqueueReceiveSlot` would pass
`force_flush` when calling `shm_mq_send`.

The attached patch, tested on master @ 1ab67c9dfaadda , is a poc
tentative implementation.
Based on feedback, we're available to work on a complete and properly
documented patch.

Thanks in advance for your consideration.

Regards,
Francesco

Attachments:

parallel_tuplequeue_autoflush.patchtext/x-patch; charset=US-ASCII; name=parallel_tuplequeue_autoflush.patchDownload+25-8
#2Francesco Degrassi
francesco.degrassi@optionfactory.net
In reply to: Francesco Degrassi (#1)
Re: RFC/PoC: GUC option to enable tuple queue autoflush for parallel workers

Hello, I hope bumping up this is not frowned upon.
Any chance we can get any feedback?

Thanks and best regards

Francesco

On Thu, 26 Sept 2024 at 16:15, Francesco Degrassi
<francesco.degrassi@optionfactory.net> wrote:

Show quoted text

Hi all. A brief overview of our use case follows.

We are developing a foreign data wrapper which employs parallel scan
support and predicate pushdown; given the types of queries we run,
foreign scans can be very long and often return very few rows.

As the scan can be very long and slow, we'd like to provide partial
results to the user as rows are being returned. We found two problems
with that:
1. Leader backend would not poll the parallel workers queue until it
itself found a row to return; we worked around it by turning
`parallel_leader_participation` to off.
2. Parallel workers tuple queues have buffering, and are not flushed
until a certain fill threshold is reached; as our queries yield few
result rows, oftentimes these rows would only get returned at the end
of the (very long) scan.

The proposal is to add a `parallel_tuplequeue_autoflush` GUC (bool,
default false) that would force every row returned by a parallel
worker to be immediately flushed to the leader; this was already the
case before v15, so it simply allows to opt for the previous
behaviour.

This would be achieved by configuring a `auto_flush` field on
`TQueueDestReceiver`, so that `tqueueReceiveSlot` would pass
`force_flush` when calling `shm_mq_send`.

The attached patch, tested on master @ 1ab67c9dfaadda , is a poc
tentative implementation.
Based on feedback, we're available to work on a complete and properly
documented patch.

Thanks in advance for your consideration.

Regards,
Francesco