Documentation refinement for Parallel Scans

Started by Zhang Mingliabout 3 years ago3 messages

zmlpostgres@gmail.com

about 3 years ago

Hi,

Found documents about parallel scan may be not so accurate.

As said in parallel.smgl:

```
In a parallel sequential scan, the table's blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential.
```

To my understanding, this was right before. Because we return one block if a worker ask for before commit 56788d2156.
As comments inside table_block_parallelscan_nextpage:
```
Earlier versions of this would allocate the next highest block number to the next worker to call this function.
```
And from commit 56788d2156, each parallel worker will try to get ranges of blocks “chunks".
Access to the table remains sequential inside each worker’s process, but not across all workers or the parallel query.
Shall we update the documents?

Regards,
Zhang Mingli

Import Notes

Reply to msg id not found: 8b3760ce-5599-47d7-a4ca-06749e4d0b10@SparkReference msg id not found: 8b3760ce-5599-47d7-a4ca-06749e4d0b10@Spark

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: Zhang Mingli (#1)

1 attachment(s)

Re: Documentation refinement for Parallel Scans

On Thu, 20 Oct 2022 at 16:03, Zhang Mingli <zmlpostgres@gmail.com> wrote:

As said in parallel.smgl:

In a parallel sequential scan, the table's blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential.

Shall we update the documents?

Yeah, 56788d215 should have updated that. Seems I didn't expect that
level of detail in the docs. I've attached a patch to address this.

I didn't feel the need to go into too much detail about how the sizes
of the ranges are calculated. I tried to be brief, but I think I did
leave enough in there so that a reader will know that we don't just
make the range length <nblocks> / <nworkers>.

I'll push this soon if nobody has any other wording suggestions.

Thanks for the report.

David

Attachments:

fix_parallel_seqscan_docs.patchtext/plain; charset=US-ASCII; name=fix_parallel_seqscan_docs.patchDownload

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index c37fb67065..e556786e2b 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -272,8 +272,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
     <listitem>
       <para>
         In a <emphasis>parallel sequential scan</emphasis>, the table's blocks will
-        be divided among the cooperating processes.  Blocks are handed out one
-        at a time, so that access to the table remains sequential.
+        be divided into ranges and shared among the cooperating processes.  Each
+        worker process will complete the scanning of its given range of blocks before
+        requesting an additional range of blocks.
       </para>
     </listitem>
     <listitem>

David Rowley

dgrowleyml@gmail.com

about 3 years ago

In reply to: David Rowley (#2)

Re: Documentation refinement for Parallel Scans

On Thu, 20 Oct 2022 at 19:33, David Rowley <dgrowleyml@gmail.com> wrote:

I'll push this soon if nobody has any other wording suggestions.

Pushed.

Thanks for the report.

David