pgsql: Use pre-fetching for ANALYZE

Started by Stephen Frostalmost 5 years ago4 messages

sfrost@snowman.net

almost 5 years ago

Use pre-fetching for ANALYZE

When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for. Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: /messages/by-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0@VI1PR0701MB6960.eurprd07.prod.outlook.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/c6fc50cb40285141fad401321ae21becbaea1c59

Modified Files
--------------
src/backend/commands/analyze.c | 73 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 71 insertions(+), 2 deletions(-)

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Stephen Frost (#1)

Re: pgsql: Use pre-fetching for ANALYZE

Hi,

On 2021-03-16 18:48:08 +0000, Stephen Frost wrote:

Use pre-fetching for ANALYZE

When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for. Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

I just looked at this as part of debugging a crash / hang in the AIO patch.

The code does:

block_accepted = table_scan_analyze_next_block(scan, targblock, vac_strategy);

#ifdef USE_PREFETCH

/*
* When pre-fetching, after we get a block, tell the kernel about the
* next one we will want, if there's any left.
*
* We want to do this even if the table_scan_analyze_next_block() call
* above decides against analyzing the block it picked.
*/
if (prefetch_maximum && prefetch_targblock != InvalidBlockNumber)
PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock);
#endif

I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
quite bad idea to me. Why are we doing IO while holding a content lock, if we
can avoid it?

Greetings,

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#2)

Re: pgsql: Use pre-fetching for ANALYZE

Hi,

On 2022-06-02 19:30:16 -0700, Andres Freund wrote:

On 2021-03-16 18:48:08 +0000, Stephen Frost wrote:

Use pre-fetching for ANALYZE

When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for. Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

I just looked at this as part of debugging a crash / hang in the AIO patch.

The code does:

block_accepted = table_scan_analyze_next_block(scan, targblock, vac_strategy);

#ifdef USE_PREFETCH

/*
* When pre-fetching, after we get a block, tell the kernel about the
* next one we will want, if there's any left.
*
* We want to do this even if the table_scan_analyze_next_block() call
* above decides against analyzing the block it picked.
*/
if (prefetch_maximum && prefetch_targblock != InvalidBlockNumber)
PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock);
#endif

I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
quite bad idea to me. Why are we doing IO while holding a content lock, if we
can avoid it?

It also seems decidedly not great from a layering POV to do the IO in
analyze.c. There's no guarantee that the tableam maps blocks in a way that's
compatible with PrefetchBuffer(). Yes, the bitmap heap scan code does
something similar, but a) that is opt in by the AM, b) there's a comment
saying it's quite crufty and should be fixed.

Greetings,

Andres Freund

Stephen Frost

sfrost@snowman.net

over 3 years ago

In reply to: Andres Freund (#3)

Re: pgsql: Use pre-fetching for ANALYZE

Greetings,

* Andres Freund (andres@anarazel.de) wrote:

On 2022-06-02 19:30:16 -0700, Andres Freund wrote:

On 2021-03-16 18:48:08 +0000, Stephen Frost wrote:

Use pre-fetching for ANALYZE

When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for. Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

I just looked at this as part of debugging a crash / hang in the AIO patch.

The code does:

block_accepted = table_scan_analyze_next_block(scan, targblock, vac_strategy);

#ifdef USE_PREFETCH

/*
* When pre-fetching, after we get a block, tell the kernel about the
* next one we will want, if there's any left.
*
* We want to do this even if the table_scan_analyze_next_block() call
* above decides against analyzing the block it picked.
*/
if (prefetch_maximum && prefetch_targblock != InvalidBlockNumber)
PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock);
#endif

I.e. we lock a buffer and *then* we prefetch another buffer. That seems like a
quite bad idea to me. Why are we doing IO while holding a content lock, if we
can avoid it?

At the end, we're doing a posix_fadvise() which is a kernel call but
hopefully wouldn't do actual IO when we call it. Still, agreed that
it'd be better to do that without holding locks and no objection to
making such a change.

It also seems decidedly not great from a layering POV to do the IO in
analyze.c. There's no guarantee that the tableam maps blocks in a way that's
compatible with PrefetchBuffer(). Yes, the bitmap heap scan code does
something similar, but a) that is opt in by the AM, b) there's a comment
saying it's quite crufty and should be fixed.

Certainly open to suggestions. Are you thinking it'd make sense to add
a 'prefetch_block' method to TableAmRoutine? Or did you have another
thought?

Thanks!

Stephen