pgsql: Prefetch data referenced by the WAL, take II.
Prefetch data referenced by the WAL, take II.
Introduce a new GUC recovery_prefetch. When enabled, look ahead in the
WAL and try to initiate asynchronous reading of referenced data blocks
that are not yet cached in our buffer pool. For now, this is done with
posix_fadvise(), which has several caveats. Since not all OSes have
that system call, "try" is provided so that it can be enabled where
available. Better mechanisms for asynchronous I/O are possible in later
work.
Set to "try" for now for test coverage. Default setting to be finalized
before release.
The GUC wal_decode_buffer_size limits the distance we can look ahead in
bytes of decoded data.
The existing GUC maintenance_io_concurrency is used to limit the number
of concurrent I/Os allowed, based on pessimistic heuristics used to
infer that I/Os have begun and completed. We'll also not look more than
maintenance_io_concurrency * 4 block references ahead.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version)
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version)
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version)
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version)
Discussion: /messages/by-id/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/5dc0418fab281d017a61a5756240467af982bdfd
Modified Files
--------------
doc/src/sgml/config.sgml | 64 ++
doc/src/sgml/monitoring.sgml | 86 +-
doc/src/sgml/wal.sgml | 12 +
src/backend/access/transam/Makefile | 1 +
src/backend/access/transam/xlog.c | 2 +
src/backend/access/transam/xlogprefetcher.c | 1082 +++++++++++++++++++++++++
src/backend/access/transam/xlogreader.c | 27 +-
src/backend/access/transam/xlogrecovery.c | 179 ++--
src/backend/access/transam/xlogutils.c | 27 +-
src/backend/catalog/system_views.sql | 14 +
src/backend/storage/buffer/bufmgr.c | 4 +
src/backend/storage/freespace/freespace.c | 3 +-
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/smgr/md.c | 6 +-
src/backend/utils/adt/pgstatfuncs.c | 5 +-
src/backend/utils/misc/guc.c | 55 +-
src/backend/utils/misc/postgresql.conf.sample | 6 +
src/include/access/xlog.h | 1 +
src/include/access/xlogprefetcher.h | 53 ++
src/include/access/xlogreader.h | 8 +
src/include/access/xlogutils.h | 3 +-
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.dat | 7 +
src/include/utils/guc.h | 4 +
src/include/utils/guc_tables.h | 1 +
src/test/regress/expected/rules.out | 11 +
src/tools/pgindent/typedefs.list | 6 +
27 files changed, 1595 insertions(+), 77 deletions(-)
Hi,
On 2022-04-07 07:44:20 +0000, Thomas Munro wrote:
Prefetch data referenced by the WAL, take II.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2022-04-07%2008%3A17%3A27
thinks that xlogpretcher.h doesn't include enough...
I had added those checks to CI, but apparently somehow screwed that up.
Or maybe it's the scripts that are screwed up? Because I don't see any error
checking in headerscheck/cpluspluscheck. And indeed, locally they show the
errors, but exit with 0.
Greetings,
Andres Freund
On 4/7/22 04:36, Andres Freund wrote:
Hi,
On 2022-04-07 07:44:20 +0000, Thomas Munro wrote:
Prefetch data referenced by the WAL, take II.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2022-04-07%2008%3A17%3A27
thinks that xlogpretcher.h doesn't include enough...I had added those checks to CI, but apparently somehow screwed that up.
Or maybe it's the scripts that are screwed up? Because I don't see any error
checking in headerscheck/cpluspluscheck. And indeed, locally they show the
errors, but exit with 0.
Yeah, you can't rely on the exit status, if they produce output that
should be regarded as a failure (that's what crake does).
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
On 4/7/22 04:36, Andres Freund wrote:
Or maybe it's the scripts that are screwed up? Because I don't see any error
checking in headerscheck/cpluspluscheck. And indeed, locally they show the
errors, but exit with 0.
Yeah, you can't rely on the exit status, if they produce output that
should be regarded as a failure (that's what crake does).
Yeah, those scripts were only intended to be run by hand. If someone
feels like upgrading them to also produce a useful exit status,
I'm for it.
regards, tom lane
On 2022-Apr-07, Thomas Munro wrote:
Prefetch data referenced by the WAL, take II.
I propose a small wording change in guc.c,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9fbbfb1be5..9803741708 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2840,7 +2840,7 @@ static struct config_int ConfigureNamesInt[] =
{
{"wal_decode_buffer_size", PGC_POSTMASTER, WAL_RECOVERY,
gettext_noop("Maximum buffer size for reading ahead in the WAL during recovery."),
- gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch referenced blocks."),
+ gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch data blocks referenced therein."),
GUC_UNIT_BYTE
},
&wal_decode_buffer_size,
"referenced blocks" seems otherwise a bit unclear to me. Other wording
suggestions welcome. I first thought of "...to prefetch referenced data
blocks", which is probably OK too.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)
On Sun, Sep 4, 2022 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Apr-07, Thomas Munro wrote:
I propose a small wording change in guc.c,diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 9fbbfb1be5..9803741708 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2840,7 +2840,7 @@ static struct config_int ConfigureNamesInt[] = { {"wal_decode_buffer_size", PGC_POSTMASTER, WAL_RECOVERY, gettext_noop("Maximum buffer size for reading ahead in the WAL during recovery."), - gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch referenced blocks."), + gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch data blocks referenced therein."), GUC_UNIT_BYTE }, &wal_decode_buffer_size,"referenced blocks" seems otherwise a bit unclear to me. Other wording
suggestions welcome. I first thought of "...to prefetch referenced data
blocks", which is probably OK too.
I'd go for your second suggestion. 'therein' doesn't convey much (we
already said 'in the WAL', and therein is just a slightly formal and
somehow more Germanic way of saying 'in it' which is kinda
duplicative; maybe 'by it' is what we want but that's still kinda
implied already), whereas adding 'data' makes it slightly clearer that
it's blocks from relations and not, say, the WAL itself.
Also, hmm, it's not really the 'Maximum buffer size', it's the 'Buffer size'.
On 2022-Sep-13, Thomas Munro wrote:
I'd go for your second suggestion. 'therein' doesn't convey much (we
already said 'in the WAL', and therein is just a slightly formal and
somehow more Germanic way of saying 'in it' which is kinda
duplicative; maybe 'by it' is what we want but that's still kinda
implied already), whereas adding 'data' makes it slightly clearer that
it's blocks from relations and not, say, the WAL itself.Also, hmm, it's not really the 'Maximum buffer size', it's the 'Buffer size'.
Sounds reasonable. Pushed these changes. Thank you!
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)