WAL-logging facility for pgstats kinds

Started by Michael Paquierover 1 year ago11 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

While brainstorming on the contents of the thread I have posted a
couple of days ago, I have been looking at what could be done so as
pgstats and WAL-logging could work together. This was point 2) from
this message:
/messages/by-id/Z2tblEmfuOfZy4zx@paquier.xyz

While considering the point that not all stats data is worth
replicating, I have fallen down to the following properties that are
nice to have across the board:
- A pgstats kind should be able to WAL-log some data that is should be
able to decide. Including versioning of the data.
- The data kind should be able to decide how this data is handled at
recovery (different persistency depending on standby vs crash
recovery, for one).
- Having to add one RMGR for each stats kind is not going to scale,
especially as we can control the redo path using a callback part of
PgStat_KindInfo. For custom kinds, this enforces some validation
based on if the stats library has been really loaded at startup with
shared_preload_library.
- It is nice for each out-of-core stats kind to not have to load and
define a custom RMGR, duplicating what this central facility is doing.
- The persistency of the stats data is better across crashes: this
approach makes the cleanup of orphaned pgstats entries easier as this
can be enforced at replay by each stats kind, if necessary, and it
is also possible to force the creation of stats data.
- For variable-numbered stats, the WAL-logging can be timed with the
timing where any pending stats are flushed, if there is some.

Table stats to prevent autovacuum from committing Seppuku for tables
without stats after a crash is the first use case I can think about,
where we'd want to split and expose some DML stats. Scans stats
should remain untouched on standbys, for example. There may be other
stats data that is worth replicating, all these could use this new
facility.

Attached is the result of this investigation, where I have finished by
implementing a new backend facility where pgstats kinds can WAL-log
data at will using a new RMGR called "PgStat". The most interesting
part is pgstat_xlog_data() in a new file called pgstat_xlog.c, that
stats kinds can use as an entry point to WAL-log data structures with
a XLogRegisterData() managed by a new record structure called
xl_pgstat_data.

For clarity, the patch set has been split into several pieces, I hope
rather edible:
- 0001, a fix I've posted on a different thread [1]/messages/by-id/Z23zcE4w1djukkva@paquier.xyz -- Michael, used in patch
0004 to test this new facility.
- 0002, a refactoring piece to be able to expose stats kind data into
the frontend (for pg_waldump).
- 0003 is the core backend piece, introducing the WAL-logging routine
and the new RMGR.
- 0004, that provides tests for the new backend facility via a custom
stats kind. Requires 0001.

I am going to attach it to the next CF.

Comments or opinions are welcome.

[1]: /messages/by-id/Z23zcE4w1djukkva@paquier.xyz -- Michael
--
Michael

Attachments:

v1-0001-injection_points-Tweak-variable-numbered-stats-to.patchtext/x-diff; charset=us-asciiDownload+8-8
v1-0002-Move-information-about-pgstats-kinds-into-its-own.patchtext/x-diff; charset=us-asciiDownload+73-57
v1-0003-Add-RMGR-and-WAL-logging-API-for-pgstats.patchtext/x-diff; charset=us-asciiDownload+185-5
v1-0004-injection_points-Add-option-and-tests-for-WAL-log.patchtext/x-diff; charset=us-asciiDownload+95-3
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: WAL-logging facility for pgstats kinds

On Fri, Dec 27, 2024 at 12:32:25PM +0900, Michael Paquier wrote:

For clarity, the patch set has been split into several pieces, I hope
rather edible:
- 0001, a fix I've posted on a different thread [1], used in patch
0004 to test this new facility.
- 0002, a refactoring piece to be able to expose stats kind data into
the frontend (for pg_waldump).
- 0003 is the core backend piece, introducing the WAL-logging routine
and the new RMGR.
- 0004, that provides tests for the new backend facility via a custom
stats kind. Requires 0001.

I am going to attach it to the next CF.

0001 has been applied as of b757abefc041. Here is a rebased patch set
for the rest.
--
Michael

Attachments:

v2-0002-Move-information-about-pgstats-kinds-into-its-own.patchtext/x-diff; charset=us-asciiDownload+73-57
v2-0003-Add-RMGR-and-WAL-logging-API-for-pgstats.patchtext/x-diff; charset=us-asciiDownload+185-5
v2-0004-injection_points-Add-option-and-tests-for-WAL-log.patchtext/x-diff; charset=us-asciiDownload+95-3
#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#2)
Re: WAL-logging facility for pgstats kinds

Hi,

On Tue, Dec 31, 2024 at 09:52:31AM +0900, Michael Paquier wrote:

On Fri, Dec 27, 2024 at 12:32:25PM +0900, Michael Paquier wrote:

For clarity, the patch set has been split into several pieces, I hope
rather edible:
- 0001, a fix I've posted on a different thread [1], used in patch
0004 to test this new facility.
- 0002, a refactoring piece to be able to expose stats kind data into
the frontend (for pg_waldump).
- 0003 is the core backend piece, introducing the WAL-logging routine
and the new RMGR.
- 0004, that provides tests for the new backend facility via a custom
stats kind. Requires 0001.

I am going to attach it to the next CF.

0001 has been applied as of b757abefc041. Here is a rebased patch set
for the rest.

Thanks for the patch!

A few random comments:

About v2-0002:

=== 1

+ * Copyright (c) 2001-2024, PostgreSQL Global Development Group

As pgstat_kind.h is a new file, s/Copyright (c) 2001-2024/Copyright (c) 2025/
maybe?

No more comments, as v2-0002 is "just" moving some stuff from pgstat.h to
pgstat_kind.h.

About v2-0003:

=== 2

Same as === 1 for pgstatdesc.c, pgstat_xlog.c and pgstat_xlog.h.

=== 3

+void
+pgstat_desc(StringInfo buf, XLogReaderState *record)
+{

Looks like logicalmsg_desc(), fine by me, except:

+ /* Write data payload as a series of hex bytes */

s/data payload/stats data/ maybe?

=== 4

+const char *
+pgstat_identify(uint8 info)

Looks like logicalmsg_identify(), fine by me.

=== 5

+typedef struct xl_pgstat_data
+{
+       PgStat_Kind stats_kind;
+       size_t          data_size;              /* size of the data */
+       /* data payload, worth data_size */
+       char            data[FLEXIBLE_ARRAY_MEMBER];
+} xl_pgstat_data;

Yeah, snapshotConflictHorizon and isCatalogRel are not needed here.

=== 6

+       stats_kind = xlrec->stats_kind;
+       data_size = xlrec->data_size;
+       data = xlrec->data;
+
+       kind_info = pgstat_get_kind_info(stats_kind);
+
+       if (kind_info == NULL)
+               elog(FATAL, "pgstat_redo: invalid stats data found: kind=%u",
+                        stats_kind);
+
+       if (kind_info->redo_cb == NULL)
+               elog(FATAL, "pgstat_redo: no redo callback found: kind=%s",
+                        kind_info->name);

What about moving the data_size and data assignments after the kind_info
and kind_info->redo_cb checks?

Also,

s/invalid stats data/invalid stats kind/ maybe?

About v2-0004:

=== 7

+ PgStat_StatInjRecord record_data = {0};

PgStat_StatInjRecord does not contain any padding but as it acts as a template,
better to use memset() instead? (to cover the cases where the record contains
padding).

=== 8

+       record_data.objid = entry_ref->shared_entry->key.objid;
+       record_data.entry = shfuncent->stats;

So it makes === 7 useless as here we are setting all the fields. But I think
it's good to keep === 7 as it acts as a template.

=== 9

+ if (!RecoveryInProgress() && inj_stats_wal_enabled)

s/!RecoveryInProgress()/XLogInsertAllowed()/ maybe?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#1)
Re: WAL-logging facility for pgstats kinds

Hi,

On 2024-12-27 12:32:25 +0900, Michael Paquier wrote:

While brainstorming on the contents of the thread I have posted a
couple of days ago, I have been looking at what could be done so as
pgstats and WAL-logging could work together. This was point 2) from
this message:
/messages/by-id/Z2tblEmfuOfZy4zx@paquier.xyz

While considering the point that not all stats data is worth
replicating, I have fallen down to the following properties that are
nice to have across the board:
- A pgstats kind should be able to WAL-log some data that is should be
able to decide. Including versioning of the data.

I don't really think that's right. For cumulative stats to make sense on both
a primary and a replica, or a primary after a crash, they need to cover things
that *already* are WAL logged.

E.g. for pg_stat_all_tables.n_{tup_{ins,upd,del,hot_upd},live_tup,dead_tup,
...}, - which are important for autovacuum scheduling - we should use the WAL
records covering those changes to re-assemble the stats during WAL replay.

The only reason that that's not trivial is that we don't have access to the
relation oid during crash recovery. Which in turn is why I think we should
change relation level stats to be keyed by relfilenode, rather than relation
oid.

I can't think of a real case where we would want to WAL log the stats
themselves, rather than re-emitting stats during replay based on the WAL
record of the "underlying object".

Do you have counter-examples?

Greetings,

Andres Freund

#5Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#4)
Re: WAL-logging facility for pgstats kinds

On Thu, Jan 02, 2025 at 08:08:42PM -0500, Andres Freund wrote:

I can't think of a real case where we would want to WAL log the stats
themselves, rather than re-emitting stats during replay based on the WAL
record of the "underlying object".

Do you have counter-examples?

I'm not sure if the rebuild based on the WAL records is simpler than
logging a snapshot of them that the startup process could digest.

Anyway, I've also wanted to be able to replicate stats for historical
tracking for stats logged through hooks, and a second case was
injection point stats. All these would require registering a custom
rmgr on top of a custom stats kind, and centralizing the logic eases a
lot some of the sanity checks they'd require as the redo callback of a
stats kind can just be attached to its PgStat_KindInfo.

I know that Lucas has been playing a bit with the area, and perhaps
he has more cases in mind where the replication of stats data could be
relevant. So I am adding him in CC. Perhaps I could be wrong, of
course.
--
Michael

#6Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#3)
Re: WAL-logging facility for pgstats kinds

On Tue, Dec 31, 2024 at 11:29:26AM +0000, Bertrand Drouvot wrote:

=== 1

+ * Copyright (c) 2001-2024, PostgreSQL Global Development Group

As pgstat_kind.h is a new file, s/Copyright (c) 2001-2024/Copyright (c) 2025/
maybe?

Fixed.

No more comments, as v2-0002 is "just" moving some stuff from pgstat.h to
pgstat_kind.h.

Thanks. This split is independent of the main topic of the thread,
and I think that it just makes the declarations cleaner, so I'm
planning to apply this one. I was also planning to write some custom
tool to manipulate the pgstats file. That will help a bit.

For the rest of the patch set, let's see how the discussion moves on..

Same as === 1 for pgstatdesc.c, pgstat_xlog.c and pgstat_xlog.h.

Fixed.

+void
+pgstat_desc(StringInfo buf, XLogReaderState *record)
+{

Looks like logicalmsg_desc(), fine by me, except:

Yes, there's not much you can do here, except if we invent one day a
way to load external libraries in the frontend.

s/data payload/stats data/ maybe?
s/invalid stats data/invalid stats kind/ maybe?

Okay.

+ PgStat_StatInjRecord record_data = {0};

PgStat_StatInjRecord does not contain any padding but as it acts as a template,
better to use memset() instead? (to cover the cases where the record contains
padding).

Dammit, forgot about that. Good point about the padding, especially
for the template bits.

+ if (!RecoveryInProgress() && inj_stats_wal_enabled)

s/!RecoveryInProgress()/XLogInsertAllowed()/ maybe?

I'd rather use RecoveryInProgress() here even if XLogInsertAllowed()
is a synonym of that, minus the update of LocalXLogInsertAllowed for
the local process.
--
Michael

Attachments:

v3-0002-Move-information-about-pgstats-kinds-into-its-own.patchtext/x-diff; charset=us-asciiDownload+73-57
v3-0003-Add-RMGR-and-WAL-logging-API-for-pgstats.patchtext/x-diff; charset=us-asciiDownload+181-5
v3-0004-injection_points-Add-option-and-tests-for-WAL-log.patchtext/x-diff; charset=us-asciiDownload+97-3
#7Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#6)
Re: WAL-logging facility for pgstats kinds

On Fri, Jan 10, 2025 at 01:46:53PM +0900, Michael Paquier wrote:

I'd rather use RecoveryInProgress() here even if XLogInsertAllowed()
is a synonym of that, minus the update of LocalXLogInsertAllowed for
the local process.

I've applied v2-0002 for the new header as it is useful on its own.
Rebased to avoid the wrath of the CF bot, as v3.
--
Michael

Attachments:

v3-0003-Add-RMGR-and-WAL-logging-API-for-pgstats.patchtext/x-diff; charset=us-asciiDownload+181-5
v3-0004-injection_points-Add-option-and-tests-for-WAL-log.patchtext/x-diff; charset=us-asciiDownload+97-3
#8Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#7)
Re: WAL-logging facility for pgstats kinds

Hi,

On 2025-01-14 12:54:36 +0900, Michael Paquier wrote:

On Fri, Jan 10, 2025 at 01:46:53PM +0900, Michael Paquier wrote:

I'd rather use RecoveryInProgress() here even if XLogInsertAllowed()
is a synonym of that, minus the update of LocalXLogInsertAllowed for
the local process.

I've applied v2-0002 for the new header as it is useful on its own.
Rebased to avoid the wrath of the CF bot, as v3.

Because I saw this being moved to the new CF: I continue to *strenuously*
object to this design. As outlined upthread, I think it's going into the
completely wrong direction.

- Andres

#9Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#8)
Re: WAL-logging facility for pgstats kinds

On Mon, Feb 10, 2025 at 11:43:30AM -0500, Andres Freund wrote:

Because I saw this being moved to the new CF: I continue to *strenuously*
object to this design. As outlined upthread, I think it's going into the
completely wrong direction.

Right. FWIW, I'm not sure that we can absolutely just discard a
possiblity like what this patch is doing, but I see your point that it
may not fit into the final picture, depending on what we finish with.
I'll go discard that for now, keeping it aside in case.
--
Michael

#10Cédric Villemain
cedric.villemain@data-bene.io
In reply to: Michael Paquier (#9)
Re: WAL-logging facility for pgstats kinds

On 12/02/2025 01:50, Michael Paquier wrote:

On Mon, Feb 10, 2025 at 11:43:30AM -0500, Andres Freund wrote:

Because I saw this being moved to the new CF: I continue to *strenuously*
object to this design. As outlined upthread, I think it's going into the
completely wrong direction.

Right. FWIW, I'm not sure that we can absolutely just discard a
possiblity like what this patch is doing, but I see your point that it
may not fit into the final picture, depending on what we finish with.
I'll go discard that for now, keeping it aside in case.

I am developing (WIP) an extension to add a facility similar to
pg_replication_slots, but for statistics or metrics. On one side, it
would allow the registration of external "stats plugins," and on the
other side, it would enable the consumption of whatever these plugins
return. I believe a similar idea may have been proposed in the
past-perhaps involving opening a dedicated port for accessing
stats/metrics-but I haven't yet searched for related public talks or
archives.

This approach makes it easy to envision a background worker on a replica
that connects to a stats slot and processes the received stats/metrics
as needed.

Additionally, there is significant potential for external applications
to leverage the PostgreSQL stats system to collect metrics instead of
relying on relational tables, promising a highly efficient solution. In
such cases, providing a mechanism to replicate this information could be
very beneficial for end-users.

Though I also support Andres' position to avoid abusing WAL in this
context, I am slightly more flexible: this exception would apply only if
there is a clear separation between stats essential for PostgreSQL (or
its extensions) to function correctly during or after a switchover, and
those (stats or) metrics that are irrelevant to PostgreSQL itself.

---
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D

#11Michael Paquier
michael@paquier.xyz
In reply to: Cédric Villemain (#10)
Re: WAL-logging facility for pgstats kinds

On Sat, Feb 15, 2025 at 10:31:33AM +0100, Cédric Villemain wrote:

Additionally, there is significant potential for external applications to
leverage the PostgreSQL stats system to collect metrics instead of relying
on relational tables, promising a highly efficient solution. In such cases,
providing a mechanism to replicate this information could be very beneficial
for end-users.

Hmm. You mean with the extra addition of a hook around
pgstat[_shmem].c so as it would be possible to take some custom action
when pushing the stats to the pgstats dshash or for the fixed-numbered
stats?

Though I also support Andres' position to avoid abusing WAL in this context,
I am slightly more flexible: this exception would apply only if there is a
clear separation between stats essential for PostgreSQL (or its extensions)
to function correctly during or after a switchover, and those (stats or)
metrics that are irrelevant to PostgreSQL itself.

All these parts are doable even today outside of core with a custom
RMGR, so it's not like there are no solutions in place. The
introduction of a pgstats callback that can attached to a stats kind
is able to enforce a stricter rule in the backend, at least, without
the need to consume a bunch of RMGR IDs for that. (It is possible to
have one RMGR used across many extensions, clumsy but doable.)
--
Michael