Pluggable cumulative statistics

Started by Michael Paquieralmost 2 years ago43 messageshackers

michael@paquier.xyz

almost 2 years ago

Hi all,

While looking at ways to make pg_stat_statements more scalable and
dynamically manageable (no more PGC_POSTMASTER for the max number of
entries), which came out as using a dshash, Andres has mentioned me
off-list (on twitter/X) that we'd better plug in it to the shmem
pgstats facility, moving the text file that holds the query strings
into memory (with size restrictions for the query strings, for
example). This has challenges on its own (query ID is 8 bytes
incompatible with the dboid/objid hash key used by pgstats, discard of
entries when maximum). Anyway, this won't happen if we don't do one
of these two things:
1) Move pg_stat_statements into core, adapting pgstats for its
requirements.
2) Make the shmem pgstats pluggable so as it is possible for extensions
to register their own stats kinds.

1) may have its advantages, still I am not sure if we want to do that.
And 2) is actually something that can be used for more things than
just pg_stat_statements, because people love extensions and
statistics (spoiler: I do). The idea is simple: any extension
defining a custom stats kind would be able to rely on all the in-core
facilities we use for the existing in-core kinds:
a) Snapshotting and caching of the stats, via stats_fetch_consistency.
b) Native handling and persistency of the custom stats data.
c) Reuse stats after a crash, pointing at this comment in xlog.c:
* TODO: With a bit of extra work we could just start with a pgstat file
* associated with the checkpoint redo location we're starting from.
This means that we always remove the stats after a crash. That's
something I have a patch for, not for this thread, but the idea is
that custom stats would also benefit from this property.

The implementation is based on the following ideas:

* A structure in shared memory that tracks the IDs of the custom stats
kinds with their names. These are incremented starting from
PGSTAT_KIND_LAST.

* Processes use a local array cache that keeps tracks of all the
custom PgStat_KindInfos, indexed by (kind_id - PGSTAT_KIND_LAST).

* The kind IDs may change across restarts, meaning that any stats data
associated to a custom kind is stored with the *name* of the custom
stats kind. Depending on the discussion happening here, I'd be open
to use the same concept as custom RMGRs, where custom kind IDs are
"reserved", fixed in time, and tracked in the Postgres wiki. It is
cheaper to store the stats this way, as well, while managing conflicts
across extensions available in the community ecosystem.

* Custom stats can be added without shared_preload_libraries,
loading them from a shmem startup hook with shared_preload_libraries
is also possible.

* The shmem pgstats defines two types of statistics: the ones in a
dshash and what's called a "fixed" type like for archiver, WAL, etc.
pointing to areas of shared memory. All the fixed types are linked to
structures for snapshotting and shmem tracking. As a matter of
simplification and because I could not really see a case where I'd
want to plug in a fixed stats kind, the patch forbids this case. This
case could be allowed, but I'd rather refactor the structures of
pgstat_internal.h so as we don't have traces of the "fixed" stats
structures in so many areas.

* Making custom stats data persistent is an interesting problem, and
there are a couple of approaches I've considered:
** Allow custom kinds to define callbacks to read and write data from
a source they'd want, like their own file through a fd. This has the
disadvantage to remove the benefit of c) above.
** Store everything in the existing stats file, adding one type of
entry like 'S' and 'N' with a "custom" type, where the *name* of the
custom stats kind is stored instead of its ID computed from shared
memory.
A mix of both? The patch attached has used the second approach. If
the process reading/writing the stats does not know about the custom
stats data, the data is discarded.

* pgstat.c has a big array called pgstat_kind_infos to define all the
existing stats kinds. Perhaps the code should be refactored to use
this new API? That would make the code more consistent with what we
do for resource managers, for one, while moving the KindInfos into
their own file. With that in mind, storing the kind ID in KindInfos
feels intuitive.

While thinking about a use case to show what these APIs can do, I have
decided to add statistics to the existing module injection_points
rather than implement a new test module, gathering data about them and
have tests that could use this data (like tracking the number of times
a point is taken). This is simple enough that it can be used as a
template, as well. There is a TAP test checking the data persistence
across restarts, so I did not mess up this part much, hopefully.

Please find attached a patch set implementing these ideas:
- 0001 switches PgStat_Kind from an enum to a uint32, for the internal
counters.
- 0002 is some cleanup for the hardcoded S, N and E in pgstat.c.
- 0003 introduces the backend-side APIs, with the shmem table counter
and the routine to give code paths a way to register their own stats
kind (see pgstat_add_kind).
- 0004 implements an example of how to use these APIs, see
injection_stats.c in src/test/modules/injection_points/.
- 0005 adds some docs.
- 0006 is an idea of how to make this custom stats data persistent.

This will hopefully spark a discussion, and I was looking for answers
regarding these questions:
- Should the pgstat_kind_infos array in pgstat.c be refactored to use
something similar to pgstat_add_kind?
- How should the persistence of the custom stats be achieved?
Callbacks to give custom stats kinds a way to write/read their data,
push everything into a single file, or support both?
- Should this do like custom RMGRs and assign to each stats kinds ID
that are set in stone rather than dynamic ones?

Thanks for reading.
--
Michael

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#1)

Re: Pluggable cumulative statistics

On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote:

- How should the persistence of the custom stats be achieved?
Callbacks to give custom stats kinds a way to write/read their data,
push everything into a single file, or support both?
- Should this do like custom RMGRs and assign to each stats kinds ID
that are set in stone rather than dynamic ones?

These two questions have been itching me in terms of how it would work
for extension developers, after noticing that custom RMGRs are used
more than I thought:
https://wiki.postgresql.org/wiki/CustomWALResourceManagers

The result is proving to be nicer, shorter by 300 lines in total and
much simpler when it comes to think about the way stats are flushed
because it is possible to achieve the same result as the first patch
set without manipulating any of the code paths doing the read and
write of the pgstats file.

In terms of implementation, pgstat.c's KindInfo data is divided into
two parts, for efficiency:
- The exiting in-core stats with designated initializers, renamed as
built-in stats kinds.
- The custom stats kinds are saved in TopMemoryContext, and can only
be registered with shared_preload_libraries. The patch reserves a set
of 128 harcoded slots for all the custom kinds making the lookups for
the KindInfos quite cheap. Upon registration, a custom stats kind
needs to assign a unique ID, with uniqueness on the names and IDs
checked at registration.

The backend code does ID -> information lookups in the hotter paths,
meaning that the code only checks if an ID is built-in or custom, then
redirects to the correct array where the information is stored.
There is one code path that does a name -> information lookup for the
undocumented SQL function pg_stat_have_stats() used in the tests,
which is a bit less efficient now, but that does not strike me as an
issue.

modules/injection_points/ works as previously as a template to show
how to use these APIs, with tests for the whole.

With that in mind, the patch set is more pleasant to the eye, and the
attached v2 consists of:
- 0001 and 0002 are some cleanups, same as previously to prepare for
the backend-side APIs.
- 0003 adds the backend support to plug-in custom stats.
- 0004 includes documentation.
- 0005 is an example of how to use them, with a TAP test providing
coverage.

Note that the patch I've proposed to make stats persistent at
checkpoint so as we don't discard everything after a crash is able to
work with the custom stats proposed on this thread:
https://commitfest.postgresql.org/48/5047/
--
Michael

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#1)

Re: Pluggable cumulative statistics

Hi,

On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote:

Hi all,

2) Make the shmem pgstats pluggable so as it is possible for extensions
to register their own stats kinds.

Thanks for the patch! I like the idea of having custom stats (it has also been
somehow mentioned in [1]/messages/by-id/20220818195124.c7ipzf6c5v7vxymc@awork3.anarazel.de).

2) is actually something that can be used for more things than
just pg_stat_statements, because people love extensions and
statistics (spoiler: I do).

* Making custom stats data persistent is an interesting problem, and
there are a couple of approaches I've considered:
** Allow custom kinds to define callbacks to read and write data from
a source they'd want, like their own file through a fd. This has the
disadvantage to remove the benefit of c) above.
** Store everything in the existing stats file, adding one type of
entry like 'S' and 'N' with a "custom" type, where the *name* of the
custom stats kind is stored instead of its ID computed from shared
memory.

What about having 2 files?

- One is the existing stats file
- One "predefined" for all the custom stats (so what you've done minus the
in-core stats). This one would not be configurable and the extensions will
not need to know about it.

Would that remove the benefit from c) that you mentioned up-thread?

[1]: /messages/by-id/20220818195124.c7ipzf6c5v7vxymc@awork3.anarazel.de

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#2)

Re: Pluggable cumulative statistics

Hi,

On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote:

On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote:

- How should the persistence of the custom stats be achieved?
Callbacks to give custom stats kinds a way to write/read their data,
push everything into a single file, or support both?
- Should this do like custom RMGRs and assign to each stats kinds ID
that are set in stone rather than dynamic ones?

These two questions have been itching me in terms of how it would work
for extension developers, after noticing that custom RMGRs are used
more than I thought:
https://wiki.postgresql.org/wiki/CustomWALResourceManagers

The result is proving to be nicer, shorter by 300 lines in total and
much simpler when it comes to think about the way stats are flushed
because it is possible to achieve the same result as the first patch
set without manipulating any of the code paths doing the read and
write of the pgstats file.

I think it makes sense to follow the same "behavior" as the custom
wal resource managers. That, indeed, looks much more simpler than v1.

In terms of implementation, pgstat.c's KindInfo data is divided into
two parts, for efficiency:
- The exiting in-core stats with designated initializers, renamed as
built-in stats kinds.
- The custom stats kinds are saved in TopMemoryContext,

Agree that a backend lifetime memory area is fine for that purpose.

and can only
be registered with shared_preload_libraries. The patch reserves a set
of 128 harcoded slots for all the custom kinds making the lookups for
the KindInfos quite cheap.

+                       MemoryContextAllocZero(TopMemoryContext,
+                                                          sizeof(PgStat_KindInfo *) * PGSTAT_KIND_CUSTOM_SIZE);

and that's only 8 * PGSTAT_KIND_CUSTOM_SIZE bytes in total.

I had a quick look at the patches (have in mind to do more):

With that in mind, the patch set is more pleasant to the eye, and the
attached v2 consists of:
- 0001 and 0002 are some cleanups, same as previously to prepare for
the backend-side APIs.

0001 and 0002 look pretty straightforward at a quick look.

- 0003 adds the backend support to plug-in custom stats.

1 ===

It looks to me that there is a mix of "in core" and "built-in" to name the
non custom stats. Maybe it's worth to just use one?

As I can see (and as you said above) this is mainly inspired by the custom
resource manager and 2 === and 3 === are probably copy/paste consequences.

2 ===

+       if (pgstat_kind_custom_infos[idx] != NULL &&
+               pgstat_kind_custom_infos[idx]->name != NULL)
+               ereport(ERROR,
+                               (errmsg("failed to register custom cumulative statistics \"%s\" with ID %u", kind_info->name, kind),
+                                errdetail("Custom resource manager \"%s\" already registered with the same ID.",
+                                                  pgstat_kind_custom_infos[idx]->name)));

s/Custom resource manager/Custom cumulative statistics/

3 ===

+       ereport(LOG,
+                       (errmsg("registered custom resource manager \"%s\" with ID %u",
+                                       kind_info->name, kind)));

s/custom resource manager/custom cumulative statistics/

- 0004 includes documentation.

Did not look yet.

- 0005 is an example of how to use them, with a TAP test providing
coverage.

Did not look yet.

As I said, I've in mind to do a more in depth review. I've noted the above while
doing a quick read of the patches so thought it makes sense to share them
now while at it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Bertrand Drouvot (#3)

Re: Pluggable cumulative statistics

On Thu, Jun 20, 2024 at 01:05:42PM +0000, Bertrand Drouvot wrote:

On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote:

* Making custom stats data persistent is an interesting problem, and
there are a couple of approaches I've considered:
** Allow custom kinds to define callbacks to read and write data from
a source they'd want, like their own file through a fd. This has the
disadvantage to remove the benefit of c) above.
** Store everything in the existing stats file, adding one type of
entry like 'S' and 'N' with a "custom" type, where the *name* of the
custom stats kind is stored instead of its ID computed from shared
memory.

What about having 2 files?

- One is the existing stats file
- One "predefined" for all the custom stats (so what you've done minus the
in-core stats). This one would not be configurable and the extensions will
not need to know about it.

Another thing that can be done here is to add a few callbacks to
control how an entry should be written out when the dshash is scanned
or read when the dshash is populated depending on the KindInfo.
That's not really complicated to do as the populate part could have a
cleanup phase if an error is found. I just did not do it yet because
this patch set is already covering a lot, just to get the basics in.

Would that remove the benefit from c) that you mentioned up-thread?

Yes, that can be slightly annoying. Splitting the stats across
multiple files would mean that each stats file would have to store the
redo LSN. That's not really complicated to implement, but really easy
to miss. Perhaps folks implementing their own stats kinds would be
aware anyway because we are going to need a callback to initialize the
file to write if we do that, and the redo LSN should be provided in
input of it. Giving more control to extension developers here would
be OK for me, especially since they could use their own format for
their output file(s).
--
Michael

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Bertrand Drouvot (#4)

Re: Pluggable cumulative statistics

On Thu, Jun 20, 2024 at 02:27:14PM +0000, Bertrand Drouvot wrote:

On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote:
I think it makes sense to follow the same "behavior" as the custom
wal resource managers. That, indeed, looks much more simpler than v1.

Thanks for the feedback.

and can only
be registered with shared_preload_libraries. The patch reserves a set
of 128 harcoded slots for all the custom kinds making the lookups for
the KindInfos quite cheap.
+                       MemoryContextAllocZero(TopMemoryContext,
+                                                          sizeof(PgStat_KindInfo *) * PGSTAT_KIND_CUSTOM_SIZE);
and that's only 8 * PGSTAT_KIND_CUSTOM_SIZE bytes in total.

Enlarging that does not worry me much. Just not too much.

With that in mind, the patch set is more pleasant to the eye, and the
attached v2 consists of:
- 0001 and 0002 are some cleanups, same as previously to prepare for
the backend-side APIs.

0001 and 0002 look pretty straightforward at a quick look.

0002 is quite independentn. Still, 0001 depends a bit on the rest.
Anyway, the Kind is already 4 bytes and it cleans up some APIs that
used int for the Kind, so enforcing signedness is just cleaner IMO.

- 0003 adds the backend support to plug-in custom stats.

1 ===

It looks to me that there is a mix of "in core" and "built-in" to name the
non custom stats. Maybe it's worth to just use one?

Right. Perhaps better to remove "in core" and stick to "builtin", as
I've used the latter for the variables and such.

As I can see (and as you said above) this is mainly inspired by the custom
resource manager and 2 === and 3 === are probably copy/paste consequences.

2 ===

+       if (pgstat_kind_custom_infos[idx] != NULL &&
+               pgstat_kind_custom_infos[idx]->name != NULL)
+               ereport(ERROR,
+                               (errmsg("failed to register custom cumulative statistics \"%s\" with ID %u", kind_info->name, kind),
+                                errdetail("Custom resource manager \"%s\" already registered with the same ID.",
+                                                  pgstat_kind_custom_infos[idx]->name)));

s/Custom resource manager/Custom cumulative statistics/

3 ===

+       ereport(LOG,
+                       (errmsg("registered custom resource manager \"%s\" with ID %u",
+                                       kind_info->name, kind)));

s/custom resource manager/custom cumulative statistics/

Oops. Will fix.
--
Michael

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#1)

Re: Pluggable cumulative statistics

At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in

* The kind IDs may change across restarts, meaning that any stats data
associated to a custom kind is stored with the *name* of the custom
stats kind. Depending on the discussion happening here, I'd be open
to use the same concept as custom RMGRs, where custom kind IDs are
"reserved", fixed in time, and tracked in the Postgres wiki. It is
cheaper to store the stats this way, as well, while managing conflicts
across extensions available in the community ecosystem.

I prefer to avoid having a central database if possible.

If we don't intend to move stats data alone out of a cluster for use
in another one, can't we store the relationship between stats names
and numeric IDs (or index numbers) in a separate file, which is loaded
just before and synced just after extension preloading finishes?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Kyotaro Horiguchi (#7)

Re: Pluggable cumulative statistics

On Fri, Jun 21, 2024 at 01:09:10PM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in

* The kind IDs may change across restarts, meaning that any stats data
associated to a custom kind is stored with the *name* of the custom
stats kind. Depending on the discussion happening here, I'd be open
to use the same concept as custom RMGRs, where custom kind IDs are
"reserved", fixed in time, and tracked in the Postgres wiki. It is
cheaper to store the stats this way, as well, while managing conflicts
across extensions available in the community ecosystem.

I prefer to avoid having a central database if possible.

I was thinking the same originally, but the experience with custom
RMGRs has made me change my mind. There are more of these than I
thought originally:
https://wiki.postgresql.org/wiki/CustomWALResourceManagers

If we don't intend to move stats data alone out of a cluster for use
in another one, can't we store the relationship between stats names
and numeric IDs (or index numbers) in a separate file, which is loaded
just before and synced just after extension preloading finishes?

Yeah, I've implemented a prototype that does exactly something like
that with a restriction on the stats name to NAMEDATALEN, except that
I've added the kind ID <-> kind name mapping at the beginning of the
main stats file. At the end, it still felt weird and over-engineered
to me, like the v1 prototype of upthread, because we finish with a
strange mix when reloading the dshash where the builtin ID are handled
with fixed values, with more code paths required when doing the
serialize callback dance for stats kinds like replication slots,
because the custom kinds need to update their hash keys to the new
values based on the ID/name mapping stored at the beginning of the
file itself.

The equation complicates itself a bit more once you'd try to add more
ways to write some stats kinds to other places, depending on what a
custom kind wants to achieve. I can see the benefits of both
approaches, still fixing the IDs in time leads to a lot of simplicity
in this infra, which is very appealing on its own before tackling the
next issues where I would rely on the proposed APIs.
--
Michael

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#6)

Re: Pluggable cumulative statistics

On Fri, Jun 21, 2024 at 08:13:15AM +0900, Michael Paquier wrote:

On Thu, Jun 20, 2024 at 02:27:14PM +0000, Bertrand Drouvot wrote:

On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote:
I think it makes sense to follow the same "behavior" as the custom
wal resource managers. That, indeed, looks much more simpler than v1.

Thanks for the feedback.

While looking at a different patch from Tristan in this area at [1]/messages/by-id/ZoNytpoHOzHGBLYi@paquier.xyz -- Michael, I
still got annoyed that this patch set was not able to support the case
of custom fixed-numbered stats, so as it is possible to plug in
pgstats things similar to the archiver, the checkpointer, WAL, etc.
These are plugged in shared memory, and are handled with copies in the
stats snapshots. After a good night of sleep, I have come up with a
good solution for that, among the following lines:
- PgStat_ShmemControl holds an array of void* indexed by
PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each
fixed-numbered stats. Each entry is allocated a size corresponding to
PgStat_KindInfo->shared_size.
- PgStat_Snapshot holds an array of void* also indexed by
PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the
snapshots. These have a size of PgStat_KindInfo->shared_data_len, set
up when stats are initialized at process startup, so this reflects
everywhere.
- Fixed numbered stats now set shared_size, and we use this number to
determine the size to allocate for each fixed-numbered stats in shmem.
- A callback is added to initialize the shared memory assigned to each
fixed-numbered stats, consisting of LWLock initializations for the
current types of stats. So this initialization step is moved out of
pgstat.c into each stats kind file.

All that has been done in the rebased patch set as of 0001, which is
kind of a nice cleanup overall because it removes all the dependencies
to the fixed-numbered stats structures from the "main" pgstats code in
pgstat.c and pgstat_shmem.c.

The remaining patches consist of:
- 0002, Switch PgStat_Kind to a uint32. Cleanup.
- 0003 introduces the pluggable stats facility. Feeding on the
refactoring for the fixed-numbered stats in 0001, it is actually
possible to get support for these in the pluggable APIs by just
removing the restriction in the registration path. This extends the
void* arrays to store references that cover the range of custom kind
IDs.
- 0004 has some docs.
- 0005 includes an example of implementation for variable-numbered
stats with the injection_points module.
- 0006 is new for this thread, implementing an example for
fixed-numbered stats, using again the injection_points module. This
stuff gathers stats about the number of times points are run, attached
and detached. Perhaps that's useful in itself, I don't know, but it
provides the coverage I want for this facility.

While on it, I have applied one of the cleanup patches as
9fd02525793f.

[1]: /messages/by-id/ZoNytpoHOzHGBLYi@paquier.xyz -- Michael
--
Michael

#10

Andrei Lepikhov

lepihov@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#1)

Re: Pluggable cumulative statistics

On 6/13/24 14:59, Michael Paquier wrote:

This will hopefully spark a discussion, and I was looking for answers
regarding these questions:
- Should the pgstat_kind_infos array in pgstat.c be refactored to use
something similar to pgstat_add_kind?
- How should the persistence of the custom stats be achieved?
Callbacks to give custom stats kinds a way to write/read their data,
push everything into a single file, or support both?
- Should this do like custom RMGRs and assign to each stats kinds ID
that are set in stone rather than dynamic ones?

It is a feature my extensions (which usually change planning behaviour)
definitely need. It is a problem to show the user if the extension does
something or not because TPS smooths the execution time of a single
query and performance cliffs.
BTW, we have 'labelled DSM segments', which allowed extensions to be
'lightweight' - not necessarily be loaded on startup, stay backend-local
and utilise shared resources. It was a tremendous win for me. Is it
possible to design this extension in the same way?

--
regards, Andrei Lepikhov

#11

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andrei Lepikhov (#10)

Re: Pluggable cumulative statistics

On Thu, Jul 04, 2024 at 10:11:02AM +0700, Andrei Lepikhov wrote:

It is a feature my extensions (which usually change planning behaviour)
definitely need. It is a problem to show the user if the extension does
something or not because TPS smooths the execution time of a single query
and performance cliffs.

Yeah, I can get that. pgstat.c is quite good regarding that as it
delays stats flushes until commit by holding pending entries (see the
pgStatPending business for variable-size stats). Custom stats kinds
registered would just rely on these facilities, including snapshot
APIs, etc.

BTW, we have 'labelled DSM segments', which allowed extensions to be
'lightweight' - not necessarily be loaded on startup, stay backend-local and
utilise shared resources. It was a tremendous win for me.

Is it possible to design this extension in the same way?

I am not sure how this would be useful when it comes to cumulative
statistics, TBH. These stats are global by design, and especially
since these most likely need to be flushed at shutdown (as of HEAD)
and read at startup, the simplest way to achieve that to let the
checkpointer and the startup process know about them is to restrict
the registration of custom stats types via _PG_init when loading
shared libraries. That's what we do for custom WAL RMGRs, for
example.

I would not be against a new flag in KindInfo to state that a given
stats type should not be flushed, as much as a set of callbacks that
offers the possibility to redirect some stats kinds to somewhere else
than pgstat.stat, like pg_stat_statements. That would be a separate
patch than what's proposed here.
--
Michael

#12

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#9)

Re: Pluggable cumulative statistics

Hi,

On Wed, Jul 03, 2024 at 06:47:15PM +0900, Michael Paquier wrote:

While looking at a different patch from Tristan in this area at [1], I
still got annoyed that this patch set was not able to support the case
of custom fixed-numbered stats, so as it is possible to plug in
pgstats things similar to the archiver, the checkpointer, WAL, etc.
These are plugged in shared memory, and are handled with copies in the
stats snapshots. After a good night of sleep, I have come up with a
good solution for that,

Great!

among the following lines:
- PgStat_ShmemControl holds an array of void* indexed by
PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each
fixed-numbered stats. Each entry is allocated a size corresponding to
PgStat_KindInfo->shared_size.

That makes sense to me, and that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS)
as compared to now.

- PgStat_Snapshot holds an array of void* also indexed by
PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the
snapshots.

Same, that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) as compared to now.

These have a size of PgStat_KindInfo->shared_data_len, set
up when stats are initialized at process startup, so this reflects
everywhere.

Yeah.

- Fixed numbered stats now set shared_size, and we use this number to
determine the size to allocate for each fixed-numbered stats in shmem.
- A callback is added to initialize the shared memory assigned to each
fixed-numbered stats, consisting of LWLock initializations for the
current types of stats. So this initialization step is moved out of
pgstat.c into each stats kind file.

That looks a reasonable approach to me.

All that has been done in the rebased patch set as of 0001, which is
kind of a nice cleanup overall because it removes all the dependencies
to the fixed-numbered stats structures from the "main" pgstats code in
pgstat.c and pgstat_shmem.c.

Looking at 0001:

1 ===

In the commit message:

- Fixed numbered stats now set shared_size, so as

Is something missing in that sentence?

2 ===

@@ -425,14 +427,12 @@ typedef struct PgStat_ShmemControl
pg_atomic_uint64 gc_request_count;

        /*
-        * Stats data for fixed-numbered objects.
+        * Stats data for fixed-numbered objects, indexed by PgStat_Kind.
+        *
+        * Each entry has a size of PgStat_KindInfo->shared_size.
         */
-       PgStatShared_Archiver archiver;
-       PgStatShared_BgWriter bgwriter;
-       PgStatShared_Checkpointer checkpointer;
-       PgStatShared_IO io;
-       PgStatShared_SLRU slru;
-       PgStatShared_Wal wal;
+       void       *fixed_data[PGSTAT_NUM_KINDS];

Can we move from PGSTAT_NUM_KINDS to the exact number of fixed stats? (add
a new define PGSTAT_NUM_FIXED_KINDS for example). That's not a big deal but we
are allocating some space for pointers that we won't use. Would need to change
the "indexing" logic though.

3 ===

Same as 2 === but for PgStat_Snapshot.

4 ===

+static void pgstat_init_snapshot(void);

what about pgstat_init_snapshot_fixed? (as it is for fixed-numbered statistics
only).

5 ===

+ /* Write various stats structs with fixed number of objects */

s/Write various stats/Write the stats/? (not coming from your patch but they
all were listed before though).

6 ===

+       for (int kind = PGSTAT_KIND_FIRST_VALID; kind <= PGSTAT_KIND_LAST; kind++)
+       {
+               char       *ptr;
+               const PgStat_KindInfo *info = pgstat_get_kind_info(kind);
+
+               if (!info->fixed_amount)
+                       continue;

Nit: Move the "ptr" declaration into an extra else? (useless to declare it
if it's not a fixed number stat)

7 ===

+               /* prepare snapshot data and write it */
+               pgstat_build_snapshot_fixed(kind);

What about changing pgstat_build_snapshot_fixed() to accept a PgStat_KindInfo
parameter (instead of the current PgStat_Kind one)? Reason is that
pgstat_get_kind_info() is already called/known in pgstat_snapshot_fixed(),
pgstat_build_snapshot() and pgstat_write_statsfile(). That would avoid
pgstat_build_snapshot_fixed() to retrieve (again) the kind_info.

8 ===

/*
* Reads in existing statistics file into the shared stats hash.

This comment above pgstat_read_statsfile() is not correct, fixed stats
are not going to the hash (was there before your patch though).

9 ===

+pgstat_archiver_init_shmem_cb(void *stats)
+{
+       PgStatShared_Archiver *stats_shmem = (PgStatShared_Archiver *) stats;
+
+       LWLockInitialize(&stats_shmem->lock, LWTRANCHE_PGSTATS_DATA);

Nit: Almost all the pgstat_XXX_init_shmem_cb() look very similar, I wonder if we
could use a macro to avoid code duplication.

10 ===

Remark not related to this patch: I think we could get rid of the shared_data_off
for the fixed stats (by moving the "stats" part at the header of their dedicated
struct). That would mean having things like:

"
typedef struct PgStatShared_Archiver
{
PgStat_ArchiverStats stats;
/* lock protects ->reset_offset as well as stats->stat_reset_timestamp */
LWLock lock;
uint32 changecount;
PgStat_ArchiverStats reset_offset;
} PgStatShared_Archiver;
"

Not sure that's worth it though.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#13

Andres Freund

andres@anarazel.de

almost 2 years ago

In reply to: Michael Paquier (#9)

Re: Pluggable cumulative statistics

Hi,

On 2024-07-03 18:47:15 +0900, Michael Paquier wrote:

While looking at a different patch from Tristan in this area at [1], I
still got annoyed that this patch set was not able to support the case
of custom fixed-numbered stats, so as it is possible to plug in
pgstats things similar to the archiver, the checkpointer, WAL, etc.
These are plugged in shared memory, and are handled with copies in the
stats snapshots. After a good night of sleep, I have come up with a
good solution for that, among the following lines:

- PgStat_ShmemControl holds an array of void* indexed by
PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each
fixed-numbered stats. Each entry is allocated a size corresponding to
PgStat_KindInfo->shared_size.

I am dubious this is a good idea. The more indirection you add, the more
expensive it gets to count stuff, the more likely it is that we end up with
backend-local "caching" in front of the stats system.

IOW, I am against making builtin stats pay the price for pluggable
fixed-numbered stats.

It also substantially reduces type-safety, making it harder to refactor. Note
that you had to add static casts in a good number of additional places.

Greetings,

Andres Freund

#14

Andres Freund

andres@anarazel.de

almost 2 years ago

In reply to: Michael Paquier (#1)

Re: Pluggable cumulative statistics

Hi,

On 2024-06-13 16:59:50 +0900, Michael Paquier wrote:

* Making custom stats data persistent is an interesting problem, and
there are a couple of approaches I've considered:
** Allow custom kinds to define callbacks to read and write data from
a source they'd want, like their own file through a fd. This has the
disadvantage to remove the benefit of c) above.

I am *strongly* against this. That'll make it much harder to do stuff like not
resetting stats after crashes and just generally will make it harder to
improve the stats facility further.

I think that pluggable users of the stats facility should only have control
over how data is stored via quite generic means.

Greetings,

Andres Freund

#15

Andres Freund

andres@anarazel.de

almost 2 years ago

In reply to: Andres Freund (#14)

Re: Pluggable cumulative statistics

Hi,

On 2024-07-04 14:00:47 -0700, Andres Freund wrote:

On 2024-06-13 16:59:50 +0900, Michael Paquier wrote:

* Making custom stats data persistent is an interesting problem, and
there are a couple of approaches I've considered:
** Allow custom kinds to define callbacks to read and write data from
a source they'd want, like their own file through a fd. This has the
disadvantage to remove the benefit of c) above.

I am *strongly* against this. That'll make it much harder to do stuff like not
resetting stats after crashes and just generally will make it harder to
improve the stats facility further.

I think that pluggable users of the stats facility should only have control
over how data is stored via quite generic means.

I forgot to say: In general I am highly supportive of this effort and thankful
to Michael for tackling it. The above was just about that one aspect.

Greetings,

Andres Freund

#16

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andres Freund (#14)

Re: Pluggable cumulative statistics

On Thu, Jul 04, 2024 at 02:00:47PM -0700, Andres Freund wrote:

On 2024-06-13 16:59:50 +0900, Michael Paquier wrote:

* Making custom stats data persistent is an interesting problem, and
there are a couple of approaches I've considered:
** Allow custom kinds to define callbacks to read and write data from
a source they'd want, like their own file through a fd. This has the
disadvantage to remove the benefit of c) above.

I am *strongly* against this. That'll make it much harder to do stuff like not
resetting stats after crashes and just generally will make it harder to
improve the stats facility further.

I think that pluggable users of the stats facility should only have control
over how data is stored via quite generic means.

I'm pretty much on the same line here, I think. If the redo logic is
changed, then any stats kinds pushing their stats into their own file
would need to copy/paste the same logic as the main file. And that's
more error prone.

I can get why some people would get that they don't want some stats
kinds to never be flushed at shutdown or even read at startup. Adding
more callbacks in this area is a separate discussion.
--
Michael

#17

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andres Freund (#15)

Re: Pluggable cumulative statistics

On Thu, Jul 04, 2024 at 02:08:25PM -0700, Andres Freund wrote:

I forgot to say: In general I am highly supportive of this effort and thankful
to Michael for tackling it. The above was just about that one aspect.

Thanks. Let's discuss how people want this stuff to be shaped, and
how much we want to cover. Better to do it one small step at a time.
--
Michael

#18

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Andres Freund (#13)

Re: Pluggable cumulative statistics

On Thu, Jul 04, 2024 at 01:56:52PM -0700, Andres Freund wrote:

On 2024-07-03 18:47:15 +0900, Michael Paquier wrote:

- PgStat_ShmemControl holds an array of void* indexed by
PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each
fixed-numbered stats. Each entry is allocated a size corresponding to
PgStat_KindInfo->shared_size.

I am dubious this is a good idea. The more indirection you add, the more
expensive it gets to count stuff, the more likely it is that we end up with
backend-local "caching" in front of the stats system.

IOW, I am against making builtin stats pay the price for pluggable
fixed-numbered stats.

Okay, noted. So, if I get that right, you would prefer an approach
where we add an extra member in the snapshot and shmem control area
dedicated only to the custom kind IDs, indexed based on the range
of the custom kind IDs, leaving the built-in fixed structures in
PgStat_ShmemControl and PgStat_Snapshot?

I was feeling a bit uncomfortable with the extra redirection for the
built-in fixed kinds, still the temptation of making that more generic
was here, so..

Having the custom fixed types point to their own array in the snapshot
and ShmemControl adds a couple more null-ness checks depending on if
you're dealing with a builtin or custom ID range. That's mostly the
path in charge of retrieving the KindInfos.

It also substantially reduces type-safety, making it harder to refactor. Note
that you had to add static casts in a good number of additional places.

Not sure on this one, because that's the same issue as
variable-numbered stats, no? The central dshash only knows about the
size of the shared stats entries for each kind, with an offset to the
stats data that gets copied to the snapshots. So I don't quite get
the worry here.

Separately from that, I think that read/write of the fixed-numbered
stats would gain in clarity if we update them to be closer to the
variable-numbers by storing entries with a specific character ('F' in
0001). If we keep track of the fixed-numbered structures in
PgStat_Snapshot, that means adding an extra field in PgStat_KindInfo
to point to the offset in PgStat_Snapshot for the write part. Note
that the addition of the init_shmem callback simplifies shmem init,
and it is also required for the fixed-numbered pluggable part.
--
Michael

#19

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Bertrand Drouvot (#12)

Re: Pluggable cumulative statistics

On Thu, Jul 04, 2024 at 11:30:17AM +0000, Bertrand Drouvot wrote:

On Wed, Jul 03, 2024 at 06:47:15PM +0900, Michael Paquier wrote:

among the following lines:
- PgStat_ShmemControl holds an array of void* indexed by
PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each
fixed-numbered stats. Each entry is allocated a size corresponding to
PgStat_KindInfo->shared_size.

That makes sense to me, and that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS)
as compared to now.

pgstat_io.c is by far the largest chunk.

- PgStat_Snapshot holds an array of void* also indexed by
PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the
snapshots.

Same, that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) as compared to now.

Still Andres does not seem to like that much, well ;)

Looking at 0001:

1 ===

In the commit message:

- Fixed numbered stats now set shared_size, so as

Is something missing in that sentence?

Right. This is missing a piece.

-       PgStatShared_Archiver archiver;
-       PgStatShared_BgWriter bgwriter;
-       PgStatShared_Checkpointer checkpointer;
-       PgStatShared_IO io;
-       PgStatShared_SLRU slru;
-       PgStatShared_Wal wal;
+       void       *fixed_data[PGSTAT_NUM_KINDS];
Can we move from PGSTAT_NUM_KINDS to the exact number of fixed stats? (add
a new define PGSTAT_NUM_FIXED_KINDS for example). That's not a big deal but we
are allocating some space for pointers that we won't use. Would need to change
the "indexing" logic though.

3 ===

Same as 2 === but for PgStat_Snapshot.

True for both. Based on the first inputs I got from Andres, the
built-in fixed stats structures would be kept as they are now, and we
could just add an extra member here for the custom fixed stats. That
still results in a few bytes wasted as not all custom stats want fixed
stats, but that's much cheaper.

4 ===

+static void pgstat_init_snapshot(void);

what about pgstat_init_snapshot_fixed? (as it is for fixed-numbered statistics
only).

Sure.

5 ===

+ /* Write various stats structs with fixed number of objects */

s/Write various stats/Write the stats/? (not coming from your patch but they
all were listed before though).

Yes, there are a few more things about that.

6 ===

+       for (int kind = PGSTAT_KIND_FIRST_VALID; kind <= PGSTAT_KIND_LAST; kind++)
+       {
+               char       *ptr;
+               const PgStat_KindInfo *info = pgstat_get_kind_info(kind);
+
+               if (!info->fixed_amount)
+                       continue;

Nit: Move the "ptr" declaration into an extra else? (useless to declare it
if it's not a fixed number stat)

Comes down to one's taste. I think that this is OK as-is, but that's
my taste.

7 ===
+               /* prepare snapshot data and write it */
+               pgstat_build_snapshot_fixed(kind);
What about changing pgstat_build_snapshot_fixed() to accept a PgStat_KindInfo
parameter (instead of the current PgStat_Kind one)? Reason is that
pgstat_get_kind_info() is already called/known in pgstat_snapshot_fixed(),
pgstat_build_snapshot() and pgstat_write_statsfile(). That would avoid
pgstat_build_snapshot_fixed() to retrieve (again) the kind_info.

pgstat_snapshot_fixed() only calls pgstat_get_kind_info() with
assertions enabled. Perhaps we could do that, just that it does not
seem that critical to me.

8 ===

/*
* Reads in existing statistics file into the shared stats hash.

This comment above pgstat_read_statsfile() is not correct, fixed stats
are not going to the hash (was there before your patch though).

Good catch. Let's adjust that separately.

9 ===
+pgstat_archiver_init_shmem_cb(void *stats)
+{
+       PgStatShared_Archiver *stats_shmem = (PgStatShared_Archiver *) stats;
+
+       LWLockInitialize(&stats_shmem->lock, LWTRANCHE_PGSTATS_DATA);
Nit: Almost all the pgstat_XXX_init_shmem_cb() look very similar, I wonder if we
could use a macro to avoid code duplication.

They are very similar, still can do different things like pgstat_io.
I am not sure that the macro would bring more readability.

Remark not related to this patch: I think we could get rid of the shared_data_off
for the fixed stats (by moving the "stats" part at the header of their dedicated
struct). That would mean having things like:

"
typedef struct PgStatShared_Archiver
{
PgStat_ArchiverStats stats;
/* lock protects ->reset_offset as well as stats->stat_reset_timestamp */
LWLock lock;
uint32 changecount;
PgStat_ArchiverStats reset_offset;
} PgStatShared_Archiver;
"

I'm not really convinced that it is a good idea to force the ordering
of the members in the shared structures for the fixed-numbered stats,
requiring these "stats" fields to always be first.
--
Michael

#20

Dmitry Dolgov

9erthalion6@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#8)

Re: Pluggable cumulative statistics

On Fri, Jun 21, 2024 at 01:28:11PM +0900, Michael Paquier wrote:
On Fri, Jun 21, 2024 at 01:09:10PM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in

* The kind IDs may change across restarts, meaning that any stats data
associated to a custom kind is stored with the *name* of the custom
stats kind. Depending on the discussion happening here, I'd be open
to use the same concept as custom RMGRs, where custom kind IDs are
"reserved", fixed in time, and tracked in the Postgres wiki. It is
cheaper to store the stats this way, as well, while managing conflicts
across extensions available in the community ecosystem.

I prefer to avoid having a central database if possible.

I was thinking the same originally, but the experience with custom
RMGRs has made me change my mind. There are more of these than I
thought originally:
https://wiki.postgresql.org/wiki/CustomWALResourceManagers

From what I understand, coordinating custom RmgrIds via a wiki page was
made under the assumption that implementing a table AM with custom WAL
requires significant efforts, which limits the demand for ids. This
might not be same for custom stats -- I've got an impression it's easier
to create one, and there could be multiple kinds of stats per an
extension (one per component), right? This would mean more kind Ids to
manage and more efforts required to do that.

I agree though that it makes sense to start this way, it's just simpler.
But maybe it's worth thinking about some other solution in the long
term, taking the over-engineered prototype as a sign that more
refactoring is needed.

#21

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Dmitry Dolgov (#20)

#22

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Michael Paquier (#19)

#23

Bertrand Drouvot