RFC: pg_stat_logmsg

Started by Joe Conwayover 2 years ago13 messages

mail@joeconway.com

over 2 years ago

1 attachment(s)

Greetings,

Attached please find a tarball (rather than a patch) for a proposed new
contrib extension, pg_stat_logmsg.

The basic idea is to mirror how pg_stat_statements works, except the
logged messages keyed by filename, lineno, and elevel are saved with a
aggregate count. The format string is displayed (similar to a query
jumble) for context, along with function name and sqlerrcode.

I threw this together rather quickly over the past couple of days
between meetings, so not claiming that it is committable (and lacks
documentation and regression tests as well), but I would love to get
feedback on:

1/ the general concept
2/ the pg_stat_statement-like implementation
3/ contrib vs core vs external project

Some samples and data:

`make installcheck` with the extension loaded:
8<------------------
# All 215 tests passed.

real 2m24.854s
user 0m0.086s
sys 0m0.283s
8<------------------

`make installcheck` without the extension loaded:
8<------------------

# All 215 tests passed.

real 2m26.765s
user 0m0.076s
sys 0m0.293s
8<------------------

Sample output after running make installcheck a couple times (plus a few
manually generated ERRORs):

8<------------------
test=# select sum(count) from pg_stat_logmsg where elevel > 20;
sum
-------
10554
(1 row)

8<------------------

Part of the thinking is that people with fleets of postgres instances
can use this to scan for various errors that they care about.
Additionally it would be useful to look for "should not happen" errors.

I will register this in the July CF and will appreciate feedback.

Thanks!

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Pavel Stehule

pavel.stehule@gmail.com

over 2 years ago

In reply to: Joe Conway (#1)

Re: RFC: pg_stat_logmsg

so 1. 7. 2023 v 1:57 odesílatel Joe Conway <mail@joeconway.com> napsal:

Greetings,

Attached please find a tarball (rather than a patch) for a proposed new
contrib extension, pg_stat_logmsg.

The basic idea is to mirror how pg_stat_statements works, except the
logged messages keyed by filename, lineno, and elevel are saved with a
aggregate count. The format string is displayed (similar to a query
jumble) for context, along with function name and sqlerrcode.

I threw this together rather quickly over the past couple of days
between meetings, so not claiming that it is committable (and lacks
documentation and regression tests as well), but I would love to get
feedback on:

1/ the general concept
2/ the pg_stat_statement-like implementation
3/ contrib vs core vs external project

Some samples and data:

`make installcheck` with the extension loaded:
8<------------------
# All 215 tests passed.

real 2m24.854s
user 0m0.086s
sys 0m0.283s
8<------------------

`make installcheck` without the extension loaded:
8<------------------

# All 215 tests passed.

real 2m26.765s
user 0m0.076s
sys 0m0.293s
8<------------------

Sample output after running make installcheck a couple times (plus a few
manually generated ERRORs):

8<------------------
test=# select sum(count) from pg_stat_logmsg where elevel > 20;
sum
-------
10554
(1 row)

test=# \x
Expanded display is on.
test=# select * from pg_stat_logmsg where elevel > 20 order by count desc;
-[ RECORD 1 ]-------------------------------
filename | aclchk.c
lineno | 2811
elevel | 21
funcname | aclcheck_error
sqlerrcode | 42501
message | permission denied for schema %s
count | 578
-[ RECORD 2 ]-------------------------------
filename | scan.l
lineno | 1241
elevel | 21
funcname | scanner_yyerror
sqlerrcode | 42601
message | %s at or near "%s"
count | 265
...

test=# select * from pg_stat_logmsg where elevel > 20 and sqlerrcode =
'XX000';
-[ RECORD 1 ]---------------------------------------
filename | tid.c
lineno | 352
elevel | 21
funcname | currtid_for_view
sqlerrcode | XX000
message | ctid isn't of type TID
count | 2
-[ RECORD 2 ]---------------------------------------
filename | pg_locale.c
lineno | 2493
elevel | 21
funcname | pg_ucol_open
sqlerrcode | XX000
message | could not open collator for locale "%s": %s
count | 2
...

8<------------------

Part of the thinking is that people with fleets of postgres instances
can use this to scan for various errors that they care about.
Additionally it would be useful to look for "should not happen" errors.

I will register this in the July CF and will appreciate feedback.

This can be a very interesting feature. I like it.

Regards

Pavel

Show quoted text

Thanks!

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Joe Conway

mail@joeconway.com

over 2 years ago

In reply to: Pavel Stehule (#2)

1 attachment(s)

Re: RFC: pg_stat_logmsg

On 6/30/23 23:20, Pavel Stehule wrote:

so 1. 7. 2023 v 1:57 odesílatel Joe Conway <mail@joeconway.com
<mailto:mail@joeconway.com>> napsal:
Part of the thinking is that people with fleets of postgres instances
can use this to scan for various errors that they care about.
Additionally it would be useful to look for "should not happen" errors.

I will register this in the July CF and will appreciate feedback.

This can be a very interesting feature. I like it.

Thanks!

FWIW, I just modified it to provide the localized text of the elevel
rather than the internal number. I also localized the message format string:

8<------------------------------
psql (16beta2)
Type "help" for help.

New tarball attached.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Masahiko Sawada

sawada.mshk@gmail.com

over 2 years ago

In reply to: Joe Conway (#1)

Re: RFC: pg_stat_logmsg

Hi,

On Sat, Jul 1, 2023 at 8:57 AM Joe Conway <mail@joeconway.com> wrote:

Greetings,

Attached please find a tarball (rather than a patch) for a proposed new
contrib extension, pg_stat_logmsg.

The basic idea is to mirror how pg_stat_statements works, except the
logged messages keyed by filename, lineno, and elevel are saved with a
aggregate count. The format string is displayed (similar to a query
jumble) for context, along with function name and sqlerrcode.

Part of the thinking is that people with fleets of postgres instances
can use this to scan for various errors that they care about.
Additionally it would be useful to look for "should not happen" errors.

Interesting idea and use cases.

I'm concerned that displaying the format string could not be helpful
in some cases. For example, when raising an ERROR while reading WAL
records, we typically write the error message stored in
record->errormsg_buf:

in ReadRecord():
if (errormsg)
ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
(errmsg_internal("%s", errormsg) /* already
translated */ ));

In this case, the error message stored in pg_stat_logmsg is just '%s'.
The filename and line number columns might help identify the actual
error but it requires users to read the source code and cannot know
the actual error message.

A similar case is where we construct the error message on the fly. For
example, in LogRecoveryConflict() the string of the recovery conflict
description comes from get_recovery_conflict_desc():

in LogRecoveryConflict():
ereport(LOG,
errmsg("recovery still waiting after %ld.%03d ms: %s",
msecs, usecs, get_recovery_conflict_desc(reason)),
nprocs > 0 ? errdetail_log_plural("Conflicting process: %s.",
"Conflicting processes: %s.",
nprocs, buf.data) : 0);

The user might want to search the error message by the actual conflict
reason, but cannot. In this case, I'd like to see the actual error
message (I'd like to normalize the number part, though).

That being said, using the format string for the error messages like
"ERROR: relation "nonexist_table" does not exist" makes sense to me
since we can avoid having too many similar entries.

So I feel that we might need to figure out what part of the log
message should be normalized like pg_stat_statement does for query
strings.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Gurjeet Singh

gurjeet@singh.im

over 2 years ago

In reply to: Masahiko Sawada (#4)

Re: RFC: pg_stat_logmsg

On Thu, Jul 6, 2023 at 12:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jul 1, 2023 at 8:57 AM Joe Conway <mail@joeconway.com> wrote:

The basic idea is to mirror how pg_stat_statements works, except the
logged messages keyed by filename, lineno, and elevel are saved with a
aggregate count. The format string is displayed (similar to a query
jumble) for context, along with function name and sqlerrcode.

Part of the thinking is that people with fleets of postgres instances
can use this to scan for various errors that they care about.
Additionally it would be useful to look for "should not happen" errors.

I'm concerned that displaying the format string could not be helpful
in some cases. For example, when raising an ERROR while reading WAL
records, we typically write the error message stored in
record->errormsg_buf:

in ReadRecord():
if (errormsg)
ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
(errmsg_internal("%s", errormsg) /* already
translated */ ));

In this case, the error message stored in pg_stat_logmsg is just '%s'.
The filename and line number columns might help identify the actual
error but it requires users to read the source code and cannot know
the actual error message.

I believe that the number of such error messages, the ones with very
little descriptive content, is very low in Postgres code. These kinds
of messages are not prevalent, and must be used when code hits obscure
conditions, like seen in your example above. These are the kinds of
errors that Joe is referring to as "should not happen". In these
cases, even if the error message was descriptive, the user will very
likely have to dive deep into code to find out the real cause.

I feel that the unique combination of file name and line number is
useful information, even in cases where the format string not very
descriptive. So I believe the extension's behaviour in this regard is
acceptable.

In cases where the extension's output is not descriptive enough, the
user can use the filename:lineno pair to look for fully formed error
messages in the actual log files; they may have to make appropriate
changes to log_* parameters, though.

If we still feel strongly about getting the actual message for these
cases, perhaps we can develop a heuristic to identify such messages,
and use either full or a prefix of the 'message' field, instead of
'message_id' field. The heuristic could be: strlen(message_id) is too
short, or that message_id is all/overwhelmingly format specifiers
(e.g. '%s: %d').

The core benefit of this extension is to make it easy for the user to
discover error messages affecting their workload. The user may be
interested in knowing the most common log messages in their server
logs, so that they can work on reducing those errors or warnings. Or
they may be interested in knowing when their workload hits
unexpected/serious error messages, even if they're infrequent. The
data exposed by pg_stat_logmsg view would serve as a starting point,
but I'm guessing it would not be sufficient for their investigation.

On Fri, Jun 30, 2023 at 4:57 PM Joe Conway <mail@joeconway.com> wrote:

I would love to get
feedback on:

1/ the general concept
2/ the pg_stat_statement-like implementation
3/ contrib vs core vs external project

+1 for the idea; a monitoring system trained at this view can bubble
up problems to users that may otherwise go unnoticed. Piggybacking on,
or using pg_stat_statments implementation as a model seems fine. I
believe making this available as a contrib module hits the right
balance; not all installations may want this, hence in-core, always
collecting data, seems undesirable; if it's an external project, it
won't be discoverable, I see that as a very high bar for someone to
trust it and begin to use it.

Best regards,
Gurjeet
http://Gurje.et

Joe Conway

mail@joeconway.com

over 2 years ago

In reply to: Gurjeet Singh (#5)

Re: RFC: pg_stat_logmsg

On 7/7/23 01:38, Gurjeet Singh wrote:

On Thu, Jul 6, 2023 at 12:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jul 1, 2023 at 8:57 AM Joe Conway <mail@joeconway.com> wrote:

The basic idea is to mirror how pg_stat_statements works, except the
logged messages keyed by filename, lineno, and elevel are saved with a
aggregate count. The format string is displayed (similar to a query
jumble) for context, along with function name and sqlerrcode.

Part of the thinking is that people with fleets of postgres instances
can use this to scan for various errors that they care about.
Additionally it would be useful to look for "should not happen" errors.

I'm concerned that displaying the format string could not be helpful
in some cases. For example, when raising an ERROR while reading WAL
records, we typically write the error message stored in
record->errormsg_buf:

in ReadRecord():
if (errormsg)
ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
(errmsg_internal("%s", errormsg) /* already
translated */ ));

In this case, the error message stored in pg_stat_logmsg is just '%s'.
The filename and line number columns might help identify the actual
error but it requires users to read the source code and cannot know
the actual error message.

I believe that the number of such error messages, the ones with very
little descriptive content, is very low in Postgres code. These kinds
of messages are not prevalent, and must be used when code hits obscure
conditions, like seen in your example above. These are the kinds of
errors that Joe is referring to as "should not happen". In these
cases, even if the error message was descriptive, the user will very
likely have to dive deep into code to find out the real cause.

Agreed. As an example, after running `make installcheck`

8<-----------------
select sum(count) from pg_stat_logmsg;
sum
------
6005
(1 row)

select message,sum(count)
from pg_stat_logmsg
where length(message) < 30
and elevel in ('ERROR','FATAL','PANIC')
and message like '%\%s%' escape '\'
group by message
order by length(message);
message | sum
-------------------------------+-----
%s | 107
"%s" is a view | 9
"%s" is a table | 4
%s is a procedure | 1
invalid size: "%s" | 13
%s at or near "%s" | 131
%s at end of input | 3
...
8<-----------------

I only see three message formats there that are generic enough to be of
concern (the first and last two shown -- FWIW I did not see any more of
them as the fmt string gets longer)

So out of 6005 log events, 241 fit this concern.

Perhaps given the small number of message format strings affected, it
would make sense to special case those, but I am not sure it is worth
the effort, at least for version 1.

I feel that the unique combination of file name and line number is
useful information, even in cases where the format string not very
descriptive. So I believe the extension's behaviour in this regard is
acceptable.

In cases where the extension's output is not descriptive enough, the
user can use the filename:lineno pair to look for fully formed error
messages in the actual log files; they may have to make appropriate
changes to log_* parameters, though.

Right

If we still feel strongly about getting the actual message for these
cases, perhaps we can develop a heuristic to identify such messages,
and use either full or a prefix of the 'message' field, instead of
'message_id' field. The heuristic could be: strlen(message_id) is too
short, or that message_id is all/overwhelmingly format specifiers
(e.g. '%s: %d').

Based on the above analysis (though granted, not all inclusive), it
seems like just special casing the specific message format strings of
interest would work.

The core benefit of this extension is to make it easy for the user to
discover error messages affecting their workload. The user may be
interested in knowing the most common log messages in their server
logs, so that they can work on reducing those errors or warnings. Or
they may be interested in knowing when their workload hits
unexpected/serious error messages, even if they're infrequent. The
data exposed by pg_stat_logmsg view would serve as a starting point,
but I'm guessing it would not be sufficient for their investigation.

Yes, exactly.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Joe Conway

mail@joeconway.com

over 2 years ago

In reply to: Joe Conway (#6)

1 attachment(s)

Re: RFC: pg_stat_logmsg

On 7/9/23 14:13, Joe Conway wrote:

On 7/7/23 01:38, Gurjeet Singh wrote:

In this case, the error message stored in pg_stat_logmsg is just '%s'.
The filename and line number columns might help identify the actual
error but it requires users to read the source code and cannot know
the actual error message.

I believe that the number of such error messages, the ones with very
little descriptive content, is very low in Postgres code. These kinds
of messages are not prevalent, and must be used when code hits obscure
conditions, like seen in your example above. These are the kinds of
errors that Joe is referring to as "should not happen". In these
cases, even if the error message was descriptive, the user will very
likely have to dive deep into code to find out the real cause.

Agreed. As an example, after running `make installcheck`

8<-----------------
select sum(count) from pg_stat_logmsg;
sum
------
6005
(1 row)

select message,sum(count)
from pg_stat_logmsg
where length(message) < 30
and elevel in ('ERROR','FATAL','PANIC')
and message like '%\%s%' escape '\'
group by message
order by length(message);
message | sum
-------------------------------+-----
%s | 107
"%s" is a view | 9
"%s" is a table | 4
%s is a procedure | 1
invalid size: "%s" | 13
%s at or near "%s" | 131
%s at end of input | 3
...
8<-----------------

I only see three message formats there that are generic enough to be of
concern (the first and last two shown -- FWIW I did not see any more of
them as the fmt string gets longer)

So out of 6005 log events, 241 fit this concern.

Perhaps given the small number of message format strings affected, it
would make sense to special case those, but I am not sure it is worth
the effort, at least for version 1.

Attached is an update, this time as a patch against 17devel. It is not
quite complete, but getting close IMHO.

Changes:
--------
1. Now is integrated into contrib as a "Additional Supplied Extension"

2. Documentation added

3. I had a verbal conversation with Andres and he reminded me that the
original idea for this was to collect data across fleets of pg hosts so
we could look for how often "should never happen" errors actually
happen. As such, we need to think in terms of aggregating the info from
many hosts, potentially with many localized languages for the messages.
So I converted the "message" column back to an untranslated string, and
added a "translated_message" column which is localized.

An alternative approach might be to provide a separate function that
accepts the message string and returns the translation. Thoughts on that?

4. In the same vein, I added a pgversion column since the filename and
line number for the same log message could vary across major or even
minor releases.

Not done:
---------
1. The main thing lacking at this point is a regression test.

2. No special casing for message == "%s". I am still not convinced it is
worthwhile to do so.

Comments gratefully welcomed.

Thanks,

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

20230912-000.difftext/x-patch; charset=UTF-8; name=20230912-000.diffDownload

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e231..9144cb7 100644
*** a/contrib/Makefile
--- b/contrib/Makefile
*************** SUBDIRS = \
*** 34,39 ****
--- 34,40 ----
  		pg_buffercache	\
  		pg_freespacemap \
  		pg_prewarm	\
+ 		pg_stat_logmsg \
  		pg_stat_statements \
  		pg_surgery	\
  		pg_trgm		\
diff --git a/contrib/meson.build b/contrib/meson.build
index 84d4e18..4e8ac39 100644
*** a/contrib/meson.build
--- b/contrib/meson.build
*************** subdir('pgcrypto')
*** 44,49 ****
--- 44,50 ----
  subdir('pg_freespacemap')
  subdir('pg_prewarm')
  subdir('pgrowlocks')
+ subdir('pg_stat_logmsg')
  subdir('pg_stat_statements')
  subdir('pgstattuple')
  subdir('pg_surgery')
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index ab7e38b..4dfc5dd 100644
*** a/doc/src/sgml/contrib.sgml
--- b/doc/src/sgml/contrib.sgml
*************** CREATE EXTENSION <replaceable>extension_
*** 157,162 ****
--- 157,163 ----
   &pgfreespacemap;
   &pgprewarm;
   &pgrowlocks;
+  &pgstatlogmsg;
   &pgstatstatements;
   &pgstattuple;
   &pgsurgery;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 4c63a7e..acede66 100644
*** a/doc/src/sgml/filelist.sgml
--- b/doc/src/sgml/filelist.sgml
***************
*** 145,150 ****
--- 145,151 ----
  <!ENTITY pgfreespacemap  SYSTEM "pgfreespacemap.sgml">
  <!ENTITY pgprewarm       SYSTEM "pgprewarm.sgml">
  <!ENTITY pgrowlocks      SYSTEM "pgrowlocks.sgml">
+ <!ENTITY pgstatlogmsg    SYSTEM "pgstatlogmsg.sgml">
  <!ENTITY pgstatstatements SYSTEM "pgstatstatements.sgml">
  <!ENTITY pgstattuple     SYSTEM "pgstattuple.sgml">
  <!ENTITY pgsurgery       SYSTEM "pgsurgery.sgml">
diff --git a/contrib/pg_stat_logmsg/.gitignore b/contrib/pg_stat_logmsg/.gitignore
index ...5dcb3ff .
*** a/contrib/pg_stat_logmsg/.gitignore
--- b/contrib/pg_stat_logmsg/.gitignore
***************
*** 0 ****
--- 1,4 ----
+ # Generated subdirectories
+ /log/
+ /results/
+ /tmp_check/
diff --git a/contrib/pg_stat_logmsg/LICENSE b/contrib/pg_stat_logmsg/LICENSE
index ...998f814 .
*** a/contrib/pg_stat_logmsg/LICENSE
--- b/contrib/pg_stat_logmsg/LICENSE
***************
*** 0 ****
--- 1,4 ----
+ This code is released under the PostgreSQL licence, as given at
+ http://www.postgresql.org/about/licence/
+ 
+ Copyright is novated to the PostgreSQL Global Development Group.
diff --git a/contrib/pg_stat_logmsg/Makefile b/contrib/pg_stat_logmsg/Makefile
index ...73e2f0e .
*** a/contrib/pg_stat_logmsg/Makefile
--- b/contrib/pg_stat_logmsg/Makefile
***************
*** 0 ****
--- 1,29 ----
+ # contrib/pg_stat_logmsg/Makefile
+ 
+ MODULE_big = pg_stat_logmsg
+ OBJS = \
+ 	$(WIN32RES) \
+ 	pg_stat_logmsg.o
+ 
+ EXTENSION = pg_stat_logmsg
+ DATA = pg_stat_logmsg--1.0.sql
+ PGFILEDESC = "pg_stat_logmsg - statistics of log messages"
+ 
+ LDFLAGS_SL += $(filter -lm, $(LIBS))
+ 
+ REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_logmsg/pg_stat_logmsg.conf
+ REGRESS = select
+ # Disabled because these tests require "shared_preload_libraries=pg_stat_logmsg",
+ # which typical installcheck users do not have (e.g. buildfarm clients).
+ NO_INSTALLCHECK = 1
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_stat_logmsg
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff --git a/contrib/pg_stat_logmsg/README.md b/contrib/pg_stat_logmsg/README.md
index ...b16ef1f .
*** a/contrib/pg_stat_logmsg/README.md
--- b/contrib/pg_stat_logmsg/README.md
***************
*** 0 ****
--- 1,137 ----
+ # PostgreSQL pg_stat_logmsg Extension Module
+ 
+ ## Overview
+ This extension adds support to track statistics of log messages into PostgreSQL. It behaves in a very similar manner to pg_stat_stayements, except counts are aggregated by log message file, line number, and level. The SQL error code and log message format string are also provided for context.
+ 
+ ## Syntax
+ 
+ 
+ 
+ ## Configuration
+ * Add pg_stat_logmsg to shared_preload_libraries in postgresql.conf.
+ ```
+ shared_preload_libraries = 'pg_stat_logmsg'
+ ```
+ * The following custom parameters may be set:
+ ```
+ # Sets the maximum number of messages tracked by pg_stat_logmsg.
+ pg_stat_logmsg.max = 5000
+ 
+ ```
+ 
+ ## Installation
+ 
+ ### Requirements
+ 
+ * PostgreSQL version 11 or newer
+ 
+ ### Compile and Install
+ 
+ Clone PostgreSQL repository:
+ 
+ ```bash
+ $> git clone https://github.com/postgres/postgres.git
+ ```
+ 
+ Checkout REL_16_STABLE (for example) branch:
+ 
+ ```bash
+ $> git checkout REL_16_STABLE
+ ```
+ 
+ Make PostgreSQL:
+ 
+ ```bash
+ $> ./configure
+ $> make install -s
+ ```
+ 
+ Change to the contrib directory:
+ 
+ ```bash
+ $> cd contrib
+ ```
+ 
+ Clone ```pg_stat_logmsg``` extension:
+ 
+ ```bash
+ $> git clone https://github.com/pg-ext/pg_stat_logmsg
+ ```
+ 
+ Change to ```pg_stat_logmsg``` directory:
+ 
+ ```bash
+ $> cd pg_stat_logmsg
+ ```
+ 
+ Build ```pg_stat_logmsg```:
+ 
+ ```bash
+ $> make
+ ```
+ 
+ Install ```pg_stat_logmsg```:
+ 
+ ```bash
+ $> make install
+ ```
+ 
+ #### Using PGXS
+ 
+ If an instance of PostgreSQL is already installed, then PGXS can be utilized to build and install ```pg_stat_logmsg```.  Ensure that PostgreSQL binaries are available via the ```$PATH``` environment variable then use the following commands.
+ 
+ ```bash
+ $> make USE_PGXS=1
+ $> make USE_PGXS=1 install
+ ```
+ 
+ ### Configure
+ 
+ The following bash commands should configure your system to utilize pg_stat_logmsg. Replace all paths as appropriate. It may be prudent to visually inspect the files afterward to ensure the changes took place.
+ 
+ ###### Initialize PostgreSQL (if needed):
+ 
+ ```bash
+ $> initdb -D /path/to/data/directory
+ ```
+ 
+ ###### Create Target Database (if needed):
+ 
+ ```bash
+ $> createdb <database>
+ ```
+ 
+ ###### Install ```pg_stat_logmsg``` functions:
+ 
+ Edit postgresql.conf and add ```pg_stat_logmsg``` to the shared_preload_libraries line, and change custom settings as mentioned above.
+ 
+ Finally, restart PostgreSQL (method may vary):
+ 
+ ```
+ $> service postgresql restart
+ ```
+ Install the extension into your database:
+ 
+ ```bash
+ psql <database>
+ CREATE EXTENSION pg_stat_logmsg;
+ ```
+ 
+ ###### Test basic functionality:
+ 
+ Examples:
+ ```
+ select * from pg_stat_logmsg;
+ 
+ 
+ 
+ 
+ ```
+ 
+ ## Notes
+ 
+ ##  Licensing
+ 
+ Please see the [LICENSE](./LICENSE) file.
+ 
+ 
diff --git a/contrib/pg_stat_logmsg/meson.build b/contrib/pg_stat_logmsg/meson.build
index ...25fda63 .
*** a/contrib/pg_stat_logmsg/meson.build
--- b/contrib/pg_stat_logmsg/meson.build
***************
*** 0 ****
--- 1,41 ----
+ # Copyright (c) 2022-2023, PostgreSQL Global Development Group
+ 
+ pg_stat_logmsg_sources = files(
+   'pg_stat_logmsg.c',
+ )
+ 
+ if host_system == 'windows'
+   pg_stat_logmsg_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+     '--NAME', 'pg_stat_logmsg',
+     '--FILEDESC', 'pg_stat_logmsg - execution statistics of log messages',])
+ endif
+ 
+ pg_stat_logmsg = shared_module('pg_stat_logmsg',
+   pg_stat_logmsg_sources,
+   kwargs: contrib_mod_args + {
+     'dependencies': contrib_mod_args['dependencies'],
+   },
+ )
+ contrib_targets += pg_stat_logmsg
+ 
+ install_data(
+   'pg_stat_logmsg.control',
+   'pg_stat_logmsg--1.0.sql',
+   kwargs: contrib_data_args,
+ )
+ 
+ tests += {
+   'name': 'pg_stat_logmsg',
+   'sd': meson.current_source_dir(),
+   'bd': meson.current_build_dir(),
+   'regress': {
+     'sql': [
+       'select',
+     ],
+     'regress_args': ['--temp-config', files('pg_stat_logmsg.conf')],
+     # Disabled because these tests require
+     # "shared_preload_libraries=pg_stat_logmsg", which typical
+     # runningcheck users do not have (e.g. buildfarm clients).
+     'runningcheck': false,
+   },
+ }
diff --git a/contrib/pg_stat_logmsg/pg_stat_logmsg--1.0.sql b/contrib/pg_stat_logmsg/pg_stat_logmsg--1.0.sql
index ...4ccb657 .
*** a/contrib/pg_stat_logmsg/pg_stat_logmsg--1.0.sql
--- b/contrib/pg_stat_logmsg/pg_stat_logmsg--1.0.sql
***************
*** 0 ****
--- 1,49 ----
+ /* contrib/pg_stat_logmsg/pg_stat_logmsg--1.0.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "CREATE EXTENSION pg_stat_logmsg" to load this file. \quit
+ 
+ -- Register functions.
+ CREATE FUNCTION pg_stat_logmsg_reset()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C PARALLEL SAFE;
+ 
+ CREATE FUNCTION pg_stat_logmsg(
+     OUT pgversion text,
+     OUT filename text,
+     OUT lineno int,
+     OUT elevel text,
+     OUT funcname text,
+     OUT sqlerrcode text,
+     OUT message text,
+     OUT translated_message text,
+     OUT count bigint
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME', 'pg_stat_logmsg'
+ LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+ 
+ -- Register a view on the function for ease of use.
+ CREATE VIEW pg_stat_logmsg AS
+   SELECT * FROM pg_stat_logmsg();
+ 
+ GRANT SELECT ON pg_stat_logmsg TO PUBLIC;
+ 
+ -- Don't want this to be available to non-superusers.
+ REVOKE ALL ON FUNCTION pg_stat_logmsg_reset() FROM PUBLIC;
+ 
+ --- Define pg_stat_logmsg_info
+ CREATE FUNCTION pg_stat_logmsg_info(
+     OUT dealloc bigint,
+     OUT stats_reset timestamp with time zone
+ )
+ RETURNS record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+ 
+ CREATE VIEW pg_stat_logmsg_info AS
+   SELECT * FROM pg_stat_logmsg_info();
+ 
+ GRANT SELECT ON pg_stat_logmsg_info TO PUBLIC;
+ 
diff --git a/contrib/pg_stat_logmsg/pg_stat_logmsg.c b/contrib/pg_stat_logmsg/pg_stat_logmsg.c
index ...4692527 .
*** a/contrib/pg_stat_logmsg/pg_stat_logmsg.c
--- b/contrib/pg_stat_logmsg/pg_stat_logmsg.c
***************
*** 0 ****
--- 1,1822 ----
+ /*-------------------------------------------------------------------------
+  *
+  * pg_stat_logmsg.c
+  *		Track log messages across a whole database cluster.
+  *
+  * Starting code cribbed from pg_stat_statements and modified to suit
+  *
+  * Copyright (c) 2023, PostgreSQL Global Development Group
+  *
+  * IDENTIFICATION
+  *	  contrib/pg_stat_logmsg/pg_stat_logmsg.c
+  *
+  *-------------------------------------------------------------------------
+  */
+ #include "postgres.h"
+ 
+ #include <math.h>
+ #include <sys/stat.h>
+ #include <unistd.h>
+ 
+ #include "access/htup_details.h"
+ #include "access/parallel.h"
+ #include "access/xact.h"
+ #include "catalog/pg_authid.h"
+ #if PG_VERSION_NUM >= 130000
+ #include "common/hashfn.h"
+ #else
+ #include "utils/hashutils.h"
+ #endif /* PG_VERSION_NUM >= 130000 */
+ #include "funcapi.h"
+ #include "mb/pg_wchar.h"
+ #include "miscadmin.h"
+ #include "pgstat.h"
+ #include "storage/fd.h"
+ #include "storage/ipc.h"
+ #include "storage/lwlock.h"
+ #include "storage/shmem.h"
+ #include "storage/spin.h"
+ #include "tcop/utility.h"
+ #include "utils/acl.h"
+ #include "utils/builtins.h"
+ #include "utils/memutils.h"
+ #include "utils/timestamp.h"
+ 
+ PG_MODULE_MAGIC;
+ 
+ /* Location of permanent stats file (valid when database is shut down) */
+ #define PGLM_DUMP_FILE	PGSTAT_STAT_PERMANENT_DIRECTORY "/pg_stat_logmsg.stat"
+ 
+ /*
+  * Location of external logmsg text file.
+  */
+ #define PGLM_TEXT_FILE	PG_STAT_TMP_DIR "/pglm_logmsg_texts.stat"
+ 
+ /* Magic number identifying the stats file format */
+ static const uint32 PGLM_FILE_HEADER = 0x20230625;
+ 
+ /* PostgreSQL major version number, changes in which invalidate all entries */
+ static const uint32 PGLM_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
+ 
+ #define ASSUMED_MEDIAN_INIT		(10.0)	/* initial assumed median usage */
+ #define ASSUMED_LENGTH_INIT		1024	/* initial assumed mean query length */
+ #define USAGE_DECREASE_FACTOR	(0.99)	/* decreased every entry_dealloc */
+ #define USAGE_DEALLOC_PERCENT	5		/* free this % of entries at once */
+ #define USAGE_INIT				(1.0)	/* including initial planning */
+ #define MAXFILENAMELEN			64
+ #define MAXFUNCNAMELEN			64
+ 
+ /*
+  * Hashtable key that defines the identity of a hashtable entry.
+  *
+  * If you add a new key to this struct, make sure to teach pglm_store() to
+  * zero the padding bytes.  Otherwise, things will break, because pglm_hash is
+  * created using HASH_BLOBS, and thus tag_hash is used to hash this.
+  */
+ typedef struct pglmHashKey
+ {
+ 	char	filename[MAXFILENAMELEN];	/* filename */
+ 	int		lineno;						/* line number */
+ 	int		elevel;						/* log level */
+ } pglmHashKey;
+ 
+ /*
+  * The actual stats counters kept within pglmEntry.
+  */
+ typedef struct Counters
+ {
+ 	int64	msgcnt;		/* # of times planned/executed */
+ 	double	usage;		/* usage factor */
+ } Counters;
+ 
+ /*
+  * Global statistics for pg_stat_logmsg
+  */
+ typedef struct pglmGlobalStats
+ {
+ 	int64		dealloc;		/* # of times entries were deallocated */
+ 	TimestampTz stats_reset;	/* timestamp with all stats reset */
+ } pglmGlobalStats;
+ 
+ /*
+  * Statistics per statement
+  *
+  * Note: in event of a failure in garbage collection of the logmsg text file,
+  * we reset logmsg_offset to zero and logmsg_len to -1.  This will be seen as
+  * an invalid state by logmsg_fetch().
+  */
+ typedef struct pglmEntry
+ {
+ 	pglmHashKey key;						/* hash key of entry - MUST BE FIRST */
+ 	Counters	counters;					/* the statistics for this logmsg */
+ 	Size		logmsg_offset;				/* logmsg text offset in external file */
+ 	int			logmsg_len;					/* # of valid bytes in logmsg string, or -1 */
+ 	int			encoding;					/* logmsg text encoding */
+ 	int			sqlerrcode;					/* encoded ERRSTATE */
+ 	char		funcname[MAXFUNCNAMELEN];	/* edata function name */
+ 	slock_t		mutex;						/* protects the counters only */
+ } pglmEntry;
+ 
+ /*
+  * Global shared state
+  */
+ typedef struct pglmSharedState
+ {
+ 	LWLock	   *lock;			/* protects hashtable search/modification */
+ 	double		cur_median_usage;	/* current median usage in hashtable */
+ 	Size		mean_logmsg_len; /* current mean entry text length */
+ 	slock_t		mutex;			/* protects following fields only: */
+ 	Size		extent;			/* current extent of query file */
+ 	int			n_writers;		/* number of active writers to query file */
+ 	int			gc_count;		/* query file garbage collection cycle count */
+ 	pglmGlobalStats stats;		/* global statistics for pglm */
+ } pglmSharedState;
+ 
+ /*---- Local variables ----*/
+ 
+ /* anti recursion */
+ static bool is_in_log_hook = false;
+ 
+ /* Saved hook values in case of unload */
+ #if PG_VERSION_NUM >= 150000
+ static shmem_request_hook_type prev_shmem_request_hook = NULL;
+ #endif /* PG_VERSION_NUM >= 150000 */
+ static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+ static emit_log_hook_type prev_emit_log_hook = NULL;
+ 
+ /* Links to shared memory state */
+ static pglmSharedState *pglm = NULL;
+ static HTAB *pglm_hash = NULL;
+ 
+ /*---- GUC variables ----*/
+ static int	pglm_max = 5000;	/* max # statements to track */
+ 
+ #define record_gc_msgfmts() \
+ 	do { \
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm; \
+ 		SpinLockAcquire(&s->mutex); \
+ 		s->gc_count++; \
+ 		SpinLockRelease(&s->mutex); \
+ 	} while(0)
+ 
+ /*---- Function declarations ----*/
+ void _PG_init(void);
+ PG_FUNCTION_INFO_V1(pg_stat_logmsg_reset);
+ PG_FUNCTION_INFO_V1(pg_stat_logmsg);
+ PG_FUNCTION_INFO_V1(pg_stat_logmsg_info);
+ 
+ #if PG_VERSION_NUM >= 150000
+ static void pglm_shmem_request(void);
+ #endif /* PG_VERSION_NUM >= 150000 */
+ static void pglm_shmem_startup(void);
+ static void pglm_shmem_shutdown(int code, Datum arg);
+ static void pglm_log_hook(ErrorData *edata);
+ static void pglm_store(ErrorData *edata);
+ static void pg_stat_logmsg_internal(FunctionCallInfo fcinfo);
+ static Size pglm_memsize(void);
+ static pglmEntry *entry_alloc(pglmHashKey *key, Size logmsg_offset,
+ 							  int logmsg_len, int encoding,
+ 							  int sqlerrcode, char *funcname);
+ static void entry_dealloc(void);
+ static bool logmsg_store(ErrorData *edata, Size *logmsg_offset,
+ 						 int *logmsg_len, int *gc_count);
+ static char *logmsg_load_file(Size *buffer_size);
+ static char *logmsg_fetch(Size logmsg_offset, int logmsg_len,
+ 						 char *buffer, Size buffer_size);
+ static bool need_gc_logmsgs(void);
+ static void gc_logmsgs(void);
+ static void entry_reset(char *filename, int lineno, int elevel);
+ static void logmsg_xact_callback(XactEvent event, void *arg);
+ #if PG_VERSION_NUM < 150000
+ static void InitMaterializedSRF(FunctionCallInfo fcinfo, bits32 flags);
+ static const char *error_severity(int elevel);
+ #endif /* PG_VERSION_NUM < 150000 */
+ 
+ /*
+  * Module load callback
+  */
+ void
+ _PG_init(void)
+ {
+ 	/*
+ 	 * In order to create our shared memory area, we have to be loaded via
+ 	 * shared_preload_libraries.  If not, fall out without hooking into any of
+ 	 * the main system.  (We don't throw error here because it seems useful to
+ 	 * allow the pg_stat_logmsg functions to be created even when the
+ 	 * module isn't active.  The functions must protect themselves against
+ 	 * being called then, however.)
+ 	 */
+ 	if (!process_shared_preload_libraries_in_progress)
+ 		return;
+ 
+ 	/*
+ 	 * Define (or redefine) custom GUC variables.
+ 	 */
+ 	DefineCustomIntVariable("pg_stat_logmsg.max",
+ 							"Sets the maximum number of messages tracked by pg_stat_logmsg.",
+ 							NULL,
+ 							&pglm_max,
+ 							5000,
+ 							100,
+ 							INT_MAX / 2,
+ 							PGC_POSTMASTER,
+ 							0,
+ 							NULL,
+ 							NULL,
+ 							NULL);
+ 
+ #if PG_VERSION_NUM >= 150000
+ 	MarkGUCPrefixReserved("pg_stat_logmsg");
+ 
+ 	/*
+ 	 * Install hooks.
+ 	 */
+ 	prev_shmem_request_hook = shmem_request_hook;
+ 	shmem_request_hook = pglm_shmem_request;
+ #else
+ 	EmitWarningsOnPlaceholders("pg_stat_logmsg");
+ 
+ 	/*
+ 	 * Request additional shared resources.  (These are no-ops if we're not in
+ 	 * the postmaster process.)  We'll allocate or attach to the shared
+ 	 * resources in pgss_shmem_startup().
+ 	 */
+ 	RequestAddinShmemSpace(pglm_memsize());
+ 	RequestNamedLWLockTranche("pg_stat_logmsg", 1);
+ 
+ 	/*
+ 	 * Install hooks.
+ 	 */
+ #endif /* PG_VERSION_NUM >= 150000 */
+ 
+ 	prev_shmem_startup_hook = shmem_startup_hook;
+ 	shmem_startup_hook = pglm_shmem_startup;
+ 
+ 	prev_emit_log_hook = emit_log_hook;
+ 	emit_log_hook = pglm_log_hook;
+ 
+ 	/* Arrange to restore state in case of ERROR */
+ 	RegisterXactCallback(logmsg_xact_callback, NULL);
+ }
+ 
+ #if PG_VERSION_NUM >= 150000
+ /*
+  * shmem_request hook: request additional shared resources.  We'll allocate or
+  * attach to the shared resources in pglm_shmem_startup().
+  */
+ static void
+ pglm_shmem_request(void)
+ {
+ 	if (prev_shmem_request_hook)
+ 		prev_shmem_request_hook();
+ 
+ 	RequestAddinShmemSpace(pglm_memsize());
+ 	RequestNamedLWLockTranche("pg_stat_logmsg", 1);
+ }
+ #endif /* PG_VERSION_NUM >= 150000 */
+ 
+ /*
+  * shmem_startup hook: allocate or attach to shared memory,
+  * then load any pre-existing statistics from file.
+  * Also create and load the query-texts file, which is expected to exist
+  * (even if empty) while the module is enabled.
+  */
+ static void
+ pglm_shmem_startup(void)
+ {
+ 	bool		found;
+ 	HASHCTL		info;
+ 	FILE	   *file = NULL;
+ 	FILE	   *lmfile = NULL;
+ 	uint32		header;
+ 	int32		num;
+ 	int32		pgver;
+ 	int32		i;
+ 	int			buffer_size;
+ 	char	   *buffer = NULL;
+ 
+ 	if (prev_shmem_startup_hook)
+ 		prev_shmem_startup_hook();
+ 
+ 	/* reset in case this is a restart within the postmaster */
+ 	pglm = NULL;
+ 	pglm_hash = NULL;
+ 
+ 	/*
+ 	 * Create or attach to the shared memory state, including hash table
+ 	 */
+ 	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+ 
+ 	pglm = ShmemInitStruct("pg_stat_logmsg",
+ 						   sizeof(pglmSharedState),
+ 						   &found);
+ 
+ 	if (!found)
+ 	{
+ 		/* First time through ... */
+ 		pglm->lock = &(GetNamedLWLockTranche("pg_stat_logmsg"))->lock;
+ 		pglm->cur_median_usage = ASSUMED_MEDIAN_INIT;
+ 		pglm->mean_logmsg_len = ASSUMED_LENGTH_INIT;
+ 		SpinLockInit(&pglm->mutex);
+ 		pglm->extent = 0;
+ 		pglm->n_writers = 0;
+ 		pglm->gc_count = 0;
+ 		pglm->stats.dealloc = 0;
+ 		pglm->stats.stats_reset = GetCurrentTimestamp();
+ 	}
+ 
+ 	info.keysize = sizeof(pglmHashKey);
+ 	info.entrysize = sizeof(pglmEntry);
+ 	pglm_hash = ShmemInitHash("pg_stat_logmsg hash",
+ 							  pglm_max, pglm_max,
+ 							  &info,
+ 							  HASH_ELEM | HASH_BLOBS);
+ 
+ 	LWLockRelease(AddinShmemInitLock);
+ 
+ 	/*
+ 	 * If we're in the postmaster (or a standalone backend...), set up a shmem
+ 	 * exit hook to dump the statistics to disk.
+ 	 */
+ 	if (!IsUnderPostmaster)
+ 		on_shmem_exit(pglm_shmem_shutdown, (Datum) 0);
+ 
+ 	/*
+ 	 * Done if some other process already completed our initialization.
+ 	 */
+ 	if (found)
+ 		return;
+ 
+ 	/*
+ 	 * Note: we don't bother with locks here, because there should be no other
+ 	 * processes running when this code is reached.
+ 	 */
+ 
+ 	/* Unlink logmsg text file possibly left over from crash */
+ 	unlink(PGLM_TEXT_FILE);
+ 
+ 	/* Allocate new logmsg text temp file */
+ 	lmfile = AllocateFile(PGLM_TEXT_FILE, PG_BINARY_W);
+ 	if (lmfile == NULL)
+ 		goto write_error;
+ 
+ 	/*
+ 	 * Attempt to load old statistics from the dump file.
+ 	 */
+ 	file = AllocateFile(PGLM_DUMP_FILE, PG_BINARY_R);
+ 	if (file == NULL)
+ 	{
+ 		if (errno != ENOENT)
+ 			goto read_error;
+ 		/* No existing persisted stats file, so we're done */
+ 		FreeFile(lmfile);
+ 		return;
+ 	}
+ 
+ 	buffer_size = 2048;
+ 	buffer = (char *) palloc(buffer_size);
+ 
+ 	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
+ 		fread(&pgver, sizeof(uint32), 1, file) != 1 ||
+ 		fread(&num, sizeof(int32), 1, file) != 1)
+ 		goto read_error;
+ 
+ 	if (header != PGLM_FILE_HEADER ||
+ 		pgver != PGLM_PG_MAJOR_VERSION)
+ 		goto data_error;
+ 
+ 	for (i = 0; i < num; i++)
+ 	{
+ 		pglmEntry	temp;
+ 		pglmEntry  *entry;
+ 		Size		logmsg_offset;
+ 
+ 		if (fread(&temp, sizeof(pglmEntry), 1, file) != 1)
+ 			goto read_error;
+ 
+ 		/* Encoding is the only field we can easily sanity-check */
+ 		if (!PG_VALID_BE_ENCODING(temp.encoding))
+ 			goto data_error;
+ 
+ 		/* Resize buffer as needed */
+ 		if (temp.logmsg_len >= buffer_size)
+ 		{
+ 			buffer_size = Max(buffer_size * 2, temp.logmsg_len + 1);
+ 			buffer = repalloc(buffer, buffer_size);
+ 		}
+ 
+ 		if (fread(buffer, 1, temp.logmsg_len + 1, file) != temp.logmsg_len + 1)
+ 			goto read_error;
+ 
+ 		/* Should have a trailing null, but let's make sure */
+ 		buffer[temp.logmsg_len] = '\0';
+ 
+ 		/* Store the logmsg text */
+ 		logmsg_offset = pglm->extent;
+ 		if (fwrite(buffer, 1, temp.logmsg_len + 1, lmfile) != temp.logmsg_len + 1)
+ 			goto write_error;
+ 		pglm->extent += temp.logmsg_len + 1;
+ 
+ 		/* make the hashtable entry (discards old entries if too many) */
+ 		entry = entry_alloc(&temp.key, logmsg_offset, temp.logmsg_len,
+ 							temp.encoding, temp.sqlerrcode, temp.funcname);
+ 
+ 		/* copy in the actual stats */
+ 		entry->counters = temp.counters;
+ 	}
+ 
+ 	/* Read global statistics for pg_stat_logmsg */
+ 	if (fread(&pglm->stats, sizeof(pglmGlobalStats), 1, file) != 1)
+ 		goto read_error;
+ 
+ 	pfree(buffer);
+ 	FreeFile(file);
+ 	FreeFile(lmfile);
+ 
+ 	/*
+ 	 * Remove the persisted stats file so it's not included in
+ 	 * backups/replication standbys, etc.  A new file will be written on next
+ 	 * shutdown.
+ 	 *
+ 	 * Note: it's okay if the PGLM_TEXT_FILE is included in a basebackup,
+ 	 * because we remove that file on startup; it acts inversely to
+ 	 * PGLM_DUMP_FILE, in that it is only supposed to be around when the
+ 	 * server is running, whereas PGLM_DUMP_FILE is only supposed to be around
+ 	 * when the server is not running.  Leaving the file creates no danger of
+ 	 * a newly restored database having a spurious record of execution costs,
+ 	 * which is what we're really concerned about here.
+ 	 */
+ 	unlink(PGLM_DUMP_FILE);
+ 
+ 	return;
+ 
+ read_error:
+ 	ereport(LOG,
+ 			(errcode_for_file_access(),
+ 			 errmsg("could not read file \"%s\": %m",
+ 					PGLM_DUMP_FILE)));
+ 	goto fail;
+ data_error:
+ 	ereport(LOG,
+ 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 			 errmsg("ignoring invalid data in file \"%s\"",
+ 					PGLM_DUMP_FILE)));
+ 	goto fail;
+ write_error:
+ 	ereport(LOG,
+ 			(errcode_for_file_access(),
+ 			 errmsg("could not write file \"%s\": %m",
+ 					PGLM_TEXT_FILE)));
+ fail:
+ 	if (buffer)
+ 		pfree(buffer);
+ 	if (file)
+ 		FreeFile(file);
+ 	if (lmfile)
+ 		FreeFile(lmfile);
+ 	/* If possible, throw away the bogus file; ignore any error */
+ 	unlink(PGLM_DUMP_FILE);
+ 
+ 	/*
+ 	 * Don't unlink PGLM_TEXT_FILE here; it should always be around while the
+ 	 * server is running with pg_stat_logmsg enabled
+ 	 */
+ }
+ 
+ /*
+  * shmem_shutdown hook: Dump statistics into file.
+  *
+  * Note: we don't bother with acquiring lock, because there should be no
+  * other processes running when this is called.
+  */
+ static void
+ pglm_shmem_shutdown(int code, Datum arg)
+ {
+ 	FILE	   *file;
+ 	char	   *mlbuffer = NULL;
+ 	Size		mlbuffer_size = 0;
+ 	HASH_SEQ_STATUS hash_seq;
+ 	int32		num_entries;
+ 	pglmEntry  *entry;
+ 
+ 	/* Don't try to dump during a crash. */
+ 	if (code)
+ 		return;
+ 
+ 	/* Safety check ... shouldn't get here unless shmem is set up. */
+ 	if (!pglm || !pglm_hash)
+ 		return;
+ 
+ 	file = AllocateFile(PGLM_DUMP_FILE ".tmp", PG_BINARY_W);
+ 	if (file == NULL)
+ 		goto error;
+ 
+ 	if (fwrite(&PGLM_FILE_HEADER, sizeof(uint32), 1, file) != 1)
+ 		goto error;
+ 	if (fwrite(&PGLM_PG_MAJOR_VERSION, sizeof(uint32), 1, file) != 1)
+ 		goto error;
+ 	num_entries = hash_get_num_entries(pglm_hash);
+ 	if (fwrite(&num_entries, sizeof(int32), 1, file) != 1)
+ 		goto error;
+ 
+ 	mlbuffer = logmsg_load_file(&mlbuffer_size);
+ 	if (mlbuffer == NULL)
+ 		goto error;
+ 
+ 	/*
+ 	 * When serializing to disk, we store logmsg texts immediately after their
+ 	 * entry data.  Any orphaned logmsg texts are thereby excluded.
+ 	 */
+ 	hash_seq_init(&hash_seq, pglm_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		int			len = entry->logmsg_len;
+ 		char	   *qstr = logmsg_fetch(entry->logmsg_offset, len,
+ 									   mlbuffer, mlbuffer_size);
+ 
+ 		if (qstr == NULL)
+ 			continue;			/* Ignore any entries with bogus texts */
+ 
+ 		if (fwrite(entry, sizeof(pglmEntry), 1, file) != 1 ||
+ 			fwrite(qstr, 1, len + 1, file) != len + 1)
+ 		{
+ 			/* note: we assume hash_seq_term won't change errno */
+ 			hash_seq_term(&hash_seq);
+ 			goto error;
+ 		}
+ 	}
+ 
+ 	/* Dump global statistics for pg_stat_logmsg */
+ 	if (fwrite(&pglm->stats, sizeof(pglmGlobalStats), 1, file) != 1)
+ 		goto error;
+ 
+ 	free(mlbuffer);
+ 	mlbuffer = NULL;
+ 
+ 	if (FreeFile(file))
+ 	{
+ 		file = NULL;
+ 		goto error;
+ 	}
+ 
+ 	/*
+ 	 * Rename file into place, so we atomically replace any old one.
+ 	 */
+ 	(void) durable_rename(PGLM_DUMP_FILE ".tmp", PGLM_DUMP_FILE, LOG);
+ 
+ 	/* Unlink logmsg-texts file; it's not needed while shutdown */
+ 	unlink(PGLM_TEXT_FILE);
+ 
+ 	return;
+ 
+ error:
+ 	ereport(LOG,
+ 			(errcode_for_file_access(),
+ 			 errmsg("could not write file \"%s\": %m",
+ 					PGLM_DUMP_FILE ".tmp")));
+ 	free(mlbuffer);
+ 	if (file)
+ 		FreeFile(file);
+ 	unlink(PGLM_DUMP_FILE ".tmp");
+ 	unlink(PGLM_TEXT_FILE);
+ }
+ 
+ /*
+  * log hook: store results if needed
+  */
+ static void
+ pglm_log_hook(ErrorData *edata)
+ {
+ 	if (!is_in_log_hook)
+ 	{
+ 		is_in_log_hook = true;
+ 		pglm_store(edata);
+ 		is_in_log_hook = false;
+ 	}
+ 
+ 	if (prev_emit_log_hook)
+ 		prev_emit_log_hook(edata);
+ }
+ 
+ /*
+  * Store statistics for a log message.
+  */
+ static void
+ pglm_store(ErrorData *edata)
+ {
+ 	pglmHashKey key;
+ 	pglmEntry  *entry;
+ 	int			encoding = GetDatabaseEncoding();
+ 	size_t		filename_len;
+ 
+ 	Assert(edata != NULL);
+ 
+ 	/* Safety check... */
+ 	if (!pglm || !pglm_hash)
+ 		return;
+ 
+ 	/* Set up key for hashtable search */
+ 
+ 	/* memset() is required when pglmHashKey is without padding only */
+ 	memset(&key, 0, sizeof(pglmHashKey));
+ 
+ 	/* make persistent copies of strings */
+ 	if (edata->filename)
+ 	{
+ 		filename_len = strlen(edata->filename);
+ 		memcpy(key.filename, edata->filename,
+ 			   filename_len < MAXFILENAMELEN ?
+ 							  filename_len : MAXFILENAMELEN );
+ 	}
+ 
+ 	key.lineno = edata->lineno;
+ 	key.elevel = edata->elevel;
+ 
+ 	/* Lookup the hash table entry with shared lock. */
+ 	LWLockAcquire(pglm->lock, LW_SHARED);
+ 
+ 	entry = (pglmEntry *) hash_search(pglm_hash, &key, HASH_FIND, NULL);
+ 
+ 	/* Create new entry, if not present */
+ 	if (!entry)
+ 	{
+ 		Size		logmsg_offset;
+ 		int			logmsg_len; 
+ 		int			gc_count;
+ 		bool		stored;
+ 		bool		do_gc;
+ 
+ 		/* Append new logmsg text to file with only shared lock held */
+ 		stored = logmsg_store(edata, &logmsg_offset, &logmsg_len, &gc_count);
+ 
+ 		/*
+ 		 * Determine whether we need to garbage collect external logmsg texts
+ 		 * while the shared lock is still held.  This micro-optimization
+ 		 * avoids taking the time to decide this while holding exclusive lock.
+ 		 */
+ 		do_gc = need_gc_logmsgs();
+ 
+ 		/* Need exclusive lock to make a new hashtable entry - promote */
+ 		LWLockRelease(pglm->lock);
+ 		LWLockAcquire(pglm->lock, LW_EXCLUSIVE);
+ 
+ 		/*
+ 		 * A garbage collection may have occurred while we weren't holding the
+ 		 * lock.  In the unlikely event that this happens, the logmsg text we
+ 		 * stored above will have been garbage collected, so write it again.
+ 		 * This should be infrequent enough that doing it while holding
+ 		 * exclusive lock isn't a performance problem.
+ 		 */
+ 		if (!stored || pglm->gc_count != gc_count)
+ 			stored = logmsg_store(edata, &logmsg_offset, &logmsg_len, NULL);
+ 
+ 		/* If we failed to write to the text file, give up */
+ 		if (!stored)
+ 			goto done;
+ 
+ 		/* OK to create a new hashtable entry */
+ 		entry = entry_alloc(&key, logmsg_offset, logmsg_len,
+ 							encoding, edata->sqlerrcode,
+ 							(char *) edata->funcname);
+ 
+ 		/* If needed, perform garbage collection while exclusive lock held */
+ 		if (do_gc)
+ 			gc_logmsgs();
+ 	}
+ 
+ 	/*
+ 	 * Grab the spinlock while updating the counters (see comment about
+ 	 * locking rules at the head of the file)
+ 	 */
+ 	{
+ 		volatile pglmEntry *e = (volatile pglmEntry *) entry;
+ 
+ 		SpinLockAcquire(&e->mutex);
+ 		e->counters.msgcnt += 1;
+ 		SpinLockRelease(&e->mutex);
+ 	}
+ 
+ done:
+ 	LWLockRelease(pglm->lock);
+ }
+ 
+ /*
+  * Reset statement statistics.
+  */
+ Datum
+ pg_stat_logmsg_reset(PG_FUNCTION_ARGS)
+ {
+ 	entry_reset(NULL, 0, 0);
+ 
+ 	PG_RETURN_VOID();
+ }
+ 
+ /* Number of output arguments (columns) */
+ #define PG_STAT_LOGMSG_COLS			9
+ 
+ Datum
+ pg_stat_logmsg(PG_FUNCTION_ARGS)
+ {
+ 	pg_stat_logmsg_internal(fcinfo);
+ 
+ 	return (Datum) 0;
+ }
+ 
+ /* Common code for all versions of pg_stat_statements() */
+ static void
+ pg_stat_logmsg_internal(FunctionCallInfo fcinfo)
+ {
+ 	ReturnSetInfo  *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	Oid				userid = GetUserId();
+ 	char		   *mlbuffer = NULL;
+ 	Size			mlbuffer_size = 0;
+ 	Size			extent = 0;
+ 	int				gc_count = 0;
+ 	HASH_SEQ_STATUS hash_seq;
+ 	pglmEntry	   *entry;
+ 	int				n_writers;
+ 
+ 	/*
+ 	 * Superusers or roles with the privileges of pg_read_all_stats members
+ 	 * are allowed
+ 	 */
+ #if PG_VERSION_NUM >= 140000
+ #define PGLM_STATSROLE		ROLE_PG_READ_ALL_STATS
+ #else
+ #define PGLM_STATSROLE		DEFAULT_ROLE_READ_ALL_STATS
+ #endif /* PG_VERSION_NUM >= 140000 */
+ 	if (!has_privs_of_role(userid, PGLM_STATSROLE))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ 				 errmsg("must be member of %s role",
+ 				 GetUserNameFromId(PGLM_STATSROLE, false))));
+ 
+ 	/* hash table must exist already */
+ 	if (!pglm || !pglm_hash)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("pg_stat_logmsg must be loaded via shared_preload_libraries")));
+ 
+ 	InitMaterializedSRF(fcinfo, 0);
+ 
+ 	/* Check we have the expected number of output arguments. */
+ 	if (rsinfo->setDesc->natts != PG_STAT_LOGMSG_COLS)
+ 		elog(ERROR, "incorrect number of output arguments");
+ 
+ 	/*
+ 	 * We'd like to load the logmsg text file (if needed) while not holding any
+ 	 * lock on pglm->lock.  In the worst case we'll have to do this again
+ 	 * after we have the lock, but it's unlikely enough to make this a win
+ 	 * despite occasional duplicated work.  We need to reload if anybody
+ 	 * writes to the file (either a retail logmsg_store(), or a garbage
+ 	 * collection) between this point and where we've gotten shared lock.  If
+ 	 * a logmsg_store is actually in progress when we look, we might as well
+ 	 * skip the speculative load entirely.
+ 	 */
+ 
+ 	/* Take the mutex so we can examine variables */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		extent = s->extent;
+ 		n_writers = s->n_writers;
+ 		gc_count = s->gc_count;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ 
+ 	/* No point in loading file now if there are active writers */
+ 	if (n_writers == 0)
+ 		mlbuffer = logmsg_load_file(&mlbuffer_size);
+ 
+ 	/*
+ 	 * Get shared lock, load or reload the logmsg text file if we must, and
+ 	 * iterate over the hashtable entries.
+ 	 *
+ 	 * With a large hash table, we might be holding the lock rather longer
+ 	 * than one could wish.  However, this only blocks creation of new hash
+ 	 * table entries, and the larger the hash table the less likely that is to
+ 	 * be needed.  So we can hope this is okay.  Perhaps someday we'll decide
+ 	 * we need to partition the hash table to limit the time spent holding any
+ 	 * one lock.
+ 	 */
+ 	LWLockAcquire(pglm->lock, LW_SHARED);
+ 
+ 	/*
+ 	 * Here it is safe to examine extent and gc_count without taking the
+ 	 * mutex.  Note that although other processes might change
+ 	 * pglm->extent just after we look at it, the strings they then write
+ 	 * into the file cannot yet be referenced in the hashtable, so we
+ 	 * don't care whether we see them or not.
+ 	 *
+ 	 * If logmsg_load_file fails, we just press on; we'll return NULL for
+ 	 * every logmsg text.
+ 	 */
+ 	if (mlbuffer == NULL ||
+ 		pglm->extent != extent ||
+ 		pglm->gc_count != gc_count)
+ 	{
+ 		free(mlbuffer);
+ 		mlbuffer = logmsg_load_file(&mlbuffer_size);
+ 	}
+ 
+ 	hash_seq_init(&hash_seq, pglm_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		Datum		values[PG_STAT_LOGMSG_COLS];
+ 		bool		nulls[PG_STAT_LOGMSG_COLS];
+ 		int			i = 0;
+ 		Counters	tmp;
+ 		char	   *mstr;
+ 
+ 		memset(values, 0, sizeof(values));
+ 		memset(nulls, 0, sizeof(nulls));
+ 
+ 		values[i++] = CStringGetTextDatum(PG_VERSION);
+ 		values[i++] = CStringGetTextDatum(entry->key.filename);
+ 		values[i++] = Int32GetDatum(entry->key.lineno);
+ 		values[i++] = CStringGetTextDatum(error_severity(entry->key.elevel));
+ 
+ 		if (entry->funcname[0])
+ 			values[i++] = CStringGetTextDatum(entry->funcname);
+ 		else
+ 			nulls[i++] = true;
+ 
+ 		values[i++] = CStringGetTextDatum(unpack_sql_state(entry->sqlerrcode));
+ 
+ 		mstr = logmsg_fetch(entry->logmsg_offset,
+ 							entry->logmsg_len,
+ 							mlbuffer,
+ 							mlbuffer_size);
+ 
+ 		if (mstr)
+ 		{
+ 			char	   *enc;
+ 
+ 			enc = pg_any_to_server(mstr,
+ 								   entry->logmsg_len,
+ 								   entry->encoding);
+ 			/* first the original format string */
+ 			values[i++] = CStringGetTextDatum(enc);
+ 			/* then the localized string */
+ 			values[i++] = CStringGetTextDatum(_(enc));
+ 
+ 			if (enc != mstr)
+ 				pfree(enc);
+ 		}
+ 		else
+ 		{
+ 			/* Just return a null if we fail to find the text */
+ 			nulls[i++] = true;
+ 			nulls[i++] = true;
+ 		}
+ 
+ 		/* copy counters to a local variable to keep locking time short */
+ 		{
+ 			volatile pglmEntry *e = (volatile pglmEntry *) entry;
+ 
+ 			SpinLockAcquire(&e->mutex);
+ 			tmp = e->counters;
+ 			SpinLockRelease(&e->mutex);
+ 		}
+ 
+ 		values[i++] = Int64GetDatumFast(tmp.msgcnt);
+ 
+ 		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ 	}
+ 
+ 	LWLockRelease(pglm->lock);
+ 
+ 	free(mlbuffer);
+ }
+ 
+ /* Number of output arguments (columns) for pg_stat_logmsg_info */
+ #define PG_STAT_LOGMSG_INFO_COLS	2
+ 
+ /*
+  * Return statistics of pg_stat_logmsg.
+  */
+ Datum
+ pg_stat_logmsg_info(PG_FUNCTION_ARGS)
+ {
+ 	pglmGlobalStats stats;
+ 	TupleDesc	tupdesc;
+ 	Datum		values[PG_STAT_LOGMSG_INFO_COLS] = {0};
+ 	bool		nulls[PG_STAT_LOGMSG_INFO_COLS] = {0};
+ 
+ 	if (!pglm || !pglm_hash)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("pg_stat_logmsg must be loaded via shared_preload_libraries")));
+ 
+ 	/* Build a tuple descriptor for our result type */
+ 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ 		elog(ERROR, "return type must be a row type");
+ 
+ 	/* Read global statistics for pg_stat_logmsg */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		stats = s->stats;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ 
+ 	values[0] = Int64GetDatum(stats.dealloc);
+ 	values[1] = TimestampTzGetDatum(stats.stats_reset);
+ 
+ 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
+ }
+ 
+ /*
+  * Estimate shared memory space needed.
+  */
+ static Size
+ pglm_memsize(void)
+ {
+ 	Size		size;
+ 
+ 	size = MAXALIGN(sizeof(pglmSharedState));
+ 	size = add_size(size, hash_estimate_size(pglm_max, sizeof(pglmEntry)));
+ 
+ 	return size;
+ }
+ 
+ /*
+  * Allocate a new hashtable entry.
+  * caller must hold an exclusive lock on pglm->lock
+  *
+  * Note: despite needing exclusive lock, it's not an error for the target
+  * entry to already exist.  This is because pglm_store releases and
+  * reacquires lock after failing to find a match; so someone else could
+  * have made the entry while we waited to get exclusive lock.
+  */
+ static pglmEntry *
+ entry_alloc(pglmHashKey *key,
+ 			Size logmsg_offset,
+ 			int logmsg_len,
+ 			int encoding,
+ 			int sqlerrcode,
+ 			char *efuncname)
+ {
+ 	pglmEntry  *entry;
+ 	bool		found;
+ 
+ 	/* Make space if needed */
+ 	while (hash_get_num_entries(pglm_hash) >= pglm_max)
+ 		entry_dealloc();
+ 
+ 	/* Find or create an entry with desired hash code */
+ 	entry = (pglmEntry *) hash_search(pglm_hash, key, HASH_ENTER, &found);
+ 
+ 	if (!found)
+ 	{
+ 		/* New entry, initialize it */
+ 
+ 		/* reset the statistics */
+ 		memset(&entry->counters, 0, sizeof(Counters));
+ 		/* set the appropriate initial usage count */
+ 		entry->counters.usage = USAGE_INIT;
+ 		/* re-initialize the mutex each time ... we assume no one using it */
+ 		SpinLockInit(&entry->mutex);
+ 		/* ... and don't forget the logmsg text metadata */
+ 		Assert(logmsg_len >= 0);
+ 		entry->logmsg_offset = logmsg_offset;
+ 		entry->logmsg_len = logmsg_len;
+ 		entry->encoding = encoding;
+ 		entry->sqlerrcode = sqlerrcode;
+ 
+ 		if (efuncname)
+ 		{
+ 			size_t		funcname_len = strlen(efuncname);
+ 
+ 			memcpy(entry->funcname, efuncname,
+ 				   funcname_len < MAXFUNCNAMELEN ?
+ 								  funcname_len : MAXFUNCNAMELEN );
+ 		}
+ 	}
+ 
+ 	return entry;
+ }
+ 
+ /*
+  * qsort comparator for sorting into increasing usage order
+  */
+ static int
+ entry_cmp(const void *lhs, const void *rhs)
+ {
+ 	double		l_usage = (*(pglmEntry *const *) lhs)->counters.usage;
+ 	double		r_usage = (*(pglmEntry *const *) rhs)->counters.usage;
+ 
+ 	if (l_usage < r_usage)
+ 		return -1;
+ 	else if (l_usage > r_usage)
+ 		return +1;
+ 	else
+ 		return 0;
+ }
+ 
+ /*
+  * Deallocate least-used entries.
+  *
+  * Caller must hold an exclusive lock on pglm->lock.
+  */
+ static void
+ entry_dealloc(void)
+ {
+ 	HASH_SEQ_STATUS hash_seq;
+ 	pglmEntry **entries;
+ 	pglmEntry  *entry;
+ 	int			nvictims;
+ 	int			i;
+ 	Size		tottextlen;
+ 	int			nvalidtexts;
+ 
+ 	/*
+ 	 * Sort entries by usage and deallocate USAGE_DEALLOC_PERCENT of them.
+ 	 * While we're scanning the table, apply the decay factor to the usage
+ 	 * values, and update the mean query length.
+ 	 *
+ 	 * Note that the mean query length is almost immediately obsolete, since
+ 	 * we compute it before not after discarding the least-used entries.
+ 	 * Hopefully, that doesn't affect the mean too much; it doesn't seem worth
+ 	 * making two passes to get a more current result.  Likewise, the new
+ 	 * cur_median_usage includes the entries we're about to zap.
+ 	 */
+ 
+ 	entries = palloc(hash_get_num_entries(pglm_hash) * sizeof(pglmEntry *));
+ 
+ 	i = 0;
+ 	tottextlen = 0;
+ 	nvalidtexts = 0;
+ 
+ 	hash_seq_init(&hash_seq, pglm_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		entries[i++] = entry;
+ 		entry->counters.usage *= USAGE_DECREASE_FACTOR;
+ 		/* In the mean length computation, ignore dropped texts. */
+ 		if (entry->logmsg_len >= 0)
+ 		{
+ 			tottextlen += entry->logmsg_len + 1;
+ 			nvalidtexts++;
+ 		}
+ 	}
+ 
+ 	/* Sort into increasing order by usage */
+ 	qsort(entries, i, sizeof(pglmEntry *), entry_cmp);
+ 
+ 	/* Record the (approximate) median usage */
+ 	if (i > 0)
+ 		pglm->cur_median_usage = entries[i / 2]->counters.usage;
+ 	/* Record the mean query length */
+ 	if (nvalidtexts > 0)
+ 		pglm->mean_logmsg_len = tottextlen / nvalidtexts;
+ 	else
+ 		pglm->mean_logmsg_len = ASSUMED_LENGTH_INIT;
+ 
+ 	/* Now zap an appropriate fraction of lowest-usage entries */
+ 	nvictims = Max(10, i * USAGE_DEALLOC_PERCENT / 100);
+ 	nvictims = Min(nvictims, i);
+ 
+ 	for (i = 0; i < nvictims; i++)
+ 		hash_search(pglm_hash, &entries[i]->key, HASH_REMOVE, NULL);
+ 
+ 	pfree(entries);
+ 
+ 	/* Increment the number of times entries are deallocated */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		s->stats.dealloc += 1;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ }
+ 
+ /*
+  * Given a logmsg string (not necessarily null-terminated), allocate a new
+  * entry in the external logmsg text file and store the string there.
+  *
+  * If successful, returns true, and stores the new entry's offset in the file
+  * into *logmsg_offset.  Also, if gc_count isn't NULL, *gc_count is set to the
+  * number of garbage collections that have occurred so far.
+  *
+  * On failure, returns false.
+  *
+  * At least a shared lock on pglm->lock must be held by the caller, so as
+  * to prevent a concurrent garbage collection.  Share-lock-holding callers
+  * should pass a gc_count pointer to obtain the number of garbage collections,
+  * so that they can recheck the count after obtaining exclusive lock to
+  * detect whether a garbage collection occurred (and removed this entry).
+  */
+ static bool
+ logmsg_store(ErrorData *edata, Size *logmsg_offset,
+ 			 int *logmsg_len, int *gc_count)
+ {
+ 	Size		off;
+ 	int			fd;
+ 
+ 	if (edata->message_id)
+ 		*logmsg_len = strlen(edata->message_id);
+ 
+ 	/*
+ 	 * We use a spinlock to protect extent/n_writers/gc_count, so that
+ 	 * multiple processes may execute this function concurrently.
+ 	 */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		off = s->extent;
+ 		s->extent += *logmsg_len + 1;
+ 		s->n_writers++;
+ 		if (gc_count)
+ 			*gc_count = s->gc_count;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ 
+ 	*logmsg_offset = off;
+ 
+ 	/*
+ 	 * Don't allow the file to grow larger than what logmsg_load_file can
+ 	 * (theoretically) handle.  This has been seen to be reachable on 32-bit
+ 	 * platforms.
+ 	 */
+ 	if (unlikely(*logmsg_len >= MaxAllocHugeSize - off))
+ 	{
+ 		errno = EFBIG;			/* not quite right, but it'll do */
+ 		fd = -1;
+ 		goto error;
+ 	}
+ 
+ 	/* Now write the data into the successfully-reserved part of the file */
+ 	fd = OpenTransientFile(PGLM_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY);
+ 	if (fd < 0)
+ 		goto error;
+ 
+ #if PG_VERSION_NUM >= 120000
+ 	if (pg_pwrite(fd, edata->message_id, *logmsg_len, off) != *logmsg_len)
+ 		goto error;
+ 	if (pg_pwrite(fd, "\0", 1, off + *logmsg_len) != 1)
+ 		goto error;
+ #else
+ 	if (lseek(fd, off, SEEK_SET) != off)
+ 		goto error;
+ 
+ 	if (write(fd, edata->message_id, *logmsg_len) != *logmsg_len)
+ 		goto error;
+ 	if (write(fd, "\0", 1) != 1)
+ 		goto error;
+ #endif /* PG_VERSION_NUM >= 120000 */
+ 
+ 	CloseTransientFile(fd);
+ 
+ 	/* Mark our write complete */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		s->n_writers--;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ 
+ 	return true;
+ 
+ error:
+ 	ereport(LOG,
+ 			(errcode_for_file_access(),
+ 			 errmsg("could not write file \"%s\": %m",
+ 					PGLM_TEXT_FILE)));
+ 
+ 	if (fd >= 0)
+ 		CloseTransientFile(fd);
+ 
+ 	/* Mark our write complete */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		s->n_writers--;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ 
+ 	return false;
+ }
+ 
+ /*
+  * Read the external logmsg text file into a malloc'd buffer.
+  *
+  * Returns NULL (without throwing an error) if unable to read, eg
+  * file not there or insufficient memory.
+  *
+  * On success, the buffer size is also returned into *buffer_size.
+  *
+  * This can be called without any lock on pglm->lock, but in that case
+  * the caller is responsible for verifying that the result is sane.
+  */
+ static char *
+ logmsg_load_file(Size *buffer_size)
+ {
+ 	char	   *buf;
+ 	int			fd;
+ 	struct stat stat;
+ 	Size		nread;
+ 
+ 	fd = OpenTransientFile(PGLM_TEXT_FILE, O_RDONLY | PG_BINARY);
+ 	if (fd < 0)
+ 	{
+ 		if (errno != ENOENT)
+ 			ereport(LOG,
+ 					(errcode_for_file_access(),
+ 					 errmsg("could not read file \"%s\": %m",
+ 							PGLM_TEXT_FILE)));
+ 		return NULL;
+ 	}
+ 
+ 	/* Get file length */
+ 	if (fstat(fd, &stat))
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not stat file \"%s\": %m",
+ 						PGLM_TEXT_FILE)));
+ 		CloseTransientFile(fd);
+ 		return NULL;
+ 	}
+ 
+ 	/* Allocate buffer; beware that off_t might be wider than size_t */
+ 	if (stat.st_size <= MaxAllocHugeSize)
+ 		buf = (char *) malloc(stat.st_size);
+ 	else
+ 		buf = NULL;
+ 
+ 	if (buf == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode(ERRCODE_OUT_OF_MEMORY),
+ 				 errmsg("out of memory"),
+ 				 errdetail("Could not allocate enough memory to read file \"%s\".",
+ 						   PGLM_TEXT_FILE)));
+ 		CloseTransientFile(fd);
+ 		return NULL;
+ 	}
+ 
+ 	/*
+ 	 * OK, slurp in the file.  Windows fails if we try to read more than
+ 	 * INT_MAX bytes at once, and other platforms might not like that either,
+ 	 * so read a very large file in 1GB segments.
+ 	 */
+ 	nread = 0;
+ 	while (nread < stat.st_size)
+ 	{
+ 		int			toread = Min(1024 * 1024 * 1024, stat.st_size - nread);
+ 
+ 		/*
+ 		 * If we get a short read and errno doesn't get set, the reason is
+ 		 * probably that garbage collection truncated the file since we did
+ 		 * the fstat(), so we don't log a complaint --- but we don't return
+ 		 * the data, either, since it's most likely corrupt due to concurrent
+ 		 * writes from garbage collection.
+ 		 */
+ 		errno = 0;
+ 		if (read(fd, buf + nread, toread) != toread)
+ 		{
+ 			if (errno)
+ 				ereport(LOG,
+ 						(errcode_for_file_access(),
+ 						 errmsg("could not read file \"%s\": %m",
+ 								PGLM_TEXT_FILE)));
+ 			free(buf);
+ 			CloseTransientFile(fd);
+ 			return NULL;
+ 		}
+ 		nread += toread;
+ 	}
+ 
+ 	if (CloseTransientFile(fd) != 0)
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not close file \"%s\": %m", PGLM_TEXT_FILE)));
+ 
+ 	*buffer_size = nread;
+ 	return buf;
+ }
+ 
+ /*
+  * Locate a logmsg text in the file image previously read by logmsg_load_file().
+  *
+  * We validate the given offset/length, and return NULL if bogus.  Otherwise,
+  * the result points to a null-terminated string within the buffer.
+  */
+ static char *
+ logmsg_fetch(Size logmsg_offset, int logmsg_len,
+ 			char *buffer, Size buffer_size)
+ {
+ 	/* File read failed? */
+ 	if (buffer == NULL)
+ 		return NULL;
+ 	/* Bogus offset/length? */
+ 	if (logmsg_len < 0 ||
+ 		logmsg_offset + logmsg_len >= buffer_size)
+ 		return NULL;
+ 	/* As a further sanity check, make sure there's a trailing null */
+ 	if (buffer[logmsg_offset + logmsg_len] != '\0')
+ 		return NULL;
+ 	/* Looks OK */
+ 	return buffer + logmsg_offset;
+ }
+ 
+ /*
+  * Do we need to garbage-collect the external logmsg text file?
+  *
+  * Caller should hold at least a shared lock on pglm->lock.
+  */
+ static bool
+ need_gc_logmsgs(void)
+ {
+ 	Size		extent;
+ 
+ 	/* Read shared extent pointer */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		extent = s->extent;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ 
+ 	/*
+ 	 * Don't proceed if file does not exceed 512 bytes per possible entry.
+ 	 *
+ 	 * Here and in the next test, 32-bit machines have overflow hazards if
+ 	 * pglm_max and/or mean_logmsg_len are large.  Force the multiplications
+ 	 * and comparisons to be done in uint64 arithmetic to forestall trouble.
+ 	 */
+ 	if ((uint64) extent < (uint64) 512 * pglm_max)
+ 		return false;
+ 
+ 	/*
+ 	 * Don't proceed if file is less than about 50% bloat.  Nothing can or
+ 	 * should be done in the event of unusually large logmsg texts accounting
+ 	 * for file's large size.  We go to the trouble of maintaining the mean
+ 	 * query length in order to prevent garbage collection from thrashing
+ 	 * uselessly.
+ 	 */
+ 	if ((uint64) extent < (uint64) pglm->mean_logmsg_len * pglm_max * 2)
+ 		return false;
+ 
+ 	return true;
+ }
+ 
+ /*
+  * Garbage-collect orphaned logmsg texts in external file.
+  *
+  * This won't be called often in the typical case, since it's likely that
+  * there won't be too much churn, and besides, a similar compaction process
+  * occurs when serializing to disk at shutdown or as part of resetting.
+  * Despite this, it seems prudent to plan for the edge case where the file
+  * becomes unreasonably large, with no other method of compaction likely to
+  * occur in the foreseeable future.
+  *
+  * The caller must hold an exclusive lock on pglm->lock.
+  *
+  * At the first sign of trouble we unlink the logmsg text file to get a clean
+  * slate (although existing statistics are retained), rather than risk
+  * thrashing by allowing the same problem case to recur indefinitely.
+  */
+ static void
+ gc_logmsgs(void)
+ {
+ 	char	   *mlbuffer;
+ 	Size		mlbuffer_size;
+ 	FILE	   *lmfile = NULL;
+ 	HASH_SEQ_STATUS hash_seq;
+ 	pglmEntry  *entry;
+ 	Size		extent;
+ 	int			nentries;
+ 
+ 	/*
+ 	 * When called from pglm_store, some other session might have proceeded
+ 	 * with garbage collection in the no-lock-held interim of lock strength
+ 	 * escalation.  Check once more that this is actually necessary.
+ 	 */
+ 	if (!need_gc_logmsgs())
+ 		return;
+ 
+ 	/*
+ 	 * Load the old texts file.  If we fail (out of memory, for instance),
+ 	 * invalidate logmsg texts.  Hopefully this is rare.  It might seem better
+ 	 * to leave things alone on an OOM failure, but the problem is that the
+ 	 * file is only going to get bigger; hoping for a future non-OOM result is
+ 	 * risky and can easily lead to complete denial of service.
+ 	 */
+ 	mlbuffer = logmsg_load_file(&mlbuffer_size);
+ 	if (mlbuffer == NULL)
+ 		goto gc_fail;
+ 
+ 	/*
+ 	 * We overwrite the logmsg texts file in place, so as to reduce the risk of
+ 	 * an out-of-disk-space failure.  Since the file is guaranteed not to get
+ 	 * larger, this should always work on traditional filesystems; though we
+ 	 * could still lose on copy-on-write filesystems.
+ 	 */
+ 	lmfile = AllocateFile(PGLM_TEXT_FILE, PG_BINARY_W);
+ 	if (lmfile == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not write file \"%s\": %m",
+ 						PGLM_TEXT_FILE)));
+ 		goto gc_fail;
+ 	}
+ 
+ 	extent = 0;
+ 	nentries = 0;
+ 
+ 	hash_seq_init(&hash_seq, pglm_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		int			logmsg_len = entry->logmsg_len;
+ 		char	   *msg = logmsg_fetch(entry->logmsg_offset,
+ 									   logmsg_len,
+ 									   mlbuffer,
+ 									   mlbuffer_size);
+ 
+ 		if (msg == NULL)
+ 		{
+ 			/* Trouble ... drop the text */
+ 			entry->logmsg_offset = 0;
+ 			entry->logmsg_len = -1;
+ 			/* entry will not be counted in mean logmsg length computation */
+ 			continue;
+ 		}
+ 
+ 		if (fwrite(msg, 1, logmsg_len + 1, lmfile) != logmsg_len + 1)
+ 		{
+ 			ereport(LOG,
+ 					(errcode_for_file_access(),
+ 					 errmsg("could not write file \"%s\": %m",
+ 							PGLM_TEXT_FILE)));
+ 			hash_seq_term(&hash_seq);
+ 			goto gc_fail;
+ 		}
+ 
+ 		entry->logmsg_offset = extent;
+ 		extent += logmsg_len + 1;
+ 		nentries++;
+ 	}
+ 
+ 	/*
+ 	 * Truncate away any now-unused space.  If this fails for some odd reason,
+ 	 * we log it, but there's no need to fail.
+ 	 */
+ 	if (ftruncate(fileno(lmfile), extent) != 0)
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not truncate file \"%s\": %m",
+ 						PGLM_TEXT_FILE)));
+ 
+ 	if (FreeFile(lmfile))
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not write file \"%s\": %m",
+ 						PGLM_TEXT_FILE)));
+ 		lmfile = NULL;
+ 		goto gc_fail;
+ 	}
+ 
+ 	elog(DEBUG1, "pglm gc of queries file shrunk size from %zu to %zu",
+ 		 pglm->extent, extent);
+ 
+ 	/* Reset the shared extent pointer */
+ 	pglm->extent = extent;
+ 
+ 	/*
+ 	 * Also update the mean logmsg length, to be sure that need_gc_logmsgs()
+ 	 * won't still think we have a problem.
+ 	 */
+ 	if (nentries > 0)
+ 		pglm->mean_logmsg_len = extent / nentries;
+ 	else
+ 		pglm->mean_logmsg_len = ASSUMED_LENGTH_INIT;
+ 
+ 	free(mlbuffer);
+ 
+ 	/*
+ 	 * OK, count a garbage collection cycle.  (Note: even though we have
+ 	 * exclusive lock on pglm->lock, we must take pglm->mutex for this, since
+ 	 * other processes may examine gc_count while holding only the mutex.
+ 	 * Also, we have to advance the count *after* we've rewritten the file,
+ 	 * else other processes might not realize they read a stale file.)
+ 	 */
+ 	record_gc_msgfmts();
+ 
+ 	return;
+ 
+ gc_fail:
+ 	/* clean up resources */
+ 	if (lmfile)
+ 		FreeFile(lmfile);
+ 	free(mlbuffer);
+ 
+ 	/*
+ 	 * Since the contents of the external file are now uncertain, mark all
+ 	 * hashtable entries as having invalid texts.
+ 	 */
+ 	hash_seq_init(&hash_seq, pglm_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		entry->logmsg_offset = 0;
+ 		entry->logmsg_len = -1;
+ 	}
+ 
+ 	/*
+ 	 * Destroy the logmsg text file and create a new, empty one
+ 	 */
+ 	(void) unlink(PGLM_TEXT_FILE);
+ 	lmfile = AllocateFile(PGLM_TEXT_FILE, PG_BINARY_W);
+ 	if (lmfile == NULL)
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not recreate file \"%s\": %m",
+ 						PGLM_TEXT_FILE)));
+ 	else
+ 		FreeFile(lmfile);
+ 
+ 	/* Reset the shared extent pointer */
+ 	pglm->extent = 0;
+ 
+ 	/* Reset mean_logmsg_len to match the new state */
+ 	pglm->mean_logmsg_len = ASSUMED_LENGTH_INIT;
+ 
+ 	/*
+ 	 * Bump the GC count even though we failed.
+ 	 *
+ 	 * This is needed to make concurrent readers of file without any lock on
+ 	 * pglm->lock notice existence of new version of file.  Once readers
+ 	 * subsequently observe a change in GC count with pglm->lock held, that
+ 	 * forces a safe reopen of file.  Writers also require that we bump here,
+ 	 * of course.  (As required by locking protocol, readers and writers don't
+ 	 * trust earlier file contents until gc_count is found unchanged after
+ 	 * pglm->lock acquired in shared or exclusive mode respectively.)
+ 	 */
+ 	record_gc_msgfmts();
+ }
+ 
+ /*
+  * Release entries corresponding to parameters passed.
+  */
+ static void
+ entry_reset(char *filename, int lineno, int elevel)
+ {
+ 	HASH_SEQ_STATUS hash_seq;
+ 	pglmEntry  *entry;
+ 	FILE	   *lmfile;
+ 	long		num_entries;
+ 	long		num_remove = 0;
+ 	pglmHashKey key;
+ 
+ 	if (!pglm || !pglm_hash)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("pg_stat_logmsg must be loaded via shared_preload_libraries")));
+ 
+ 	LWLockAcquire(pglm->lock, LW_EXCLUSIVE);
+ 	num_entries = hash_get_num_entries(pglm_hash);
+ 
+ 	if (filename != NULL && lineno != 0 && elevel != 0)
+ 	{
+ 		/* If all the parameters are available, use the fast path. */
+ 		memset(&key, 0, sizeof(pglmHashKey));
+ 
+ 		if (filename)
+ 		{
+ 			size_t		filename_len = strlen(filename);
+ 
+ 			memcpy(key.filename, filename,
+ 				   filename_len < MAXFILENAMELEN ?
+ 								  filename_len : MAXFILENAMELEN );
+ 		}
+ 		else
+ 			memset(key.filename, '\0', MAXFILENAMELEN);
+ 
+ 		key.lineno = lineno;
+ 		key.elevel = elevel;
+ 
+ 		/* Remove the key if exists */
+ 		entry = (pglmEntry *) hash_search(pglm_hash, &key, HASH_REMOVE, NULL);
+ 		if (entry)				/* found */
+ 			num_remove++;
+ 	}
+ 	else if (filename != NULL || lineno != 0 || elevel != 0)
+ 	{
+ 		/* Remove entries corresponding to valid parameters. */
+ 		hash_seq_init(&hash_seq, pglm_hash);
+ 		while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 		{
+ 			if ((!filename || entry->key.filename == filename) &&
+ 				(!lineno || entry->key.lineno == lineno) &&
+ 				(!elevel || entry->key.elevel == elevel))
+ 			{
+ 				hash_search(pglm_hash, &entry->key, HASH_REMOVE, NULL);
+ 				num_remove++;
+ 			}
+ 		}
+ 	}
+ 	else
+ 	{
+ 		/* Remove all entries. */
+ 		hash_seq_init(&hash_seq, pglm_hash);
+ 		while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 		{
+ 			hash_search(pglm_hash, &entry->key, HASH_REMOVE, NULL);
+ 			num_remove++;
+ 		}
+ 	}
+ 
+ 	/* All entries are removed? */
+ 	if (num_entries != num_remove)
+ 		goto release_lock;
+ 
+ 	/*
+ 	 * Reset global statistics for pg_stat_logmsg since all entries are
+ 	 * removed.
+ 	 */
+ 	{
+ 		volatile pglmSharedState *s = (volatile pglmSharedState *) pglm;
+ 		TimestampTz stats_reset = GetCurrentTimestamp();
+ 
+ 		SpinLockAcquire(&s->mutex);
+ 		s->stats.dealloc = 0;
+ 		s->stats.stats_reset = stats_reset;
+ 		SpinLockRelease(&s->mutex);
+ 	}
+ 
+ 	/*
+ 	 * Write new empty logmsg file, perhaps even creating a new one to recover
+ 	 * if the file was missing.
+ 	 */
+ 	lmfile = AllocateFile(PGLM_TEXT_FILE, PG_BINARY_W);
+ 	if (lmfile == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not create file \"%s\": %m",
+ 						PGLM_TEXT_FILE)));
+ 		goto done;
+ 	}
+ 
+ 	/* If ftruncate fails, log it, but it's not a fatal problem */
+ 	if (ftruncate(fileno(lmfile), 0) != 0)
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not truncate file \"%s\": %m",
+ 						PGLM_TEXT_FILE)));
+ 
+ 	FreeFile(lmfile);
+ 
+ done:
+ 	pglm->extent = 0;
+ 	/* This counts as a logmsg text garbage collection for our purposes */
+ 	record_gc_msgfmts();
+ 
+ release_lock:
+ 	LWLockRelease(pglm->lock);
+ }
+ 
+ /*
+  * logmsg_xact_callback --- cleanup at main-transaction end.
+  */
+ static void
+ logmsg_xact_callback(XactEvent event, void *arg)
+ {
+ 	is_in_log_hook = false;
+ }
+ 
+ #if PG_VERSION_NUM < 150000
+ /* flag bits for InitMaterializedSRF() */
+ #define MAT_SRF_USE_EXPECTED_DESC	0x01	/* use expectedDesc as tupdesc. */
+ #define MAT_SRF_BLESS				0x02	/* "Bless" a tuple descriptor with
+ 											 * BlessTupleDesc(). */
+ 
+ /*
+  * InitMaterializedSRF
+  *
+  * Helper function to build the state of a set-returning function used
+  * in the context of a single call with materialize mode.  This code
+  * includes sanity checks on ReturnSetInfo, creates the Tuplestore and
+  * the TupleDesc used with the function and stores them into the
+  * function's ReturnSetInfo.
+  *
+  * "flags" can be set to MAT_SRF_USE_EXPECTED_DESC, to use the tuple
+  * descriptor coming from expectedDesc, which is the tuple descriptor
+  * expected by the caller.  MAT_SRF_BLESS can be set to complete the
+  * information associated to the tuple descriptor, which is necessary
+  * in some cases where the tuple descriptor comes from a transient
+  * RECORD datatype.
+  */
+ static void
+ InitMaterializedSRF(FunctionCallInfo fcinfo, bits32 flags)
+ {
+ 	bool		random_access;
+ 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	Tuplestorestate *tupstore;
+ 	MemoryContext old_context,
+ 				per_query_ctx;
+ 	TupleDesc	stored_tupdesc;
+ 
+ 	/* check to see if caller supports returning a tuplestore */
+ 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("set-valued function called in context that cannot accept a set")));
+ 	if (!(rsinfo->allowedModes & SFRM_Materialize) ||
+ 		((flags & MAT_SRF_USE_EXPECTED_DESC) != 0 && rsinfo->expectedDesc == NULL))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("materialize mode required, but it is not allowed in this context")));
+ 
+ 	/*
+ 	 * Store the tuplestore and the tuple descriptor in ReturnSetInfo.  This
+ 	 * must be done in the per-query memory context.
+ 	 */
+ 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ 	old_context = MemoryContextSwitchTo(per_query_ctx);
+ 
+ 	/* build a tuple descriptor for our result type */
+ 	if ((flags & MAT_SRF_USE_EXPECTED_DESC) != 0)
+ 		stored_tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+ 	else
+ 	{
+ 		if (get_call_result_type(fcinfo, NULL, &stored_tupdesc) != TYPEFUNC_COMPOSITE)
+ 			elog(ERROR, "return type must be a row type");
+ 	}
+ 
+ 	/* If requested, bless the tuple descriptor */
+ 	if ((flags & MAT_SRF_BLESS) != 0)
+ 		BlessTupleDesc(stored_tupdesc);
+ 
+ 	random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ 
+ 	tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ 	rsinfo->returnMode = SFRM_Materialize;
+ 	rsinfo->setResult = tupstore;
+ 	rsinfo->setDesc = stored_tupdesc;
+ 	MemoryContextSwitchTo(old_context);
+ }
+ 
+ /*
+  * error_severity --- get string representing elevel
+  *
+  * The string is not localized here, but we mark the strings for translation
+  * so that callers can invoke _() on the result.
+  */
+ static const char *
+ error_severity(int elevel)
+ {
+ 	const char *prefix;
+ 
+ 	switch (elevel)
+ 	{
+ 		case DEBUG1:
+ 		case DEBUG2:
+ 		case DEBUG3:
+ 		case DEBUG4:
+ 		case DEBUG5:
+ 			prefix = gettext_noop("DEBUG");
+ 			break;
+ 		case LOG:
+ 		case LOG_SERVER_ONLY:
+ 			prefix = gettext_noop("LOG");
+ 			break;
+ 		case INFO:
+ 			prefix = gettext_noop("INFO");
+ 			break;
+ 		case NOTICE:
+ 			prefix = gettext_noop("NOTICE");
+ 			break;
+ 		case WARNING:
+ #if PG_VERSION_NUM >= 140000
+ 		case WARNING_CLIENT_ONLY:
+ #endif /* PG_VERSION_NUM >= 140000 */
+ 			prefix = gettext_noop("WARNING");
+ 			break;
+ 		case ERROR:
+ 			prefix = gettext_noop("ERROR");
+ 			break;
+ 		case FATAL:
+ 			prefix = gettext_noop("FATAL");
+ 			break;
+ 		case PANIC:
+ 			prefix = gettext_noop("PANIC");
+ 			break;
+ 		default:
+ 			prefix = "???";
+ 			break;
+ 	}
+ 
+ 	return prefix;
+ }
+ #endif /* PG_VERSION_NUM < 150000 */
diff --git a/contrib/pg_stat_logmsg/pg_stat_logmsg.conf b/contrib/pg_stat_logmsg/pg_stat_logmsg.conf
index ...ab6336c .
*** a/contrib/pg_stat_logmsg/pg_stat_logmsg.conf
--- b/contrib/pg_stat_logmsg/pg_stat_logmsg.conf
***************
*** 0 ****
--- 1 ----
+ shared_preload_libraries = 'pg_stat_logmsg'
diff --git a/contrib/pg_stat_logmsg/pg_stat_logmsg.control b/contrib/pg_stat_logmsg/pg_stat_logmsg.control
index ...800b550 .
*** a/contrib/pg_stat_logmsg/pg_stat_logmsg.control
--- b/contrib/pg_stat_logmsg/pg_stat_logmsg.control
***************
*** 0 ****
--- 1,5 ----
+ # pg_stat_logmsg extension
+ comment = 'track statistics of log messages'
+ default_version = '1.0'
+ module_pathname = '$libdir/pg_stat_logmsg'
+ relocatable = true
diff --git a/doc/src/sgml/pgstatlogmsg.sgml b/doc/src/sgml/pgstatlogmsg.sgml
index ...3f52beb .
*** a/doc/src/sgml/pgstatlogmsg.sgml
--- b/doc/src/sgml/pgstatlogmsg.sgml
***************
*** 0 ****
--- 1,595 ----
+ <!-- doc/src/sgml/pgstatlogmsg.sgml -->
+ 
+ <sect1 id="pgstatlogmsg" xreflabel="pg_stat_logmsg">
+  <title>pg_stat_logmsg &mdash; track counts of normalized server log messages</title>
+ 
+  <indexterm zone="pgstatlogmsg">
+   <primary>pg_stat_logmsg</primary>
+  </indexterm>
+ 
+  <para>
+   The <filename>pg_stat_logmsg</filename> module provides a means for
+   tracking the aggregated counts of normalized server log message strings.
+  </para>
+ 
+  <para>
+   The module must be loaded by adding <literal>pg_stat_logmsg</literal> to
+   <xref linkend="guc-shared-preload-libraries"/> in
+   <filename>postgresql.conf</filename>, because it requires additional shared memory.
+   This means that a server restart is needed to add or remove the module.
+  </para>
+ 
+  <para>
+    When <filename>pg_stat_logmsg</filename> is active, it tracks
+    log message counts across all databases of the server. To access and manipulate
+    these statistics, the module provides views
+    <structname>pg_stat_logmsg</structname> and
+    <structname>pg_stat_logmsg_info</structname>,
+    and the utility functions <function>pg_stat_logmsg_reset</function> and
+    <function>pg_stat_logmsg</function>.  These are not available globally but
+    can be enabled for a specific database with
+    <command>CREATE EXTENSION pg_stat_logmsg</command>.
+  </para>
+ 
+  <sect2 id="pgstatlogmsg-pg-stat-logmsg">
+   <title>The <structname>pg_stat_logmsg</structname> View</title>
+ 
+   <para>
+    The counts gathered by the module are made available via a
+    view named <structname>pg_stat_logmsg</structname>.  This view
+    contains one row for each distinct combination of source code
+    filename, source code line number, and log severity level
+    (up to the maximum number of distinct messages that the module
+    can track). The columns of the view are shown in
+    <xref linkend="pgstatlogmsg-columns"/>.
+   </para>
+ 
+   <table id="pgstatlogmsg-columns">
+    <title><structname>pg_stat_logmsg</structname> Columns</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        Column Type
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+ 
+     <tbody>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>pgversion</structfield> <type>text</type>
+       </para>
+       <para>
+        <productname>PostgreSQL</productname> version number. Useful for
+        correlating source code filename and line number with the actual
+        entry source code.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>filename</structfield> <type>text</type>
+       </para>
+       <para>
+        Source code filename where the log message originates.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>lineno</structfield> <type>int</type>
+       </para>
+       <para>
+        Source code line number where the log message originates.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>elevel</structfield> <type>text</type>
+       </para>
+       <para>
+        Log message severity level as called at runtime.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>funcname</structfield> <type>text</type>
+       </para>
+       <para>
+        Source code function name where the log message originates.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>sqlerrcode</structfield> <type>text</type>
+       </para>
+       <para>
+        SQL error code of the log message. See <xref linkend="errcodes-appendix"/>.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>message</structfield> <type>text</type>
+       </para>
+       <para>
+        Normalized log message. Basically the log message text with the
+        replaceable format parameters rather than the actual runtime
+        data.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>translated_message</structfield> <type>text</type>
+       </para>
+       <para>
+        Same as <structfield>message</structfield>, except localized.
+       </para></entry>
+      </row>
+ 
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>count</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of times this log message has occurred.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+ 
+   <para>
+    For security reasons, only superusers and roles with privileges of the
+    <literal>pg_read_all_stats</literal> role are allowed to view this data.
+   </para>
+ 
+   <para>
+    Log messages are combined into a single
+    <structname>pg_stat_logmsg</structname> entry whenever they
+    originate from identical source code <structfield>filename</structfield>
+    and line number (<structfield>lineno</structfield>),
+    and they have the same runtime error severity level
+    (<structfield>elevel</structfield>).
+   </para>
+ 
+   <para>
+    As a rule of thumb, <structfield>filename</structfield>,
+    <structfield>lineno</structfield>, and <structfield>elevel</structfield>
+    values can be assumed to be stable and comparable only so long as
+    the underlying server version stays exactly the same.
+   </para>
+ 
+   <para>
+    The representative log messages are kept in an external disk file, and do
+    not consume shared memory. Therefore, even very lengthy log messages can
+    be stored successfully.  However, if many long log messages are
+    accumulated, the external file might grow unmanageably large.  As a
+    recovery method if that happens, <filename>pg_stat_logmsg</filename> may
+    choose to discard the log messages, whereupon all existing entries in
+    the <structname>pg_stat_logmsg</structname> view will show
+    null <structfield>message</structfield> fields, though the counts
+    associated with each row are preserved. If this happens, consider
+    reducing <varname>pg_stat_logmsg.max</varname> to prevent
+    recurrences.
+   </para>
+  </sect2>
+ 
+  <sect2 id="pgstatlogmsg-pg-stat-logmsg-info">
+   <title>The <structname>pg_stat_logmsg_info</structname> View</title>
+ 
+   <indexterm>
+    <primary>pg_stat_logmsg_info</primary>
+   </indexterm>
+ 
+   <para>
+    The statistics of the <filename>pg_stat_logmsg</filename> module
+    itself are tracked and made available via a view named
+    <structname>pg_stat_logmsg_info</structname>.  This view contains
+    only a single row.  The columns of the view are shown in
+    <xref linkend="pgstatlogmsginfo-columns"/>.
+   </para>
+ 
+   <table id="pgstatlogmsginfo-columns">
+    <title><structname>pg_stat_logmsg_info</structname> Columns</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        Column Type
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+ 
+     <tbody>
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>dealloc</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Total number of times <structname>pg_stat_logmsg</structname>
+        entries about the least-executed statements were deallocated
+        because more distinct statements than
+        <varname>pg_stat_logmsg.max</varname> were observed
+       </para></entry>
+      </row>
+      <row>
+       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+       </para>
+       <para>
+        Time at which all statistics in the
+        <structname>pg_stat_logmsg</structname> view were last reset.
+       </para></entry>
+      </row>
+ 
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
+ 
+  <sect2 id="pgstatlogmsg-funcs">
+   <title>Functions</title>
+ 
+   <variablelist>
+    <varlistentry>
+     <term>
+      <function>pg_stat_logmsg_reset() returns void</function>
+      <indexterm>
+       <primary>pg_stat_logmsg_reset</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+       <function>pg_stat_logmsg_reset</function> discards all statistics
+       gathered so far by <filename>pg_stat_logmsg</filename>. It will
+       also reset the statistics in the
+       <structname>pg_stat_logmsg_info</structname> view.
+       By default, this function can only be executed by superusers.
+       Access may be granted to others using <command>GRANT</command>.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term>
+      <function>pg_stat_logmsg() returns setof record</function>
+      <indexterm>
+       <primary>pg_stat_logmsg</primary>
+       <secondary>function</secondary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+       The <structname>pg_stat_logmsg</structname> view is defined in
+       terms of a function also named <function>pg_stat_logmsg</function>.
+       It is possible for clients to call
+       the <function>pg_stat_logmsg</function> function directly.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </sect2>
+ 
+  <sect2 id="pgstatlogmsg-config-params">
+   <title>Configuration Parameters</title>
+ 
+   <variablelist>
+    <varlistentry>
+     <term>
+      <varname>pg_stat_logmsg.max</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>pg_stat_logmsg.max</varname> configuration parameter</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+       <varname>pg_stat_logmsg.max</varname> is the maximum number of
+       log messages tracked by the module (i.e., the maximum number of rows
+       in the <structname>pg_stat_logmsg</structname> view). If more distinct
+       log messages than that are observed, information about the least-recorded
+       log message is discarded.  The number of times such information was
+       discarded can be seen in the
+       <structname>pg_stat_logmsg_info</structname> view.
+       The default value is 5000.
+       This parameter can only be set at server start.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <para>
+    The module requires additional shared memory proportional to
+    <varname>pg_stat_logmsg.max</varname>.  Note that this
+    memory is consumed whenever the module is loaded.
+   </para>
+ 
+   <para>
+    These parameters must be set in <filename>postgresql.conf</filename>.
+    Typical usage might be:
+ 
+ <programlisting>
+ # postgresql.conf
+ shared_preload_libraries = 'pg_stat_logmsg'
+ 
+ pg_stat_logmsg.max = 10000
+ </programlisting>
+   </para>
+  </sect2>
+ 
+  <sect2 id="pgstatlogmsg-sample-output">
+   <title>Sample Output</title>
+ 
+ <screen>
+ $ make installcheck
+ $ psql logmsg
+ logmsg=# \x
+ logmsg=# -- most common messages
+ logmsg=# SELECT * FROM pg_stat_logmsg
+          ORDER BY count DESC LIMIT 5;
+ -[ RECORD 1 ]------+----------------------------------------
+ pgversion          | 17devel
+ filename           | aclchk.c
+ lineno             | 2807
+ elevel             | ERROR
+ funcname           | aclcheck_error
+ sqlerrcode         | 42501
+ message            | permission denied for schema %s
+ translated_message | permission denied for schema %s
+ count              | 282
+ -[ RECORD 2 ]------+----------------------------------------
+ pgversion          | 17devel
+ filename           | mcxt.c
+ lineno             | 913
+ elevel             | LOG
+ funcname           | MemoryContextStatsPrint
+ sqlerrcode         | 00000
+ message            | level: %d; %s: %s%s
+ translated_message | level: %d; %s: %s%s
+ count              | 255
+ -[ RECORD 3 ]------+----------------------------------------
+ pgversion          | 17devel
+ filename           | pl_exec.c
+ lineno             | 3909
+ elevel             | WARNING
+ funcname           | exec_stmt_raise
+ sqlerrcode         | 01000
+ message            | %s
+ translated_message | %s
+ count              | 241
+ -[ RECORD 4 ]------+----------------------------------------
+ pgversion          | 17devel
+ filename           | scan.l
+ lineno             | 1241
+ elevel             | ERROR
+ funcname           | scanner_yyerror
+ sqlerrcode         | 42601
+ message            | %s at or near "%s"
+ translated_message | %s at or near "%s"
+ count              | 133
+ -[ RECORD 5 ]------+----------------------------------------
+ pgversion          | 17devel
+ filename           | datetime.c
+ lineno             | 4002
+ elevel             | ERROR
+ funcname           | DateTimeParseError
+ sqlerrcode         | 22015
+ message            | interval field value out of range: "%s"
+ translated_message | interval field value out of range: "%s"
+ count              | 120
+ 
+ logmsg=# -- internal errors
+ logmsg=# SELECT * FROM pg_stat_logmsg
+          WHERE sqlerrcode = 'XX000'
+          ORDER BY count DESC LIMIT 5;
+ -[ RECORD 1 ]------+-------------------------------------------------------
+ pgversion          | 17devel
+ filename           | user.c
+ lineno             | 2130
+ elevel             | ERROR
+ funcname           | check_role_membership_authorization
+ sqlerrcode         | XX000
+ message            | role "%s" cannot have explicit members
+ translated_message | role "%s" cannot have explicit members
+ count              | 2
+ -[ RECORD 2 ]------+-------------------------------------------------------
+ pgversion          | 17devel
+ filename           | tablecmds.c
+ lineno             | 11216
+ elevel             | ERROR
+ funcname           | ATExecAlterConstraint
+ sqlerrcode         | XX000
+ message            | cannot alter constraint "%s" on relation "%s"
+ translated_message | cannot alter constraint "%s" on relation "%s"
+ count              | 2
+ -[ RECORD 3 ]------+-------------------------------------------------------
+ pgversion          | 17devel
+ filename           | tid.c
+ lineno             | 358
+ elevel             | ERROR
+ funcname           | currtid_for_view
+ sqlerrcode         | XX000
+ message            | currtid cannot handle views with no CTID
+ translated_message | currtid cannot handle views with no CTID
+ count              | 1
+ -[ RECORD 4 ]------+-------------------------------------------------------
+ pgversion          | 17devel
+ filename           | tid.c
+ lineno             | 315
+ elevel             | ERROR
+ funcname           | currtid_internal
+ sqlerrcode         | XX000
+ message            | cannot look at latest visible tid for relation "%s.%s"
+ translated_message | cannot look at latest visible tid for relation "%s.%s"
+ count              | 1
+ -[ RECORD 5 ]------+-------------------------------------------------------
+ pgversion          | 17devel
+ filename           | pg_locale.c
+ lineno             | 2904
+ elevel             | ERROR
+ funcname           | icu_validate_locale
+ sqlerrcode         | XX000
+ message            | ICU locale "%s" has unknown language "%s"
+ translated_message | ICU locale "%s" has unknown language "%s"
+ count              | 1
+ 
+ logmsg=# SELECT sqlerrcode, sum(count) AS total_count
+          FROM pg_stat_logmsg
+          GROUP BY sqlerrcode ORDER BY 1;
+  sqlerrcode | total_count 
+ ------------+-------------
+  00000      |         295
+  01000      |         318
+  01006      |           1
+  01007      |           5
+  08006      |           2
+  08P01      |           1
+  09000      |           5
+  0A000      |         400
+  0LP01      |           9
+  0Z002      |           2
+  21000      |           9
+  22000      |          11
+  22001      |          17
+  22003      |         184
+  22004      |          25
+  22005      |           1
+  22007      |          91
+  22008      |          95
+  22009      |           2
+  2200C      |           1
+  2200F      |           1
+  2200H      |           2
+  2200L      |           5
+  2200M      |           2
+  2200N      |          16
+  2200S      |           2
+  2200T      |           1
+  22011      |           6
+  22012      |          54
+  22013      |           6
+  22014      |           1
+  22015      |         120
+  22016      |           1
+  2201B      |           7
+  2201E      |          21
+  2201F      |          11
+  2201G      |          14
+  2201W      |           1
+  22021      |           1
+  22023      |         332
+  22025      |           3
+  22026      |           5
+  2202E      |          33
+  2202G      |           1
+  2202H      |           5
+  22030      |          14
+  22031      |           8
+  22033      |           7
+  22036      |          13
+  22038      |           9
+  22039      |           3
+  2203A      |           8
+  2203B      |           2
+  2203C      |           4
+  22P02      |         357
+  22P04      |          13
+  22P05      |           8
+  22P06      |          10
+  23502      |          69
+  23503      |         132
+  23505      |          70
+  23514      |         220
+  23P01      |          13
+  24000      |           9
+  25001      |          21
+  25006      |          16
+  25P01      |          33
+  25P02      |          14
+  27000      |           3
+  2BP01      |          86
+  2F003      |           1
+  2F005      |           1
+  34000      |           5
+  39P03      |           1
+  3B001      |           1
+  3D000      |           4
+  3F000      |          16
+  42501      |         525
+  42601      |         387
+  42602      |           8
+  42611      |           7
+  42701      |          18
+  42702      |           7
+  42703      |         140
+  42704      |         154
+  42710      |          67
+  42712      |          12
+  42723      |           4
+  42725      |          14
+  42803      |          38
+  42804      |         171
+  42809      |         135
+  42830      |           2
+  42846      |           4
+  42883      |         138
+  428C9      |          21
+  42P01      |         128
+  42P02      |           1
+  42P05      |           1
+  42P06      |           1
+  42P07      |          20
+  42P09      |           2
+  42P10      |          85
+  42P13      |          79
+  42P15      |          15
+  42P16      |         119
+  42P17      |          98
+  42P18      |           1
+  42P19      |          19
+  42P20      |          14
+  42P21      |          17
+  42P22      |           8
+  44000      |          31
+  54000      |           3
+  54001      |           5
+  54011      |           3
+  55000      |         143
+  55006      |           8
+  55P04      |           5
+  58P01      |           5
+  F0000      |           3
+  P0001      |          42
+  P0002      |           5
+  P0003      |          11
+  P0004      |           4
+  XX000      |          14
+ (126 rows)
+ 
+ </screen>
+  </sect2>
+ 
+  <sect2 id="pgstatlogmsg-authors">
+   <title>Authors</title>
+ 
+   <para>
+    Joe Conway <email>mail@joeconway.com</email>.
+   </para>
+  </sect2>
+ 
+ </sect1>

Tomas Vondra

tomas.vondra@enterprisedb.com

over 1 year ago

In reply to: Joe Conway (#7)

Re: RFC: pg_stat_logmsg

Hi,

I noticed this patch hasn't moved since September 2023, so I wonder
what's the main blocker / what is needed to move this?

As for the feature, I've never done a fleet-wide analysis, so if this is
one of the main use cases, I'm not really sure I can judge if this is a
good tool for that. It seems like it might be a convenient way to do
that, but does that require we add the module to contrib?

As for the code, I wonder if the instability of line numbers could be a
problem - these can change (a little bit) between minor releases, so
after an upgrade we'll read the dump file with line numbers from the old
release, and then start adding entries with new line numbers. Do we need
to handle this in some way?

This might be partially solved by eviction of entries from the old
release - we apply decay, so after a while their usage will be 0. But
what if there's no pressure for space, we'll not actually evict them.
And it'll be confusing to have a mix of old/new line numbers.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Tomas Vondra (#8)

Re: RFC: pg_stat_logmsg

On Wed, Jul 17, 2024 at 12:14:36AM +0200, Tomas Vondra wrote:

I noticed this patch hasn't moved since September 2023, so I wonder
what's the main blocker / what is needed to move this?

+ /* Location of permanent stats file (valid when database is shut down) */
+ #define PGLM_DUMP_FILE	PGSTAT_STAT_PERMANENT_DIRECTORY
"/pg_stat_logmsg.stat

Perhaps this does not count as a valid reason, but does it really make
sense to implement things this way, knowing that this could be changed
to rely on a potential pluggable pgstats? I mean this one I've
proposed:
/messages/by-id/Zmqm9j5EO0I4W8dx@paquier.xyz

One potential implementation is to stick that to be fixed-numbered,
because there is a maximum cap to the number of entries proposed by
the patch, while keeping the whole in memory.

+ logmsg_store(ErrorData *edata, Size *logmsg_offset,
+ 			 int *logmsg_len, int *gc_count)

The patch shares a lot of perks with pg_stat_statements that don't
scale well. I'm wondering if it is a good idea to duplicate these
properties in a second, different, module, like the external file can
could be written out on a periodic basis depending on the workload.
I am not saying that the other thread is a magic solution for
everything (not looked yet at how this would plug with the cap in
entries that pg_stat_statements wants), just one option on the table.

As for the code, I wonder if the instability of line numbers could be a
problem - these can change (a little bit) between minor releases, so
after an upgrade we'll read the dump file with line numbers from the old
release, and then start adding entries with new line numbers. Do we need
to handle this in some way?

Indeed. Perhaps a PostgreSQL version number assigned to each entry to
know from which binary an entry comes from? This would cost a couple
of extra bytes for each entry still that would be the best information
possible to match that with a correct code tree. If it comes to that,
even getting down to a commit SHA1 could be useful to provide the
lowest level of granularity. Another thing would be to give up on the
line number, stick to the uniqueness in the stats entries with the
errcode and the file name, but that won't help for things like
tablecmds.c.

This might be partially solved by eviction of entries from the old
release - we apply decay, so after a while their usage will be 0. But
what if there's no pressure for space, we'll not actually evict them.
And it'll be confusing to have a mix of old/new line numbers.

Once we know that these stats not going to be relevant anymore as of a
minor upgrade flow, resetting them could be the move that makes the
most sense, leaving the reset to the provider doing the upgrades,
while taking a snapshot of the past data before the reset? I find the
whole problem tricky to define, TBH.
--
Michael

#10

Joe Conway

mail@joeconway.com

over 1 year ago

In reply to: Tomas Vondra (#8)

Re: RFC: pg_stat_logmsg

On 7/16/24 18:14, Tomas Vondra wrote:

I noticed this patch hasn't moved since September 2023, so I wonder
what's the main blocker / what is needed to move this?

Mainly me finding time I'm afraid.

As for the feature, I've never done a fleet-wide analysis, so if this is
one of the main use cases, I'm not really sure I can judge if this is a
good tool for that. It seems like it might be a convenient way to do
that, but does that require we add the module to contrib?

I had an offlist chat with Andres about this IIRC and he suggested he
thought it ought to be just built in to the backend as part of the
statistics subsystem. Lately though I have been toying with the idea of
keeping it as an extension and basing it off Michael Paquier's work for
Pluggable cumulative statistics.

As for the code, I wonder if the instability of line numbers could be a
problem - these can change (a little bit) between minor releases, so
after an upgrade we'll read the dump file with line numbers from the old
release, and then start adding entries with new line numbers. Do we need
to handle this in some way?

Hmm, yeah, I had been planning to include postgres version as part of
the output, but maybe it would need to be part of the key.

This might be partially solved by eviction of entries from the old
release - we apply decay, so after a while their usage will be 0. But
what if there's no pressure for space, we'll not actually evict them.
And it'll be confusing to have a mix of old/new line numbers.

Agreed.

I am going to try hard to get back to this sooner rather than later, but
realistically that might be in time for the September commitfest rather
than during this one.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11

Joe Conway

mail@joeconway.com

over 1 year ago

In reply to: Michael Paquier (#9)

Re: RFC: pg_stat_logmsg

On 7/16/24 19:08, Michael Paquier wrote:

On Wed, Jul 17, 2024 at 12:14:36AM +0200, Tomas Vondra wrote:

I noticed this patch hasn't moved since September 2023, so I wonder
what's the main blocker / what is needed to move this?
+ /* Location of permanent stats file (valid when database is shut down) */
+ #define PGLM_DUMP_FILE	PGSTAT_STAT_PERMANENT_DIRECTORY
"/pg_stat_logmsg.stat
Perhaps this does not count as a valid reason, but does it really make
sense to implement things this way, knowing that this could be changed
to rely on a potential pluggable pgstats? I mean this one I've
proposed:
/messages/by-id/Zmqm9j5EO0I4W8dx@paquier.xyz

Yep, see my adjacent reply to Tomas.

One potential implementation is to stick that to be fixed-numbered,
because there is a maximum cap to the number of entries proposed by
the patch, while keeping the whole in memory.
+ logmsg_store(ErrorData *edata, Size *logmsg_offset,
+ 			 int *logmsg_len, int *gc_count)
The patch shares a lot of perks with pg_stat_statements that don't
scale well. I'm wondering if it is a good idea to duplicate these
properties in a second, different, module, like the external file can
could be written out on a periodic basis depending on the workload.
I am not saying that the other thread is a magic solution for
everything (not looked yet at how this would plug with the cap in
entries that pg_stat_statements wants), just one option on the table.

As for the code, I wonder if the instability of line numbers could be a
problem - these can change (a little bit) between minor releases, so
after an upgrade we'll read the dump file with line numbers from the old
release, and then start adding entries with new line numbers. Do we need
to handle this in some way?

Indeed. Perhaps a PostgreSQL version number assigned to each entry to
know from which binary an entry comes from? This would cost a couple
of extra bytes for each entry still that would be the best information
possible to match that with a correct code tree. If it comes to that,
even getting down to a commit SHA1 could be useful to provide the
lowest level of granularity. Another thing would be to give up on the
line number, stick to the uniqueness in the stats entries with the
errcode and the file name, but that won't help for things like
tablecmds.c.

I think including version in the key makes most sense. Also do we even
have a mechanism to grab the commit sha in running code?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Joe Conway (#11)

Re: RFC: pg_stat_logmsg

On Wed, Jul 17, 2024 at 07:48:15AM -0400, Joe Conway wrote:

I think including version in the key makes most sense. Also do we even have
a mechanism to grab the commit sha in running code?

Not directly, still that's doable.

The closest thing I would consider here is to get the output of
something like 'git rev-parse --short HEAD` and attach it to
PG_VERSION with --with-extra-version. I do that in my local builds
because I always want to know from which commit I am building
something. Then, PG_VERSION could be stored with the entries while
hashing the stats key with the version string, the error code, the
source file name and/or the line number for uniqueness. 32 bytes of
room would be most likely enough when it comes to the PG_VERSION data
stored in the stats entries.
--
Michael

#13

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Joe Conway (#10)

Re: RFC: pg_stat_logmsg

On Wed, Jul 17, 2024 at 07:43:13AM -0400, Joe Conway wrote:

On 7/16/24 18:14, Tomas Vondra wrote:

As for the feature, I've never done a fleet-wide analysis, so if this is
one of the main use cases, I'm not really sure I can judge if this is a
good tool for that. It seems like it might be a convenient way to do
that, but does that require we add the module to contrib?

I had an offlist chat with Andres about this IIRC and he suggested he
thought it ought to be just built in to the backend as part of the
statistics subsystem. Lately though I have been toying with the idea of
keeping it as an extension and basing it off Michael Paquier's work for
Pluggable cumulative statistics.

This may live better as a contrib/ module, serving as well as an extra
template for what can be done with the pluggable stats. Adding that
in core is of course OK for me if that's the consensus. The APIs for
pluggable stats are really the same as what you would store in core,
minus the system functions you'd want to add in the catalog .dat
files, of course.

I'd like to get it this part done by the end of this commit fest to
have room with pg_stat_statements for this release, but well, we'll
see. As far as I can see everybody who commented on the thread seems
kind of OK with the idea to fix the stats kinds IDs in time, like
custom RMGRs. That's just simpler implementation-wise, but I'm also
looking for more opinions.

Hmm, yeah, I had been planning to include postgres version as part of the
output, but maybe it would need to be part of the key.

Seems to me that you should do both, then: add PG_VERSION to the
entries, and hash the keys with it for uniqueness. You could also
have a reset function that performs a removal of the stats for
anything else than the current PG_VERSION, for example.
--
Michael