Add errdetail() with PID and UID about source of termination signal
Hi all,
From time to time we have scenario where somebody has some backend killed by
some_script_somewhere(TM) in a multi-department company scenario and
to me it was
always unnecessary time intensive to identify the origin of the signal.
eBPF/bpftrace can be used to find the origin, but I think that PG could
simply log which PID/UID (e.g. root or pg-user) raised the signal. So Linux
has SA_SIGINFO, we could use that to provide mentioned things. As expected,
search of course returned that it was discussed earlier here [1]https://hackorum.dev/topics/32019 11 years ago,
but there was no patch back then, so attached is an attempt to do just that.
No GUC, and yes it only displays it on Linux. I think FreeBSD also has
this, but I
haven't tried it there (or I haven't tried other OSes without it -
proper autoconf/meson
sa_sigaction SA_SIGINFO detection is probably missing with proper
#ifdefs ), but I
would first like to learn if that would be a welcomed feature or not.
-J.
Attachments:
v1-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+90-18
Hi Jakub
On 18/02/2026 08:32, Jakub Wartak wrote:
I would first like to learn if that would be a welcomed feature or not.
+1
I think it's a very useful feature (only tested on Linux)
FATAL: terminating connection due to administrator command
DETAIL: signal sent by PID 1592705, UID 1000.
I'm wondering if there is a standard style for displaying such values in
DETAIL. For instance, the checkpoint LOG is formatted like this:
LOG: checkpoint complete: ... write=0.044 s, sync=0.071 s, ...
I'm not sure if it applies for DETAIL, but at least it's what the
example at the error style guide[1] suggests:
Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
Thanks!
Best, Jim
1 - https://www.postgresql.org/docs/current/error-style-guide.html
On Wed, Feb 18, 2026 at 5:08 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
Hi Jakub
On 18/02/2026 08:32, Jakub Wartak wrote:
I would first like to learn if that would be a welcomed feature or not.
+1
I think it's a very useful feature (only tested on Linux)
FATAL: terminating connection due to administrator command
DETAIL: signal sent by PID 1592705, UID 1000.
Hi Jim, thanks for feedback :)
I'm wondering if there is a standard style for displaying such values in
DETAIL. For instance, the checkpoint LOG is formatted like this:LOG: checkpoint complete: ... write=0.044 s, sync=0.071 s, ...
I'm not sure if it applies for DETAIL, but at least it's what the
example at the error style guide[1] suggests:Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
After using `grep -hr errdetail src/ | sed -E 's/^\s+//g' | sort |
uniq` I doubt there is any
real standard, but one can find there:
errdetail("The server process with PID %d is among those with the
oldest transactions.", minPid)
errdetail("The source process with PID %d is not running anymore.",
One could say that all those DETAIL log messages should start with an
uppercase letter, yet it didn't look good to me when above
"terminating connection ..."
started itself with a lowercase letter "t", and then next-line DETAIL
we would start
with an uppercase
"Signal..".
but, I'm open to any better proposal...
-J.
On Feb 23, 2026, at 21:28, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
On Wed, Feb 18, 2026 at 5:08 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
Hi Jakub
On 18/02/2026 08:32, Jakub Wartak wrote:
I would first like to learn if that would be a welcomed feature or not.
+1
I think it's a very useful feature (only tested on Linux)
FATAL: terminating connection due to administrator command
DETAIL: signal sent by PID 1592705, UID 1000.Hi Jim, thanks for feedback :)
I'm wondering if there is a standard style for displaying such values in
DETAIL. For instance, the checkpoint LOG is formatted like this:LOG: checkpoint complete: ... write=0.044 s, sync=0.071 s, ...
I'm not sure if it applies for DETAIL, but at least it's what the
example at the error style guide[1] suggests:Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
After using `grep -hr errdetail src/ | sed -E 's/^\s+//g' | sort |
uniq` I doubt there is any
real standard, but one can find there:errdetail("The server process with PID %d is among those with the
oldest transactions.", minPid)
errdetail("The source process with PID %d is not running anymore.",One could say that all those DETAIL log messages should start with an
uppercase letter, yet it didn't look good to me when above
"terminating connection ..."
started itself with a lowercase letter "t", and then next-line DETAIL
we would start
with an uppercase
"Signal..".but, I'm open to any better proposal...
-J.
There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```
I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]
There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.
1. Rebased due to recent change to remove bgwriter_die()
2. Added auto-detection of SA_SIGINFO to meson and autoconf
3. Changed "Signal" to be in upper case now (I don't like it, but it
follows guideline now)
4. Tested on FreeBSD 14.3 - it works there too now..
5. ...and fixed some clang warnings by changing %d -> %lld in
errdetail due to apparent pid_t differences on platforms.
-J.
Attachments:
v2-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+151-17
On 2026-02-24 Tu 5:05 AM, Jakub Wartak wrote:
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.
I don't think it will arise, as Windows doesn't have the siginfo stuff,
AFAIK.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Tue, Feb 24, 2026 at 5:15 PM Andrew Dunstan <andrew@dunslane.net> wrote:
On 2026-02-24 Tu 5:05 AM, Jakub Wartak wrote:
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.I don't think it will arise, as Windows doesn't have the siginfo stuff,
AFAIK.
Hi Andrew. Ok, so V3 is attached with just one change: uid_t/pid_t
changed to uint32 to make win32 happy.
Perhaps one question is whether this should be toggleable with GUC or not.
-J.
Attachments:
v3-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+151-17
On Feb 25, 2026, at 16:26, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
On Tue, Feb 24, 2026 at 5:15 PM Andrew Dunstan <andrew@dunslane.net> wrote:
On 2026-02-24 Tu 5:05 AM, Jakub Wartak wrote:
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.I don't think it will arise, as Windows doesn't have the siginfo stuff,
AFAIK.Hi Andrew. Ok, so V3 is attached with just one change: uid_t/pid_t
changed to uint32 to make win32 happy.Perhaps one question is whether this should be toggleable with GUC or not.
-J.
<v3-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch>
A few comments for v3:
1 - syncrep.c
```
@@ -302,7 +302,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
ereport(WARNING,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errdetail("Signal sent by PID %lld, UID %lld.",
+ (long long)proc_die_sender_pid, (long long)proc_die_sender_uid)
+ ));
```
Here errdetail is used twice. I guess the second conditional one should be errhint.
2 - syncrep.c
```
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errdetail("Signal sent by PID %lld, UID %lld.",
+ (long long)proc_die_sender_pid, (long long)proc_die_sender_uid)
+ ));
```
Same as comment 1.
3
```
+volatile uint32 proc_die_sender_pid = 0;
+volatile uint32 proc_die_sender_uid = 0;
```
These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic writes.
4
```
- errmsg("terminating walreceiver process due to administrator command")));
+ errmsg("terminating walreceiver process due to administrator command"),
+ proc_die_sender_pid == 0 ? 0 :
+ errdetail("Signal sent by PID %lld, UID %lld.",
+ (long long)proc_die_sender_pid, (long long)proc_die_sender_uid)
+ ));
```
Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current postmaster.c just formats pid as int (%d):
```
/* in parent, successful fork */
ereport(DEBUG2,
(errmsg_internal("forked new %s, pid=%d socket=%d",
GetBackendTypeDesc(bn->bkend_type),
(int) pid, (int) client_sock->sock)));
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Wed, Feb 25, 2026 at 10:15 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao, thanks for review.
A few comments for v3:
1 - syncrep.c
[..]
+ proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Here errdetail is used twice. I guess the second conditional one should be errhint.
2 - syncrep.c
[..]
Same as comment 1.
You are right, apparently I copy/pasted code from src/backend/tcop/postgres.c
way too fast... fixed.
3 ``` +volatile uint32 proc_die_sender_pid = 0; +volatile uint32 proc_die_sender_uid = 0; ```These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic writes.
Well the problem that sig_atomic_t is int and we need at least uint32 and
I couldn't find better way. I think that 4 bytes writes will be mostly
always atomic (for 64-bits it would depend on
PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY)
Yet I moved those little below, so it's more aligned to the other uses.
4 ``` - errmsg("terminating walreceiver process due to administrator command"))); + errmsg("terminating walreceiver process due to administrator command"), + proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current postmaster.c just formats pid as int (%d):
```
/* in parent, successful fork */
ereport(DEBUG2,
(errmsg_internal("forked new %s, pid=%d socket=%d",
GetBackendTypeDesc(bn->bkend_type),
(int) pid, (int) client_sock->sock)));
```
Yes, I think I was kind of lost when thinking about it (v1 had sig_atomic_t,
later had pid_t, I read somewhere about 64-bit pids, and so on) vs
platform-agnostic hell of putting that into printf). Possible I was
overthinking it
and I have reverted it to just using %d with that uint32. BTW I've also found:
elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
v4 attached. Thanks again for the review.
-J.
Attachments:
v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+151-17
On Feb 25, 2026, at 18:45, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
On Wed, Feb 25, 2026 at 10:15 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao, thanks for review.
A few comments for v3:
1 - syncrep.c
[..]
+ proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Here errdetail is used twice. I guess the second conditional one should be errhint.
2 - syncrep.c
[..]
Same as comment 1.
You are right, apparently I copy/pasted code from src/backend/tcop/postgres.c
way too fast... fixed.3 ``` +volatile uint32 proc_die_sender_pid = 0; +volatile uint32 proc_die_sender_uid = 0; ```These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic writes.
Well the problem that sig_atomic_t is int and we need at least uint32 and
I couldn't find better way. I think that 4 bytes writes will be mostly
always atomic (for 64-bits it would depend on
PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY)Yet I moved those little below, so it's more aligned to the other uses.
4 ``` - errmsg("terminating walreceiver process due to administrator command"))); + errmsg("terminating walreceiver process due to administrator command"), + proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current postmaster.c just formats pid as int (%d):
```
/* in parent, successful fork */
ereport(DEBUG2,
(errmsg_internal("forked new %s, pid=%d socket=%d",
GetBackendTypeDesc(bn->bkend_type),
(int) pid, (int) client_sock->sock)));
```Yes, I think I was kind of lost when thinking about it (v1 had sig_atomic_t,
later had pid_t, I read somewhere about 64-bit pids, and so on) vs
platform-agnostic hell of putting that into printf). Possible I was
overthinking it
and I have reverted it to just using %d with that uint32. BTW I've also found:
elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);v4 attached. Thanks again for the review.
-J.
<v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch>
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID. So, I think we may need to reset proc_die_sender_pid/uid somewhere. For example, in ProcessInterrupts(), copy them to local variables and reset them to 0, then use the local variables for ereport().
2.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by PID %d, UID %d.",
+ proc_die_sender_pid, proc_die_sender_uid)
+ ));
```
syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
3.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
QueryCancelPending = false;
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by PID %d, UID %d.",
+ proc_die_sender_pid, proc_die_sender_uid)
+ ));
SyncRepCancelWait();
break;
}
```
I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao,
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID.
Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
(for whatever reason), then two scenarios can happen as far as I'm concerned:
* another SIGTERM comes in from the same or different uid/pid and it wll be
reported properly
* different SIGKILL, but in this case we won't report UID/PID at all
am I missing something or do You have any particular scenario in mind?
The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
err reporting facilities.
2.
syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
Right, let's make it that way.
3. ``` @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) QueryCancelPending = false; ereport(WARNING, (errmsg("canceling wait for synchronous replication due to user request"), - errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); + errdetail("The transaction has already committed locally, but might not have been replicated to the standby."), + proc_die_sender_pid == 0 ? 0 : + errhint("Signal sent by PID %d, UID %d.", + proc_die_sender_pid, proc_die_sender_uid) + )); SyncRepCancelWait(); break; } ```I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
Right, it was superfluous.
v5 attached.
-J.