Add errdetail() with PID and UID about source of termination signal
Hi all,
From time to time we have scenario where somebody has some backend killed by
some_script_somewhere(TM) in a multi-department company scenario and
to me it was
always unnecessary time intensive to identify the origin of the signal.
eBPF/bpftrace can be used to find the origin, but I think that PG could
simply log which PID/UID (e.g. root or pg-user) raised the signal. So Linux
has SA_SIGINFO, we could use that to provide mentioned things. As expected,
search of course returned that it was discussed earlier here [1]https://hackorum.dev/topics/32019 11 years ago,
but there was no patch back then, so attached is an attempt to do just that.
No GUC, and yes it only displays it on Linux. I think FreeBSD also has
this, but I
haven't tried it there (or I haven't tried other OSes without it -
proper autoconf/meson
sa_sigaction SA_SIGINFO detection is probably missing with proper
#ifdefs ), but I
would first like to learn if that would be a welcomed feature or not.
-J.
Attachments:
v1-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+90-18
Hi Jakub
On 18/02/2026 08:32, Jakub Wartak wrote:
I would first like to learn if that would be a welcomed feature or not.
+1
I think it's a very useful feature (only tested on Linux)
FATAL: terminating connection due to administrator command
DETAIL: signal sent by PID 1592705, UID 1000.
I'm wondering if there is a standard style for displaying such values in
DETAIL. For instance, the checkpoint LOG is formatted like this:
LOG: checkpoint complete: ... write=0.044 s, sync=0.071 s, ...
I'm not sure if it applies for DETAIL, but at least it's what the
example at the error style guide[1] suggests:
Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
Thanks!
Best, Jim
1 - https://www.postgresql.org/docs/current/error-style-guide.html
On Wed, Feb 18, 2026 at 5:08 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
Hi Jakub
On 18/02/2026 08:32, Jakub Wartak wrote:
I would first like to learn if that would be a welcomed feature or not.
+1
I think it's a very useful feature (only tested on Linux)
FATAL: terminating connection due to administrator command
DETAIL: signal sent by PID 1592705, UID 1000.
Hi Jim, thanks for feedback :)
I'm wondering if there is a standard style for displaying such values in
DETAIL. For instance, the checkpoint LOG is formatted like this:LOG: checkpoint complete: ... write=0.044 s, sync=0.071 s, ...
I'm not sure if it applies for DETAIL, but at least it's what the
example at the error style guide[1] suggests:Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
After using `grep -hr errdetail src/ | sed -E 's/^\s+//g' | sort |
uniq` I doubt there is any
real standard, but one can find there:
errdetail("The server process with PID %d is among those with the
oldest transactions.", minPid)
errdetail("The source process with PID %d is not running anymore.",
One could say that all those DETAIL log messages should start with an
uppercase letter, yet it didn't look good to me when above
"terminating connection ..."
started itself with a lowercase letter "t", and then next-line DETAIL
we would start
with an uppercase
"Signal..".
but, I'm open to any better proposal...
-J.
On Feb 23, 2026, at 21:28, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
On Wed, Feb 18, 2026 at 5:08 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
Hi Jakub
On 18/02/2026 08:32, Jakub Wartak wrote:
I would first like to learn if that would be a welcomed feature or not.
+1
I think it's a very useful feature (only tested on Linux)
FATAL: terminating connection due to administrator command
DETAIL: signal sent by PID 1592705, UID 1000.Hi Jim, thanks for feedback :)
I'm wondering if there is a standard style for displaying such values in
DETAIL. For instance, the checkpoint LOG is formatted like this:LOG: checkpoint complete: ... write=0.044 s, sync=0.071 s, ...
I'm not sure if it applies for DETAIL, but at least it's what the
example at the error style guide[1] suggests:Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
After using `grep -hr errdetail src/ | sed -E 's/^\s+//g' | sort |
uniq` I doubt there is any
real standard, but one can find there:errdetail("The server process with PID %d is among those with the
oldest transactions.", minPid)
errdetail("The source process with PID %d is not running anymore.",One could say that all those DETAIL log messages should start with an
uppercase letter, yet it didn't look good to me when above
"terminating connection ..."
started itself with a lowercase letter "t", and then next-line DETAIL
we would start
with an uppercase
"Signal..".but, I'm open to any better proposal...
-J.
There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```
I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]
There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.
1. Rebased due to recent change to remove bgwriter_die()
2. Added auto-detection of SA_SIGINFO to meson and autoconf
3. Changed "Signal" to be in upper case now (I don't like it, but it
follows guideline now)
4. Tested on FreeBSD 14.3 - it works there too now..
5. ...and fixed some clang warnings by changing %d -> %lld in
errdetail due to apparent pid_t differences on platforms.
-J.
Attachments:
v2-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+151-17
On 2026-02-24 Tu 5:05 AM, Jakub Wartak wrote:
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.
I don't think it will arise, as Windows doesn't have the siginfo stuff,
AFAIK.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Tue, Feb 24, 2026 at 5:15 PM Andrew Dunstan <andrew@dunslane.net> wrote:
On 2026-02-24 Tu 5:05 AM, Jakub Wartak wrote:
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.I don't think it will arise, as Windows doesn't have the siginfo stuff,
AFAIK.
Hi Andrew. Ok, so V3 is attached with just one change: uid_t/pid_t
changed to uint32 to make win32 happy.
Perhaps one question is whether this should be toggleable with GUC or not.
-J.
Attachments:
v3-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+151-17
On Feb 25, 2026, at 16:26, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
On Tue, Feb 24, 2026 at 5:15 PM Andrew Dunstan <andrew@dunslane.net> wrote:
On 2026-02-24 Tu 5:05 AM, Jakub Wartak wrote:
On Tue, Feb 24, 2026 at 9:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
[..]There is guidance in the documentation regarding error message style: https://www.postgresql.org/docs/current/error-style-guide.html
```
Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the first word of sentences. Put two spaces after the period if another sentence follows (for English text; might be inappropriate in other languages).
```I also noticed that some existing DETAIL and HINT messages do not fully follow this guideline. But I believe new code should adhere to the documented style as much as possible. In particular, DETAIL and HINT messages should begin with a capital letter and follow the complete-sentence convention.
Hi, v2 attached, WIP, the only known remaining issue to me is that
windows might fail to compile as it probably doesn't have concept of
uid_t... I'm wondering what to do there.I don't think it will arise, as Windows doesn't have the siginfo stuff,
AFAIK.Hi Andrew. Ok, so V3 is attached with just one change: uid_t/pid_t
changed to uint32 to make win32 happy.Perhaps one question is whether this should be toggleable with GUC or not.
-J.
<v3-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch>
A few comments for v3:
1 - syncrep.c
```
@@ -302,7 +302,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
ereport(WARNING,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errdetail("Signal sent by PID %lld, UID %lld.",
+ (long long)proc_die_sender_pid, (long long)proc_die_sender_uid)
+ ));
```
Here errdetail is used twice. I guess the second conditional one should be errhint.
2 - syncrep.c
```
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errdetail("Signal sent by PID %lld, UID %lld.",
+ (long long)proc_die_sender_pid, (long long)proc_die_sender_uid)
+ ));
```
Same as comment 1.
3
```
+volatile uint32 proc_die_sender_pid = 0;
+volatile uint32 proc_die_sender_uid = 0;
```
These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic writes.
4
```
- errmsg("terminating walreceiver process due to administrator command")));
+ errmsg("terminating walreceiver process due to administrator command"),
+ proc_die_sender_pid == 0 ? 0 :
+ errdetail("Signal sent by PID %lld, UID %lld.",
+ (long long)proc_die_sender_pid, (long long)proc_die_sender_uid)
+ ));
```
Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current postmaster.c just formats pid as int (%d):
```
/* in parent, successful fork */
ereport(DEBUG2,
(errmsg_internal("forked new %s, pid=%d socket=%d",
GetBackendTypeDesc(bn->bkend_type),
(int) pid, (int) client_sock->sock)));
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Wed, Feb 25, 2026 at 10:15 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao, thanks for review.
A few comments for v3:
1 - syncrep.c
[..]
+ proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Here errdetail is used twice. I guess the second conditional one should be errhint.
2 - syncrep.c
[..]
Same as comment 1.
You are right, apparently I copy/pasted code from src/backend/tcop/postgres.c
way too fast... fixed.
3 ``` +volatile uint32 proc_die_sender_pid = 0; +volatile uint32 proc_die_sender_uid = 0; ```These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic writes.
Well the problem that sig_atomic_t is int and we need at least uint32 and
I couldn't find better way. I think that 4 bytes writes will be mostly
always atomic (for 64-bits it would depend on
PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY)
Yet I moved those little below, so it's more aligned to the other uses.
4 ``` - errmsg("terminating walreceiver process due to administrator command"))); + errmsg("terminating walreceiver process due to administrator command"), + proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current postmaster.c just formats pid as int (%d):
```
/* in parent, successful fork */
ereport(DEBUG2,
(errmsg_internal("forked new %s, pid=%d socket=%d",
GetBackendTypeDesc(bn->bkend_type),
(int) pid, (int) client_sock->sock)));
```
Yes, I think I was kind of lost when thinking about it (v1 had sig_atomic_t,
later had pid_t, I read somewhere about 64-bit pids, and so on) vs
platform-agnostic hell of putting that into printf). Possible I was
overthinking it
and I have reverted it to just using %d with that uint32. BTW I've also found:
elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
v4 attached. Thanks again for the review.
-J.
Attachments:
v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+151-17
On Feb 25, 2026, at 18:45, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
On Wed, Feb 25, 2026 at 10:15 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao, thanks for review.
A few comments for v3:
1 - syncrep.c
[..]
+ proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Here errdetail is used twice. I guess the second conditional one should be errhint.
2 - syncrep.c
[..]
Same as comment 1.
You are right, apparently I copy/pasted code from src/backend/tcop/postgres.c
way too fast... fixed.3 ``` +volatile uint32 proc_die_sender_pid = 0; +volatile uint32 proc_die_sender_uid = 0; ```These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic writes.
Well the problem that sig_atomic_t is int and we need at least uint32 and
I couldn't find better way. I think that 4 bytes writes will be mostly
always atomic (for 64-bits it would depend on
PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY)Yet I moved those little below, so it's more aligned to the other uses.
4 ``` - errmsg("terminating walreceiver process due to administrator command"))); + errmsg("terminating walreceiver process due to administrator command"), + proc_die_sender_pid == 0 ? 0 : + errdetail("Signal sent by PID %lld, UID %lld.", + (long long)proc_die_sender_pid, (long long)proc_die_sender_uid) + )); ```Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current postmaster.c just formats pid as int (%d):
```
/* in parent, successful fork */
ereport(DEBUG2,
(errmsg_internal("forked new %s, pid=%d socket=%d",
GetBackendTypeDesc(bn->bkend_type),
(int) pid, (int) client_sock->sock)));
```Yes, I think I was kind of lost when thinking about it (v1 had sig_atomic_t,
later had pid_t, I read somewhere about 64-bit pids, and so on) vs
platform-agnostic hell of putting that into printf). Possible I was
overthinking it
and I have reverted it to just using %d with that uint32. BTW I've also found:
elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);v4 attached. Thanks again for the review.
-J.
<v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch>
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID. So, I think we may need to reset proc_die_sender_pid/uid somewhere. For example, in ProcessInterrupts(), copy them to local variables and reset them to 0, then use the local variables for ereport().
2.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by PID %d, UID %d.",
+ proc_die_sender_pid, proc_die_sender_uid)
+ ));
```
syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
3.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
QueryCancelPending = false;
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by PID %d, UID %d.",
+ proc_die_sender_pid, proc_die_sender_uid)
+ ));
SyncRepCancelWait();
break;
}
```
I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao,
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID.
Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
(for whatever reason), then two scenarios can happen as far as I'm concerned:
* another SIGTERM comes in from the same or different uid/pid and it wll be
reported properly
* different SIGKILL, but in this case we won't report UID/PID at all
am I missing something or do You have any particular scenario in mind?
The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
err reporting facilities.
2.
syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
Right, let's make it that way.
3. ``` @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) QueryCancelPending = false; ereport(WARNING, (errmsg("canceling wait for synchronous replication due to user request"), - errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); + errdetail("The transaction has already committed locally, but might not have been replicated to the standby."), + proc_die_sender_pid == 0 ? 0 : + errhint("Signal sent by PID %d, UID %d.", + proc_die_sender_pid, proc_die_sender_uid) + )); SyncRepCancelWait(); break; } ```I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
Right, it was superfluous.
v5 attached.
-J.
Attachments:
v5-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+146-16
On 2026-02-26 Th 4:25 AM, Jakub Wartak wrote:
On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao,
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID.
Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
(for whatever reason), then two scenarios can happen as far as I'm concerned:
* another SIGTERM comes in from the same or different uid/pid and it wll be
reported properly
* different SIGKILL, but in this case we won't report UID/PID at allam I missing something or do You have any particular scenario in mind?
The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
err reporting facilities.2.
syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.Right, let's make it that way.
3. ``` @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) QueryCancelPending = false; ereport(WARNING, (errmsg("canceling wait for synchronous replication due to user request"), - errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); + errdetail("The transaction has already committed locally, but might not have been replicated to the standby."), + proc_die_sender_pid == 0 ? 0 : + errhint("Signal sent by PID %d, UID %d.", + proc_die_sender_pid, proc_die_sender_uid) + )); SyncRepCancelWait(); break; } ```I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
Right, it was superfluous.
v5 attached.
I'd kinda like to sneak this in for pg19, because I think it's useful.
Here's a v6 that changes one or two things:
- changes the globals to sig_atomic_t
- in ProcessInterrupts, copies to local sender_pid/sender_uid, then
zeros the globals before any ereport
- uses errdetail() for all the messages
Plus a few more cosmetic changes like consistent casing.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Attachments:
v6-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchtext/x-patch; charset=UTF-8; name=v6-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patchDownload+138-16
On 6 Apr 2026, at 18:51, Andrew Dunstan <andrew@dunslane.net> wrote:
I'd kinda like to sneak this in for pg19, because I think it's useful. Here's a v6 that changes one or two things:
+1. I haven't done an in-depth review but I quite like the feature and have
wanted this very thing in the past.
--
Daniel Gustafsson
Andrew Dunstan <andrew@dunslane.net> 于2026年4月7日周二 00:51写道:
On 2026-02-26 Th 4:25 AM, Jakub Wartak wrote:
On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li.evan.chao@gmail.com> wrote:
Hi Chao,
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at
startup, then assign values to them upon receiving SIGTERM, and never reset
them, which assumes a process must die upon SIGTERM. Is the assumption
true? I guess not. If a process receives SIGTERM and not die immediately,
then die for other reason, then it may report a misleading PID and UID.Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
(for whatever reason), then two scenarios can happen as far as I'mconcerned:
* another SIGTERM comes in from the same or different uid/pid and it wll
be
reported properly
* different SIGKILL, but in this case we won't report UID/PID at allam I missing something or do You have any particular scenario in mind?
The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
err reporting facilities.2.
syncrpe.c uses errhint to print PID and UID, and postgres.c useserrdetail. We should keep consistency, maybe all use errhint.
Right, let's make it that way.
3.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
QueryCancelPending = false;
ereport(WARNING,
(errmsg("canceling wait forsynchronous replication due to user request"),
- errdetail("The transaction has
already committed locally, but might not have been replicated to the
standby.")));+ errdetail("The transaction has
already committed locally, but might not have been replicated to the
standby."),+ proc_die_sender_pid ==
0 ? 0 :
+ errhint("Signal
sent by PID %d, UID %d.",
+
proc_die_sender_pid, proc_die_sender_uid)
+ ));
SyncRepCancelWait();
break;
}
```I don’t think the query cancel case relates to SIGTERM, so we don’t
need to log PID and UID here.
Right, it was superfluous.
v5 attached.
I'd kinda like to sneak this in for pg19, because I think it's useful.
Here's a v6 that changes one or two things:- changes the globals to sig_atomic_t
- in ProcessInterrupts, copies to local sender_pid/sender_uid, then
zeros the globals before any ereport- uses errdetail() for all the messages
Plus a few more cosmetic changes like consistent casing.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
I just reviewed v6 and got 1 comment.
sig_atomic_t is underlying int, but pid_t and uid_t are usually unsigned
int,
so that while saving pid and uid to ProcDieSenderPid and ProcDieSenderUid,
overflow may happen. But that’s fine, as the data is stored in the same way
and we only want to print them. So, for the print statement:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %d, UID %d.", (int) (pid), (int) (uid)))
Does it make sense to use %u and cast to pid_t and uid_t? Like:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %u, UID %u.", (pid_t) (pid), (uid_t)
(uid)))
Thanks!
--
wang jie
On Tue, Apr 7, 2026 at 5:55 AM jie wang <jugierwang@gmail.com> wrote:
Andrew Dunstan <andrew@dunslane.net> 于2026年4月7日周二 00:51写道:
[..]
I'd kinda like to sneak this in for pg19, because I think it's useful.
Here's a v6 that changes one or two things:- changes the globals to sig_atomic_t
- in ProcessInterrupts, copies to local sender_pid/sender_uid, then
zeros the globals before any ereport- uses errdetail() for all the messages
Plus a few more cosmetic changes like consistent casing.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.comI just reviewed v6 and got 1 comment.
sig_atomic_t is underlying int, but pid_t and uid_t are usually unsigned int,
so that while saving pid and uid to ProcDieSenderPid and ProcDieSenderUid,
overflow may happen. But that’s fine, as the data is stored in the same way
and we only want to print them. So, for the print statement:#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %d, UID %d.", (int) (pid), (int) (uid)))Does it make sense to use %u and cast to pid_t and uid_t? Like:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %u, UID %u.", (pid_t) (pid), (uid_t) (uid)))
I really don't have hard opinion on what we should use here (int vs uint32 vs
pid_t) and I was scratching my head at this earier, as:
1. Usually win32 doesn't have pid_t, but in win32_port.h we have
"typedef int pid_t".
2. According to `grep -ri ' pid' src/backend/ we seem to use "int" in most cases
for this (especially MyProcPid is also int). The only other places
where I found
%u or %l would be those:
src/backend/port/win32/signal.c:
snprintf(pipename, sizeof(pipename),
"\\\\.\\pipe\\pgsignal_%u", (int) pid);
src/backend/port/win32/signal.c:
(errmsg("could not create signal listener pipe for PID %d:
error code %lu",
src/backend/postmaster/datachecksum_state.c:
"Waiting for worker in database %s (pid %ld)", db->dbname, (long) pid);
src/backend/postmaster/postmaster.c:
elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
So apparently we are really not consistent, but int looks fine to me.
3. Linux kernel for most allows up to kernel.pid_max (4194304, 22 bits) and
internally it uses "int" for it's pid_max_max [1]https://github.com/torvalds/linux/blob/master/kernel/pid.c#L64C17-L64C40 and uses up to
30-bits since always.
So "int" follows old style and seems to be OK at least from my point of view.
-J.
[1]: https://github.com/torvalds/linux/blob/master/kernel/pid.c#L64C17-L64C40
[2]: https://github.com/torvalds/linux/blob/master/include/linux/threads.h#L34
On 2026-04-07 Tu 5:10 AM, Jakub Wartak wrote:
On Tue, Apr 7, 2026 at 5:55 AM jie wang<jugierwang@gmail.com> wrote:
Andrew Dunstan<andrew@dunslane.net> 于2026年4月7日周二 00:51写道:
[..]
I'd kinda like to sneak this in for pg19, because I think it's useful.
Here's a v6 that changes one or two things:- changes the globals to sig_atomic_t
- in ProcessInterrupts, copies to local sender_pid/sender_uid, then
zeros the globals before any ereport- uses errdetail() for all the messages
Plus a few more cosmetic changes like consistent casing.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.comI just reviewed v6 and got 1 comment.
sig_atomic_t is underlying int, but pid_t and uid_t are usually unsigned int,
so that while saving pid and uid to ProcDieSenderPid and ProcDieSenderUid,
overflow may happen. But that’s fine, as the data is stored in the same way
and we only want to print them. So, for the print statement:#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %d, UID %d.", (int) (pid), (int) (uid)))Does it make sense to use %u and cast to pid_t and uid_t? Like:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %u, UID %u.", (pid_t) (pid), (uid_t) (uid)))I really don't have hard opinion on what we should use here (int vs uint32 vs
pid_t) and I was scratching my head at this earier, as:1. Usually win32 doesn't have pid_t, but in win32_port.h we have
"typedef int pid_t".2. According to `grep -ri ' pid' src/backend/ we seem to use "int" in most cases
for this (especially MyProcPid is also int). The only other places
where I found
%u or %l would be those:src/backend/port/win32/signal.c:
snprintf(pipename, sizeof(pipename),
"\\\\.\\pipe\\pgsignal_%u", (int) pid);src/backend/port/win32/signal.c:
(errmsg("could not create signal listener pipe for PID %d:
error code %lu",
src/backend/postmaster/datachecksum_state.c:
"Waiting for worker in database %s (pid %ld)", db->dbname, (long) pid);
src/backend/postmaster/postmaster.c:
elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);So apparently we are really not consistent, but int looks fine to me.
3. Linux kernel for most allows up to kernel.pid_max (4194304, 22 bits) and
internally it uses "int" for it's pid_max_max [1] and uses up to
30-bits since always.So "int" follows old style and seems to be OK at least from my point of view.
-J.
[1] -https://github.com/torvalds/linux/blob/master/kernel/pid.c#L64C17-L64C40
[2] -https://github.com/torvalds/linux/blob/master/include/linux/threads.h#L34
OK, went back to using int. Pushed.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
Hi,
On 2026-04-06 12:51:40 -0400, Andrew Dunstan wrote:
From 450b68d29f02ee1f5bf71db708b380ab389a30c6 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <andrew@dunslane.net>
Date: Mon, 6 Apr 2026 12:39:14 -0400
Subject: [PATCH v6] Add errdetail() with PID and UID about source of
termination signal.When a backend is terminated via pg_terminate_backend() or an external
SIGTERM, the error message now includes the sender's PID and UID as
errdetail, making it easier to identify the source of unexpected
terminations in multi-user environments.On platforms that support SA_SIGINFO (Linux, FreeBSD, and most modern
Unix systems), the signal handler captures si_pid and si_uid from the
siginfo_t structure. On platforms without SA_SIGINFO, the detail is
simply omitted.Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Chao Li <1356863904@qq.com>
Discussion: /messages/by-id/CAKZiRmyrOWovZSdixpLd3PGMQXuQL_zw2Ght5XhHCkQ1uDsxjw@mail.gmail.com
+++ b/src/backend/replication/syncrep.c @@ -303,7 +303,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) ereport(WARNING, (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"), - errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); + errdetail("The transaction has already committed locally, but might not have been replicated to the standby.%s", + ProcDieSenderPid == 0 ? "" : + psprintf("\nSignal sent by PID %d, UID %d.", + (int) ProcDieSenderPid, + (int) ProcDieSenderUid)))); whereToSendOutput = DestNone; SyncRepCancelWait(); break;
Pretty sure this is broken from a translateability POV?
It's also somewhat ugly.
+++ b/src/port/pqsignal.c @@ -82,10 +82,19 @@ static volatile pqsigfunc pqsignal_handlers[PG_NSIG]; * * This wrapper also handles restoring the value of errno. */ +#if !defined(FRONTEND) && defined(HAVE_SA_SIGINFO) +static void +wrapper_handler(int signo, siginfo_t * info, void *context) +#else static void wrapper_handler(SIGNAL_ARGS) +#endif { int save_errno = errno; +#if !defined(FRONTEND) && defined(HAVE_SA_SIGINFO) + /* SA_SIGINFO signature uses signo, not SIGNAL_ARGS macro */ + int postgres_signal_arg = signo; +#endif
Seems that you then should change what SIGNAL_ARGS means, not randomly hack
around it in one place?
Assert(postgres_signal_arg > 0); Assert(postgres_signal_arg < PG_NSIG); @@ -105,6 +114,14 @@ wrapper_handler(SIGNAL_ARGS) raise(postgres_signal_arg); return; } + +#ifdef HAVE_SA_SIGINFO + if (signo == SIGTERM && info) + { + ProcDieSenderPid = info->si_pid; + ProcDieSenderUid = info->si_uid; + } +#endif #endif(*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);
This seems completely wrong from a layering POV. The wrapper has no business
whatsoever to know that how SIGTERM is interpreted and thus no business
setting variables like ProcDieSenderPid.
Pretty sure have some sigterm handlers that shouldn't set ProcDieSenderPid.
A more correct answer here would be to forward information about the sender of
a signal to the signal handlers and let them interpret the information if
available.
I think this is nowhere near ready to have been committed.
Greetings,
Andres Freund
On 2026-04-07 Tu 10:55 AM, Andres Freund wrote:
This seems completely wrong from a layering POV. The wrapper has no business
whatsoever to know that how SIGTERM is interpreted and thus no business
setting variables like ProcDieSenderPid.Pretty sure have some sigterm handlers that shouldn't set ProcDieSenderPid.
A more correct answer here would be to forward information about the sender of
a signal to the signal handlers and let them interpret the information if
available.
OK, fair points. Does the attached meet your concerns?
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
Attachments:
errdetail-pid-fix.patchtext/x-patch; charset=UTF-8; name=errdetail-pid-fix.patchDownload+33-24
Hi,
On 2026-04-07 12:49:19 -0400, Andrew Dunstan wrote:
On 2026-04-07 Tu 10:55 AM, Andres Freund wrote:
This seems completely wrong from a layering POV. The wrapper has no business
whatsoever to know that how SIGTERM is interpreted and thus no business
setting variables like ProcDieSenderPid.Pretty sure have some sigterm handlers that shouldn't set ProcDieSenderPid.
A more correct answer here would be to forward information about the sender of
a signal to the signal handlers and let them interpret the information if
available.OK, fair points. Does the attached meet your concerns?
I think the extra data should be forwarded as arguments to the "real" (not
wrapper) handler, not as globals. You can have signal handlers interrupt each
others on some platforms, which means that if you're not careful, you could
end up reading the values from the wrong signal.
Greetings,
Andres Freund
On 2026-04-07 Tu 2:19 PM, Andres Freund wrote:
Hi,
On 2026-04-07 12:49:19 -0400, Andrew Dunstan wrote:
On 2026-04-07 Tu 10:55 AM, Andres Freund wrote:
This seems completely wrong from a layering POV. The wrapper has no business
whatsoever to know that how SIGTERM is interpreted and thus no business
setting variables like ProcDieSenderPid.Pretty sure have some sigterm handlers that shouldn't set ProcDieSenderPid.
A more correct answer here would be to forward information about the sender of
a signal to the signal handlers and let them interpret the information if
available.OK, fair points. Does the attached meet your concerns?
I think the extra data should be forwarded as arguments to the "real" (not
wrapper) handler, not as globals. You can have signal handlers interrupt each
others on some platforms, which means that if you're not careful, you could
end up reading the values from the wrong signal.
OK, maybe this, then? It saves the siginfo before calling the handler,
and restores it after the call, so you should always be looking at the
right one.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com