Using ProcSignal to get memory context stats from a running backend

andres@anarazel.de

about 8 years ago

In reply to: Craig Ringer (#1)

Re: Using ProcSignal to get memory context stats from a running backend

Hi,

On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:

TL;DR: Lets add a ProcSignalReason that makes a backend
call MemoryContextStats when it sees it and a C func that users can use to
set it on a proc. Sane?

It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
handler, so you can't really rely on a lot of stuff being legal to
do.

It'd be easy to set a flag in the handler and then have
CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call. But that'd have
the disadvanatage that it possibly would take a while till the
MemoryContextStats() is executed. Not sure if that's still good enough
for you?

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

Greetings,

Andres Freund

craig@2ndquadrant.com

about 8 years ago

In reply to: Tsunakawa, Takayuki (#2)

Re: Using ProcSignal to get memory context stats from a running backend

On 12 December 2017 at 12:25, Tsunakawa, Takayuki <
tsunakawa.takay@jp.fujitsu.com> wrote:

From: Craig Ringer [mailto:craig@2ndquadrant.com]

TL;DR: Lets add a ProcSignalReason that makes a backend call
MemoryContextStats when it sees it and a C func that users can use to set
it on a proc. Sane?

So how about borrowing a ProcSignalReason entry for "dump a memory

context

summary at your earliest convenience" ? We could name it a more generic
"dump debug data" in case we want to add things later.

Then a new pg_log_debug_backend(int) function or something like that

could

signal the proc and let CHECK_FOR_INTERRUPTS handle calling
MemoryContextStats next time it's called.

+1
That's one of things I wanted to do. It will be more useful on Windows.
Would it work for autovac processes and background workers, etc. that
connect to shared memory?

Anything that uses CHECK_FOR_INTERRUPTS() and is attached to PGXACT. So
yeah, pretty much anything attached to shmem.

I have also wanted to dump stack traces. Linux (glibc) has
backtrace_symbols(), and Windows has StackWalk()/StackWalk64(). Is it sane
to make the function a hook?

In-proc stack traces are immensely useful, and IMO relatively safe in a
proc that's already in a reasonable state. If your stack is mangled, making
it worse with an in-proc stack trace is rarely your biggest concern. I'd
LOVE to be able to do this.

However, I'd want to address anything like that quite separately to the
change I proposed to expose an existing facility.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Andres Freund (#3)

Re: Using ProcSignal to get memory context stats from a running backend

Andres Freund <andres@anarazel.de> writes:

On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:

TL;DR: Lets add a ProcSignalReason that makes a backend
call MemoryContextStats when it sees it and a C func that users can use to
set it on a proc. Sane?

It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
handler, so you can't really rely on a lot of stuff being legal to do.

Indeed, calling MemoryContextStats in a signal handler would be a damfool
idea, but Craig didn't propose that AFAICS.

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

Yeah, this is an issue. MemoryContextStats is designed to print
to stderr in the (possibly vain) hope that it will work even when
we are up against an OOM failure. That's not a property I much
want to give up, but you're right that it's not very desirable
if a user is trying to capture state during normal running.

regards, tom lane

craig@2ndquadrant.com

about 8 years ago

In reply to: Andres Freund (#3)

Re: Using ProcSignal to get memory context stats from a running backend

On 12 December 2017 at 12:43, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:

TL;DR: Lets add a ProcSignalReason that makes a backend
call MemoryContextStats when it sees it and a C func that users can use

to

set it on a proc. Sane?

It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
handler, so you can't really rely on a lot of stuff being legal to
do.

It'd be easy to set a flag in the handler and then have
CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call.

Yes, definitely. That was my intention. Trying to write to stderr, mess
with memory contexts, etc from a signal handler context seems awfully hairy
and definitely not something I'd want to risk on a live system.

But that'd have
the disadvanatage that it possibly would take a while till the
MemoryContextStats() is executed. Not sure if that's still good enough
for you?

Definitely. Sure, it won't be perfect, but it'd be a big improvement on
what we have.

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

I think the reason it prints to stderr now is that it's intended to run in
OOM situations.

Arguably that's not such a concern when being triggered by a procsignal. So
elog(...) in that context could make sense. I'd probably add a
print-wrapper callback arg to MemoryContextStatsDetail that you can use to
write to a stringinfo / elog / fprintf(stderr), as desired.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

craig@2ndquadrant.com

about 8 years ago

In reply to: Tom Lane (#5)

Re: Using ProcSignal to get memory context stats from a running backend

On 12 December 2017 at 14:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

Yeah, this is an issue. MemoryContextStats is designed to print
to stderr in the (possibly vain) hope that it will work even when
we are up against an OOM failure. That's not a property I much
want to give up, but you're right that it's not very desirable
if a user is trying to capture state during normal running.

If we were willing to wrap variadic calls in a callback and rely on
vfprintf, we could:

typedef void (*printwrapper)(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

...

static void
printwrapper_stderr(void *extra, const char *fmt, ...)
{
vfprintf(stderr, fmt, va_list);
}

void
MemoryContextStats(MemoryContext context)
{
MemoryContextStatsDetail(context, 100, printwrapper_stderr, NULL);
}

void
MemoryContextStatsDetail(MemoryContext context, int max_children,
printwrapper outfunc, void *printwrapper_extra)
{
...
printwrapper(extra, "Grand total: ...", ...);
}

and let the ProcSignal based caller pass an elog wrapper instead, or form a
StringInfo with appendStringInfoVA then elog it in one go after the
MemoryContextStatsDetail call returns.

I was worried about relying on vfprintf, but we already use it widely in
the codebase so it shouldn't be an issue with elderly buildfarm critters.

Letting it go to stderr isn't too bad, but it'd certainly make it that bit
nicer if it didn't so I'm not opposed to adding the needed indirection.
I'll give it a try in a bit.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

andres@anarazel.de

about 8 years ago

In reply to: Craig Ringer (#7)

Re: Using ProcSignal to get memory context stats from a running backend

On 2017-12-12 14:25:48 +0800, Craig Ringer wrote:

If we were willing to wrap variadic calls in a callback and rely on
vfprintf, we could:

I don't think there's problems with relying on variadic calls, we do
that in plenty places.

and let the ProcSignal based caller pass an elog wrapper instead, or form a
StringInfo with appendStringInfoVA then elog it in one go after the
MemoryContextStatsDetail call returns.

I don't think we want a simple elog wrapper - outputting the context
stats as hundreds of log messages doesn't seem right. So at the least it
seems we should bunch it up in stringinfo - which seems to at least
require expanding the API to pass around a void *callback_data or such.

I do wonder if the right thing here wouldn't be to put the result into a
dsm segment, and then return that to the UDF on the requesting
side. Logging to the server log and then have the requestor dig that out
doesn't seem particularly user friendly.

Greetings,

Andres Freund

robertmhaas@gmail.com

about 8 years ago

In reply to: Andres Freund (#8)

Re: Using ProcSignal to get memory context stats from a running backend

On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund <andres@anarazel.de> wrote:

I do wonder if the right thing here wouldn't be to put the result into a
dsm segment, and then return that to the UDF on the requesting
side. Logging to the server log and then have the requestor dig that out
doesn't seem particularly user friendly.

I think that dumping it to the server log will be fine for most
people, and it's *significantly* safer. Creating a DSM segment could
fail, and the last thing we want is for interrogating a long-running
backend to make it crap out.

+1 for this whole concept, just BTW. As I've said before, I grow
weary of asking customers to run gdb.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10

andres@anarazel.de

about 8 years ago

In reply to: Robert Haas (#9)

Re: Using ProcSignal to get memory context stats from a running backend

On 2017-12-12 14:04:55 -0500, Robert Haas wrote:

On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund <andres@anarazel.de> wrote:

I do wonder if the right thing here wouldn't be to put the result into a
dsm segment, and then return that to the UDF on the requesting
side. Logging to the server log and then have the requestor dig that out
doesn't seem particularly user friendly.

I think that dumping it to the server log will be fine for most
people, and it's *significantly* safer.

I agree that it's more reliable - I hope there's no meaningful safety
difference. I think you overestimate users a bit however - far from
most of them are going to be able to extract a very long log entry from
a busy log file. There's no generally available easy way to copy a few
pages of text from a logfile that's a few gigabytes large...

I'd suggest adding two functions.

Creating a DSM segment could fail, and the last thing we want is for
interrogating a long-running backend to make it crap out.

It'd need to handle memory alloc failures gracefully, I agree. That
doesn't seem impossible. Also don't think that's meaningfully different
in the non-dsm case, it'd be good to handle malloc failures gracefully
in that case as well.

The requesting side should create the dsm segment and preallocate a
reasonable amount of memory, if we go for it.

+1 for this whole concept, just BTW. As I've said before, I grow
weary of asking customers to run gdb.

Indeed.

Greetings,

Andres Freund

#11

robertmhaas@gmail.com

about 8 years ago

In reply to: Andres Freund (#10)

Re: Using ProcSignal to get memory context stats from a running backend

On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote:

I agree that it's more reliable - I hope there's no meaningful safety
difference. I think you overestimate users a bit however - far from
most of them are going to be able to extract a very long log entry from
a busy log file. There's no generally available easy way to copy a few
pages of text from a logfile that's a few gigabytes large...

Well, a lot of users will send us the whole logfile rather than just
the relevant bits, but that doesn't bother me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

andres@anarazel.de

about 8 years ago

In reply to: Robert Haas (#11)

Re: Using ProcSignal to get memory context stats from a running backend

On 2017-12-12 14:16:59 -0500, Robert Haas wrote:

On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote:

I agree that it's more reliable - I hope there's no meaningful safety
difference. I think you overestimate users a bit however - far from
most of them are going to be able to extract a very long log entry from
a busy log file. There's no generally available easy way to copy a few
pages of text from a logfile that's a few gigabytes large...

Well, a lot of users will send us the whole logfile rather than just
the relevant bits, but that doesn't bother me.

That doesn't really work on some busy servers, and also often in cases
where the log potentially contains sensitive (e.g. HIPPA) data.

I think a function returning the dump would be great, a function "just"
dumping to the server log still pretty good.

Greetings,

Andres Freund

#13

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Craig Ringer (#6)

Re: Using ProcSignal to get memory context stats from a running backend

On Mon, Dec 11, 2017 at 10:07 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 12 December 2017 at 12:43, Andres Freund <andres@anarazel.de> wrote:

On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
But that'd have
the disadvanatage that it possibly would take a while till the
MemoryContextStats() is executed. Not sure if that's still good enough
for you?

Definitely. Sure, it won't be perfect, but it'd be a big improvement on what
we have.

If this would be fine enough, why not giving a shot then? Having to
use gdb now on production systems is something sometimes hard to
justify to customers. There are also the Windows problems...

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

I think the reason it prints to stderr now is that it's intended to run in
OOM situations.

Yep. I am on board with Tom here that this property should not be thrown away.
--
Michael

#14

craig@2ndquadrant.com

about 8 years ago

In reply to: Michael Paquier (#13)

Re: Using ProcSignal to get memory context stats from a running backend

On 13 December 2017 at 12:10, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Mon, Dec 11, 2017 at 10:07 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 12 December 2017 at 12:43, Andres Freund <andres@anarazel.de> wrote:

On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
But that'd have
the disadvanatage that it possibly would take a while till the
MemoryContextStats() is executed. Not sure if that's still good enough
for you?

Definitely. Sure, it won't be perfect, but it'd be a big improvement on

what

we have.

If this would be fine enough, why not giving a shot then? Having to
use gdb now on production systems is something sometimes hard to
justify to customers. There are also the Windows problems...

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

I think the reason it prints to stderr now is that it's intended to run

in

OOM situations.

Yep. I am on board with Tom here that this property should not be thrown
away.

OK, so I think I'll do pretty much what I outlined above, using stringinfo
for the !OOM case and fprintf for the OOM case, via callback, roughly as
outlined upthread. I won't bother with elog, it's easy enough if someone
cares.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15

Greg Stark

stark@mit.edu

about 8 years ago

In reply to: Andres Freund (#12)

Re: Using ProcSignal to get memory context stats from a running backend

Another simpler option would be to open up a new file in the log
directory something like "debug_dump.<pid>.txt" and print whatever you
want there. Then print out a reasonable size log entry saying "Debug
dump output to file 'debug_dump.<pid>.txt'". You could provide a
function that reads such files out of the log directory or just
document how to access them using the pg_read_file().

It's not perfect but it's got most of the advantages of communicating
with the requesting process without the complexities of a DSM segment
and it's a bit more flexible too. Users can have automated monitoring
tools watch for dumps for example. And regular tools can be used to
back up and clean out old files.

#16

craig@2ndquadrant.com

about 8 years ago

In reply to: Greg Stark (#15)

Re: Using ProcSignal to get memory context stats from a running backend

On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:

Another simpler option would be to open up a new file in the log
directory

... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17

robertmhaas@gmail.com

about 8 years ago

In reply to: Craig Ringer (#16)

Re: Using ProcSignal to get memory context stats from a running backend

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:

Another simpler option would be to open up a new file in the log
directory

... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

+1. I still think just printing it to the log is fine.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

craig@2ndquadrant.com

about 8 years ago

In reply to: Robert Haas (#17)

1 attachment(s)

Re: Using ProcSignal to get memory context stats from a running backend

On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:

Another simpler option would be to open up a new file in the log
directory

... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

+1. I still think just printing it to the log is fine.

Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new
ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by
CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.
It's a diagnostic routine so I don't think that's going to be a great
bother. By default it's set to write_stderr. That just writes to vfprintf
on unix so the outcome's the same as our direct use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to write memory
context dumps to the eventlog with ERROR level if running as a service with
no console. That means we vsnprintf the string into a static buffer first.
I'm not 100% convinced of the wisdom of that because it could flood the
event log, which is usually size limited by # of events and recycled. It'd
be better to write one event than write one per memory context line, but
it's not safe to do that when OOM. I lean toward this being a win: at least
Windows users should have some hope of finding out why Pg OOM'd, which
currently they don't when it runs as a service. If we don't, we should look
at using SetStdHandle to write stderr to a secondary logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the
same behaviour instead, but I figured I'd start with reusing write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf, because
on some systems simply finding where stderr is written can be a challenge,
if it's not going to /dev/null via some detached session. Is it in
journald? In some separate log? Being captured by the logging collector
(and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info to
associate the dump with the process of interest and what it's doing at the
time.

Also, an astonishing number of deployed systems I've worked with actually
don't put the pid or anything useful in log_line_prefix to make grouping up
multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
fixing it is easy enough.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v1-0001-Add-pg_diag_backend-to-print-memory-context-info.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Add-pg_diag_backend-to-print-memory-context-info.patchDownload

From 663e31ffbe471b5dac18ff6322c0ea45a948350a Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Tue, 19 Dec 2017 20:47:36 +0800
Subject: [PATCH v1] Add pg_diag_backend to print memory context info

The new pg_diag_backend(pid) function signals a backend to dump
a summary of its memory context state when it next calls
CHECK_FOR_INTERRUPTS(). The memory context information is
output to the server log.

This works on any backend that uses the standard
procsignal_sigusr1_handler or explicitly handles PROCSIG_DIAG_REQUEST in
its own handler. Currently this includes normal user backends, the
walsender and autovacuum workers but not the other system processes.
Background workers must handle it explicitly handle SIGUSR1 with
procsignal_sigusr1_handler.
---
 doc/src/sgml/bgworker.sgml              | 10 ++++-
 doc/src/sgml/func.sgml                  | 13 ++++++
 src/backend/storage/ipc/procsignal.c    |  3 ++
 src/backend/tcop/postgres.c             | 70 +++++++++++++++++++++++++++++++++
 src/backend/utils/adt/misc.c            | 16 ++++++++
 src/backend/utils/init/globals.c        |  1 +
 src/backend/utils/mmgr/aset.c           |  4 +-
 src/backend/utils/mmgr/mcxt.c           | 10 +++--
 src/include/catalog/pg_proc.h           |  2 +
 src/include/miscadmin.h                 |  4 ++
 src/include/storage/procsignal.h        |  5 ++-
 src/include/utils/memutils.h            |  4 ++
 src/test/regress/sql/misc_functions.sql |  9 +++++
 13 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 4bc2b69..50bb54f 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -185,7 +185,8 @@ typedef struct BackgroundWorker
    when the process is started or exits.  It should be 0 for workers registered
    at postmaster startup time, or when the backend registering the worker does
    not wish to wait for the worker to start up.  Otherwise, it should be
-   initialized to <literal>MyProcPid</literal>.
+   initialized to <literal>MyProcPid</literal>. Typically this sets the
+   receiving process's latch via its signal handler.
   </para>
 
   <para>Once running, the process can connect to a database by calling
@@ -208,7 +209,12 @@ typedef struct BackgroundWorker
    allow the process to customize its signal handlers, if necessary.
    Signals can be unblocked in the new process by calling
    <function>BackgroundWorkerUnblockSignals</function> and blocked by calling
-   <function>BackgroundWorkerBlockSignals</function>.
+   <function>BackgroundWorkerBlockSignals</function>. The default
+   <literal>SIGUSR1</literal> handler for background workers just
+   sets the process latch. Background workers may instead adopt
+   the <function>procsignal_sigusr1_handler</function> used by 
+   normal backends and call <function>CHECK_FOR_INTERRUPTS()</function>
+   in their main loops.
   </para>
 
   <para>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4dd9d02..be5ed54 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18348,6 +18348,9 @@ SELECT set_config('log_statement_stats', 'off', false);
    <indexterm>
     <primary>pg_terminate_backend</primary>
    </indexterm>
+   <indexterm>
+    <primary>pg_diag_backend</primary>
+   </indexterm>
 
    <indexterm>
     <primary>signal</primary>
@@ -18407,6 +18410,16 @@ SELECT set_config('log_statement_stats', 'off', false);
         however only superusers can terminate superuser backends.
        </entry>
       </row>
+      <row>
+       <entry>
+        <literal><function>pg_diag_backend(<parameter>pid</parameter> <type>int</type>)</function></literal>
+        </entry>
+       <entry><type>boolean</type></entry>
+       <entry>Ask a backend to report diagnostics including memory use details
+        to the server log. Restricted to superusers. The log entry begins
+        with <literal>diagnostic dump</literal> or the localized equivalent.
+       </entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b9302ac..450e4f1 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -292,6 +292,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_DIAG_REQUEST))
+		HandleDiagRequestInterrupt(PROCSIG_DIAG_REQUEST);
+
 	SetLatch(MyLatch);
 
 	latch_sigusr1_handler();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1ae9ac2..d1ca55f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -169,6 +169,9 @@ static ProcSignalReason RecoveryConflictReason;
 static MemoryContext row_description_context = NULL;
 static StringInfoData row_description_buf;
 
+/* Reused for printwrapper_stringinfo */
+static StringInfoData diag_request_buf;
+
 /* ----------------------------------------------------------------
  *		decls for routines only used in this file
  * ----------------------------------------------------------------
@@ -193,6 +196,7 @@ static void drop_unnamed_stmt(void);
 static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
+static void HandleDiagRequest(void);
 
 
 /* ----------------------------------------------------------------
@@ -3047,8 +3051,74 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (DiagRequestPending)
+		HandleDiagRequest();
+}
+
+/*
+ * Accumulate writes into the buffer in diag_request_buf,
+ * for use with functions that expect a printf-like callback.
+ */
+pg_attribute_printf(1, 2) static void
+printwrapper_stringinfo(const char * fmt, ...)
+{
+	Assert(diag_request_buf.data != NULL);
+	for (;;)
+	{
+		va_list		args;
+		int			needed;
+		va_start(args, fmt);
+		needed = appendStringInfoVA(&diag_request_buf, fmt, args);
+		va_end(args);
+		if (needed == 0)
+			break;
+		enlargeStringInfo(&diag_request_buf, needed);
+	}
 }
 
+/*
+ * The interrupt-handler side of handling a diagnostic request
+ * interrupt. This just saves the request for handling during
+ * CHECK_FOR_INTERRUPTS(). Any prior request that isn't yet actioned is
+ * clobbered.
+ */
+void
+HandleDiagRequestInterrupt(ProcSignalReason request)
+{
+	DiagRequestPending = request;
+	InterruptPending = true;
+}
+
+/*
+ * Dump memory context information to the logs in response to a
+ * PROCSIG_DIAG_REQUEST sent via pg_diag_backend or similar.
+ *
+ * Usually invoked via procsignal_sigusr1_handler and
+ * CHECK_FOR_INTERRUPTS(), so this won't work by default
+ * for background workers and other nonstandard processes.
+ */
+static void
+HandleDiagRequest(void)
+{
+	ProcSignalReason DiagRequest = DiagRequestPending;
+	DiagRequestPending = 0;
+
+	/* So far we only have a generic diagnostics dump */
+	if (DiagRequest == PROCSIG_DIAG_REQUEST)
+	{
+		MemoryContextStats_printf_hook_type	old_printf_hook
+			= MemoryContextStats_printf_hook;
+		initStringInfo(&diag_request_buf);
+		MemoryContextStats_printf_hook = printwrapper_stringinfo;
+		MemoryContextStats(TopMemoryContext);
+		MemoryContextStats_printf_hook = old_printf_hook;
+		elog(LOG, "diagnostic dump requested; memory context info: %s",
+			diag_request_buf.data);
+		pfree(diag_request_buf.data);
+		diag_request_buf.data = NULL;
+	}
+}
 
 /*
  * IA64-specific code to fetch the AR.BSP register for stack depth checks.
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index f53d411..ac0c29f 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -321,6 +321,22 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+Datum
+pg_diag_backend(PG_FUNCTION_ARGS)
+{
+	int pid = PG_GETARG_INT32(0);
+	int r;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("pg_diag_backend requires superuser privileges")));
+
+	r = SendProcSignal(pid, PROCSIG_DIAG_REQUEST, InvalidBackendId);
+
+	PG_RETURN_BOOL(r == 0);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 9680a4b..f3e4a4d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -29,6 +29,7 @@ ProtocolVersion FrontendProtocol;
 volatile bool InterruptPending = false;
 volatile bool QueryCancelPending = false;
 volatile bool ProcDiePending = false;
+volatile int DiagRequestPending = false;
 volatile bool ClientConnectionLost = false;
 volatile bool IdleInTransactionSessionTimeoutPending = false;
 volatile sig_atomic_t ConfigReloadPending = false;
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 1519da0..76f1cf0 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1369,8 +1369,8 @@ AllocSetStats(MemoryContext context, int level, bool print,
 		int			i;
 
 		for (i = 0; i < level; i++)
-			fprintf(stderr, "  ");
-		fprintf(stderr,
+			MemoryContextStats_printf_hook("  ");
+		MemoryContextStats_printf_hook(
 				"%s: %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
 				set->header.name, totalspace, nblocks, freespace, freechunks,
 				totalspace - freespace);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 97382a6..afd120d 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -56,6 +56,10 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
 						   bool print, int max_children,
 						   MemoryContextCounters *totals);
 
+/* The printf-like destination for MemoryContextStats output */
+MemoryContextStats_printf_hook_type MemoryContextStats_printf_hook
+	= write_stderr;
+
 /*
  * You should not do memory allocations within a critical section, because
  * an out-of-memory error will be escalated to a PANIC. To enforce that
@@ -454,7 +458,7 @@ MemoryContextStatsDetail(MemoryContext context, int max_children)
 
 	MemoryContextStatsInternal(context, 0, true, max_children, &grand_totals);
 
-	fprintf(stderr,
+	MemoryContextStats_printf_hook(
 			"Grand total: %zu bytes in %zd blocks; %zu free (%zd chunks); %zu used\n",
 			grand_totals.totalspace, grand_totals.nblocks,
 			grand_totals.freespace, grand_totals.freechunks,
@@ -510,8 +514,8 @@ MemoryContextStatsInternal(MemoryContext context, int level,
 			int			i;
 
 			for (i = 0; i <= level; i++)
-				fprintf(stderr, "  ");
-			fprintf(stderr,
+				MemoryContextStats_printf_hook("  ");
+			MemoryContextStats_printf_hook(
 					"%d more child contexts containing %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
 					ichild - max_children,
 					local_totals.totalspace,
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c969375..2f8fb7b 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3256,6 +3256,8 @@ DATA(insert OID = 2171 ( pg_cancel_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v
 DESCR("cancel a server process' current query");
 DATA(insert OID = 2096 ( pg_terminate_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_terminate_backend _null_ _null_ _null_ ));
 DESCR("terminate a server process");
+DATA(insert OID = 3556 ( pg_diag_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_diag_backend _null_ _null_ _null_ ));
+DESCR("signal a server process to dump diagnostics to the log");
 DATA(insert OID = 2172 ( pg_start_backup		PGNSP PGUID 12 1 0 0 0 f f f f t f v r 3 0 3220 "25 16 16" _null_ _null_ _null_ _null_ _null_ pg_start_backup _null_ _null_ _null_ ));
 DESCR("prepare for taking an online backup");
 DATA(insert OID = 2173 ( pg_stop_backup			PGNSP PGUID 12 1 0 0 0 f f f f t f v r 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ ));
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 59da7a6..9b17173 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -80,6 +80,7 @@
 extern PGDLLIMPORT volatile bool InterruptPending;
 extern PGDLLIMPORT volatile bool QueryCancelPending;
 extern PGDLLIMPORT volatile bool ProcDiePending;
+extern PGDLLIMPORT volatile int DiagRequestPending;
 extern PGDLLIMPORT volatile bool IdleInTransactionSessionTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending;
 
@@ -275,6 +276,9 @@ extern bool stack_is_too_deep(void);
 
 extern void PostgresSigHupHandler(SIGNAL_ARGS);
 
+enum ProcSignalReason;
+extern void HandleDiagRequestInterrupt(enum ProcSignalReason request);
+
 /* in tcop/utility.c */
 extern void PreventCommandIfReadOnly(const char *cmdname);
 extern void PreventCommandIfParallelMode(const char *cmdname);
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 20bb05b..121fb86 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -27,7 +27,7 @@
  * Also, because of race conditions, it's important that all the signals be
  * defined so that no harm is done if a process mistakenly receives one.
  */
-typedef enum
+typedef enum ProcSignalReason
 {
 	PROCSIG_CATCHUP_INTERRUPT,	/* sinval catchup interrupt */
 	PROCSIG_NOTIFY_INTERRUPT,	/* listen/notify interrupt */
@@ -42,6 +42,9 @@ typedef enum
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
+	/* Signals used to ask the backend for diagnostic output */
+	PROCSIG_DIAG_REQUEST,		/* Dump memory context info etc */
+
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
 
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 9c30eb7..f5b6d2c 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -45,6 +45,10 @@
 
 #define AllocHugeSizeIsValid(size)	((Size) (size) <= MaxAllocHugeSize)
 
+typedef void (*MemoryContextStats_printf_hook_type)(const char *fmt, ...)
+	pg_attribute_printf(1, 2);
+
+extern MemoryContextStats_printf_hook_type MemoryContextStats_printf_hook;
 
 /*
  * Standard top-level memory contexts.
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 1a20c1f..fb64b87 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -1,4 +1,13 @@
 --
+-- Misc admin functions
+--
+
+-- pg_diag_backend makes the backend log, so it can't really
+-- be tested within pg_regress except to ensure it doesn't
+-- upset the backend.
+SELECT pg_diag_backend(pg_backend_pid());
+
+--
 -- num_nulls()
 --
 
-- 
2.9.5

#19

Pavel Stehule

pavel.stehule@gmail.com

about 8 years ago

In reply to: Craig Ringer (#18)

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:

On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:

Another simpler option would be to open up a new file in the log
directory

... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

+1. I still think just printing it to the log is fine.

Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST.
It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output
to elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.
It's a diagnostic routine so I don't think that's going to be a great
bother. By default it's set to write_stderr. That just writes to vfprintf
on unix so the outcome's the same as our direct use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to write
memory context dumps to the eventlog with ERROR level if running as a
service with no console. That means we vsnprintf the string into a static
buffer first. I'm not 100% convinced of the wisdom of that because it could
flood the event log, which is usually size limited by # of events and
recycled. It'd be better to write one event than write one per memory
context line, but it's not safe to do that when OOM. I lean toward this
being a win: at least Windows users should have some hope of finding out
why Pg OOM'd, which currently they don't when it runs as a service. If we
don't, we should look at using SetStdHandle to write stderr to a secondary
logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve exactly
the same behaviour instead, but I figured I'd start with reusing
write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf, because
on some systems simply finding where stderr is written can be a challenge,
if it's not going to /dev/null via some detached session. Is it in
journald? In some separate log? Being captured by the logging collector
(and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info to
associate the dump with the process of interest and what it's doing at the
time.

Also, an astonishing number of deployed systems I've worked with actually
don't put the pid or anything useful in log_line_prefix to make grouping up
multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
fixing it is easy enough.

sorry for small offtopic. Can be used this mechanism for log of executed
plan or full query?

Regards

Pavel

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#20

robertmhaas@gmail.com

about 8 years ago

In reply to: Craig Ringer (#18)

Re: Using ProcSignal to get memory context stats from a running backend

On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.

That looks pretty grotty to me. I think if you want to elog/ereport
this, you need to pass another argument to MemoryContextStats() or add
another memory context method. This is pretty much a textbook example
of the wrong way to use a global variable, IMHO.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Robert Haas (#20)

Re: Using ProcSignal to get memory context stats from a running backend

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.

That looks pretty grotty to me. I think if you want to elog/ereport
this, you need to pass another argument to MemoryContextStats() or add
another memory context method. This is pretty much a textbook example
of the wrong way to use a global variable, IMHO.

Yeah. But please don't mess with MemoryContextStats per se ---
I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
is kinda wired into my gdb reflexes. I think what'd make sense
is a new function "MemoryContextStatsTo(context, function_pointer)".
It's okay to redefine the APIs of the per-context-type functions
these would call, though, because nobody calls those functions directly.

regards, tom lane

#22

andres@anarazel.de

about 8 years ago

In reply to: Tom Lane (#21)

Re: Using ProcSignal to get memory context stats from a running backend

On 2017-12-19 13:17:52 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.

That looks pretty grotty to me. I think if you want to elog/ereport
this, you need to pass another argument to MemoryContextStats() or add
another memory context method. This is pretty much a textbook example
of the wrong way to use a global variable, IMHO.

Agreed.

Yeah. But please don't mess with MemoryContextStats per se ---
I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
is kinda wired into my gdb reflexes. I think what'd make sense
is a new function "MemoryContextStatsTo(context, function_pointer)".
It's okay to redefine the APIs of the per-context-type functions
these would call, though, because nobody calls those functions directly.

We already have MemoryContextStatsDetail() - it seems to make sense to
expand that API and leave MemoryContextStats() alone. I don't think
there's a need for a third variant?

Greetings,

Andres Freund

#23

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Andres Freund (#22)

Re: Using ProcSignal to get memory context stats from a running backend

Andres Freund <andres@anarazel.de> writes:

On 2017-12-19 13:17:52 -0500, Tom Lane wrote:

Yeah. But please don't mess with MemoryContextStats per se ---
I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
is kinda wired into my gdb reflexes. I think what'd make sense
is a new function "MemoryContextStatsTo(context, function_pointer)".
It's okay to redefine the APIs of the per-context-type functions
these would call, though, because nobody calls those functions directly.

We already have MemoryContextStatsDetail() - it seems to make sense to
expand that API and leave MemoryContextStats() alone. I don't think
there's a need for a third variant?

Sure, WFM.

regards, tom lane

#24

craig@2ndquadrant.com

about 8 years ago

In reply to: Andres Freund (#22)

Re: Using ProcSignal to get memory context stats from a running backend

On 20 December 2017 at 02:35, Andres Freund <andres@anarazel.de> wrote:

Yeah. But please don't mess with MemoryContextStats per se ---
I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
is kinda wired into my gdb reflexes. I think what'd make sense
is a new function "MemoryContextStatsTo(context, function_pointer)".
It's okay to redefine the APIs of the per-context-type functions
these would call, though, because nobody calls those functions directly.

We already have MemoryContextStatsDetail() - it seems to make sense to
expand that API and leave MemoryContextStats() alone. I don't think
there's a need for a third variant?

Cool, can do.

I'll have to expose a typedef for the printf-wrapper callback in memnodes.h
and add it to the stats method, which I thought would be more likely to get
complaints than the global hook. I'm actually happier to do it with a
passed callback.

Will revise when I get a chance in the next couple of days.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#25

craig@2ndquadrant.com

about 8 years ago

In reply to: Craig Ringer (#24)

1 attachment(s)

Re: Using ProcSignal to get memory context stats from a running backend

On 20 December 2017 at 08:46, Craig Ringer <craig@2ndquadrant.com> wrote:

On 20 December 2017 at 02:35, Andres Freund <andres@anarazel.de> wrote:

Yeah. But please don't mess with MemoryContextStats per se ---
I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
is kinda wired into my gdb reflexes. I think what'd make sense
is a new function "MemoryContextStatsTo(context, function_pointer)".
It's okay to redefine the APIs of the per-context-type functions
these would call, though, because nobody calls those functions directly.

We already have MemoryContextStatsDetail() - it seems to make sense to
expand that API and leave MemoryContextStats() alone. I don't think
there's a need for a third variant?

Cool, can do.

I'll have to expose a typedef for the printf-wrapper callback in
memnodes.h and add it to the stats method, which I thought would be more
likely to get complaints than the global hook. I'm actually happier to do
it with a passed callback.

Will revise when I get a chance in the next couple of days.

Done.

It uses vfprintf unconditionally, even on Windows where we'd usually use
write_stderr, so it doesn't change the current MemoryContextStats behaviour.

2017-12-21 14:39:20.045 AWST [10588] pg_regress/misc_functions LOG:
diagnostic dump requested; memory context info: TopMemoryContext: 67440
total in 5 blocks; 13376 free (6 chunks); 54064 used
pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks;
1376 free (0 chunks); 6816 used
TopTransactionContext: 8192 total in 1 blocks; 7728 free (1
chunks); 464 used
...

To verify that stderr output still works properly I ran:

SELECT
repeat('spamspamspam', 20000000),
repeat('spamspamspam', 20000000),
repeat('spamspamspam', 20000000),
repeat('spamspamspam', 20000000),
... lots ...;

and got the expected

+ERROR:  out of memory
+DETAIL:  Failed on request of size 240000004.

and memory context dump to stderr. I didn't add a TAP test for that because
I'd rather avoid exercising OOM in tests where we don't know what the
host's swap config is like, how its ulimit behaves, whether it has
craziness like the OOM killer, etc. But it might make sense to add a TAP
test to set a low ulimit and exercise OOM recovery at some point.

I've added a separate vfprintf wrapper for memory context use that doesn't
try to use the windows error log like write_stderr(...) does. If we want to
change OOM behaviour on Windows it can be done in a followup.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v2-0001-Add-pg_diag_backend-to-print-memory-context-info.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-pg_diag_backend-to-print-memory-context-info.patchDownload

From db5caa16cd3c78544f7a6e336c3d8b9118f7cdaf Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Tue, 19 Dec 2017 20:47:36 +0800
Subject: [PATCH v2] Add pg_diag_backend to print memory context info

The new pg_diag_backend(pid) function signals a backend to dump
a summary of its memory context state when it next calls
CHECK_FOR_INTERRUPTS(). The memory context information is
output to the server log.

This works on any backend that uses the standard
procsignal_sigusr1_handler or explicitly handles PROCSIG_DIAG_REQUEST in
its own handler. Currently this includes normal user backends, the
walsender and autovacuum workers but not the other system processes.
Background workers must handle it explicitly handle SIGUSR1 with
procsignal_sigusr1_handler.
---
 doc/src/sgml/bgworker.sgml                   | 10 ++++-
 doc/src/sgml/func.sgml                       | 13 ++++++
 src/backend/storage/ipc/procsignal.c         |  3 ++
 src/backend/tcop/postgres.c                  | 66 ++++++++++++++++++++++++++++
 src/backend/utils/adt/misc.c                 | 16 +++++++
 src/backend/utils/init/globals.c             |  1 +
 src/backend/utils/mmgr/aset.c                | 10 +++--
 src/backend/utils/mmgr/mcxt.c                | 50 ++++++++++++++++-----
 src/include/catalog/pg_proc.h                |  2 +
 src/include/miscadmin.h                      |  4 ++
 src/include/nodes/memnodes.h                 | 10 ++++-
 src/include/storage/procsignal.h             |  5 ++-
 src/include/utils/memutils.h                 |  4 +-
 src/test/regress/expected/misc_functions.out | 12 +++++
 src/test/regress/sql/misc_functions.sql      |  9 ++++
 15 files changed, 194 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 4bc2b69..50bb54f 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -185,7 +185,8 @@ typedef struct BackgroundWorker
    when the process is started or exits.  It should be 0 for workers registered
    at postmaster startup time, or when the backend registering the worker does
    not wish to wait for the worker to start up.  Otherwise, it should be
-   initialized to <literal>MyProcPid</literal>.
+   initialized to <literal>MyProcPid</literal>. Typically this sets the
+   receiving process's latch via its signal handler.
   </para>
 
   <para>Once running, the process can connect to a database by calling
@@ -208,7 +209,12 @@ typedef struct BackgroundWorker
    allow the process to customize its signal handlers, if necessary.
    Signals can be unblocked in the new process by calling
    <function>BackgroundWorkerUnblockSignals</function> and blocked by calling
-   <function>BackgroundWorkerBlockSignals</function>.
+   <function>BackgroundWorkerBlockSignals</function>. The default
+   <literal>SIGUSR1</literal> handler for background workers just
+   sets the process latch. Background workers may instead adopt
+   the <function>procsignal_sigusr1_handler</function> used by 
+   normal backends and call <function>CHECK_FOR_INTERRUPTS()</function>
+   in their main loops.
   </para>
 
   <para>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4dd9d02..be5ed54 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18348,6 +18348,9 @@ SELECT set_config('log_statement_stats', 'off', false);
    <indexterm>
     <primary>pg_terminate_backend</primary>
    </indexterm>
+   <indexterm>
+    <primary>pg_diag_backend</primary>
+   </indexterm>
 
    <indexterm>
     <primary>signal</primary>
@@ -18407,6 +18410,16 @@ SELECT set_config('log_statement_stats', 'off', false);
         however only superusers can terminate superuser backends.
        </entry>
       </row>
+      <row>
+       <entry>
+        <literal><function>pg_diag_backend(<parameter>pid</parameter> <type>int</type>)</function></literal>
+        </entry>
+       <entry><type>boolean</type></entry>
+       <entry>Ask a backend to report diagnostics including memory use details
+        to the server log. Restricted to superusers. The log entry begins
+        with <literal>diagnostic dump</literal> or the localized equivalent.
+       </entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b9302ac..450e4f1 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -292,6 +292,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_DIAG_REQUEST))
+		HandleDiagRequestInterrupt(PROCSIG_DIAG_REQUEST);
+
 	SetLatch(MyLatch);
 
 	latch_sigusr1_handler();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1ae9ac2..b7a8d4d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -193,6 +193,9 @@ static void drop_unnamed_stmt(void);
 static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
+static void HandleDiagRequest(void);
+static void printwrapper_stringinfo(void *extra, const char *fmt, ...)
+	pg_attribute_printf(2, 3);
 
 
 /* ----------------------------------------------------------------
@@ -3047,8 +3050,71 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (DiagRequestPending)
+		HandleDiagRequest();
+}
+
+/*
+ * Accumulate writes into the buffer in diag_request_buf,
+ * for use with functions that expect a printf-like callback.
+ */
+static void
+printwrapper_stringinfo(void *extra, const char * fmt, ...)
+{
+	StringInfo out = extra;
+	for (;;)
+	{
+		va_list		args;
+		int			needed;
+		va_start(args, fmt);
+		needed = appendStringInfoVA(out, fmt, args);
+		va_end(args);
+		if (needed == 0)
+			break;
+		enlargeStringInfo(out, needed);
+	}
 }
 
+/*
+ * The interrupt-handler side of handling a diagnostic request
+ * interrupt. This just saves the request for handling during
+ * CHECK_FOR_INTERRUPTS(). Any prior request that isn't yet actioned is
+ * clobbered.
+ */
+void
+HandleDiagRequestInterrupt(ProcSignalReason request)
+{
+	DiagRequestPending = request;
+	InterruptPending = true;
+}
+
+/*
+ * Dump memory context information to the logs in response to a
+ * PROCSIG_DIAG_REQUEST sent via pg_diag_backend or similar.
+ *
+ * Usually invoked via procsignal_sigusr1_handler and
+ * CHECK_FOR_INTERRUPTS(), so this won't work by default
+ * for background workers and other nonstandard processes.
+ */
+static void
+HandleDiagRequest(void)
+{
+	ProcSignalReason DiagRequest = DiagRequestPending;
+	DiagRequestPending = 0;
+
+	/* So far we only have a generic diagnostics dump */
+	if (DiagRequest == PROCSIG_DIAG_REQUEST)
+	{
+		StringInfoData outbuf;
+		initStringInfo(&outbuf);
+		MemoryContextStatsDetail(TopMemoryContext, 100,
+			printwrapper_stringinfo, (void*)&outbuf);
+		elog(LOG, "diagnostic dump requested; memory context info: %s",
+			outbuf.data);
+		pfree(outbuf.data);
+	}
+}
 
 /*
  * IA64-specific code to fetch the AR.BSP register for stack depth checks.
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index f53d411..ac0c29f 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -321,6 +321,22 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+Datum
+pg_diag_backend(PG_FUNCTION_ARGS)
+{
+	int pid = PG_GETARG_INT32(0);
+	int r;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("pg_diag_backend requires superuser privileges")));
+
+	r = SendProcSignal(pid, PROCSIG_DIAG_REQUEST, InvalidBackendId);
+
+	PG_RETURN_BOOL(r == 0);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 9680a4b..f3e4a4d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -29,6 +29,7 @@ ProtocolVersion FrontendProtocol;
 volatile bool InterruptPending = false;
 volatile bool QueryCancelPending = false;
 volatile bool ProcDiePending = false;
+volatile int DiagRequestPending = false;
 volatile bool ClientConnectionLost = false;
 volatile bool IdleInTransactionSessionTimeoutPending = false;
 volatile sig_atomic_t ConfigReloadPending = false;
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 1519da0..547b695 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -277,7 +277,8 @@ static void AllocSetDelete(MemoryContext context);
 static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
 static bool AllocSetIsEmpty(MemoryContext context);
 static void AllocSetStats(MemoryContext context, int level, bool print,
-			  MemoryContextCounters *totals);
+			  MemoryContextCounters *totals, printf_wrapper outfunc,
+			  void *outfunc_extra);
 
 #ifdef MEMORY_CONTEXT_CHECKING
 static void AllocSetCheck(MemoryContext context);
@@ -1333,7 +1334,8 @@ AllocSetIsEmpty(MemoryContext context)
  */
 static void
 AllocSetStats(MemoryContext context, int level, bool print,
-			  MemoryContextCounters *totals)
+			  MemoryContextCounters *totals, printf_wrapper outfunc,
+			  void *outfunc_extra)
 {
 	AllocSet	set = (AllocSet) context;
 	Size		nblocks = 0;
@@ -1369,8 +1371,8 @@ AllocSetStats(MemoryContext context, int level, bool print,
 		int			i;
 
 		for (i = 0; i < level; i++)
-			fprintf(stderr, "  ");
-		fprintf(stderr,
+			outfunc(outfunc_extra, "  ");
+		outfunc(outfunc_extra,
 				"%s: %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
 				set->header.name, totalspace, nblocks, freespace, freechunks,
 				totalspace - freespace);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 97382a6..c108d87 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -54,7 +54,11 @@ MemoryContext PortalContext = NULL;
 static void MemoryContextCallResetCallbacks(MemoryContext context);
 static void MemoryContextStatsInternal(MemoryContext context, int level,
 						   bool print, int max_children,
-						   MemoryContextCounters *totals);
+						   MemoryContextCounters *totals,
+						   printf_wrapper outfunc, void *outfunc_args);
+
+static void fprintf_stderr_wrapper(void *args, const char *fmt, ...)
+	pg_attribute_printf(2, 3);
 
 /*
  * You should not do memory allocations within a critical section, because
@@ -426,6 +430,24 @@ MemoryContextIsEmpty(MemoryContext context)
 }
 
 /*
+ * MemoryContextStats doesn't want to try to write to elog(...)
+ * as it could be called under severe memory pressure. So we
+ * write straight to stderr.
+ *
+ * This duplicates write_stderr(...) without the Windows
+ * event log support.
+ */
+static void
+fprintf_stderr_wrapper(void *args, const char *fmt, ...)
+{
+	va_list		ap;
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	va_end(ap);
+	fflush(stderr);
+}
+
+/*
  * MemoryContextStats
  *		Print statistics about the named context and all its descendants.
  *
@@ -437,7 +459,7 @@ void
 MemoryContextStats(MemoryContext context)
 {
 	/* A hard-wired limit on the number of children is usually good enough */
-	MemoryContextStatsDetail(context, 100);
+	MemoryContextStatsDetail(context, 100, fprintf_stderr_wrapper, NULL);
 }
 
 /*
@@ -446,15 +468,17 @@ MemoryContextStats(MemoryContext context)
  * Entry point for use if you want to vary the number of child contexts shown.
  */
 void
-MemoryContextStatsDetail(MemoryContext context, int max_children)
+MemoryContextStatsDetail(MemoryContext context, int max_children,
+	printf_wrapper outfunc, void *outfunc_extra)
 {
 	MemoryContextCounters grand_totals;
 
 	memset(&grand_totals, 0, sizeof(grand_totals));
 
-	MemoryContextStatsInternal(context, 0, true, max_children, &grand_totals);
+	MemoryContextStatsInternal(context, 0, true, max_children, &grand_totals,
+		outfunc, outfunc_extra);
 
-	fprintf(stderr,
+	outfunc(outfunc_extra,
 			"Grand total: %zu bytes in %zd blocks; %zu free (%zd chunks); %zu used\n",
 			grand_totals.totalspace, grand_totals.nblocks,
 			grand_totals.freespace, grand_totals.freechunks,
@@ -471,7 +495,8 @@ MemoryContextStatsDetail(MemoryContext context, int max_children)
 static void
 MemoryContextStatsInternal(MemoryContext context, int level,
 						   bool print, int max_children,
-						   MemoryContextCounters *totals)
+						   MemoryContextCounters *totals,
+						   printf_wrapper outfunc, void *outfunc_extra)
 {
 	MemoryContextCounters local_totals;
 	MemoryContext child;
@@ -480,7 +505,8 @@ MemoryContextStatsInternal(MemoryContext context, int level,
 	AssertArg(MemoryContextIsValid(context));
 
 	/* Examine the context itself */
-	context->methods->stats(context, level, print, totals);
+	context->methods->stats(context, level, print, totals,
+		outfunc, outfunc_extra);
 
 	/*
 	 * Examine children.  If there are more than max_children of them, we do
@@ -495,11 +521,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
 		if (ichild < max_children)
 			MemoryContextStatsInternal(child, level + 1,
 									   print, max_children,
-									   totals);
+									   totals,
+									   outfunc, outfunc_extra);
 		else
 			MemoryContextStatsInternal(child, level + 1,
 									   false, max_children,
-									   &local_totals);
+									   &local_totals,
+									   outfunc, outfunc_extra);
 	}
 
 	/* Deal with excess children */
@@ -510,8 +538,8 @@ MemoryContextStatsInternal(MemoryContext context, int level,
 			int			i;
 
 			for (i = 0; i <= level; i++)
-				fprintf(stderr, "  ");
-			fprintf(stderr,
+				outfunc(outfunc_extra, "  ");
+			outfunc(outfunc_extra,
 					"%d more child contexts containing %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
 					ichild - max_children,
 					local_totals.totalspace,
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c969375..2f8fb7b 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3256,6 +3256,8 @@ DATA(insert OID = 2171 ( pg_cancel_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v
 DESCR("cancel a server process' current query");
 DATA(insert OID = 2096 ( pg_terminate_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_terminate_backend _null_ _null_ _null_ ));
 DESCR("terminate a server process");
+DATA(insert OID = 3556 ( pg_diag_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_diag_backend _null_ _null_ _null_ ));
+DESCR("signal a server process to dump diagnostics to the log");
 DATA(insert OID = 2172 ( pg_start_backup		PGNSP PGUID 12 1 0 0 0 f f f f t f v r 3 0 3220 "25 16 16" _null_ _null_ _null_ _null_ _null_ pg_start_backup _null_ _null_ _null_ ));
 DESCR("prepare for taking an online backup");
 DATA(insert OID = 2173 ( pg_stop_backup			PGNSP PGUID 12 1 0 0 0 f f f f t f v r 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ ));
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 59da7a6..9b17173 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -80,6 +80,7 @@
 extern PGDLLIMPORT volatile bool InterruptPending;
 extern PGDLLIMPORT volatile bool QueryCancelPending;
 extern PGDLLIMPORT volatile bool ProcDiePending;
+extern PGDLLIMPORT volatile int DiagRequestPending;
 extern PGDLLIMPORT volatile bool IdleInTransactionSessionTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending;
 
@@ -275,6 +276,9 @@ extern bool stack_is_too_deep(void);
 
 extern void PostgresSigHupHandler(SIGNAL_ARGS);
 
+enum ProcSignalReason;
+extern void HandleDiagRequestInterrupt(enum ProcSignalReason request);
+
 /* in tcop/utility.c */
 extern void PreventCommandIfReadOnly(const char *cmdname);
 extern void PreventCommandIfParallelMode(const char *cmdname);
diff --git a/src/include/nodes/memnodes.h b/src/include/nodes/memnodes.h
index c7eb1e7..86d022c 100644
--- a/src/include/nodes/memnodes.h
+++ b/src/include/nodes/memnodes.h
@@ -35,6 +35,13 @@ typedef struct MemoryContextCounters
 } MemoryContextCounters;
 
 /*
+ * MemoryContextStats can't use elog(...) during OOM so we need
+ * to be able to route its output to different destinations.
+ */
+typedef void (*printf_wrapper)(void *extra, const char *fmt, ...)
+	pg_attribute_printf(2, 3);
+
+/*
  * MemoryContext
  *		A logical context in which memory allocations occur.
  *
@@ -62,7 +69,8 @@ typedef struct MemoryContextMethods
 	Size		(*get_chunk_space) (MemoryContext context, void *pointer);
 	bool		(*is_empty) (MemoryContext context);
 	void		(*stats) (MemoryContext context, int level, bool print,
-						  MemoryContextCounters *totals);
+						  MemoryContextCounters *totals,
+						  printf_wrapper outfunc, void *outfunc_extra);
 #ifdef MEMORY_CONTEXT_CHECKING
 	void		(*check) (MemoryContext context);
 #endif
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 20bb05b..121fb86 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -27,7 +27,7 @@
  * Also, because of race conditions, it's important that all the signals be
  * defined so that no harm is done if a process mistakenly receives one.
  */
-typedef enum
+typedef enum ProcSignalReason
 {
 	PROCSIG_CATCHUP_INTERRUPT,	/* sinval catchup interrupt */
 	PROCSIG_NOTIFY_INTERRUPT,	/* listen/notify interrupt */
@@ -42,6 +42,9 @@ typedef enum
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
+	/* Signals used to ask the backend for diagnostic output */
+	PROCSIG_DIAG_REQUEST,		/* Dump memory context info etc */
+
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
 
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 9c30eb7..1b08f4e 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -45,7 +45,6 @@
 
 #define AllocHugeSizeIsValid(size)	((Size) (size) <= MaxAllocHugeSize)
 
-
 /*
  * Standard top-level memory contexts.
  *
@@ -82,7 +81,8 @@ extern Size GetMemoryChunkSpace(void *pointer);
 extern MemoryContext MemoryContextGetParent(MemoryContext context);
 extern bool MemoryContextIsEmpty(MemoryContext context);
 extern void MemoryContextStats(MemoryContext context);
-extern void MemoryContextStatsDetail(MemoryContext context, int max_children);
+extern void MemoryContextStatsDetail(MemoryContext context, int max_children,
+	printf_wrapper outfunc, void *outfunc_extra);
 extern void MemoryContextAllowInCriticalSection(MemoryContext context,
 									bool allow);
 
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 130a0e4..9ce99c6 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -1,4 +1,16 @@
 --
+-- Misc admin functions
+--
+-- pg_diag_backend makes the backend log, so it can't really
+-- be tested within pg_regress except to ensure it doesn't
+-- upset the backend.
+SELECT pg_diag_backend(pg_backend_pid());
+ pg_diag_backend 
+-----------------
+ t
+(1 row)
+
+--
 -- num_nulls()
 --
 SELECT num_nonnulls(NULL);
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 1a20c1f..fb64b87 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -1,4 +1,13 @@
 --
+-- Misc admin functions
+--
+
+-- pg_diag_backend makes the backend log, so it can't really
+-- be tested within pg_regress except to ensure it doesn't
+-- upset the backend.
+SELECT pg_diag_backend(pg_backend_pid());
+
+--
 -- num_nulls()
 --
 
-- 
2.9.5

#26

andres@anarazel.de

about 8 years ago

In reply to: Craig Ringer (#25)

Re: Using ProcSignal to get memory context stats from a running backend

Hi,

On 2017-12-21 14:49:28 +0800, Craig Ringer wrote:

+/*
+ * Accumulate writes into the buffer in diag_request_buf,
+ * for use with functions that expect a printf-like callback.
+ */
+static void
+printwrapper_stringinfo(void *extra, const char * fmt, ...)
+{
+	StringInfo out = extra;
+	for (;;)
+	{
+		va_list		args;
+		int			needed;
+		va_start(args, fmt);
+		needed = appendStringInfoVA(out, fmt, args);
+		va_end(args);
+		if (needed == 0)
+			break;
+		enlargeStringInfo(out, needed);
+	}
}

Hm, so I'm not entirely sure it's ok to use something that ERRORs on
OOM. There's plenty of scenarios with thousands of memory contexts,
making this output fairly large. If we want to make this usable in
production, I'm not sure it's ok to introduce additional ERRORs. I
wonder if we can change this to emit a static message if collecting the
output exhausted memory.

Greetings,

Andres Freund

#27

craig@2ndquadrant.com

about 8 years ago

In reply to: Andres Freund (#26)

Re: Using ProcSignal to get memory context stats from a running backend

On 21 December 2017 at 14:58, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-12-21 14:49:28 +0800, Craig Ringer wrote:
+/*
+ * Accumulate writes into the buffer in diag_request_buf,
+ * for use with functions that expect a printf-like callback.
+ */
+static void
+printwrapper_stringinfo(void *extra, const char * fmt, ...)
+{
+     StringInfo out = extra;
+     for (;;)
+     {
+             va_list         args;
+             int                     needed;
+             va_start(args, fmt);
+             needed = appendStringInfoVA(out, fmt, args);
+             va_end(args);
+             if (needed == 0)
+                     break;
+             enlargeStringInfo(out, needed);
+     }
}
Hm, so I'm not entirely sure it's ok to use something that ERRORs on
OOM. There's plenty of scenarios with thousands of memory contexts,
making this output fairly large. If we want to make this usable in
production, I'm not sure it's ok to introduce additional ERRORs. I
wonder if we can change this to emit a static message if collecting the
output exhausted memory.

There tons of callers to enlargeStringInfo, so a 'noerror' parameter would
be viable.

But I'm not convinced it's worth it personally. If we OOM in response to a
ProcSignal request for memory context output, we're having pretty bad luck.
The output is 8k in my test. But even if it were a couple of hundred kb,
happening to hit OOM just then isn't great luck on modern systems with many
gigabytes of RAM.

If that *does* happen, repalloc(...) will call
MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
our memory context dump anyway, albeit to stderr.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#28

andres@anarazel.de

about 8 years ago

In reply to: Craig Ringer (#27)

Re: Using ProcSignal to get memory context stats from a running backend

Hi,

On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:

There tons of callers to enlargeStringInfo, so a 'noerror' parameter would
be viable.

Not sure what you mean with that sentence?

But I'm not convinced it's worth it personally. If we OOM in response to a
ProcSignal request for memory context output, we're having pretty bad luck.
The output is 8k in my test. But even if it were a couple of hundred kb,
happening to hit OOM just then isn't great luck on modern systems with many
gigabytes of RAM.

I've seen plenty memory dumps in the dozens to hundreds of
megabytes. And imo such cases are more likely to invite use of this
facility.

If that *does* happen, repalloc(...) will call
MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
our memory context dump anyway, albeit to stderr.

That would still abort the query that might otherwise continue to work,
so that seems no excuse.

Greetings,

Andres Freund

#29

craig@2ndquadrant.com

about 8 years ago

In reply to: Andres Freund (#28)

Re: Using ProcSignal to get memory context stats from a running backend

On 21 December 2017 at 15:24, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:

There tons of callers to enlargeStringInfo, so a 'noerror' parameter

would

be viable.

Not sure what you mean with that sentence?

Mangled in editing and sent prematurely, disregard.

There are NOT tons of callers to enlargeStringInfo, so adding a parameter
that allowed it to return a failure result rather than ERROR on OOM seems
to be a reasonable option. But it relies on repalloc(), which will ERROR on
OOM. AFAICS we don't have "no error" variants of the memory allocation
routines and I'm not eager to add them. Especially since we can't trust
that we're not on misconfigured Linux where the OOM killer will go wild
instead of giving us a nice NULL result.

So I guess that means we should probably just do individual elog(...)s and
live with the ugliness of scraping the resulting mess out of the logs.
After all, a log destination that could possibly copy and modify the string
being logged a couple of times it's not a good idea to try to drop the
whole thing into the logs in one blob. And we can't trust things like
syslog with large messages.

I'll follow up with a revised patch that uses individual elog()s soon.

BTW, I also pgindented it in my tree, so it'll have formatting fixed up.
pgindent's handling of

static void printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

upsets me a little, since if I break the line it mangles it into

static void
printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

so it'll break line-length rules a little, but otherwise conform.

But I'm not convinced it's worth it personally. If we OOM in response to a

ProcSignal request for memory context output, we're having pretty bad

luck.

The output is 8k in my test. But even if it were a couple of hundred kb,
happening to hit OOM just then isn't great luck on modern systems with

many

gigabytes of RAM.

I've seen plenty memory dumps in the dozens to hundreds of
megabytes. And imo such cases are more likely to invite use of this
facility.

OK, that's more concerning then. Also impressively huge.

As written it's using MemoryContextStatsDetail(TopMemoryContext, 100) so it
shouldn't really be *getting* multimegabyte dumps, but I'm not thrilled by
hardcoding that. It doesn't merit a GUC though, so I'm not sure what to do
about it and figured I'd make it a "later" problem.

If that *does* happen, repalloc(...) will call
MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
our memory context dump anyway, albeit to stderr.

That would still abort the query that might otherwise continue to work,
so that seems no excuse.

Eh. It'd probably die soon enough anyway. But yeah, I agree it's not safe
to hope we can sling around up to a couple of hundred MB under presumed
memory pressure.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#30

craig@2ndquadrant.com

about 8 years ago

In reply to: Craig Ringer (#29)

Re: Using ProcSignal to get memory context stats from a running backend

On 21 December 2017 at 15:55, Craig Ringer <craig@2ndquadrant.com> wrote:

On 21 December 2017 at 15:24, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:

There tons of callers to enlargeStringInfo, so a 'noerror' parameter

would

be viable.

Not sure what you mean with that sentence?

Mangled in editing and sent prematurely, disregard.

There are NOT tons of callers to enlargeStringInfo, so adding a parameter
that allowed it to return a failure result rather than ERROR on OOM seems
to be a reasonable option. But it relies on repalloc(), which will ERROR on
OOM. AFAICS we don't have "no error" variants of the memory allocation
routines and I'm not eager to add them. Especially since we can't trust
that we're not on misconfigured Linux where the OOM killer will go wild
instead of giving us a nice NULL result.

So I guess that means we should probably just do individual elog(...)s and
live with the ugliness of scraping the resulting mess out of the logs.
After all, a log destination that could possibly copy and modify the string
being logged a couple of times it's not a good idea to try to drop the
whole thing into the logs in one blob. And we can't trust things like
syslog with large messages.

I'll follow up with a revised patch that uses individual elog()s soon.

I intend to add an elog_internal_raw for this, which takes a pre-formatted
string and bypasses EVALUATE_MESSAGE. In this case, one written to a static
buffer by vsnprintf.

That bypasses two rounds of allocations by elog - expand_fmt_string for %m
substitution, and the appendStringInfoVA for formatting. And it's a whole
lot cleaner than

char buffer[2048];
...
vsnprintf(buffer, sizeof(buffer), ...)
...
elog(LOG, "%s", buffer);

It still imposes a single-line length limit, but no worse than write_stderr
already does on win32. If that's not OK, preformatting each line a
StringInfo before dumping straight to elog works too.

Complaints or seem OK?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#31

1.
/messages/by-id/CADdR5nxQUSh5kCm9MKmNga8+c1JLxLHDzLhAyXpfo9-Wmc6s5g@mail.gmail.com
2. https://github.com/postgrespro/pg_query_state

craig@2ndquadrant.com

about 8 years ago

In reply to: Craig Ringer (#30)

Re: Using ProcSignal to get memory context stats from a running backend

On 22 December 2017 at 13:05, Craig Ringer <craig@2ndquadrant.com> wrote:

On 21 December 2017 at 15:55, Craig Ringer <craig@2ndquadrant.com> wrote:

On 21 December 2017 at 15:24, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:

There tons of callers to enlargeStringInfo, so a 'noerror' parameter

would

be viable.

Not sure what you mean with that sentence?

Mangled in editing and sent prematurely, disregard.

There are NOT tons of callers to enlargeStringInfo, so adding a
parameter that allowed it to return a failure result rather than ERROR on
OOM seems to be a reasonable option. But it relies on repalloc(), which
will ERROR on OOM. AFAICS we don't have "no error" variants of the memory
allocation routines and I'm not eager to add them. Especially since we
can't trust that we're not on misconfigured Linux where the OOM killer will
go wild instead of giving us a nice NULL result.

So I guess that means we should probably just do individual elog(...)s
and live with the ugliness of scraping the resulting mess out of the logs.
After all, a log destination that could possibly copy and modify the string
being logged a couple of times it's not a good idea to try to drop the
whole thing into the logs in one blob. And we can't trust things like
syslog with large messages.

I'll follow up with a revised patch that uses individual elog()s soon.

I intend to add an elog_internal_raw

Er, I mean

errmsg_internal_raw(const char * errmsg)

i.e. like errmsg_internal, but with no varargs or format string, to be used
when we want to prepare a simple preformatted log entry.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#32

Maksim Milyutin

milyutinma@gmail.com

about 8 years ago

In reply to: Pavel Stehule (#19)

Re: Using ProcSignal to get memory context stats from a running backend

On 19.12.2017 16:54, Pavel Stehule wrote:

Hi

2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com
<mailto:craig@2ndquadrant.com>>:

On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer
<craig@2ndquadrant.com <mailto:craig@2ndquadrant.com>> wrote:

On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu

<mailto:stark@mit.edu>> wrote:

Another simpler option would be to open up a new file in

the log

directory

... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

+1. I still think just printing it to the log is fine.

Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new
ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by
CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to
elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook
global. It's a diagnostic routine so I don't think that's going to
be a great bother. By default it's set to write_stderr. That just
writes to vfprintf on unix so the outcome's the same as our direct
use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to
write memory context dumps to the eventlog with ERROR level if
running as a service with no console. That means we vsnprintf the
string into a static buffer first. I'm not 100% convinced of the
wisdom of that because it could flood the event log, which is
usually size limited by # of events and recycled. It'd be better
to write one event than write one per memory context line, but
it's not safe to do that when OOM. I lean toward this being a win:
at least Windows users should have some hope of finding out why Pg
OOM'd, which currently they don't when it runs as a service. If we
don't, we should look at using SetStdHandle to write stderr to a
secondary logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve
exactly the same behaviour instead, but I figured I'd start with
reusing write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf,
because on some systems simply finding where stderr is written can
be a challenge, if it's not going to /dev/null via some detached
session. Is it in journald? In some separate log? Being captured
by the logging collector (and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info
to associate the dump with the process of interest and what it's
doing at the time.

Also, an astonishing number of deployed systems I've worked with
actually don't put the pid or anything useful in log_line_prefix
to make grouping up multi-line output practical. Which is insane.
But it's only PGC_SIGHUP so fixing it is easy enough.

sorry for small offtopic. Can be used this mechanism for log of
executed plan or full query?

This idea (but without logging) is implemented in the work on
pg_progress command proposed by Remi Colinet[1] and in extension
pg_query_state[2].

--
Regards,
Maksim Milyutin

#33

1.
/messages/by-id/3f905f10-cf7a-d4e0-64a1-7fd9b8351592@gmail.com

craig@2ndquadrant.com

about 8 years ago

In reply to: Maksim Milyutin (#32)

Re: Using ProcSignal to get memory context stats from a running backend

On 22 December 2017 at 20:50, Maksim Milyutin <milyutinma@gmail.com> wrote:

On 19.12.2017 16:54, Pavel Stehule wrote:

Hi

2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:

On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:

Another simpler option would be to open up a new file in the log
directory

... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

+1. I still think just printing it to the log is fine.

Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST.
It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output
to elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.
It's a diagnostic routine so I don't think that's going to be a great
bother. By default it's set to write_stderr. That just writes to vfprintf
on unix so the outcome's the same as our direct use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to write
memory context dumps to the eventlog with ERROR level if running as a
service with no console. That means we vsnprintf the string into a static
buffer first. I'm not 100% convinced of the wisdom of that because it could
flood the event log, which is usually size limited by # of events and
recycled. It'd be better to write one event than write one per memory
context line, but it's not safe to do that when OOM. I lean toward this
being a win: at least Windows users should have some hope of finding out
why Pg OOM'd, which currently they don't when it runs as a service. If we
don't, we should look at using SetStdHandle to write stderr to a secondary
logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve exactly
the same behaviour instead, but I figured I'd start with reusing
write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf, because
on some systems simply finding where stderr is written can be a challenge,
if it's not going to /dev/null via some detached session. Is it in
journald? In some separate log? Being captured by the logging collector
(and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info to
associate the dump with the process of interest and what it's doing at the
time.

Also, an astonishing number of deployed systems I've worked with actually
don't put the pid or anything useful in log_line_prefix to make grouping up
multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
fixing it is easy enough.

sorry for small offtopic. Can be used this mechanism for log of executed
plan or full query?

That's a really good idea. I'd love to be able to pg_explain_backend(...)

I left the mechanism as a generic diagnostic signal exactly so that we
could add other things we wanted to be able to ask backends. I think a
follow-on patch that adds support for dumping explain-format plans would be
great, if it's practical to do that while a query's already running.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#34

Pavel Stehule

pavel.stehule@gmail.com

about 8 years ago

In reply to: Maksim Milyutin (#32)

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-22 13:50 GMT+01:00 Maksim Milyutin <milyutinma@gmail.com>:

On 19.12.2017 16:54, Pavel Stehule wrote:

Hi

2017-12-19 14:44 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:

On 18 December 2017 at 10:05, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

On 15 December 2017 at 09:24, Greg Stark <stark@mit.edu> wrote:

Another simpler option would be to open up a new file in the log
directory

... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

+1. I still think just printing it to the log is fine.

Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new ProcSignalReason PROCSIG_DIAG_REQUEST.
It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output
to elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.
It's a diagnostic routine so I don't think that's going to be a great
bother. By default it's set to write_stderr. That just writes to vfprintf
on unix so the outcome's the same as our direct use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to write
memory context dumps to the eventlog with ERROR level if running as a
service with no console. That means we vsnprintf the string into a static
buffer first. I'm not 100% convinced of the wisdom of that because it could
flood the event log, which is usually size limited by # of events and
recycled. It'd be better to write one event than write one per memory
context line, but it's not safe to do that when OOM. I lean toward this
being a win: at least Windows users should have some hope of finding out
why Pg OOM'd, which currently they don't when it runs as a service. If we
don't, we should look at using SetStdHandle to write stderr to a secondary
logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve exactly
the same behaviour instead, but I figured I'd start with reusing
write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf, because
on some systems simply finding where stderr is written can be a challenge,
if it's not going to /dev/null via some detached session. Is it in
journald? In some separate log? Being captured by the logging collector
(and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info to
associate the dump with the process of interest and what it's doing at the
time.

Also, an astonishing number of deployed systems I've worked with actually
don't put the pid or anything useful in log_line_prefix to make grouping up
multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
fixing it is easy enough.

sorry for small offtopic. Can be used this mechanism for log of executed
plan or full query?

This idea (but without logging) is implemented in the work on pg_progress
command proposed by Remi Colinet[1] and in extension pg_query_state[2].

1. /messages/by-id/CADdR5nxQUSh5kCm9MKmNga8%25
2Bc1JLxLHDzLhAyXpfo9-Wmc6s5g%40mail.gmail.com
2. https://github.com/postgrespro/pg_query_state

I remember the discussion - and I hope so one time we will have some
EXPLAIN PROCESS pid command.

Using signal and pg log can be very simple solution immediately available

Regards

Pavel

Show quoted text

--
Regards,
Maksim Milyutin

#35

Maksim Milyutin

milyutinma@gmail.com

about 8 years ago

In reply to: Craig Ringer (#33)

Re: Using ProcSignal to get memory context stats from a running backend

On 22.12.2017 16:56, Craig Ringer wrote:

On 22 December 2017 at 20:50, Maksim Milyutin <milyutinma@gmail.com
<mailto:milyutinma@gmail.com>> wrote:

On 19.12.2017 16:54, Pavel Stehule wrote:

sorry for small offtopic. Can be used this mechanism for log of
executed plan or full query?

That's a really good idea. I'd love to be able to pg_explain_backend(...)

I left the mechanism as a generic diagnostic signal exactly so that we
could add other things we wanted to be able to ask backends. I think a
follow-on patch that adds support for dumping explain-format plans
would be great, if it's practical to do that while a query's already
running.

Noticing the interest in the calling some routines on the remote backend
through signals, in parallel thread[1] I have proposed the possibility
to define user defined signal handlers in extensions. There is a patch
taken from pg_query_state module.

--
Regards,
Maksim Milyutin

#36