Use a signal to trigger a memory context dump?

Started by Andres Freundover 11 years ago8 messages
#1Andres Freund
andres@2ndquadrant.com

Hi,

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.
I wonder if we could install a signal handler for some unused signal
(e.g. SIGPWR) to dump memory.
I'd also considered adding a SQL function that uses the SIGUSR1 signal
multiplexing for the purpose but that's not necessarily nice if you have
to investigate while SQL access isn't yet possible. There's also the
problem that not all possibly interesting processes use the sigusr1
signal multiplexing.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#1)
Re: Use a signal to trigger a memory context dump?

Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.

+100

I keep thinking we have this and then keep being disappointed when I go
try to find it.

I wonder if we could install a signal handler for some unused signal
(e.g. SIGPWR) to dump memory.

Interesting thought, but..

I'd also considered adding a SQL function that uses the SIGUSR1 signal
multiplexing for the purpose but that's not necessarily nice if you have
to investigate while SQL access isn't yet possible. There's also the
problem that not all possibly interesting processes use the sigusr1
signal multiplexing.

I'd tend to think this would be sufficient. You're suggesting a case
where you need to debug prior to SQL access (not specifically sure what
you mean by that) or processes which are hopefully less likely to have
memory issues, but you don't have gdb..

Another thought along the lines of getting information about running
processes would be to see the call stack or execution plan.. I seem to
recall there being a patch for the latter at one point?

Thanks,

Stephen

#3Andres Freund
andres@2ndquadrant.com
In reply to: Stephen Frost (#2)
Re: Use a signal to trigger a memory context dump?

On 2014-06-23 08:36:02 -0400, Stephen Frost wrote:

Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.

+100

I keep thinking we have this and then keep being disappointed when I go
try to find it.

I wonder if we could install a signal handler for some unused signal
(e.g. SIGPWR) to dump memory.

Interesting thought, but..

I'd also considered adding a SQL function that uses the SIGUSR1 signal
multiplexing for the purpose but that's not necessarily nice if you have
to investigate while SQL access isn't yet possible. There's also the
problem that not all possibly interesting processes use the sigusr1
signal multiplexing.

I'd tend to think this would be sufficient. You're suggesting a case
where you need to debug prior to SQL access (not specifically sure what
you mean by that) or processes which are hopefully less likely to have
memory issues, but you don't have gdb..

prior to SQL access := Before crash recovery finished/hot standby
reached consistency.

And I don't agree that memory dumps from non-plain backends are that
uninteresting. E.g. background workers and logical decoding walsenders
both can be interesting.

Another thought along the lines of getting information about running
processes would be to see the call stack or execution plan.. I seem to
recall there being a patch for the latter at one point?

I think these are *much* more complicated. I don't want to tackle them
at the same time, otherwise we'll never get anywhere.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#3)
Re: Use a signal to trigger a memory context dump?

Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2014-06-23 08:36:02 -0400, Stephen Frost wrote:

I'd tend to think this would be sufficient. You're suggesting a case
where you need to debug prior to SQL access (not specifically sure what
you mean by that) or processes which are hopefully less likely to have
memory issues, but you don't have gdb..

prior to SQL access := Before crash recovery finished/hot standby
reached consistency.

And I don't agree that memory dumps from non-plain backends are that
uninteresting. E.g. background workers and logical decoding walsenders
both can be interesting.

I didn't mean they're uninteresting- I meant that if you're dealing with
those kinds of issues, having gdb isn't as huge a hurdle..

Another thought along the lines of getting information about running
processes would be to see the call stack or execution plan.. I seem to
recall there being a patch for the latter at one point?

I think these are *much* more complicated. I don't want to tackle them
at the same time, otherwise we'll never get anywhere.

Sure, just some things to keep in mind as you're thinking about changes
in this area. Just to toss another random thought out there, what about
an SQL function which does a LISTEN and then sends a signal to another
backend which throws a NOTIFY with payload including the requested info?
That'd be *very* useful as there are lots of cases where access to the
logs isn't trivial (particularly if they've been properly locked down
due to the sensetive info they can contain..).

Thanks,

Stephen

#5MauMau
maumau307@gmail.com
In reply to: Andres Freund (#1)
Re: Use a signal to trigger a memory context dump?

From: "Andres Freund" <andres@2ndquadrant.com>

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.

+1

It would be nice if there's a generic infrastructure on which the DBA can
get information of running backends. I wish for a functionality to dump
info of all backends with a single operation as well as one backend at a
time, because it would be difficult to ask for users to choose a specific
backend or operate on all backends, especially on Windows. The candidate
info are:

* memory context

* stack trace: I'd like to implement this.

* GUC settings: to know that backends are running with intended settings.

* prepared statements (= pg_prepared_statements): to know if applications
are taking advantage of prepared statements for performance.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: Use a signal to trigger a memory context dump?

Andres Freund <andres@2ndquadrant.com> writes:

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.
I wonder if we could install a signal handler for some unused signal
(e.g. SIGPWR) to dump memory.
I'd also considered adding a SQL function that uses the SIGUSR1 signal
multiplexing for the purpose but that's not necessarily nice if you have
to investigate while SQL access isn't yet possible. There's also the
problem that not all possibly interesting processes use the sigusr1
signal multiplexing.

Well, you can't just have the signal handler call MemoryContextStats
directly. (Even if the memory manager's state were 100% interrupt-safe,
which it ain't, fprintf itself might not be safe either.)

The closest approximation that I think would be reasonable is to
set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS
macro. So you're already buying into the assumption that the process
executes CHECK_FOR_INTERRUPTS fairly often. Which probably means
that assuming it's using the standard sigusr1 handler isn't a big
extra limitation.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#6)
Re: Use a signal to trigger a memory context dump?

On 2014-06-23 10:07:36 -0700, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.
I wonder if we could install a signal handler for some unused signal
(e.g. SIGPWR) to dump memory.
I'd also considered adding a SQL function that uses the SIGUSR1 signal
multiplexing for the purpose but that's not necessarily nice if you have
to investigate while SQL access isn't yet possible. There's also the
problem that not all possibly interesting processes use the sigusr1
signal multiplexing.

Well, you can't just have the signal handler call MemoryContextStats
directly. (Even if the memory manager's state were 100% interrupt-safe,
which it ain't, fprintf itself might not be safe either.)

Yea. And fprintf() definitely isn't.

The closest approximation that I think would be reasonable is to
set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS
macro. So you're already buying into the assumption that the process
executes CHECK_FOR_INTERRUPTS fairly often. Which probably means
that assuming it's using the standard sigusr1 handler isn't a big
extra limitation.

There seem to be far more subsystems doing CHECK_FOR_INTERRUPTS than
using SIGUSR1 multiplexing. Several processes have their own SIGUSR1
handlers:
* bgworkers (Which certainly is a major candidate for this. And: Isn't this a bug?
Think recovery conflicts.)
* startup process (certainly interesting as well)
* checkpointer
* walreceiver
* walsender
* wal writer
* bgwriter
* archiver
* syslogger

At least bgworkers, startup process, walsenders are definitely
interesting from this POV.

It very well might be best to provide a common sigusr1 implementation
supporting a subset of multiplexing for some of those since they
essentially all do the same... Although that'd require a fair bit of
surgery in procsignal.c

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#7)
Re: Use a signal to trigger a memory context dump?

+1 for having an API better than GDB to make a process emit a memory usage
dump. This is my top non-crash cause for use of GDB in production.

On Mon, Jun 23, 2014 at 07:21:22PM +0200, Andres Freund wrote:

On 2014-06-23 10:07:36 -0700, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.
I wonder if we could install a signal handler for some unused signal
(e.g. SIGPWR) to dump memory.

SIGPWR is not widely available. Apart from SIGUSR1 and SIGUSR2, using a
portable signal risks colliding with the standard use thereof.

I'd also considered adding a SQL function that uses the SIGUSR1 signal
multiplexing for the purpose but that's not necessarily nice if you have
to investigate while SQL access isn't yet possible. There's also the
problem that not all possibly interesting processes use the sigusr1
signal multiplexing.

I don't know whether to be interested in cases where SQL access is
unavailable. If those cases are important, an idea for achieving it without
leaning on unportable or already-used signals is to define SIGUSR2 as a second
multiplexer that uses files instead of shared memory. You'd send the signal
with something like this:

: >$PGDATA/procsig/$targetpid.memdump
kill -USR2 $targetpid

(This would probably require first converting the existing autovacuum use of
SIGUSR2 to the shared memory procsig mechanism.)

The closest approximation that I think would be reasonable is to
set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS
macro. So you're already buying into the assumption that the process
executes CHECK_FOR_INTERRUPTS fairly often. Which probably means
that assuming it's using the standard sigusr1 handler isn't a big
extra limitation.

If it's acceptable to require SQL access and exclude would-be target processes
that detach from shared memory, I favor an approach using the shared memory
SIGUSR1 multiplexer. Bringing all the processes that do use shared memory
into agreement about the use of SIGUSR1 feels like a valuable step forward.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers