bgworker crashed or not?

Started by Antonin Houskaover 12 years ago36 messageshackers

ah@cybertec.at

over 12 years ago

In 9.3 I noticed that postmaster considers bgworker crashed (and
therefore tries to restart it) even if it has exited with zero status code.

I first thought about a patch like the one below, but then noticed that
postmaster.c:bgworker_quickdie() signal handler exits with 0 too (when
there's no success). Do we need my patch, my patch + <something for the
handler> or no patch at all?

// Antonin Houska (Tony)

diff --git a/src/backend/postmaster/postmaster.c
b/src/backend/postmaster/postmaster.c
index 0957e91..0313fd7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2791,11 +2814,7 @@ reaper(SIGNAL_ARGS)

/* Was it one of our background workers? */
if (CleanupBackgroundWorker(pid, exitstatus))
- {
- /* have it be restarted */
- HaveCrashedWorker = true;
continue;
- }

/*
* Else do standard backend child cleanup.
@@ -2851,7 +2870,10 @@ CleanupBackgroundWorker(int pid,

                /* Delay restarting any bgworker that exits with a
nonzero status. */
                if (!EXIT_STATUS_0(exitstatus))
+               {
                        rw->rw_crashed_at = GetCurrentTimestamp();
+                       HaveCrashedWorker = true;
+               }
                else
                        rw->rw_crashed_at = 0;

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 12 years ago

In reply to: Antonin Houska (#1)

Re: bgworker crashed or not?

On Fri, Jan 31, 2014 at 12:44 PM, Antonin Houska
<antonin.houska@gmail.com> wrote:

In 9.3 I noticed that postmaster considers bgworker crashed (and
therefore tries to restart it) even if it has exited with zero status code.

I first thought about a patch like the one below, but then noticed that
postmaster.c:bgworker_quickdie() signal handler exits with 0 too (when
there's no success). Do we need my patch, my patch + <something for the
handler> or no patch at all?

I think the word "crashed" here doesn't actually mean "crashed". If a
worker dies with an exit status of other than 0 or 1, we call
HandleChildCrash(), which effects a system-wide restart. But if it
exits with code 0, we don't do that. We just set HaveCrashedWorker so
that it gets restarted immediately, and that's it. In other words,
exit(0) in a bgworker causes an immediate relaunch, and exit(1) causes
the restart interval to be respected, and exit(other) causes a
system-wide crash-and-restart cycle.

This is admittedly a weird API, and we've had some discussion of
whether to change it, but I don't know that we've reached any final
conclusion. I'm tempted to propose exactly inverting the current
meaning of exit(0). That is, make it mean "don't restart me, ever,
even if I have a restart interval configured" rather than "restart me
right away, even if I have a restart interval configured". That way,
a background process that wants to run until it accomplishes some task
could be written to exit(1) on error and exit(0) on success, which
seems quite natural.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#2)

Re: bgworker crashed or not?

Robert Haas <robertmhaas@gmail.com> writes:

This is admittedly a weird API, and we've had some discussion of
whether to change it, but I don't know that we've reached any final
conclusion. I'm tempted to propose exactly inverting the current
meaning of exit(0). That is, make it mean "don't restart me, ever,
even if I have a restart interval configured" rather than "restart me
right away, even if I have a restart interval configured". That way,
a background process that wants to run until it accomplishes some task
could be written to exit(1) on error and exit(0) on success, which
seems quite natural.

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the "restart immediately" behavior?

I think this is an improvement, but it probably depends on what
you think the use-cases are for bgworkers. I can definitely see
that there is a need for a bgworker to be just plain done, though.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#3)

Re: bgworker crashed or not?

On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the "restart immediately" behavior?

That's what I was thinking about, yes. Of course, when the restart
interval is 0, "done until restart interval" is the same as "restart
immediately", so for anyone who wants to *always* restart immediately
there is no problem. Where you will run into trouble is if you
sometimes want to wait for the restart interval and other times want
to restart immediately. But I'm not sure that's a real use case. If
it is, I suggest that we assign it some other, more obscure exit code
and reserve 0 and 1 for what I believe will probably be the common
cases.

It would be potentially more useful and more general to have a
function BackgroundWorkerSetMyRestartInterval() or similar. That way,
a background worker could choose to get restarted after whatever
amount of time it likes in a particular instance, rather than only 0
or the interval configured at registration time. But I'm not that
excited about implementing that, and certainly not for 9.4. It's not
clear that there's a real need, and there are thorny questions like
"should a postmaster crash and restart cycle cause an immediate
restart?" and "what if some other process wants to poke that
background worker and cause it to restart sooner?". There are lots of
interesting and potentially useful behaviors here, but I'm wary of
letting the engineering getting out ahead of the use cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#4)

Re: bgworker crashed or not?

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the "restart immediately" behavior?

That's what I was thinking about, yes. Of course, when the restart
interval is 0, "done until restart interval" is the same as "restart
immediately", so for anyone who wants to *always* restart immediately
there is no problem. Where you will run into trouble is if you
sometimes want to wait for the restart interval and other times want
to restart immediately. But I'm not sure that's a real use case. If
it is, I suggest that we assign it some other, more obscure exit code
and reserve 0 and 1 for what I believe will probably be the common
cases.

Agreed, but after further reflection it seems like if you've declared
a restart interval, then "done until restart interval" is probably the
common case. So how about

exit(0) - done until restart interval, or permanently if there is none
exit(1) - done permanently, even if a restart interval was declared
exit(other) - crash

I don't offhand see a point in an "exit and restart immediately" case.
Why exit at all, if you could just keep running? If you *can't* just
keep running, it probably means you know you've bollixed something,
so that the crash case is probably what to do anyway.

It would be potentially more useful and more general to have a
function BackgroundWorkerSetMyRestartInterval() or similar.

That might be a good idea too, but I think it's orthogonal to what
the exit codes mean.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#5)

Re: bgworker crashed or not?

On Mon, Feb 3, 2014 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the "restart immediately" behavior?

That's what I was thinking about, yes. Of course, when the restart
interval is 0, "done until restart interval" is the same as "restart
immediately", so for anyone who wants to *always* restart immediately
there is no problem. Where you will run into trouble is if you
sometimes want to wait for the restart interval and other times want
to restart immediately. But I'm not sure that's a real use case. If
it is, I suggest that we assign it some other, more obscure exit code
and reserve 0 and 1 for what I believe will probably be the common
cases.

Agreed, but after further reflection it seems like if you've declared
a restart interval, then "done until restart interval" is probably the
common case. So how about

exit(0) - done until restart interval, or permanently if there is none
exit(1) - done permanently, even if a restart interval was declared
exit(other) - crash

I think what I proposed is better for two reasons:

1. It doesn't change the meaning of exit(1) vs. 9.3. All background
worker code I've seen or heard about (which is admittedly not all
there is) does exit(1) because the current exit(0) behavior doesn't
seem to be what anyone wants. So I don't think it matters much
whether we redefine exit(0), but redefining exit(1) will require
version-dependent changes to the background workers in contrib and,
most likely, every other background worker anyone has written.

2. If you want a worker that starts up, does something, and doesn't
need to be further restarted when that succeeds, then under your
definition you return 0 when you fail and 1 when you succeed. Under
my definition you do the opposite, which I think is more natural.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#6)

Re: bgworker crashed or not?

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Feb 3, 2014 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Agreed, but after further reflection it seems like if you've declared
a restart interval, then "done until restart interval" is probably the
common case. So how about ...

I think what I proposed is better for two reasons:

1. It doesn't change the meaning of exit(1) vs. 9.3. All background
worker code I've seen or heard about (which is admittedly not all
there is) does exit(1) because the current exit(0) behavior doesn't
seem to be what anyone wants.

Hm. If that's actually the case, then I agree that preserving the
current behavior of exit(1) is useful. I'd been assuming we were
breaking things anyway.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#5)

Re: bgworker crashed or not?

On 2014-02-03 11:29:22 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the "restart immediately" behavior?

That's what I was thinking about, yes. Of course, when the restart
interval is 0, "done until restart interval" is the same as "restart
immediately", so for anyone who wants to *always* restart immediately
there is no problem. Where you will run into trouble is if you
sometimes want to wait for the restart interval and other times want
to restart immediately. But I'm not sure that's a real use case. If
it is, I suggest that we assign it some other, more obscure exit code
and reserve 0 and 1 for what I believe will probably be the common
cases.

Agreed, but after further reflection it seems like if you've declared
a restart interval, then "done until restart interval" is probably the
common case. So how about

exit(0) - done until restart interval, or permanently if there is none
exit(1) - done permanently, even if a restart interval was declared
exit(other) - crash

I don't offhand see a point in an "exit and restart immediately" case.
Why exit at all, if you could just keep running? If you *can't* just
keep running, it probably means you know you've bollixed something,
so that the crash case is probably what to do anyway.

There's the case where you want to quickly go over all the databases,
but only use one bgworker for it. I don't think there's another way to
do that.

I think we really should bite the bullet and change this before 9.4
comes out. The current static bgworker facility has only been out there
for one release, and dynamic bgworkers aren't released yet at all. If we
wait with this for 9.5, we'll annoy many more people.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

petr@2ndquadrant.com

about 12 years ago

In reply to: Andres Freund (#8)

Re: bgworker crashed or not?

On 16/04/14 15:10, Andres Freund wrote:

I think we really should bite the bullet and change this before 9.4
comes out. The current static bgworker facility has only been out there
for one release, and dynamic bgworkers aren't released yet at all. If we
wait with this for 9.5, we'll annoy many more people.

+1

On 2014-02-03 11:29:22 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So
exit(0) - done, permanently
exit(1) - done until restart interval
exit(other) - crash
and there's no way to obtain the "restart immediately" behavior?

Also I think if we do it this way, the incompatibility impact is rather
small for most existing bgworkers, like Robert I haven't seen anybody
actually using the exit code 0 currently - I am sure somebody does, but
it seems to be very small minority.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

about 12 years ago

In reply to: Andres Freund (#8)

Re: bgworker crashed or not?

On Wed, Apr 16, 2014 at 9:10 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Agreed, but after further reflection it seems like if you've declared
a restart interval, then "done until restart interval" is probably the
common case. So how about

exit(0) - done until restart interval, or permanently if there is none
exit(1) - done permanently, even if a restart interval was declared
exit(other) - crash

I don't offhand see a point in an "exit and restart immediately" case.
Why exit at all, if you could just keep running? If you *can't* just
keep running, it probably means you know you've bollixed something,
so that the crash case is probably what to do anyway.

There's the case where you want to quickly go over all the databases,
but only use one bgworker for it. I don't think there's another way to
do that.

I think we really should bite the bullet and change this before 9.4
comes out. The current static bgworker facility has only been out there
for one release, and dynamic bgworkers aren't released yet at all. If we
wait with this for 9.5, we'll annoy many more people.

So, exactly what do you want to change? If you want to keep the
restart-immediately behavior, then that argues for NOT changing
exit(0).

I actually think the right answer here might be to give background
workers a way to change their configured restart interval. We've
already got a shared memory area that the postmaster reads to know how
what to do when starting a dynamic background worker, so it probably
wouldn't be that hard. But I'm not volunteering to undertake that on
April 16th.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 12 years ago

In reply to: Robert Haas (#10)

Re: bgworker crashed or not?

On 2014-04-16 10:37:12 -0400, Robert Haas wrote:

On Wed, Apr 16, 2014 at 9:10 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Agreed, but after further reflection it seems like if you've declared
a restart interval, then "done until restart interval" is probably the
common case. So how about

exit(0) - done until restart interval, or permanently if there is none
exit(1) - done permanently, even if a restart interval was declared
exit(other) - crash

I don't offhand see a point in an "exit and restart immediately" case.
Why exit at all, if you could just keep running? If you *can't* just
keep running, it probably means you know you've bollixed something,
so that the crash case is probably what to do anyway.

There's the case where you want to quickly go over all the databases,
but only use one bgworker for it. I don't think there's another way to
do that.

I think we really should bite the bullet and change this before 9.4
comes out. The current static bgworker facility has only been out there
for one release, and dynamic bgworkers aren't released yet at all. If we
wait with this for 9.5, we'll annoy many more people.

So, exactly what do you want to change? If you want to keep the
restart-immediately behavior, then that argues for NOT changing
exit(0).

I don't neccessarily want to keep that behaviour. It can be emulated
easily enough by setting the restart interval to 0. I just wanted to
explain that it's not completely unreasonable to want such a short
restart interval.

I actually think the right answer here might be to give background
workers a way to change their configured restart interval. We've
already got a shared memory area that the postmaster reads to know how
what to do when starting a dynamic background worker, so it probably
wouldn't be that hard. But I'm not volunteering to undertake that on
April 16th.

That seems like a separate feature - and I don't see a need to rush that
into 9.4. All I want that if we decide to change the API, we should do
it now.

What I dislike about the status quo is that the
1) The regular exit(0) behaves all but regular.
2) The only way to regularly exit is logged as a failure. Thus there's
no real way to flag a failure that's actually a failure.

So I think your proposal above already is an improvement upon the status
quo. Maybe we also should change the logging of bgworker exits.

I think we probably also need a way to exit that's treated as an error,
but doesn't lead to a PANIC restart.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

about 12 years ago

In reply to: Andres Freund (#11)

Re: bgworker crashed or not?

On Wed, Apr 16, 2014 at 11:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:

I actually think the right answer here might be to give background
workers a way to change their configured restart interval. We've
already got a shared memory area that the postmaster reads to know how
what to do when starting a dynamic background worker, so it probably
wouldn't be that hard. But I'm not volunteering to undertake that on
April 16th.

That seems like a separate feature - and I don't see a need to rush that
into 9.4. All I want that if we decide to change the API, we should do
it now.

Well, I'm pretty OK with that, since I proposed it before and still
think it's important, but I'd rather leave it alone for another
release than rush into something we'll just end up changing again.

What I dislike about the status quo is that the
1) The regular exit(0) behaves all but regular.
2) The only way to regularly exit is logged as a failure. Thus there's
no real way to flag a failure that's actually a failure.

So I think your proposal above already is an improvement upon the status
quo.

I don't have time to write a patch for that, but I'm OK with
committing it (or having someone else commit it) even at this late
date, unless someone objects.

Maybe we also should change the logging of bgworker exits.

It's pretty clear that the logging around bgworkers is way too verbose
for anything other than a long-running daemon, but I don't think we
should try to fix that problem right now. It needs more careful
thought than we have time to give it at this juncture. One idea might
be to let the registrant of the background worker specify a logging
level, or maybe just a flag bit to indicate verbose (LOG) or quiet
(say, DEBUG1 or DEBUG2) logging.

I think we probably also need a way to exit that's treated as an error,
but doesn't lead to a PANIC restart.

Why can't that be handled through ereport(ERROR/FATAL) rather than
through the choice of exit status? It seems to me that the only point
of the exit status is or should be to provide feedback to the
postmaster on how it should respond to the background worker's
untimely demise. If any other information needs to be conveyed, the
worker should log that itself rather than trying to tell the
postmaster what to log.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 12 years ago

In reply to: Robert Haas (#12)

Re: bgworker crashed or not?

On 2014-04-16 11:37:47 -0400, Robert Haas wrote:

I think we probably also need a way to exit that's treated as an error,
but doesn't lead to a PANIC restart.

Why can't that be handled through ereport(ERROR/FATAL) rather than
through the choice of exit status? It seems to me that the only point
of the exit status is or should be to provide feedback to the
postmaster on how it should respond to the background worker's
untimely demise. If any other information needs to be conveyed, the
worker should log that itself rather than trying to tell the
postmaster what to log.

I dislike that because it essentially requires the bgworker to have a
full error catching environment like PostgresMain() has. That seems
bad for many cases.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Andres Freund (#13)

Re: bgworker crashed or not?

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-04-16 11:37:47 -0400, Robert Haas wrote:

Why can't that be handled through ereport(ERROR/FATAL) rather than
through the choice of exit status?

I dislike that because it essentially requires the bgworker to have a
full error catching environment like PostgresMain() has. That seems
bad for many cases.

That sounds like utter nonsense. No sane bgwriter code is going to be
able to discount the possibility of something throwing an elog(ERROR).
Now, you might not need the ability to do a transaction abort, but
you'll have to at least be able to terminate the process cleanly.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 12 years ago

In reply to: Tom Lane (#14)

Re: bgworker crashed or not?

On 2014-04-16 11:54:25 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-04-16 11:37:47 -0400, Robert Haas wrote:

Why can't that be handled through ereport(ERROR/FATAL) rather than
through the choice of exit status?

I dislike that because it essentially requires the bgworker to have a
full error catching environment like PostgresMain() has. That seems
bad for many cases.

That sounds like utter nonsense. No sane bgwriter code is going to be
able to discount the possibility of something throwing an elog(ERROR).
Now, you might not need the ability to do a transaction abort, but
you'll have to at least be able to terminate the process cleanly.

Why? Throwing an error without an exception stack seems to work
sensibly? The error is promoted to FATAL and the normal FATAL handling
is performed? C.f. pg_re_throw() called via errfinish() in the ERRROR
case.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

about 12 years ago

In reply to: Andres Freund (#15)

Re: bgworker crashed or not?

On Wed, Apr 16, 2014 at 12:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-04-16 11:54:25 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-04-16 11:37:47 -0400, Robert Haas wrote:

Why can't that be handled through ereport(ERROR/FATAL) rather than
through the choice of exit status?

I dislike that because it essentially requires the bgworker to have a
full error catching environment like PostgresMain() has. That seems
bad for many cases.

That sounds like utter nonsense. No sane bgwriter code is going to be
able to discount the possibility of something throwing an elog(ERROR).
Now, you might not need the ability to do a transaction abort, but
you'll have to at least be able to terminate the process cleanly.

Why? Throwing an error without an exception stack seems to work
sensibly? The error is promoted to FATAL and the normal FATAL handling
is performed? C.f. pg_re_throw() called via errfinish() in the ERRROR
case.

And... so what's the problem? You seemed to be saying that the
background worker would need to a more developed error-handling
environment in order to do proper logging, but here you're saying
(rightly, I believe) that it doesn't. Even if it did, though, I think
the right solution is to install one, not make it the postmaster's job
to try to read the tea leaves in the worker's exit code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 12 years ago

In reply to: Robert Haas (#16)

Re: bgworker crashed or not?

On 2014-04-16 12:04:26 -0400, Robert Haas wrote:

And... so what's the problem? You seemed to be saying that the
background worker would need to a more developed error-handling
environment in order to do proper logging, but here you're saying
(rightly, I believe) that it doesn't. Even if it did, though, I think
the right solution is to install one, not make it the postmaster's job
to try to read the tea leaves in the worker's exit code.

Well, currently it will log the message that has been thrown, that might
lack context. LogChildExit() already has code to print activity of the
bgworker after it crashed.

Note that a FATAL error will always use a exit(1) - so we better not
make that a special case in the code :/.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

about 12 years ago

In reply to: Andres Freund (#17)

Re: bgworker crashed or not?

On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-04-16 12:04:26 -0400, Robert Haas wrote:

And... so what's the problem? You seemed to be saying that the
background worker would need to a more developed error-handling
environment in order to do proper logging, but here you're saying
(rightly, I believe) that it doesn't. Even if it did, though, I think
the right solution is to install one, not make it the postmaster's job
to try to read the tea leaves in the worker's exit code.

Well, currently it will log the message that has been thrown, that might
lack context. LogChildExit() already has code to print activity of the
bgworker after it crashed.

I'm still not seeing the problem. It's the background worker's job to
make sure that the right stuff gets logged, just as it would be for
any other backend. Trying to bolt some portion of the responsibility
for that onto the postmaster is 100% wrong.

Note that a FATAL error will always use a exit(1) - so we better not
make that a special case in the code :/.

Well, everything's a special case, in that every error code has (and
should have) its own meaning. But the handling we give to exit code 1
is not obviously incompatible with what you probably want to have
happen after a FATAL.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 12 years ago

In reply to: Robert Haas (#18)

Re: bgworker crashed or not?

On 2014-04-16 12:20:01 -0400, Robert Haas wrote:

On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-04-16 12:04:26 -0400, Robert Haas wrote:

And... so what's the problem? You seemed to be saying that the
background worker would need to a more developed error-handling
environment in order to do proper logging, but here you're saying
(rightly, I believe) that it doesn't. Even if it did, though, I think
the right solution is to install one, not make it the postmaster's job
to try to read the tea leaves in the worker's exit code.

Well, currently it will log the message that has been thrown, that might
lack context. LogChildExit() already has code to print activity of the
bgworker after it crashed.

I'm still not seeing the problem. It's the background worker's job to
make sure that the right stuff gets logged, just as it would be for
any other backend. Trying to bolt some portion of the responsibility
for that onto the postmaster is 100% wrong.

Well, it already has taken on that responsibility, it's not my idea to
add it. I merely want to control more precisely what happens.

Note that a FATAL error will always use a exit(1) - so we better not
make that a special case in the code :/.

Well, everything's a special case, in that every error code has (and
should have) its own meaning. But the handling we give to exit code 1
is not obviously incompatible with what you probably want to have
happen after a FATAL.

That was essentially directed at Tom's proposal in
/messages/by-id/32224.1391444962@sss.pgh.pa.us - I
don't think that'd be compatible with FATAL errors.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

robertmhaas@gmail.com

about 12 years ago

In reply to: Andres Freund (#19)

Re: bgworker crashed or not?

On Wed, Apr 16, 2014 at 12:28 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-04-16 12:20:01 -0400, Robert Haas wrote:

On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-04-16 12:04:26 -0400, Robert Haas wrote:

And... so what's the problem? You seemed to be saying that the
background worker would need to a more developed error-handling
environment in order to do proper logging, but here you're saying
(rightly, I believe) that it doesn't. Even if it did, though, I think
the right solution is to install one, not make it the postmaster's job
to try to read the tea leaves in the worker's exit code.

Well, currently it will log the message that has been thrown, that might
lack context. LogChildExit() already has code to print activity of the
bgworker after it crashed.

I'm still not seeing the problem. It's the background worker's job to
make sure that the right stuff gets logged, just as it would be for
any other backend. Trying to bolt some portion of the responsibility
for that onto the postmaster is 100% wrong.

Well, it already has taken on that responsibility, it's not my idea to
add it. I merely want to control more precisely what happens.s

I think that's doubling down on an already-questionable design principle.

Or if I may be permitted a more colloquial idiom:

Luke, it's a trap.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

petr@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#20)

craig@2ndquadrant.com

about 12 years ago

In reply to: Petr Jelinek (#21)

craig@2ndquadrant.com

about 12 years ago

In reply to: Craig Ringer (#22)

petr@2ndquadrant.com

about 12 years ago

In reply to: Craig Ringer (#23)

robertmhaas@gmail.com

about 12 years ago

In reply to: Craig Ringer (#23)

petr@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#25)

robertmhaas@gmail.com

about 12 years ago

In reply to: Petr Jelinek (#26)

petr@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#27)

robertmhaas@gmail.com

about 12 years ago

In reply to: Petr Jelinek (#28)

petr@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#29)

petr@2ndquadrant.com

about 12 years ago

In reply to: Petr Jelinek (#30)

robertmhaas@gmail.com

about 12 years ago

In reply to: Petr Jelinek (#30)

petr@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#32)

robertmhaas@gmail.com

about 12 years ago

In reply to: Petr Jelinek (#33)

petr@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#34)

craig@2ndquadrant.com

about 12 years ago

In reply to: Robert Haas (#34)