Idea: closing the loop for "pg_ctl reload"

Started by Tom Laneabout 11 years ago53 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Bug #12788 reminded me of a problem I think we've discussed before:
if you use "pg_ctl reload" to trigger reload of the postmaster's
config files, and there's something wrong with those files, there's
no warning to you of that. The postmaster just bleats to its log and
keeps running. If you don't think to look in the log to confirm
successful reload, you're left with a time bomb: the next attempt
to restart the postmaster will fail altogether because of the bad
config file.

I commented in the bug thread that there wasn't much that pg_ctl
could do about this, but on reflection it seems like something we
could fix, because pg_ctl must be able to read the postmaster.pid
file in order to issue a reload signal. Consider a design like this:

1. Extend the definition of the postmaster.pid file to add another
line, which will contain the time of the last postmaster configuration
load attempt (might as well be a numeric Unix-style timestamp) and
a boolean indication of whether that attempt succeeded or not.

2. Change pg_ctl so that after sending a reload signal, it sleeps
for a second and checks for a change in the config load timestamp
(repeat as necessary till timeout). Once it sees the timestamp
change, it's in a position to report success or failure using the
boolean. I think this should become the default behavior, though
you could define -W to mean that there should be no wait or feedback.

It's tempting to think of storing a whole error message rather than
just a boolean, but I think that would complicate the pidfile definition
undesirably. A warning to look in the postmaster log ought to be
sufficient here.

For extra credit, the pg_reload_conf() function could be made to behave
similarly.

I don't have the time to pursue this idea myself, but perhaps someone
looking for a not-too-complicated project could take it on.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#1)
Re: Idea: closing the loop for "pg_ctl reload"

On February 19, 2015 08:26:45 PM Tom Lane wrote:

I don't have the time to pursue this idea myself, but perhaps someone
looking for a not-too-complicated project could take it on.

I can have a crack at this. What's the process? Do I add it to a CF once I
have a patch, or do I do that beforehand?

jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan de Visser (#2)
Re: Idea: closing the loop for "pg_ctl reload"

Jan de Visser <jan@de-visser.net> writes:

I can have a crack at this. What's the process? Do I add it to a CF once I
have a patch, or do I do that beforehand?

The CF process is for reviewing things, so until you have either a patch
or a design sketch you want feedback on, there's no need for a CF entry.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#1)
Re: Idea: closing the loop for "pg_ctl reload"

On February 19, 2015 08:26:45 PM Tom Lane wrote:

Bug #12788 reminded me of a problem I think we've discussed before:
if you use "pg_ctl reload" to trigger reload of the postmaster's
config files, and there's something wrong with those files, there's
no warning to you of that. The postmaster just bleats to its log and
keeps running. If you don't think to look in the log to confirm
successful reload, you're left with a time bomb: the next attempt
to restart the postmaster will fail altogether because of the bad
config file.

I commented in the bug thread that there wasn't much that pg_ctl
could do about this, but on reflection it seems like something we
could fix, because pg_ctl must be able to read the postmaster.pid
file in order to issue a reload signal. Consider a design like this:

1. Extend the definition of the postmaster.pid file to add another
line, which will contain the time of the last postmaster configuration
load attempt (might as well be a numeric Unix-style timestamp) and
a boolean indication of whether that attempt succeeded or not.

2. Change pg_ctl so that after sending a reload signal, it sleeps
for a second and checks for a change in the config load timestamp
(repeat as necessary till timeout). Once it sees the timestamp
change, it's in a position to report success or failure using the
boolean. I think this should become the default behavior, though
you could define -W to mean that there should be no wait or feedback.

It's tempting to think of storing a whole error message rather than
just a boolean, but I think that would complicate the pidfile definition
undesirably. A warning to look in the postmaster log ought to be
sufficient here.

For extra credit, the pg_reload_conf() function could be made to behave
similarly.

I don't have the time to pursue this idea myself, but perhaps someone
looking for a not-too-complicated project could take it on.

regards, tom lane

Here's my first crack at this. Notes:
1/ I haven't done the '-W' flag Tom mentions yet.
2/ Likewise haven't touched pg_reload_conf()
3/ Design details: I introduced a new struct in pg_ctl containing the parsed-
out data from postmaster.pid, with functions to retrieve the data and to
dispose memory allocated for it. This made the change in do_reload fairly
straightforward. I think this struct can be used in other places in pg_ctl as
well, and potentially in other files as well.
4/ Question: Can I assume pg_malloc allocated memory is set to zero? If not,
is it OK to do a memset(..., 0, ...)? I don't have much experience on any of
the esoteric platforms pgsql supports...

jan

Attachments:

Let_pg_ctl_check_the_result_of_a_postmaster_config_reload.patchtext/x-patch; charset=UTF-8; name=Let_pg_ctl_check_the_result_of_a_postmaster_config_reload.patchDownload+189-23
#5Jan de Visser
jan@de-visser.net
In reply to: Jan de Visser (#4)
Re: Idea: closing the loop for "pg_ctl reload"

On March 2, 2015 12:56:23 AM Jan de Visser wrote:
... stuff ...

I seem to have mis-clicked something in the CF app - I created two entries
somehow. I think one got created when I entered the msgid of Tom's original
message with the enclosing '<...>'. If that's the case, then that may be a
bug.

jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael@paquier.xyz
In reply to: Jan de Visser (#5)
Re: Idea: closing the loop for "pg_ctl reload"

On Mon, Mar 2, 2015 at 3:01 PM, Jan de Visser <jan@de-visser.net> wrote:

On March 2, 2015 12:56:23 AM Jan de Visser wrote:
... stuff ...

I seem to have mis-clicked something in the CF app - I created two entries
somehow. I think one got created when I entered the msgid of Tom's original
message with the enclosing '<...>'. If that's the case, then that may be a
bug.

Don't worry for that. I just removed the duplicated entry.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan de Visser (#4)
Re: Idea: closing the loop for "pg_ctl reload"

Jan de Visser <jan@de-visser.net> writes:

4/ Question: Can I assume pg_malloc allocated memory is set to zero? If not,
is it OK to do a memset(..., 0, ...)? I don't have much experience on any of
the esoteric platforms pgsql supports...

No, you need the memset. You might accidentally get already-zeroed memory
from malloc, but it's not something you can rely on.

However, you could and should use pg_malloc0, which takes care of that
for you...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#7)
Re: Idea: closing the loop for "pg_ctl reload"

On March 2, 2015 09:50:49 AM Tom Lane wrote:

However, you could and should use pg_malloc0, which takes care of that
for you...

I am (using pg_malloc, that is). So, just to be sure: pg_malloc memsets the
block to 0, right?

My question was more along the lines if memsetting to 0 to ensure that pointer
fields are NULL and int/long fields are 0. I know they are on Linux, but don't
know if that applies to other platforms as well, or if I need to set fields
explicitly to those 'zero'/'uninitialized' values.

jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan de Visser (#8)
Re: Idea: closing the loop for "pg_ctl reload"

Jan de Visser <jan@de-visser.net> writes:

On March 2, 2015 09:50:49 AM Tom Lane wrote:

However, you could and should use pg_malloc0, which takes care of that
for you...

I am (using pg_malloc, that is). So, just to be sure: pg_malloc memsets the
block to 0, right?

No, it doesn't, but pg_malloc0 does. Consult the code if you're confused:
src/common/fe_memutils.c

My question was more along the lines if memsetting to 0 to ensure that pointer
fields are NULL and int/long fields are 0.

Yes, we do assume that widely, and so does a heck of a lot of other code.
In principle the C standard doesn't require that a NULL pointer be
all-zero-bits, only that casting "0" to a pointer yield a NULL pointer.
But certainly there are no modern implementations that don't represent
NULL as 0. Anybody who tried to do it differently would soon find that
hardly any real-world C code would run on their platform.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#9)
Re: Idea: closing the loop for "pg_ctl reload"

On March 2, 2015 12:44:40 PM Tom Lane wrote:

No, it doesn't, but pg_malloc0 does. Consult the code if you're confused:
src/common/fe_memutils.c

Doh! I read pg_malloc( ), not pg_malloc0.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jan de Visser (#8)
Re: Idea: closing the loop for "pg_ctl reload"

Jan de Visser <jan@de-visser.net> wrote:

On March 2, 2015 09:50:49 AM Tom Lane wrote:

However, you could and should use pg_malloc0, which takes care
of that for you...

I am (using pg_malloc, that is). So, just to be sure: pg_malloc
memsets the block to 0, right?

I think you may have misread a zero character as an empty pair of
parentheses. Tom pointed out that the pg_malloc() function gives
you uninitialized memory -- you cannot count on the contents. He
further pointed out that if you need it to be initialized to '0'
bytes you should call the pg_malloc0() function rather than calling
the pg_malloc() function and running memset separately.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jan de Visser
jan@de-visser.net
In reply to: Jan de Visser (#4)
Re: Idea: closing the loop for "pg_ctl reload"

On March 2, 2015 12:56:23 AM Jan de Visser wrote:

Here's my first crack at this. Notes:
1/ I haven't done the '-W' flag Tom mentions yet.
2/ Likewise haven't touched pg_reload_conf()
3/ Design details: I introduced a new struct in pg_ctl containing the
parsed- out data from postmaster.pid, with functions to retrieve the data
and to dispose memory allocated for it. This made the change in do_reload
fairly straightforward. I think this struct can be used in other places in
pg_ctl as well, and potentially in other files as well.
4/ Question: Can I assume pg_malloc allocated memory is set to zero? If not,
is it OK to do a memset(..., 0, ...)? I don't have much experience on any
of the esoteric platforms pgsql supports...

Slight tweaks to better deal with pre-9.5 (6?) servers that don't write the
reload timestamp. I tested with a 9.4 server and it seemed to work...

Also implemented -W/-w. Haven't done pg_reload_conf() yet.

Attachments:

Let_pg_ctl_check_the_result_of_a_postmaster_config_reload.patchtext/x-patch; charset=UTF-8; name=Let_pg_ctl_check_the_result_of_a_postmaster_config_reload.patchDownload+216-26
#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Idea: closing the loop for "pg_ctl reload"

On Fri, Feb 20, 2015 at 1:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. Extend the definition of the postmaster.pid file to add another
line, which will contain the time of the last postmaster configuration
load attempt (might as well be a numeric Unix-style timestamp) and
a boolean indication of whether that attempt succeeded or not.

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

We do have the external_pid_file GUC so perhaps this just means we
should warn people in the release notes or somewhere and point them at
that GUC.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#13)
Re: Idea: closing the loop for "pg_ctl reload"

On 2015-03-03 15:21:24 +0000, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost
5440001 82345992

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#14)
Re: Idea: closing the loop for "pg_ctl reload"

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-03-03 15:21:24 +0000, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.

Yeah, that ship sailed long ago. I'm sure anyone who's doing this is
using "head -1" not just "cat", and what I suggest wouldn't break that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#14)
Re: Idea: closing the loop for "pg_ctl reload"

On 3/3/15 9:26 AM, Andres Freund wrote:

On 2015-03-03 15:21:24 +0000, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost
5440001 82345992

If we already have all this extra stuff, why not include an actual error
message then, or at least the first line of an error (or maybe just swap
any newlines with spaces)?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#15)
Re: Idea: closing the loop for "pg_ctl reload"

On March 3, 2015 10:29:43 AM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-03-03 15:21:24 +0000, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.

Yeah, that ship sailed long ago. I'm sure anyone who's doing this is
using "head -1" not just "cat", and what I suggest wouldn't break that.

regards, tom lane

And what I've implemented doesn't either. The extra line is added to the back
of the file.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Jan de Visser
jan@de-visser.net
In reply to: Jim Nasby (#16)
Re: Idea: closing the loop for "pg_ctl reload"

On March 3, 2015 11:09:29 AM Jim Nasby wrote:

On 3/3/15 9:26 AM, Andres Freund wrote:

On 2015-03-03 15:21:24 +0000, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost

5440001 82345992

If we already have all this extra stuff, why not include an actual error
message then, or at least the first line of an error (or maybe just swap
any newlines with spaces)?

Not impossible. I can play around with that and see if it's as straightforward
as I think it is.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jan de Visser (#18)
Re: Idea: closing the loop for "pg_ctl reload"

On 3/3/15 11:15 AM, Jan de Visser wrote:

On March 3, 2015 11:09:29 AM Jim Nasby wrote:

On 3/3/15 9:26 AM, Andres Freund wrote:

On 2015-03-03 15:21:24 +0000, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost

5440001 82345992

If we already have all this extra stuff, why not include an actual error
message then, or at least the first line of an error (or maybe just swap
any newlines with spaces)?

Not impossible. I can play around with that and see if it's as straightforward
as I think it is.

I'm sure the code side of this is trivial; it's a question of why Tom
was objecting. It would probably be better for us to come to a
conclusion before working on this.

On a related note... something else we could do here would be to keep a
last-known-good copy of the config files around. That way if you flubbed
something at least the server would still start. I do think that any
admin worth anything would notice an error from pg_ctl, but maybe others
have a different opinion.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Jim Nasby (#16)
Re: Idea: closing the loop for "pg_ctl reload"

On 2015-03-03 11:09:29 -0600, Jim Nasby wrote:

On 3/3/15 9:26 AM, Andres Freund wrote:

On 2015-03-03 15:21:24 +0000, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like "kill -HUP `cat .../postmaster.pid`" or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost
5440001 82345992

If we already have all this extra stuff, why not include an actual error
message then, or at least the first line of an error (or maybe just swap any
newlines with spaces)?

It's often several errors and such. I think knowing that it failed and
that you should look into the error log is already quite helpful
already.

Generally we obviously need some status to indicate that the config file
has been reloaded, but that could be easily combined with storing the
error message.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#20)
#22Andres Freund
andres@anarazel.de
In reply to: Jim Nasby (#21)
#23Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#23)
#25Jan de Visser
jan@de-visser.net
In reply to: Jim Nasby (#23)
#26Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#24)
#27Jan de Visser
jan@de-visser.net
In reply to: Jim Nasby (#26)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Jim Nasby (#26)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#29)
#31Jan de Visser
jan@de-visser.net
In reply to: Andres Freund (#30)
#32Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jan de Visser (#31)
#33Payal Singh
payal@omniti.com
In reply to: Jim Nasby (#32)
#34Jan de Visser
jan@de-visser.net
In reply to: Payal Singh (#33)
#35Jan de Visser
jan@de-visser.net
In reply to: Jan de Visser (#34)
#36Jan de Visser
jan@de-visser.net
In reply to: Jan de Visser (#35)
#37Payal Singh
payal@omniti.com
In reply to: Jan de Visser (#36)
#38Payal Singh
payal@omniti.com
In reply to: Payal Singh (#37)
#39Jan de Visser
jan@de-visser.net
In reply to: Payal Singh (#38)
#40Jan de Visser
jan@de-visser.net
In reply to: Jan de Visser (#39)
#41Jan de Visser
jan@de-visser.net
In reply to: Jan de Visser (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan de Visser (#40)
#43Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#42)
#44Jan de Visser
jan@de-visser.net
In reply to: Jan de Visser (#43)
#45Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#42)
#46Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#42)
#47Jan de Visser
jan@de-visser.net
In reply to: Stephen Frost (#46)
#48Stephen Frost
sfrost@snowman.net
In reply to: Jan de Visser (#47)
#49Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jan de Visser (#43)
#50Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#49)
#51Jan de Visser
jan@de-visser.net
In reply to: Michael Paquier (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan de Visser (#51)
#53Jan de Visser
jan@de-visser.net
In reply to: Tom Lane (#52)