Bug with pg_ctl -w/wait and config-only directories
In researching pg_ctl -w/wait mode for pg_upgrade, I found that pg_ctl
-w's handling of configuration-only directories is often incorrect. For
example, 'pg_ctl -w stop' checks for the postmaster.pid file to
determine when the server is shut down, but there is no postmaster.pid
file in the config directory, so it fails, i.e. does nothing. What is
interesting is that specifying the real data directory does work.
Similarly, pg_ctl references these data directory files:
snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
I assume things that use these files also don't work for config-only
directories.
You might think that you can always just specify the real data
directory, but that doesn't work if the server has to be started because
you need to point to postgresql.conf. pg_ctl -w restart is a classic
case of something that needs both the config directory and the real data
directory. Basically, this stuff all seems broken and needs to be fixed
or documented.
What is even worse is that pre-9.1, pg_ctl start would read ports from
the pg_ctl -o command line, but in 9.1 we changed this to force reading
the postmaster.pid file to find the port number and socket directory
location --- meaning, new in PG 9.1, 'pg_ctl -w start' doesn't work for
config-only directories either. And, we can't easily connect to the
server to get the 'data_directory' because we need to read
postmaster.pid to get the connection settings. :-(
I think this points to the need for a command-line tool to output the
data directory location; I am not sure what to do about the new 9.1
breakage.
pg_upgrade can work around these issues by starting using the config
directory and stopping using the real data directory, but it cannot work
around the 9.1 pg_ctl -w start problem for config-only directories.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Sat, Oct 01, 2011 at 02:08:33PM -0400, Bruce Momjian wrote:
In researching pg_ctl -w/wait mode for pg_upgrade, I found that pg_ctl
-w's handling of configuration-only directories is often incorrect. For
example, 'pg_ctl -w stop' checks for the postmaster.pid file to
determine when the server is shut down, but there is no postmaster.pid
file in the config directory, so it fails, i.e. does nothing. What is
interesting is that specifying the real data directory does work.Similarly, pg_ctl references these data directory files:
snprintf(postopts_file, MAXPGPATH, "%s/postmaster.opts", pg_data);
snprintf(backup_file, MAXPGPATH, "%s/backup_label", pg_data);
snprintf(recovery_file, MAXPGPATH, "%s/recovery.conf", pg_data);
snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);I assume things that use these files also don't work for config-only
directories.You might think that you can always just specify the real data
directory, but that doesn't work if the server has to be started because
you need to point to postgresql.conf. pg_ctl -w restart is a classic
case of something that needs both the config directory and the real data
directory. Basically, this stuff all seems broken and needs to be fixed
or documented.What is even worse is that pre-9.1, pg_ctl start would read ports from
the pg_ctl -o command line, but in 9.1 we changed this to force reading
the postmaster.pid file to find the port number and socket directory
location --- meaning, new in PG 9.1, 'pg_ctl -w start' doesn't work for
config-only directories either. And, we can't easily connect to the
server to get the 'data_directory' because we need to read
postmaster.pid to get the connection settings. :-(I think this points to the need for a command-line tool to output the
data directory location; I am not sure what to do about the new 9.1
breakage.pg_upgrade can work around these issues by starting using the config
directory and stopping using the real data directory, but it cannot work
around the 9.1 pg_ctl -w start problem for config-only directories.--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com+ It's impossible for everything to be true. +
I went through several iterations trying to find a command that can work
the way we'd like it to. (Essentially is works the way you're describing
it should.) So, in Gentoo, for the initscript, we have this really ugly
command to start the server:
su -l postgres \
-c "env PGPORT=\"${PGPORT}\" ${PG_EXTRA_ENV} \
/usr/lib/postgresql-9.0/bin/pg_ctl \
start ${WAIT_FOR_START} -t ${START_TIMEOUT} -s -D ${DATA_DIR} \
-o '-D ${PGDATA} --data-directory=${DATA_DIR} \
--silent-mode=true ${PGOPTS}'"
And to stop the server:
su -l postgres \
-c "env PGPORT=\"${PGPORT}\" ${PG_EXTRA_ENV} \
/usr/lib/postgresql-9.0/bin/pg_ctl \
stop ${WAIT_FOR_STOP} -t ${NICE_TIMEOUT} -s -D ${DATA_DIR} \
-m smart"
The default values for these are:
PGPORT='5432'
PG_EXTRA_ENV=''
WAIT_FOR_START='-w'
START_TIMEOUT='60'
WAIT_FOR_STOP='-w'
NICE_TIMEOUT='60'
DATA_DIR='/var/lib/postgresql/9.0/data'
PGDATA='/etc/postgresql-9.0'
PGOPTS=''
We don't use 'pg_ctl restart', instead we stop and then start the
server. So, I don't have an answer for that. I'd imagine passing '-D
${DATA_DIR}' would do the trick there as well.
Of course, simplifying this a bit would be welcome.
--
Mr. Aaron W. Swenson
Gentoo Linux Developer
Email : titanofold@gentoo.org
GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0
GnuPG ID : D1BBFDA0
Mr. Aaron W. Swenson wrote:
I went through several iterations trying to find a command that can work
the way we'd like it to. (Essentially is works the way you're describing
it should.) So, in Gentoo, for the initscript, we have this really ugly
command to start the server:su -l postgres \
-c "env PGPORT=\"${PGPORT}\" ${PG_EXTRA_ENV} \
/usr/lib/postgresql-9.0/bin/pg_ctl \
start ${WAIT_FOR_START} -t ${START_TIMEOUT} -s -D ${DATA_DIR} \
-o '-D ${PGDATA} --data-directory=${DATA_DIR} \
--silent-mode=true ${PGOPTS}'"And to stop the server:
su -l postgres \
-c "env PGPORT=\"${PGPORT}\" ${PG_EXTRA_ENV} \
/usr/lib/postgresql-9.0/bin/pg_ctl \
stop ${WAIT_FOR_STOP} -t ${NICE_TIMEOUT} -s -D ${DATA_DIR} \
-m smart"The default values for these are:
PGPORT='5432'
PG_EXTRA_ENV=''
WAIT_FOR_START='-w'
START_TIMEOUT='60'
WAIT_FOR_STOP='-w'
NICE_TIMEOUT='60'
DATA_DIR='/var/lib/postgresql/9.0/data'
PGDATA='/etc/postgresql-9.0'
PGOPTS=''We don't use 'pg_ctl restart', instead we stop and then start the
server. So, I don't have an answer for that. I'd imagine passing '-D
${DATA_DIR}' would do the trick there as well.Of course, simplifying this a bit would be welcome.
What exactly is your question? You are not using a config-only
directory but the real data directory, so it should work fine.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Sun, Oct 2, 2011 at 7:54 AM, Bruce Momjian <bruce@momjian.us> wrote:
What exactly is your question? You are not using a config-only
directory but the real data directory, so it should work fine.
No. He is using PGDATA (= /etc/postgresql-9.0) as a config-only
directory, and DATA_DIR (= /var/lib/postgresql/9.0/data) as a
real data directory.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Fujii Masao wrote:
On Sun, Oct 2, 2011 at 7:54 AM, Bruce Momjian <bruce@momjian.us> wrote:
What exactly is your question? ?You are not using a config-only
directory but the real data directory, so it should work fine.No. He is using PGDATA (= /etc/postgresql-9.0) as a config-only
directory, and DATA_DIR (= /var/lib/postgresql/9.0/data) as a
real data directory.
Wow, I see what you mean now! So the user already figured out it was
broken and used the workaround I recently discovered? Was this ever
reported to the community? If so, I never saw it.
So, in testing, I see it is even more broken than I thought. Not only
is pg_ctl -w broken for start/stop for config-only installs, but pg_ctl
stop (no -w) is also broken because it can't find the postmaster.pid
file to check or use to get the pid to send the signal. pg_ctl reload
and restart are similarly broken. :-(
And it gets worse. The example supplied by the Gentoo developer shows a
use case where the data directory is not even specified in the
configuration file but rather on the command line:
su -l postgres \
-c "env PGPORT=\"${PGPORT}\" ${PG_EXTRA_ENV} \
/usr/lib/postgresql-9.0/bin/pg_ctl \
start ${WAIT_FOR_START} -t ${START_TIMEOUT} -s -D ${DATA_DIR} \
-o '-D ${PGDATA} --data-directory=${DATA_DIR} \
--silent-mode=true ${PGOPTS}'"
In this case, dumping the postgresql.conf file settings is not going to
help --- there is nothing in the config directory that is going to point
us to the data directory --- it exists only in the process arguments.
Frankly, I am confused how this breakage has gone unreported for so long.
Our current TODO item is:
Allow pg_ctl to work properly with configuration files located outside
the PGDATA directory
pg_ctl can not read the pid file because it isn't located in the
config directory but in the PGDATA directory. The solution is to allow
pg_ctl to read and understand postgresql.conf to find the data_directory
value.
BUG #5103: "pg_ctl -w (re)start" fails with custom
unix_socket_directory
While this is accurate, it certainly is missing much of the breakage.
Finding a non-standard socket directory is the least of our problems
with config-only directories (even standard settings don't work), and
reading the config file is not enough of a solution because of the
possible passing of parameters on the command line.
To add even more complexity, imagine someone using the same config
directory for several data/cluster directories, and just passing a
unique --data-directory for each one on start --- in that case,
specifying the config directory is not sufficiently unique to specify
which data directory. It seems we would need some way to pass the data
directory to pg_ctl, perhaps via -o, but parsing that was something we
have tried to avoid (there may be no other choice), and it would have to
be supplied for start and stop.
The only conclusion I can come up with is that we need to be able to
dump postgresql.conf's data_directory, but also to read it from the
command line.
I am starting to question the value of config-only directories if pg_ctl
stop doesn't work, or you have to specify a different directory for
start and stop. Writing a second postmaster.pid file into the config
directory would help, but it would break with shared-config setups and I
don't think we can assume we have write permission on the config
directory.
What are config-only directories buying us that we can't get from
telling users to use symlinks and point to the data directory directly?
Did we not think of these things when we designed config-only
directories? I don't even see this problem mentioned in our
documentation.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes:
I am starting to question the value of config-only directories if pg_ctl
stop doesn't work, or you have to specify a different directory for
start and stop.
Yup.
Did we not think of these things when we designed config-only
directories? I don't even see this problem mentioned in our
documentation.
Yeah, we did. The people who were lobbying for the feature didn't care,
or possibly thought that somebody would fix it for them later.
regards, tom lane
Excerpts from Tom Lane's message of lun oct 03 12:34:22 -0300 2011:
Bruce Momjian <bruce@momjian.us> writes:
I am starting to question the value of config-only directories if pg_ctl
stop doesn't work, or you have to specify a different directory for
start and stop.Yup.
Did we not think of these things when we designed config-only
directories? I don't even see this problem mentioned in our
documentation.Yeah, we did. The people who were lobbying for the feature didn't care,
or possibly thought that somebody would fix it for them later.
I think the main proponents are the Debian guys, and they don't use
pg_ctl because they have their own pg_ctlcluster.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
Excerpts from Tom Lane's message of lun oct 03 12:34:22 -0300 2011:
Bruce Momjian <bruce@momjian.us> writes:
I am starting to question the value of config-only directories if pg_ctl
stop doesn't work, or you have to specify a different directory for
start and stop.Yup.
Did we not think of these things when we designed config-only
directories? I don't even see this problem mentioned in our
documentation.Yeah, we did. The people who were lobbying for the feature didn't care,
or possibly thought that somebody would fix it for them later.I think the main proponents are the Debian guys, and they don't use
pg_ctl because they have their own pg_ctlcluster.
OK, so it is as messed up as I thought.
I am all fine for people lobbying for features, but not if they don't
work with our tools. pg_upgrade is certainly not going to use the
Debian start/stop tools unless Debian patches pg_upgrade.
So someone thought we would eventually fix the tools? I am unclear
exactly how to fix much of this. Even documenting some workarounds
seems impossible, e.g. pg_ctl restart.
I can't see any feature config-only directories adds that can't be
accomplished by symlinks. Even the ability to use a single
configuration file for multiple clusters can be done.
In summary, here is what I have found that works or is impossible with
config-only directories:
pg_ctl start specify config directory
pg_ctl -w start impossible
pg_ctl restart impossible
pg_ctl stop specify real data dir
pg_ctl -w stop specify real data dir
pg_ctl reload specify real data dir
Config-only directories seem to be only adding confusion. All possible
solutions seem to be adding more code and user requirements, which the
creation of symlinks avoids.
Is it time for me to ask on 'general' if removal of this feature is
warranted?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes:
Config-only directories seem to be only adding confusion. All possible
solutions seem to be adding more code and user requirements, which the
creation of symlinks avoids.
Is it time for me to ask on 'general' if removal of this feature is
warranted?
Well, the way we could fix it is to invent the parse-the-config-files
option that was alluded to recently. Then pg_ctl would continue to
take the -D switch or PGDATA environment variable with the same meaning
that the postmaster attaches to it, and would do something like
postgres --print-config-value=data_directory -D $PGDATA
to extract the actual location of the data directory.
Whether this is worth the trouble is highly debatable IMO. One obvious
risk factor for pg_ctl stop/restart is that the current contents of
postgresql.conf might not match what they were when the postmaster was
started.
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.
regards, tom lane
On mån, 2011-10-03 at 11:27 -0400, Bruce Momjian wrote:
Frankly, I am confused how this breakage has gone unreported for so
long.
Well, nobody is required to use pg_ctl, and for the longest time, it was
pg_ctl that was considered to be broken (for various other reasons) and
avoided in packaged init scripts.
Arguably, if push came to shove, pg_upgrade wouldn't really need to use
pg_ctl either.
On Mon, Oct 3, 2011 at 7:07 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On mån, 2011-10-03 at 11:27 -0400, Bruce Momjian wrote:
Frankly, I am confused how this breakage has gone unreported for so
long.Well, nobody is required to use pg_ctl,
You are if you wish to run as a service on Windows.
--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 10/03/2011 12:54 PM, Tom Lane wrote:
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.
The horse has well and truly bolted. We'd have a major row if anyone
tried to remove it. Let's not rehash old battles. Our only option is to
make it work as best we can.
cheers
andrew
Andrew Dunstan wrote:
On 10/03/2011 12:54 PM, Tom Lane wrote:
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.The horse has well and truly bolted. We'd have a major row if anyone
tried to remove it. Let's not rehash old battles. Our only option is to
make it work as best we can.
I disagree. If people were using it we would have had many more bug
reports about pg_ctl not working.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Config-only directories seem to be only adding confusion. All possible
solutions seem to be adding more code and user requirements, which the
creation of symlinks avoids.Is it time for me to ask on 'general' if removal of this feature is
warranted?Well, the way we could fix it is to invent the parse-the-config-files
option that was alluded to recently. Then pg_ctl would continue to
take the -D switch or PGDATA environment variable with the same meaning
that the postmaster attaches to it, and would do something likepostgres --print-config-value=data_directory -D $PGDATA
to extract the actual location of the data directory.
That works, assuming the server was not started with -o
'data_directory=/abc'. The only workaround there would be to have
pg_ctl supply the -o, even on pg_ctl stop, and parse that in pg_ctl.
Whether this is worth the trouble is highly debatable IMO. One obvious
risk factor for pg_ctl stop/restart is that the current contents of
postgresql.conf might not match what they were when the postmaster was
started.I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.
The entire thing seems logically broken, to the point where even if we
did get code working, few users would even understand it.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On 10/03/2011 02:15 PM, Bruce Momjian wrote:
Andrew Dunstan wrote:
On 10/03/2011 12:54 PM, Tom Lane wrote:
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.The horse has well and truly bolted. We'd have a major row if anyone
tried to remove it. Let's not rehash old battles. Our only option is to
make it work as best we can.I disagree. If people were using it we would have had many more bug
reports about pg_ctl not working.
No, that's an indication people aren't using pg_ctl, not that they
aren't using separate config dirs.
cheers
andrew
Peter Eisentraut wrote:
On m?n, 2011-10-03 at 11:27 -0400, Bruce Momjian wrote:
Frankly, I am confused how this breakage has gone unreported for so
long.Well, nobody is required to use pg_ctl, and for the longest time, it was
pg_ctl that was considered to be broken (for various other reasons) and
avoided in packaged init scripts.
Yes, but I am now seeing that pg_ctl is really unfixable. Is the
config-only directory really a valuable feature if pg_ctl does not work?
If we could document that pg_ctl (and pg_upgrade) doesn't work with
config-only directories, at least we would have a consistent API. The
question is whether the config-only directory is useful with this
restriction. Are people recording the postmaster pid somewhere when
they start it? I doubt they are parsing the connection information we
added to postmaster.pid in 9.1. Are they manually going into the
postmaster.pdi file and grabbing the first line?
Arguably, if push came to shove, pg_upgrade wouldn't really need to use
pg_ctl either.
It would have to implement the 'wait' mode inside pg_upgrade, and in
other applications that needs that behavior.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Mon, Oct 3, 2011 at 7:15 PM, Bruce Momjian <bruce@momjian.us> wrote:
Andrew Dunstan wrote:
On 10/03/2011 12:54 PM, Tom Lane wrote:
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.The horse has well and truly bolted. We'd have a major row if anyone
tried to remove it. Let's not rehash old battles. Our only option is to
make it work as best we can.I disagree. If people were using it we would have had many more bug
reports about pg_ctl not working.
Debian/ubuntu packages and our own project infrastructure use it.
Though, there is a non-trivial script wrapping it, presumably to try
to make it work properly, and handle side-by-side installations of
different major versions.
--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Andrew Dunstan wrote:
On 10/03/2011 02:15 PM, Bruce Momjian wrote:
Andrew Dunstan wrote:
On 10/03/2011 12:54 PM, Tom Lane wrote:
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.The horse has well and truly bolted. We'd have a major row if anyone
tried to remove it. Let's not rehash old battles. Our only option is to
make it work as best we can.I disagree. If people were using it we would have had many more bug
reports about pg_ctl not working.No, that's an indication people aren't using pg_ctl, not that they
aren't using separate config dirs.
So, you are saying that people who want config-only directories are just
not people who normally use pg_ctl, because if they were, they would
have reported the bug? That seems unlikely. I will admit the Gentoo
case is exactly that.
So we just document that config-only directories don't work for pg_ctl
and pg_upgrade?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On 10/03/2011 02:25 PM, Bruce Momjian wrote:
Andrew Dunstan wrote:
On 10/03/2011 02:15 PM, Bruce Momjian wrote:
Andrew Dunstan wrote:
On 10/03/2011 12:54 PM, Tom Lane wrote:
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.The horse has well and truly bolted. We'd have a major row if anyone
tried to remove it. Let's not rehash old battles. Our only option is to
make it work as best we can.I disagree. If people were using it we would have had many more bug
reports about pg_ctl not working.No, that's an indication people aren't using pg_ctl, not that they
aren't using separate config dirs.So, you are saying that people who want config-only directories are just
not people who normally use pg_ctl, because if they were, they would
have reported the bug? That seems unlikely. I will admit the Gentoo
case is exactly that.
As Dave has pointed out there are many more people that use it, probably
most notably Debian/Ubuntu users.
So we just document that config-only directories don't work for pg_ctl
and pg_upgrade?
I'd rather not if it can be avoided.
cheers
andrew
Andrew Dunstan wrote:
On 10/03/2011 02:25 PM, Bruce Momjian wrote:
Andrew Dunstan wrote:
On 10/03/2011 02:15 PM, Bruce Momjian wrote:
Andrew Dunstan wrote:
On 10/03/2011 12:54 PM, Tom Lane wrote:
I was never exactly thrilled with the separate-config-directory design
to start with, so I'm probably not the person to opine on whether we
could get away with removing it.The horse has well and truly bolted. We'd have a major row if anyone
tried to remove it. Let's not rehash old battles. Our only option is to
make it work as best we can.I disagree. If people were using it we would have had many more bug
reports about pg_ctl not working.No, that's an indication people aren't using pg_ctl, not that they
aren't using separate config dirs.So, you are saying that people who want config-only directories are just
not people who normally use pg_ctl, because if they were, they would
have reported the bug? That seems unlikely. I will admit the Gentoo
case is exactly that.As Dave has pointed out there are many more people that use it, probably
most notably Debian/Ubuntu users.So we just document that config-only directories don't work for pg_ctl
and pg_upgrade?I'd rather not if it can be avoided.
OK, please propose and "avoid" plan? I can't come up with one that makes
any sense.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +