Problem with pg_upgrade?
I have gotten two reports via IRC that months after using 9.0
pg_upgrade, some of the clog files have been removed while there is
still data in the table needing those clog files. These reports came to
me through Rhodiumtoad who analyzed the systems.
Looking at pg_upgrade, I am concerned that somehow autovaccum is running
in frozen mode before I have restored the frozen xids for the table or
database. Here is the code I am using:
snprintf(cmd, sizeof(cmd),
SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" "
"-o \"-p %d -c autovacuum=off "
"-c autovacuum_freeze_max_age=2000000000\" "
"start >> \"%s\" 2>&1" SYSTEMQUOTE,
bindir,
Does anyone have any other suggestions on how to make sure autovacuum
does not run in freeze mode? I know 'autovacuum=off' turns off normal
autovacuum. Would increasing autovacuum_naptime help? It looks like
the autovacuum code sleeps before processing anything, but I am not
certain.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote:
Does anyone have any other suggestions on how to make sure autovacuum
does not run in freeze mode?
Can you run in single user mode?
Regards,
Jeff Davis
Excerpts from Jeff Davis's message of mar mar 29 21:27:34 -0300 2011:
On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote:
Does anyone have any other suggestions on how to make sure autovacuum
does not run in freeze mode?Can you run in single user mode?
I asked the same thing. Apparently the problem is that it would make
error handling a lot more difficult. I think it would be better to have
some sort of option to disable autovacuum completely which would be used
only during pg_upgrade.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, 2011-03-29 at 21:43 -0300, Alvaro Herrera wrote:
I think it would be better to have
some sort of option to disable autovacuum completely which would be used
only during pg_upgrade.
Sounds reasonable to me.
Regards,
Jeff Davis
Alvaro Herrera wrote:
Excerpts from Jeff Davis's message of mar mar 29 21:27:34 -0300 2011:
On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote:
Does anyone have any other suggestions on how to make sure autovacuum
does not run in freeze mode?Can you run in single user mode?
I asked the same thing. Apparently the problem is that it would make
error handling a lot more difficult. I think it would be better to have
some sort of option to disable autovacuum completely which would be used
only during pg_upgrade.
Yes, also consider that pg_dumpall assumes psql with its use of
\connect, and I would have to start/stop the single-user backend for
every database change, plus I use psql with ON_ERROR_STOP=on.
I think we have three options:
o find if the use of autovacuum_freeze_max_age is safe, or make
it safe
o document that autovacuum_naptime always happens before
autovacuum does anything and set it high
o modify autovacuum to be an enum, with values on/off/disabled
I think the last one makes more sense, and is safer if we need to
backpatch this. Creating a new variable for this would be confusing
because it could conflict with the 'autovacuum' setting.
Also, I am unclear if this is really our bug. At least one of the
systems was on Ubuntu/Debian, and they might both have been, and I know
Debian changes our source code. Where can I find a copy of the diffs
they have made?
I am hearing only second-hand reports of this problem through
Rhodiumtoad on IRC. I don't have IRC access this week so if someone can
get details from him that would help. I think the fix he found was to
pull the clog files off of an old file system backup.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote:
Also, I am unclear if this is really our bug. At least one of the
systems was on Ubuntu/Debian, and they might both have been, and I know
Debian changes our source code. Where can I find a copy of the diffs
they have made?
http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/
Peter Eisentraut wrote:
On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote:
Also, I am unclear if this is really our bug. At least one of the
systems was on Ubuntu/Debian, and they might both have been, and I know
Debian changes our source code. Where can I find a copy of the diffs
they have made?http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/
These all seem reasonable, but I am confused by this. Someone reported
last week that the equals sign is not optional in postgreql.conf on
Debian but I don't see that patch in here. Also I thought they modified
pg_hba.conf in some odd way. Are these changes no longer made in 9.0?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote:
I think we have three options:
o find if the use of autovacuum_freeze_max_age is safe, or make
it safe
o document that autovacuum_naptime always happens before
autovacuum does anything and set it high
o modify autovacuum to be an enum, with values on/off/disabledI think the last one makes more sense, and is safer if we need to
backpatch this. Creating a new variable for this would be confusing
because it could conflict with the 'autovacuum' setting.
I have to admit the prospect of abuse is slightly frightening to me
here. I guess we can't be held responsible for users who do dumb
things, but it might not be too clear to someone what the difference
is between autovacuum=off and autovacuum=disabled. I don't really
understand why this is an issue in the first place, though. Surely we
must be setting the XID counter on the new cluster to match the one on
the old cluster, and migrating the relfrozenxid and datfrozenxid
settings, so why does it matter if someone runs vacuum freeze?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, 2011-03-30 at 16:46 -0400, Robert Haas wrote:
I don't really
understand why this is an issue in the first place, though. Surely we
must be setting the XID counter on the new cluster to match the one on
the old cluster, and migrating the relfrozenxid and datfrozenxid
settings, so why does it matter if someone runs vacuum freeze?
Because autovacuum may run before those things are properly set, as
Bruce said in the original email:
"I am concerned that somehow autovaccum is running
in frozen mode before I have restored the frozen xids for the table or
database."
I think some kind of hidden GUC might be the best option. I tend to
agree that a third option to the "autovacuum" setting would be
confusing.
Regards,
Jeff Davis
On ons, 2011-03-30 at 15:39 -0400, Bruce Momjian wrote:
Peter Eisentraut wrote:
On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote:
Also, I am unclear if this is really our bug. At least one of the
systems was on Ubuntu/Debian, and they might both have been, and I know
Debian changes our source code. Where can I find a copy of the diffs
they have made?http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/
These all seem reasonable, but I am confused by this. Someone reported
last week that the equals sign is not optional in postgreql.conf on
Debian but I don't see that patch in here.
That's probably because some other tool processes to the configuration
file to find the data directory or something.
Also I thought they modified
pg_hba.conf in some odd way. Are these changes no longer made in 9.0?
I think that was about 10 years ago.
Robert Haas wrote:
On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote:
I think we have three options:
? ? ? ?o ?find if the use of autovacuum_freeze_max_age is safe, or make
? ? ? ? ? it safe
? ? ? ?o ?document that autovacuum_naptime always happens before
? ? ? ? ? autovacuum does anything and set it high
? ? ? ?o ?modify autovacuum to be an enum, with values on/off/disabledI think the last one makes more sense, and is safer if we need to
backpatch this. ?Creating a new variable for this would be confusing
because it could conflict with the 'autovacuum' setting.I have to admit the prospect of abuse is slightly frightening to me
here. I guess we can't be held responsible for users who do dumb
things, but it might not be too clear to someone what the difference
is between autovacuum=off and autovacuum=disabled. I don't really
understand why this is an issue in the first place, though. Surely we
must be setting the XID counter on the new cluster to match the one on
the old cluster, and migrating the relfrozenxid and datfrozenxid
settings, so why does it matter if someone runs vacuum freeze?
First, I am not sure it is a problem, but based on the IRC reports I
felt I should ask here for confirmation. Here is a sample pg_dump
output:
CREATE TABLE sample (
x integer
);
-- For binary upgrade, set relfrozenxid.
UPDATE pg_catalog.pg_class
SET relfrozenxid = '703'
WHERE oid = 'sample'::pg_catalog.regclass;
So, we set the cluster xid while we do this schema-only restore. I
belive it might be possible for autovacuum to run while the schema is
restored, see an empty table, and set the relfrozenxid to be the current
xid, when in fact we are about to put a heap file in place of the
current empty file. I thought the autovacuum_freeze_max_age=2000000000
would prevent this but now I am not sure. I assumed that since the gap
between the restored relfrozenxid and the current counter would
certainly be < 2000000000 that autovacuum would not touch it. It is
possible these users had drastically modified autovacuum_freeze_max_age
to cause 3-billion gaps --- again, I have no direct contact with the
reporters, but I figured being paranoid is a good thing where pg_upgrade
is involved.
I wonder if the fact that these people never reported the bug means
there were doing something odd with their servers.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Bruce Momjian wrote:
First, I am not sure it is a problem, but based on the IRC reports I
felt I should ask here for confirmation. Here is a sample pg_dump
output:CREATE TABLE sample (
x integer
);-- For binary upgrade, set relfrozenxid.
UPDATE pg_catalog.pg_class
SET relfrozenxid = '703'
WHERE oid = 'sample'::pg_catalog.regclass;So, we set the cluster xid while we do this schema-only restore. I
belive it might be possible for autovacuum to run while the schema is
restored, see an empty table, and set the relfrozenxid to be the current
xid, when in fact we are about to put a heap file in place of the
current empty file. I thought the autovacuum_freeze_max_age=2000000000
would prevent this but now I am not sure. I assumed that since the gap
between the restored relfrozenxid and the current counter would
certainly be < 2000000000 that autovacuum would not touch it. It is
possible these users had drastically modified autovacuum_freeze_max_age
to cause 3-billion gaps --- again, I have no direct contact with the
reporters, but I figured being paranoid is a good thing where pg_upgrade
is involved.I wonder if the fact that these people never reported the bug means
there were doing something odd with their servers.
I just updated the C comment about what we are doing:
* Using autovacuum=off disables cleanup vacuum and analyze, but
* freeze vacuums can still happen, so we set
* autovacuum_freeze_max_age very high. We assume all datfrozenxid and
* relfrozen values are less than a gap of 2000000000 from the current
* xid counter, so autovacuum will not touch them.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Bruce Momjian wrote:
I wonder if the fact that these people never reported the bug means
there were doing something odd with their servers.I just updated the C comment about what we are doing:
* Using autovacuum=off disables cleanup vacuum and analyze, but
* freeze vacuums can still happen, so we set
* autovacuum_freeze_max_age very high. We assume all datfrozenxid and
* relfrozen values are less than a gap of 2000000000 from the current
* xid counter, so autovacuum will not touch them.
FYI, 2000000000 is the maximum value for autovacuum_freeze_max_age, so a
user can't set it higher.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Wed, Mar 30, 2011 at 5:27 PM, Bruce Momjian <bruce@momjian.us> wrote:
First, I am not sure it is a problem, but based on the IRC reports I
felt I should ask here for confirmation. Here is a sample pg_dump
output:CREATE TABLE sample (
x integer
);-- For binary upgrade, set relfrozenxid.
UPDATE pg_catalog.pg_class
SET relfrozenxid = '703'
WHERE oid = 'sample'::pg_catalog.regclass;So, we set the cluster xid while we do this schema-only restore. I
belive it might be possible for autovacuum to run while the schema is
restored, see an empty table, and set the relfrozenxid to be the current
xid, when in fact we are about to put a heap file in place of the
current empty file. I thought the autovacuum_freeze_max_age=2000000000
would prevent this but now I am not sure. I assumed that since the gap
between the restored relfrozenxid and the current counter would
certainly be < 2000000000 that autovacuum would not touch it. It is
possible these users had drastically modified autovacuum_freeze_max_age
to cause 3-billion gaps --- again, I have no direct contact with the
reporters, but I figured being paranoid is a good thing where pg_upgrade
is involved.
It does seem possible that that could happen, but I'm not sure exactly
what would be causing autovacuum to fire in the first place. It
wouldn't have to be triggered by the anti-wraparound machinery - if
the table appeared to be in need of vacuuming, then we'd vacuum it,
discover that is was empty, and update relfrozenxid. Hmm... could it
fire just because the table has no stats? But if that were the case
you'd think we'd be seeing this more often.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas wrote:
So, we set the cluster xid while we do this schema-only restore. ?I
belive it might be possible for autovacuum to run while the schema is
restored, see an empty table, and set the relfrozenxid to be the current
xid, when in fact we are about to put a heap file in place of the
current empty file. ?I thought the autovacuum_freeze_max_age=2000000000
would prevent this but now I am not sure. ?I assumed that since the gap
between the restored relfrozenxid and the current counter would
certainly be < 2000000000 that autovacuum would not touch it. ?It is
possible these users had drastically modified autovacuum_freeze_max_age
to cause 3-billion gaps --- again, I have no direct contact with the
reporters, but I figured being paranoid is a good thing where pg_upgrade
is involved.It does seem possible that that could happen, but I'm not sure exactly
what would be causing autovacuum to fire in the first place. It
wouldn't have to be triggered by the anti-wraparound machinery - if
the table appeared to be in need of vacuuming, then we'd vacuum it,
discover that is was empty, and update relfrozenxid. Hmm... could it
fire just because the table has no stats? But if that were the case
you'd think we'd be seeing this more often.
Well, autovacuum=off, so it should only run in freeze mode, and I can't
see how that could happen. I am thinking I have to study autovacuum.c.
I wonder if datfrozenxid could be incremented because the database is
originally empty. It would just need to scan pg_class, not actually
vacuum anything. I wonder if we do that. The bottom line is I am
hanging too much on autovacuum_freeze_max_age causing autovacuum to do
nothing.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Bruce Momjian wrote:
It does seem possible that that could happen, but I'm not sure exactly
what would be causing autovacuum to fire in the first place. It
wouldn't have to be triggered by the anti-wraparound machinery - if
the table appeared to be in need of vacuuming, then we'd vacuum it,
discover that is was empty, and update relfrozenxid. Hmm... could it
fire just because the table has no stats? But if that were the case
you'd think we'd be seeing this more often.Well, autovacuum=off, so it should only run in freeze mode, and I can't
see how that could happen. I am thinking I have to study autovacuum.c.I wonder if datfrozenxid could be incremented because the database is
originally empty. It would just need to scan pg_class, not actually
vacuum anything. I wonder if we do that. The bottom line is I am
hanging too much on autovacuum_freeze_max_age causing autovacuum to do
nothing.
What if we allow autovacuum_max_workers to be set to zero; the current
minimum is one.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Bruce Momjian wrote:
Bruce Momjian wrote:
It does seem possible that that could happen, but I'm not sure exactly
what would be causing autovacuum to fire in the first place. It
wouldn't have to be triggered by the anti-wraparound machinery - if
the table appeared to be in need of vacuuming, then we'd vacuum it,
discover that is was empty, and update relfrozenxid. Hmm... could it
fire just because the table has no stats? But if that were the case
you'd think we'd be seeing this more often.Well, autovacuum=off, so it should only run in freeze mode, and I can't
see how that could happen. I am thinking I have to study autovacuum.c.I wonder if datfrozenxid could be incremented because the database is
originally empty. It would just need to scan pg_class, not actually
vacuum anything. I wonder if we do that. The bottom line is I am
hanging too much on autovacuum_freeze_max_age causing autovacuum to do
nothing.What if we allow autovacuum_max_workers to be set to zero; the current
minimum is one.
I can think of one case where autovacuum_freeze_max_age would be
insufficient. If you set autovacuum_freeze_max_age in the old cluster
to 2B, and you had a database that was near that limit, the tables
created by pg_upgrade's --schema-only restore might create enough new
transactions to cause autovacuum to run in freeze mode. While I think
it is unlikely that is the cause of the problem report, it is enough for
me to discount using autovacuum_freeze_max_age to disable autovacuum
freeze.
I will work on code to allow autovacuum_max_workers to be set to zero in
HEAD and 9.0, and have pg_upgrade us that. I think the maintenance
overhead of an invisible variable is too much.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On 31.03.2011 17:55, Bruce Momjian wrote:
I will work on code to allow autovacuum_max_workers to be set to zero in
HEAD and 9.0, and have pg_upgrade us that.
We've intentionally not allowed the user to disable anti-wraparound
autovacuum before. Do we really want to allow it now for the sake of
pg_upgrade?
I think the maintenance
overhead of an invisible variable is too much.
A simple GUC or command-line switch isn't much code.
Is the problem just that the clog files get removed too early, or is
there something else? If it's just the clog files, we could simply copy
them (again) after updating datfrozenxids.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
On 31.03.2011 17:55, Bruce Momjian wrote:
I will work on code to allow autovacuum_max_workers to be set to zero in
HEAD and 9.0, and have pg_upgrade us that.We've intentionally not allowed the user to disable anti-wraparound
autovacuum before. Do we really want to allow it now for the sake of
pg_upgrade?
Not sure.
I think the maintenance
overhead of an invisible variable is too much.A simple GUC or command-line switch isn't much code.
Well, is this going to show in SHOW ALL or pg_settings? Do we have the
ability to easily disable display of this?
Is the problem just that the clog files get removed too early, or is
there something else? If it's just the clog files, we could simply copy
them (again) after updating datfrozenxids.
The problem is that pg_upgrade through pg_dumpall is setting
pg_database/pg_class frozen xid values and I can't have autovacuum
modifying the system while this is happening.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
I think the maintenance
overhead of an invisible variable is too much.A simple GUC or command-line switch isn't much code.
I like the idea of a command-line switch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company