Should `pg_upgrade --check` check relation filenodes are present?

Started by Craig de Stigteralmost 9 years ago4 messages
#1Craig de Stigter
craig.destigter@koordinates.com

Hi list

We attempted to pg_upgrade a database on a replication slave, and got the
error:

error while creating link for relation "<schema>.<tablename>"

("/var/lib/postgresql-ext/PG_9.2_201204301/19171/141610397" to
"/var/lib/postgresql-ext/PG_9.5_201510051/16401/9911696"): No such file or
directory

The missing table turned out to be an unlogged table, and the data file for
it was not present on the slave machine. That's reasonable. In our case we
are able to start over from a snapshot and drop all the unlogged tables
before trying again.

However, this problem was not caught by the `--check` command. I'm looking
at the source code and it appears that pg_upgrade does not attempt to
verify relation filenodes actually exist before proceeding, whether using
--check or not.

Should it? I assume the reasoning is because it would take a long time and
perhaps the benefit of doing so would be minimal?

--
Regards,
Craig de Stigter

Developer
Koordinates

+64 21 256 9488 <+64%2021%20256%209488> / koordinates.com / @koordinates
<https://twitter.com/koordinates&gt;

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig de Stigter (#1)
Re: Should `pg_upgrade --check` check relation filenodes are present?

Craig de Stigter <craig.destigter@koordinates.com> writes:

We attempted to pg_upgrade a database on a replication slave, and got the
error:

error while creating link for relation "<schema>.<tablename>"
("/var/lib/postgresql-ext/PG_9.2_201204301/19171/141610397" to
"/var/lib/postgresql-ext/PG_9.5_201510051/16401/9911696"): No such file or
directory

The missing table turned out to be an unlogged table, and the data file for
it was not present on the slave machine. That's reasonable. In our case we
are able to start over from a snapshot and drop all the unlogged tables
before trying again.

However, this problem was not caught by the `--check` command. I'm looking
at the source code and it appears that pg_upgrade does not attempt to
verify relation filenodes actually exist before proceeding, whether using
--check or not.

Should it? I assume the reasoning is because it would take a long time and
perhaps the benefit of doing so would be minimal?

This failure would occur before we'd done anything irretrievable to the
source DB, so I'm not all that concerned. You could have just re-initdb'd
the target directory and started over (after dropping the unlogged tables
of course).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Craig de Stigter (#1)
Re: Should `pg_upgrade --check` check relation filenodes are present?

On 1/31/17 4:57 PM, Craig de Stigter wrote:

However, this problem was not caught by the `--check` command. I'm
looking at the source code and it appears that pg_upgrade does not
attempt to verify relation filenodes actually exist before proceeding,
whether using --check or not.

The purpose of --check is to see if there is anything in your database
that pg_upgrade cannot upgrade. Its purpose is not to detect general
damage in a database.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Bruce Momjian
bruce@momjian.us
In reply to: Craig de Stigter (#1)
Re: Should `pg_upgrade --check` check relation filenodes are present?

On Wed, Feb 1, 2017 at 10:57:01AM +1300, Craig de Stigter wrote:

Hi list

We attempted to pg_upgrade a database on a replication slave, and got the
error:

error while creating link for relation "<schema>.<tablename>" ("/var/
lib/postgresql-ext/PG_9.2_201204301/19171/141610397" to "/var/lib/
postgresql-ext/PG_9.5_201510051/16401/9911696"): No such file or
directory

The missing table turned out to be an unlogged table, and the data file for it
was not present on the slave machine. That's reasonable. In our case we are
able to start over from a snapshot and drop all the unlogged tables before
trying again.

However, this problem was not caught by the `--check` command. I'm looking at
the source code and it appears that pg_upgrade does not attempt to verify
relation filenodes actually exist before proceeding, whether using --check or
not.

Should it? I assume the reasoning is because it would take a long time and
perhaps the benefit of doing so would be minimal?

I think pg_upgrade needs to be improved in this area, but I am not sure
how yet. Clearly the check should detect this or the upgrade should
succeed.

First, you are not using the standby upgrade instructions in step 10
here, right?

https://www.postgresql.org/docs/9.6/static/pgupgrade.html

I assume you don't want this standby to rejoin the primary, you just
want to upgrade it.

Second, I thought unlogged tables had empty files on the standby, not
_missing_ files. Is that correct? Should pg_upgrade just allow missing
unlogged table files? I don't see any way to detect we are running on a
standby since the server is in write mode to run pg_upgrade.

I can develop a patch once I have answers to these questions.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers