pg_upgrade from 9.1.3 to 9.2 failed

Started by Rural Hunterover 13 years ago49 messageshackers

ruralhunter@gmail.com

over 13 years ago

I am trying to test the upgrade from my 9.1.3 db to 9.2 on ubuntu 10.10
server. I got error below when run the pg_upgrade command. What can I do
for this?

$ /opt/PostgreSQL/9.2/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin -B
/opt/PostgreSQL/9.2/bin -d /raid/pgsql -D /raid/pg92data
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories ok
Checking cluster versions ok
Checking database user is a superuser ok
Checking for prepared transactions ok
Checking for reg* system OID user data types ok
Checking for contrib/isn with bigint-passing mismatch ok
Creating catalog dump ok
Checking for presence of required libraries ok
Checking database user is a superuser ok
Checking for prepared transactions ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID for new cluster ok
Resetting WAL archives ok
Setting frozenxid counters in new cluster ok
Creating databases in the new cluster ok
Adding support functions to new cluster ok
Restoring database schema to new cluster ok
Removing support functions from new cluster ok
Copying user relation files
/raid/pgsql/base/6087920/6088238
old and new databases "testdb" have a different number of relations
Failure, exiting

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Rural Hunter (#1)

Re: pg_upgrade from 9.1.3 to 9.2 failed

On Fri, Sep 14, 2012 at 01:43:30PM +0800, Rural Hunter wrote:

I am trying to test the upgrade from my 9.1.3 db to 9.2 on ubuntu
10.10 server. I got error below when run the pg_upgrade command.
What can I do for this?

$ /opt/PostgreSQL/9.2/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin -B
/opt/PostgreSQL/9.2/bin -d /raid/pgsql -D /raid/pg92data
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories ok
Checking cluster versions ok
Checking database user is a superuser ok
Checking for prepared transactions ok
Checking for reg* system OID user data types ok
Checking for contrib/isn with bigint-passing mismatch ok
Creating catalog dump ok
Checking for presence of required libraries ok
Checking database user is a superuser ok
Checking for prepared transactions ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID for new cluster ok
Resetting WAL archives ok
Setting frozenxid counters in new cluster ok
Creating databases in the new cluster ok
Adding support functions to new cluster ok
Restoring database schema to new cluster ok
Removing support functions from new cluster ok
Copying user relation files
/raid/pgsql/base/6087920/6088238
old and new databases "testdb" have a different number of relations
Failure, exiting

That is an odd failure. That check was added in PG 9.1 and this is the
first time I am seeing this failure.

The check is to make sure that once we have created all the user schema
details in the new cluster, that there are the same number of objects in
the new and old databases.

Obviously there are a different number in your case here, but I don't
know why those would be different, and in fact, because we have never
hit this, there isn't even any debug output that shows the source of the
difference.

If I send you a patch can you compile it and send back the debug output
it produces?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#2)

Re: pg_upgrade from 9.1.3 to 9.2 failed

于2012年9月14日 22:26:16,Bruce Momjian写到:

On Fri, Sep 14, 2012 at 01:43:30PM +0800, Rural Hunter wrote:

I am trying to test the upgrade from my 9.1.3 db to 9.2 on ubuntu
10.10 server. I got error below when run the pg_upgrade command.
What can I do for this?

$ /opt/PostgreSQL/9.2/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin -B
/opt/PostgreSQL/9.2/bin -d /raid/pgsql -D /raid/pg92data
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories ok
Checking cluster versions ok
Checking database user is a superuser ok
Checking for prepared transactions ok
Checking for reg* system OID user data types ok
Checking for contrib/isn with bigint-passing mismatch ok
Creating catalog dump ok
Checking for presence of required libraries ok
Checking database user is a superuser ok
Checking for prepared transactions ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID for new cluster ok
Resetting WAL archives ok
Setting frozenxid counters in new cluster ok
Creating databases in the new cluster ok
Adding support functions to new cluster ok
Restoring database schema to new cluster ok
Removing support functions from new cluster ok
Copying user relation files
/raid/pgsql/base/6087920/6088238
old and new databases "testdb" have a different number of relations
Failure, exiting

That is an odd failure. That check was added in PG 9.1 and this is the
first time I am seeing this failure.

The check is to make sure that once we have created all the user schema
details in the new cluster, that there are the same number of objects in
the new and old databases.

Obviously there are a different number in your case here, but I don't
know why those would be different, and in fact, because we have never
hit this, there isn't even any debug output that shows the source of the
difference.

If I send you a patch can you compile it and send back the debug output
it produces?

Yes sure, I will try to compile and retest with it.

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Rural Hunter (#3)

Re: pg_upgrade from 9.1.3 to 9.2 failed

On Fri, Sep 14, 2012 at 11:53:38PM +0800, Rural Hunter wrote:

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID for new cluster ok
Resetting WAL archives ok
Setting frozenxid counters in new cluster ok
Creating databases in the new cluster ok
Adding support functions to new cluster ok
Restoring database schema to new cluster ok
Removing support functions from new cluster ok
Copying user relation files
/raid/pgsql/base/6087920/6088238
old and new databases "testdb" have a different number of relations
Failure, exiting

That is an odd failure. That check was added in PG 9.1 and this is the
first time I am seeing this failure.

The check is to make sure that once we have created all the user schema
details in the new cluster, that there are the same number of objects in
the new and old databases.

Obviously there are a different number in your case here, but I don't
know why those would be different, and in fact, because we have never
hit this, there isn't even any debug output that shows the source of the
difference.

If I send you a patch can you compile it and send back the debug output
it produces?

Yes sure, I will try to compile and retest with it.

Actually, I have a simpler idea. At the point where it fails, you can
run pg_dump --schema-only on the testdb database in the old and new
cluster and then diff those output files and email the result to us; it
should show the mismatch. I am not sure if the dumps will output the
objects in the same order, it might.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#4)

Re: pg_upgrade from 9.1.3 to 9.2 failed

于 2012/9/15 2:39, Bruce Momjian 写道:

On Fri, Sep 14, 2012 at 11:53:38PM +0800, Rural Hunter wrote:

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID for new cluster ok
Resetting WAL archives ok
Setting frozenxid counters in new cluster ok
Creating databases in the new cluster ok
Adding support functions to new cluster ok
Restoring database schema to new cluster ok
Removing support functions from new cluster ok
Copying user relation files
/raid/pgsql/base/6087920/6088238
old and new databases "testdb" have a different number of relations
Failure, exiting

That is an odd failure. That check was added in PG 9.1 and this is the
first time I am seeing this failure.

The check is to make sure that once we have created all the user schema
details in the new cluster, that there are the same number of objects in
the new and old databases.

Obviously there are a different number in your case here, but I don't
know why those would be different, and in fact, because we have never
hit this, there isn't even any debug output that shows the source of the
difference.

If I send you a patch can you compile it and send back the debug output
it produces?

Yes sure, I will try to compile and retest with it.

Actually, I have a simpler idea. At the point where it fails, you can
run pg_dump --schema-only on the testdb database in the old and new
cluster and then diff those output files and email the result to us; it
should show the mismatch. I am not sure if the dumps will output the
objects in the same order, it might.

diff attached.

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Rural Hunter (#5)

Re: pg_upgrade from 9.1.3 to 9.2 failed

On Sat, Sep 15, 2012 at 11:40:06AM +0800, Rural Hunter wrote:

The check is to make sure that once we have created all the user schema
details in the new cluster, that there are the same number of objects in
the new and old databases.

Obviously there are a different number in your case here, but I don't
know why those would be different, and in fact, because we have never
hit this, there isn't even any debug output that shows the source of the
difference.

If I send you a patch can you compile it and send back the debug output
it produces?

Yes sure, I will try to compile and retest with it.

Actually, I have a simpler idea. At the point where it fails, you can
run pg_dump --schema-only on the testdb database in the old and new
cluster and then diff those output files and email the result to us; it
should show the mismatch. I am not sure if the dumps will output the
objects in the same order, it might.
diff attached.

OK, I see many new ALTER TABLE commands, but nothing that would cause a
difference in relation count.

Attached is a patch that will return the OID of the old/new mismatched
entries. Please research the pg_class objects on the old/new clusters
that have the mismatch and let me know. It might be something that
isn't in the old cluster, or not in the new cluster.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#6)

Re: pg_upgrade from 9.1.3 to 9.2 failed

于 2012/9/16 2:06, Bruce Momjian 写道:

On Sat, Sep 15, 2012 at 11:40:06AM +0800, Rural Hunter wrote:
The check is to make sure that once we have created all the user schema
details in the new cluster, that there are the same number of objects in
the new and old databases.

Obviously there are a different number in your case here, but I don't
know why those would be different, and in fact, because we have never
hit this, there isn't even any debug output that shows the source of the
difference.

If I send you a patch can you compile it and send back the debug output
it produces?

Yes sure, I will try to compile and retest with it.

Actually, I have a simpler idea. At the point where it fails, you can
run pg_dump --schema-only on the testdb database in the old and new
cluster and then diff those output files and email the result to us; it
should show the mismatch. I am not sure if the dumps will output the
objects in the same order, it might.
diff attached.
OK, I see many new ALTER TABLE commands, but nothing that would cause a
difference in relation count.

Attached is a patch that will return the OID of the old/new mismatched
entries. Please research the pg_class objects on the old/new clusters
that have the mismatch and let me know. It might be something that
isn't in the old cluster, or not in the new cluster.

I ran the pg_upgrade with the patch and found the problematic object is
a toast object.
Copying user relation files
/raid/pgsql/base/6087920/6088238
Mismatch of relation OID in database "forummon": old OID 16439148, new
OID 16439322

In old cluster:
# select * from pg_class WHERE oid=16439148;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | reltoastrelid |
reltoastidxid | relhasindex | relisshared | relpersistence | relkind |
relnatts | relchecks | relhasoids | relhaspkey | relhasrules |
relhastriggers | relhassubclass | relfrozenxid | relacl | reloptions
-------------------+--------------+----------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+--------+------------
pg_toast_16439145 | 99 | 16439149 | 0 | 10 | 0 | 16439148 | 0 | 0 | 0 |
0 | 16439150 | t | f | p | t | 3 | 0 | f | t | f | f | f | 630449585 | |
(1 row)

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Rural Hunter (#7)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:

OK, I see many new ALTER TABLE commands, but nothing that would cause a
difference in relation count.

Attached is a patch that will return the OID of the old/new mismatched
entries. Please research the pg_class objects on the old/new clusters
that have the mismatch and let me know. It might be something that
isn't in the old cluster, or not in the new cluster.

I ran the pg_upgrade with the patch and found the problematic object
is a toast object.
Copying user relation files
/raid/pgsql/base/6087920/6088238
Mismatch of relation OID in database "forummon": old OID 16439148,
new OID 16439322

In old cluster:
# select * from pg_class WHERE oid=16439148;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | reltoastrelid |
reltoastidxid | relhasindex | relisshared | relpersistence | relkind
| relnatts | relchecks | relhasoids | relhaspkey | relhasrules |
relhastriggers | relhassubclass | relfrozenxid | relacl | reloptions
-------------------+--------------+----------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+--------+------------
pg_toast_16439145 | 99 | 16439149 | 0 | 10 | 0 | 16439148 | 0 | 0 |
0 | 0 | 16439150 | t | f | p | t | 3 | 0 | f | t | f | f | f |
630449585 | |
(1 row)

But it doesn't exist in new cluster:
select * from pg_class WHERE oid=16439148;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | relallvisible |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl | reloptions
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+--------+------------
(0 rows)

[ Thread moved to hackers list.]

OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it. Can you find out what table references this toast
table? Try this query on the old cluster:

select oid, * from pg_class WHERE reltoastrelid = 16439148;

I believe it will have an oid of 16439145, or it might not exist.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Bruce Momjian (#8)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:

I ran the pg_upgrade with the patch and found the problematic object
is a toast object.

OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it. Can you find out what table references this toast
table? Try this query on the old cluster:

select oid, * from pg_class WHERE reltoastrelid = 16439148;

I believe it will have an oid of 16439145, or it might not exist.

Most likely what's happened is that the table has a toast table that
it doesn't need, as a result of having dropped the only wide column(s)
in it. So when the table is recreated in the new cluster, there's no
toast table for it.

So what you need to do is get rid of that check, or relax it so that it
doesn't insist on toast tables matching up exactly. It seems possible
that there could be discrepancies in the other direction too, ie,
new cluster created a toast table when old cluster didn't have one.

regards, tom lane

#10

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#8)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

于2012年9月17日 1:17:46,Bruce Momjian写到:

On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:

OK, I see many new ALTER TABLE commands, but nothing that would cause a
difference in relation count.

Attached is a patch that will return the OID of the old/new mismatched
entries. Please research the pg_class objects on the old/new clusters
that have the mismatch and let me know. It might be something that
isn't in the old cluster, or not in the new cluster.

I ran the pg_upgrade with the patch and found the problematic object
is a toast object.
Copying user relation files
/raid/pgsql/base/6087920/6088238
Mismatch of relation OID in database "forummon": old OID 16439148,
new OID 16439322

In old cluster:
# select * from pg_class WHERE oid=16439148;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | reltoastrelid |
reltoastidxid | relhasindex | relisshared | relpersistence | relkind
| relnatts | relchecks | relhasoids | relhaspkey | relhasrules |
relhastriggers | relhassubclass | relfrozenxid | relacl | reloptions
-------------------+--------------+----------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+--------+------------
pg_toast_16439145 | 99 | 16439149 | 0 | 10 | 0 | 16439148 | 0 | 0 |
0 | 0 | 16439150 | t | f | p | t | 3 | 0 | f | t | f | f | f |
630449585 | |
(1 row)

But it doesn't exist in new cluster:
select * from pg_class WHERE oid=16439148;
relname | relnamespace | reltype | reloftype | relowner | relam |
relfilenode | reltablespace | relpages | reltuples | relallvisible |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl | reloptions
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+--------+------------
(0 rows)

[ Thread moved to hackers list.]

OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it. Can you find out what table references this toast
table? Try this query on the old cluster:

select oid, * from pg_class WHERE reltoastrelid = 16439148;

I believe it will have an oid of 16439145, or it might not exist.

# select oid, * from pg_class WHERE reltoastrelid = 16439148;
oid | relname | relnamespace | reltype | reloftype |
relowner | relam | relfilenode | reltablespace | relpages | reltuples |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl | reloptions
----------+--------------+--------------+----------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+-----------------------------------------+------------
16439145 | sql_features | 16438995 | 16439147 | 0 |
10 | 0 | 16439145 | 0 | 0 | 0 |
16439148 | 0 | f | f | p |
r | 7 | 0 | f | f | f
| f | f | 630449585 |
{postgres=arwdDxt/postgres,=r/postgres} |
(1 row)

It's not a table. I haven't seen this name before. not sure why it
exists. So what's the next thing I can do?

#11

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Rural Hunter (#10)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

Rural Hunter <ruralhunter@gmail.com> writes:

# select oid, * from pg_class WHERE reltoastrelid = 16439148;
oid | relname | relnamespace | reltype | reloftype |
relowner | relam | relfilenode | reltablespace | relpages | reltuples |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl | reloptions
----------+--------------+--------------+----------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+-----------------------------------------+------------
16439145 | sql_features | 16438995 | 16439147 | 0 |
10 | 0 | 16439145 | 0 | 0 | 0 |
16439148 | 0 | f | f | p |
r | 7 | 0 | f | f | f
| f | f | 630449585 |
{postgres=arwdDxt/postgres,=r/postgres} |
(1 row)

Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.

I wonder whether you dropped and recreated the information_schema in
the lifetime of this database? We have recommended doing that in the
past, IIRC. Could such a thing have confused pg_dump?

regards, tom lane

#12

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Tom Lane (#11)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

于2012年9月17日 9:48:58,Tom Lane写到:

Rural Hunter <ruralhunter@gmail.com> writes:

# select oid, * from pg_class WHERE reltoastrelid = 16439148;
oid | relname | relnamespace | reltype | reloftype |
relowner | relam | relfilenode | reltablespace | relpages | reltuples |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl | reloptions
----------+--------------+--------------+----------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+-----------------------------------------+------------
16439145 | sql_features | 16438995 | 16439147 | 0 |
10 | 0 | 16439145 | 0 | 0 | 0 |
16439148 | 0 | f | f | p |
r | 7 | 0 | f | f | f
| f | f | 630449585 |
{postgres=arwdDxt/postgres,=r/postgres} |
(1 row)

Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.

I wonder whether you dropped and recreated the information_schema in
the lifetime of this database? We have recommended doing that in the
past, IIRC. Could such a thing have confused pg_dump?

regards, tom lane

No, I have never manually re-created the table. This is the first time
I see the name. But I'm not sure other things I installed before
recreated it or not, such as pg_buffercache etc. One more thing, is
this a hidden table? I can see it with '\d
information_schema.sql_features' but it's not in the list of '\d'.

#13

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Rural Hunter (#12)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

Rural Hunter <ruralhunter@gmail.com> writes:

于2012年9月17日 9:48:58,Tom Lane写到:

I wonder whether you dropped and recreated the information_schema in
the lifetime of this database? We have recommended doing that in the
past, IIRC. Could such a thing have confused pg_dump?

No, I have never manually re-created the table.

I think you must have, because the query output shows that sql_features,
its rowtype, and the information_schema all have OIDs much larger than
they would have had in a virgin installation. The large relfilenode
could have been explained by a VACUUM FULL, but the other OIDs wouldn't
have been changed by that.

This is the first time
I see the name. But I'm not sure other things I installed before
recreated it or not, such as pg_buffercache etc. One more thing, is
this a hidden table? I can see it with '\d
information_schema.sql_features' but it's not in the list of '\d'.

That just means that information_schema is not in your search_path.

regards, tom lane

#14

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Tom Lane (#9)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

On Sun, Sep 16, 2012 at 06:04:16PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:

I ran the pg_upgrade with the patch and found the problematic object
is a toast object.

OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it. Can you find out what table references this toast
table? Try this query on the old cluster:

select oid, * from pg_class WHERE reltoastrelid = 16439148;

I believe it will have an oid of 16439145, or it might not exist.

Most likely what's happened is that the table has a toast table that
it doesn't need, as a result of having dropped the only wide column(s)
in it. So when the table is recreated in the new cluster, there's no
toast table for it.

So what you need to do is get rid of that check, or relax it so that it
doesn't insist on toast tables matching up exactly. It seems possible
that there could be discrepancies in the other direction too, ie,
new cluster created a toast table when old cluster didn't have one.

pg_dump.c already has this code:

if (OidIsValid(pg_class_reltoastrelid))
{
/*
* One complexity is that the table definition might not require
* the creation of a TOAST table, and the TOAST table might have
* been created long after table creation, when the table was
* loaded with wide data. By setting the TOAST oid we force
* creation of the TOAST heap and TOAST index by the backend so we
* can cleanly copy the files during binary upgrade.
*/

appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);

/* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
}

As you can see, we look at the existing TOAST usage and force the new
cluster to match. As I remember we replay the DROP COLUMN in binary
upgrade mode so the new cluster always matches the old cluster's TOAST
usage. I certainly have never seen this bug reported before.

I think the big question is why did this case fail? I can say that the
query that pulls details from each cluster skips information_schema or
oid < FirstNormalObjectId. I wonder if there is a mismatch between what
pg_dump filters out and pg_upgrade. Can you tell us the schema of the
'sql_features' table?

Also, does it appear in the pg_dump --schema-only output? I don't think
it does because it wasn't reported in the pg_dump --schema-only diff I
requested, and pg_dump wouldn't have dumped it from the new cluster.

What that means is that 'sql_features' got a TOAST table in the old
cluster but while 'sql_features' also has a TOAST table in the new
cluster, it isn't processed by pg_upgrade because it is in the
information schema and has an oid < FirstNormalObjectId.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#15

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Tom Lane (#11)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

On Sun, Sep 16, 2012 at 09:48:58PM -0400, Tom Lane wrote:

Rural Hunter <ruralhunter@gmail.com> writes:

# select oid, * from pg_class WHERE reltoastrelid = 16439148;
oid | relname | relnamespace | reltype | reloftype |
relowner | relam | relfilenode | reltablespace | relpages | reltuples |
reltoastrelid | reltoastidxid | relhasindex | relisshared |
relpersistence | relkind | relnatts | relchecks | relhasoids |
relhaspkey | relhasrules | relhastriggers | relhassubclass |
relfrozenxid | relacl | reloptions
----------+--------------+--------------+----------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+--------------+-----------------------------------------+------------
16439145 | sql_features | 16438995 | 16439147 | 0 |
10 | 0 | 16439145 | 0 | 0 | 0 |
16439148 | 0 | f | f | p |
r | 7 | 0 | f | f | f
| f | f | 630449585 |
{postgres=arwdDxt/postgres,=r/postgres} |
(1 row)

Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.

I bet pg_upgrade is picking it up from the old cluster because it has an
oid >= FirstNormalObjectId and the table is not in the information
schema.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#16

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Bruce Momjian (#15)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Sep 16, 2012 at 09:48:58PM -0400, Tom Lane wrote:

Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.

I bet pg_upgrade is picking it up from the old cluster because it has an
oid >= FirstNormalObjectId and the table is not in the information
schema.

If it *isn't* in information_schema, but is just some random table that
happens to be named sql_features, then it's hard to explain why there's
anything going wrong at all. My money is on the OP having done a reload
of the information_schema (as per, eg, the release notes for 9.1.2), and
somehow that's confusing pg_dump and/or pg_upgrade.

regards, tom lane

#17

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#14)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

于2012年9月17日 12:32:36,Bruce Momjian写到:

On Sun, Sep 16, 2012 at 06:04:16PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Sep 16, 2012 at 12:38:37PM +0800, Rural Hunter wrote:

I ran the pg_upgrade with the patch and found the problematic object
is a toast object.

OK, this is exactly what I wanted to see, and it explains why pg_dump
didn't show it. Can you find out what table references this toast
table? Try this query on the old cluster:

select oid, * from pg_class WHERE reltoastrelid = 16439148;

I believe it will have an oid of 16439145, or it might not exist.

Most likely what's happened is that the table has a toast table that
it doesn't need, as a result of having dropped the only wide column(s)
in it. So when the table is recreated in the new cluster, there's no
toast table for it.

So what you need to do is get rid of that check, or relax it so that it
doesn't insist on toast tables matching up exactly. It seems possible
that there could be discrepancies in the other direction too, ie,
new cluster created a toast table when old cluster didn't have one.

pg_dump.c already has this code:

if (OidIsValid(pg_class_reltoastrelid))
{
/*
* One complexity is that the table definition might not require
* the creation of a TOAST table, and the TOAST table might have
* been created long after table creation, when the table was
* loaded with wide data. By setting the TOAST oid we force
* creation of the TOAST heap and TOAST index by the backend so we
* can cleanly copy the files during binary upgrade.
*/

appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);

/* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
}

As you can see, we look at the existing TOAST usage and force the new
cluster to match. As I remember we replay the DROP COLUMN in binary
upgrade mode so the new cluster always matches the old cluster's TOAST
usage. I certainly have never seen this bug reported before.

I think the big question is why did this case fail? I can say that the
query that pulls details from each cluster skips information_schema or
oid < FirstNormalObjectId. I wonder if there is a mismatch between what
pg_dump filters out and pg_upgrade. Can you tell us the schema of the
'sql_features' table?

Also, does it appear in the pg_dump --schema-only output? I don't think
it does because it wasn't reported in the pg_dump --schema-only diff I
requested, and pg_dump wouldn't have dumped it from the new cluster.

right. I checked the dump from the old cluster and it's not there.

Show quoted text

What that means is that 'sql_features' got a TOAST table in the old
cluster but while 'sql_features' also has a TOAST table in the new
cluster, it isn't processed by pg_upgrade because it is in the
information schema and has an oid < FirstNormalObjectId.

#18

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Tom Lane (#16)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

于2012年9月17日 12:47:11,Tom Lane写到:

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Sep 16, 2012 at 09:48:58PM -0400, Tom Lane wrote:

Well, that's even stranger, because (1) information_schema.sql_features
ought to have a toast table in either version, and (2) neither pg_dump
nor pg_upgrade ought to be attempting to dump or transfer that table.

I bet pg_upgrade is picking it up from the old cluster because it has an
oid >= FirstNormalObjectId and the table is not in the information
schema.

If it *isn't* in information_schema, but is just some random table that
happens to be named sql_features, then it's hard to explain why there's
anything going wrong at all. My money is on the OP having done a reload
of the information_schema (as per, eg, the release notes for 9.1.2), and
somehow that's confusing pg_dump and/or pg_upgrade.

ah yes yes, now I can remember it! I have followed the release notes
and re-created the whole information_schema schema.

Show quoted text

regards, tom lane

#19

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Rural Hunter (#17)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

On Mon, Sep 17, 2012 at 01:03:37PM +0800, Rural Hunter wrote:

As you can see, we look at the existing TOAST usage and force the new
cluster to match. As I remember we replay the DROP COLUMN in binary
upgrade mode so the new cluster always matches the old cluster's TOAST
usage. I certainly have never seen this bug reported before.

I think the big question is why did this case fail? I can say that the
query that pulls details from each cluster skips information_schema or
oid < FirstNormalObjectId. I wonder if there is a mismatch between what
pg_dump filters out and pg_upgrade. Can you tell us the schema of the
'sql_features' table?

# select * from pg_tables where tablename='sql_features';
schemaname | tablename | tableowner | tablespace |
hasindexes | hasrules | hastriggers
--------------------+--------------+------------+------------+------------+----------+-------------
information_schema | sql_features | postgres | | f
| f | f
(1 row)

OK, good to know. This is the query pg_upgrade 9.2 uses to pull
information from 9.1 and 9.2:

SELECT c.oid, n.nspname, c.relname, c.relfilenode, c.reltablespace, t.spclocation
FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
LEFT OUTER JOIN pg_catalog.pg_tablespace t ON c.reltablespace = t.oid
WHERE relkind IN ('r','t', 'i', 'S') AND
((n.nspname !~ '^pg_temp_' AND
n.nspname !~ '^pg_toast_temp_' AND
n.nspname NOT IN ('pg_catalog', 'information_schema', 'binary_upgrade') AND
c.oid >= 16384
)
OR
(n.nspname = 'pg_catalog' AND
relname IN
('pg_largeobject', 'pg_largeobject_loid_pn_index', 'pg_largeobject_metadata', 'pg_largeobject_metadata_oid_index')
)
)
ORDER BY 1;

Based on the fact that sql_features exists in the information_schema
schema, I don't think 'sql_features' table is actually being processed
by pg_upgrade, but I think its TOAST table, because it has a high oid,
is being processed because it is in the pg_toast schema. This is
causing the mismatch between the old and new clusters.

I am thinking this query needs to be split apart into a UNION where the
second part handles TOAST tables and looks at the schema of the _owner_
of the TOAST table. Needs to be backpatched too.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#20

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Bruce Momjian (#19)

Re: [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

On Mon, Sep 17, 2012 at 05:07:23PM -0400, Bruce Momjian wrote:

# select * from pg_tables where tablename='sql_features';
schemaname | tablename | tableowner | tablespace |
hasindexes | hasrules | hastriggers
--------------------+--------------+------------+------------+------------+----------+-------------
information_schema | sql_features | postgres | | f
| f | f
(1 row)

OK, good to know. This is the query pg_upgrade 9.2 uses to pull
information from 9.1 and 9.2:

SELECT c.oid, n.nspname, c.relname, c.relfilenode, c.reltablespace, t.spclocation
FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
LEFT OUTER JOIN pg_catalog.pg_tablespace t ON c.reltablespace = t.oid
WHERE relkind IN ('r','t', 'i', 'S') AND
((n.nspname !~ '^pg_temp_' AND
n.nspname !~ '^pg_toast_temp_' AND
n.nspname NOT IN ('pg_catalog', 'information_schema', 'binary_upgrade') AND
c.oid >= 16384
)
OR
(n.nspname = 'pg_catalog' AND
relname IN
('pg_largeobject', 'pg_largeobject_loid_pn_index', 'pg_largeobject_metadata', 'pg_largeobject_metadata_oid_index')
)
)
ORDER BY 1;

Based on the fact that sql_features exists in the information_schema
schema, I don't think 'sql_features' table is actually being processed
by pg_upgrade, but I think its TOAST table, because it has a high oid,
is being processed because it is in the pg_toast schema. This is
causing the mismatch between the old and new clusters.

I am thinking this query needs to be split apart into a UNION where the
second part handles TOAST tables and looks at the schema of the _owner_
of the TOAST table. Needs to be backpatched too.

OK, I am at a conference now so will not be able to write-up a patch
until perhaps next week. You can drop the information schema in the old
database and pg_upgrade should run fine. I will test your failure once
I create a patch.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#21

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Bruce Momjian (#20)

#22

Rural Hunter

ruralhunter@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#20)