Unable to dump database using pg_dump

Started by Adam Dearalmost 18 years ago25 messagesgeneral
Jump to latest
#1Adam Dear
adear@usnx.net

I am migrating to a new server. I am trying to get a dump of my
database to import into postgres 8.1.11 on my new server. When I run
the pg_dump commmand:

pg_dump --clean --file=madison.sql --schema=public -U madisoncounty
madisoncounty

I get this message:

pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression

I'm pretty new to postgres, and I'm not sure how to debug / fix this
issue. I did a VACUUM FULL on the server before I tried to do the dump.

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Adam Dear (#1)
Re: Unable to dump database using pg_dump

Adam Dear wrote:

pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression

Is the pg_dump command from the old postgresql install or the new one?

--
Craig Ringer

#3Adam Dear
adear@usnx.net
In reply to: Craig Ringer (#2)
Re: Unable to dump database using pg_dump

The pg_dump command is from version 7.4.11. This is the version of
postgres I am running on my old server.

Craig Ringer wrote:

Show quoted text

Adam Dear wrote:

pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression

Is the pg_dump command from the old postgresql install or the new one?

--
Craig Ringer

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#3)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression

The pg_dump command is from version 7.4.11. This is the version of
postgres I am running on my old server.

Just FYI, when doing a version upgrade it's considered better practice
to dump with the newer version's pg_dump, if possible. Sometimes the
newer pg_dump knows things that will smooth the migration. However,
that's not related to your immediate problem.

A look at the pg_dump source code shows that the subquery it's
complaining about is trying to fetch the name of each view's owner.
Apparently you've got multiple rows in pg_shadow with the same usesysid
--- or perhaps just one row but the index on usesysid is corrupt and is
returning it more than once.  Try
	select ctid, usename, usesysid from pg_shadow;
and see if you see any duplicate usesysid's.  If so, try removing the
extra copies (use WHERE ctid = '...' if there isn't any other difference
between the rows).  If you see no dups in a query like this, then the
index is at fault --- try reindexing.

regards, tom lane

#5Adam Dear
adear@usnx.net
In reply to: Tom Lane (#4)
Re: Unable to dump database using pg_dump

I ran the select as instructed, and this is the output:

madisoncounty=# select ctid, usename, usesysid from pg_shadow;
ctid | usename | usesysid
-------+---------------+----------
(0,1) | postgres | 1
(0,2) | postgres | 1
(0,5) | madisoncounty | 100
(3 rows)

I then did:

delete from pg_shadow where ctid='(0,2)';
DELETE 1

I had already reindexed the tables. Now when I run the command, I get this:

pg_dump: [archiver (db)] connection to database "madisoncounty" failed:
FATAL: user "postgres" does not exist

Also, I have been connecting to the db as the postgres user. Now I am
unable to connect to the database with the postgres user, or with the
actual user for the site.

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression

The pg_dump command is from version 7.4.11. This is the version of
postgres I am running on my old server.

Just FYI, when doing a version upgrade it's considered better practice
to dump with the newer version's pg_dump, if possible. Sometimes the
newer pg_dump knows things that will smooth the migration. However,
that's not related to your immediate problem.

A look at the pg_dump source code shows that the subquery it's
complaining about is trying to fetch the name of each view's owner.
Apparently you've got multiple rows in pg_shadow with the same usesysid
--- or perhaps just one row but the index on usesysid is corrupt and is
returning it more than once.  Try
select ctid, usename, usesysid from pg_shadow;
and see if you see any duplicate usesysid's.  If so, try removing the
extra copies (use WHERE ctid = '...' if there isn't any other difference
between the rows).  If you see no dups in a query like this, then the
index is at fault --- try reindexing.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#5)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

I ran the select as instructed, and this is the output:
madisoncounty=# select ctid, usename, usesysid from pg_shadow;
ctid | usename | usesysid
-------+---------------+----------
(0,1) | postgres | 1
(0,2) | postgres | 1
(0,5) | madisoncounty | 100
(3 rows)

I then did:

delete from pg_shadow where ctid='(0,2)';
DELETE 1

I had already reindexed the tables. Now when I run the command, I get this:

pg_dump: [archiver (db)] connection to database "madisoncounty" failed:
FATAL: user "postgres" does not exist

How annoying :-(. And I suppose madisoncounty isn't a superuser,
so you're now stuck with no working superuser. What you'll need
to do is shut down the database and start up a standalone backend
(read the "postgres" reference page about how to work in this mode).
What I'd then do is delete the other postgres row (if you can still
see it) and do CREATE USER postgres WITH SYSID 1 CREATEUSER CREATEDB.
That should get you back to having a working postgres account.

It's hard to be sure how you got into this state, although one possible
theory is you got burnt by not vacuuming pg_shadow on a regular basis.
The newer version should keep you out of that error, if so.

regards, tom lane

#7Adam Dear
adear@usnx.net
In reply to: Tom Lane (#6)
Re: Unable to dump database using pg_dump

I really appreciate you help so far. here is what I am seeing . I did:

select * from pg_shadow;

1: usename (typeid = 19, len = 64, typmod = -1, byval = f)
2: usesysid (typeid = 23, len = 4, typmod = -1, byval = t)
3: usecreatedb (typeid = 16, len = 1, typmod = -1, byval = t)
4: usesuper (typeid = 16, len = 1, typmod = -1, byval = t)
5: usecatupd (typeid = 16, len = 1, typmod = -1, byval = t)
6: passwd (typeid = 25, len = -1, typmod = -1, byval = f)
7: valuntil (typeid = 702, len = 4, typmod = -1, byval = t)
8: useconfig (typeid = 1009, len = -1, typmod = -1, byval = f)
----
1: usename = "postgres" (typeid = 19, len = 64, typmod
= -1, byval = f)
2: usesysid = "1" (typeid = 23, len = 4, typmod = -1,
byval = t)
3: usecreatedb = "t" (typeid = 16, len = 1, typmod = -1,
byval = t)
4: usesuper = "t" (typeid = 16, len = 1, typmod = -1,
byval = t)
5: usecatupd = "t" (typeid = 16, len = 1, typmod = -1,
byval = t)
----

I'm not seeing the madisoncounty user in there. Also, I tried starting
the db using /etc/init.d/postgres start, and it fails. Is that the
proper way to get the service going, or should I be doing something else?

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

I ran the select as instructed, and this is the output:
madisoncounty=# select ctid, usename, usesysid from pg_shadow;
ctid | usename | usesysid
-------+---------------+----------
(0,1) | postgres | 1
(0,2) | postgres | 1
(0,5) | madisoncounty | 100
(3 rows)

I then did:

delete from pg_shadow where ctid='(0,2)';
DELETE 1

I had already reindexed the tables. Now when I run the command, I get this:

pg_dump: [archiver (db)] connection to database "madisoncounty" failed:
FATAL: user "postgres" does not exist

How annoying :-(. And I suppose madisoncounty isn't a superuser,
so you're now stuck with no working superuser. What you'll need
to do is shut down the database and start up a standalone backend
(read the "postgres" reference page about how to work in this mode).
What I'd then do is delete the other postgres row (if you can still
see it) and do CREATE USER postgres WITH SYSID 1 CREATEUSER CREATEDB.
That should get you back to having a working postgres account.

It's hard to be sure how you got into this state, although one possible
theory is you got burnt by not vacuuming pg_shadow on a regular basis.
The newer version should keep you out of that error, if so.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#7)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

I'm not seeing the madisoncounty user in there.

Odder and odder. It might be worth trying "vacuum freeze pg_shadow".

Also, I tried starting
the db using /etc/init.d/postgres start, and it fails.

Fails how? In particular, what shows up in the postmaster log?

Is that the
proper way to get the service going, or should I be doing something else?

The usual way to manually start/stop daemons on Linux is

sudo /sbin/service postgresql start
sudo /sbin/service postgresql stop

(omitting sudo if you're already root). I'm not sure offhand if there's
any real difference between that and just calling the init.d script
directly, but I believe that's how you're Supposed To Do It.

regards, tom lane

#9Adam Dear
adear@usnx.net
In reply to: Tom Lane (#8)
Re: Unable to dump database using pg_dump

I managed to get the server started again. It was failing because the
permissions on the data folder were not right.

Now, though, I'm back to where I started. There are two postgres users
in the pg_shadow table. One of them has a password, the other doesn't,
but they both have the same sysid. I tried to delete the one without
the password, I couldn't get it deleted.

I tried:

Delete from pg_shadow where ctid = '(0,1)';
DELETE 0

and

Delete from pg_shadow where passwd = '';
DELETE 0

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

I'm not seeing the madisoncounty user in there.

Odder and odder. It might be worth trying "vacuum freeze pg_shadow".

Also, I tried starting
the db using /etc/init.d/postgres start, and it fails.

Fails how? In particular, what shows up in the postmaster log?

Is that the
proper way to get the service going, or should I be doing something else?

The usual way to manually start/stop daemons on Linux is

sudo /sbin/service postgresql start
sudo /sbin/service postgresql stop

(omitting sudo if you're already root). I'm not sure offhand if there's
any real difference between that and just calling the init.d script
directly, but I believe that's how you're Supposed To Do It.

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#9)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

I tried:
Delete from pg_shadow where ctid = '(0,1)';
DELETE 0

This is looking more and more like a transaction ID wraparound problem.
Did you try the vacuum freeze suggestion?

regards, tom lane

#11Adam Dear
adear@usnx.net
In reply to: Tom Lane (#10)
Re: Unable to dump database using pg_dump

I tried:

Vacuum Freeze pg_shadow;

Then

Reindex database madisoncounty;

Then

Delete from pg_shadow where ctid='(0,1)';

The record is still there.

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

I tried:
Delete from pg_shadow where ctid = '(0,1)';
DELETE 0

This is looking more and more like a transaction ID wraparound problem.
Did you try the vacuum freeze suggestion?

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#11)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

The record is still there.

Hmph. Could we see all the system columns from that table?
select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;

regards, tom lane

#13Adam Dear
adear@usnx.net
In reply to: Tom Lane (#12)
Re: Unable to dump database using pg_dump

madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;
ctid | xmin | xmax | cmin | cmax | usename
-------+------+------+------+------+---------------
(0,1) | 1 | 596 | 596 | 1 | postgres
(0,2) | 2 | 1 | 1 | 0 | postgres
(0,5) | 2 | 0 | 0 | 0 | madisoncounty
(3 rows)

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

The record is still there.

Hmph. Could we see all the system columns from that table?
select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#13)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;
ctid | xmin | xmax | cmin | cmax | usename
-------+------+------+------+------+---------------
(0,1) | 1 | 596 | 596 | 1 | postgres
(0,2) | 2 | 1 | 1 | 0 | postgres
(0,5) | 2 | 0 | 0 | 0 | madisoncounty
(3 rows)

Hm, I thought you deleted the (0,2) tuple ... did you restore a physical
backup or something?

Anyway, as far as I can see the way that you got into this state must
have been

1. The (0,1) tuple must have been the one originally inserted by initdb;
there's no other way it could have xmin=1.

2. Shortly after initdb (at transaction 596 to be exact) this tuple was
updated --- probably by a password-assignment operation --- creating the
tuple at (0,2), which must originally have had xmin = 596, cmin = 1 (the
xmax overlays cmin in 7.4, so we can assume that column value is bogus).

3. Much time passes, and pg_shadow never gets vacuumed so the dead
tuple at (0,1) is never cleaned up. Eventually the XID counter passes 2
billion + 596, and suddenly transaction 596 appears to be in the future,
so the tuple at (0,1) starts to be seen by SELECTs again.

4. At this point you ran VACUUM FREEZE, which replaced the xmins of the
second and third tuples with 2 (FrozenTransactionId) ... but 7.4 does
not think it could ever need to freeze xmax, and at this point VACUUM
wouldn't touch the (0,1) tuple anyway because it considers the tuple as
RECENTLY_DEAD.

So VACUUM won't help you, at least not for another 2 billion
transactions. And the DELETE doesn't work either because it correctly
perceives (0,1) as an updated tuple that's been superseded by (0,2),
which doesn't meet the WHERE clause so DELETE doesn't touch it.
You could delete (0,2) but that leaves you with no working postgres user
(since the system's SnapshotNow rules consider (0,1) as dead), and if
you create another one you're back to having 2 entries in pg_shadow.
Nasty :-(

I can't think of any way out of this using plain 7.4 SQL operations.
You could maybe hack a special case into VACUUM to make it nuke the
dead tuple, but what's probably going to be easier is to manipulate the
data on disk. Are you comfortable enough with editing binary data
to find the "596" and replace it with "2"? It'd be somewhere near
the end of the first (and probably only) block of pg_shadow, and a
few bytes before one of the occurrences of the string "postgres".
BTW, pg_shadow is $PGDATA/global/1260.

(If you try this, do the editing while the postmaster is stopped,
else you might have problems with it buffering the old data.)

regards, tom lane

#15Adam Dear
adear@usnx.net
In reply to: Tom Lane (#14)
Re: Unable to dump database using pg_dump

I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it using:

vim -b 1260

Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.

@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^E^@^H^@^C)^X?madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@'^@^@^@md58fd917bc348d399f31056d33330ba74f^@<98>^B^@^@<81>^E^@^@^@^@^@^@^@^@^@^@
^E^@^H^@^C%^X?madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@'^@^@
^@md58fd917bc348d399f31056d33330ba74f^@<94>^B^@^@<98>^B^@^@^@^@^@^@^@^@^@^@^D^@^H^@^A^E^X^_madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@^B^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^B^@^H^@^C)^X?postgres^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^A^A^A^@'^@^@^@md506464ceceb2b5b44a27417bf6ac59c8a^@^A^@^@
^@T^B^@^@^A^@^@^@^@^@^@^@^B^@^H^@^A^E^X^_postgres^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^A^@^@^@^A^A^A^@

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;
ctid | xmin | xmax | cmin | cmax | usename
-------+------+------+------+------+---------------
(0,1) | 1 | 596 | 596 | 1 | postgres
(0,2) | 2 | 1 | 1 | 0 | postgres
(0,5) | 2 | 0 | 0 | 0 | madisoncounty
(3 rows)

Hm, I thought you deleted the (0,2) tuple ... did you restore a physical
backup or something?

Anyway, as far as I can see the way that you got into this state must
have been

1. The (0,1) tuple must have been the one originally inserted by initdb;
there's no other way it could have xmin=1.

2. Shortly after initdb (at transaction 596 to be exact) this tuple was
updated --- probably by a password-assignment operation --- creating the
tuple at (0,2), which must originally have had xmin = 596, cmin = 1 (the
xmax overlays cmin in 7.4, so we can assume that column value is bogus).

3. Much time passes, and pg_shadow never gets vacuumed so the dead
tuple at (0,1) is never cleaned up. Eventually the XID counter passes 2
billion + 596, and suddenly transaction 596 appears to be in the future,
so the tuple at (0,1) starts to be seen by SELECTs again.

4. At this point you ran VACUUM FREEZE, which replaced the xmins of the
second and third tuples with 2 (FrozenTransactionId) ... but 7.4 does
not think it could ever need to freeze xmax, and at this point VACUUM
wouldn't touch the (0,1) tuple anyway because it considers the tuple as
RECENTLY_DEAD.

So VACUUM won't help you, at least not for another 2 billion
transactions. And the DELETE doesn't work either because it correctly
perceives (0,1) as an updated tuple that's been superseded by (0,2),
which doesn't meet the WHERE clause so DELETE doesn't touch it.
You could delete (0,2) but that leaves you with no working postgres user
(since the system's SnapshotNow rules consider (0,1) as dead), and if
you create another one you're back to having 2 entries in pg_shadow.
Nasty :-(

I can't think of any way out of this using plain 7.4 SQL operations.
You could maybe hack a special case into VACUUM to make it nuke the
dead tuple, but what's probably going to be easier is to manipulate the
data on disk. Are you comfortable enough with editing binary data
to find the "596" and replace it with "2"? It'd be somewhere near
the end of the first (and probably only) block of pg_shadow, and a
few bytes before one of the occurrences of the string "postgres".
BTW, pg_shadow is $PGDATA/global/1260.

(If you try this, do the editing while the postmaster is stopped,
else you might have problems with it buffering the old data.)

regards, tom lane

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Adam Dear (#15)
Re: Unable to dump database using pg_dump

Adam Dear wrote:

I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it
using:

vim -b 1260

Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.

Huh, you really need a proper hex editor. Try "hexedit" for example.
And you need to convert the values to hexadecimal.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#15)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it using:

vim -b 1260

Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.

I think you need to change T^B to ^B^@ ... though I agree with Alvaro
that a hex editor would be a more comfortable tool.

regards, tom lane

#18Adam Dear
adear@usnx.net
In reply to: Tom Lane (#17)
Re: Unable to dump database using pg_dump

I downloaded a hex editor, but honestly, I can't make heads or tails of
what I'm looking at. I'm not sure what needs changing.

Also, FYI, you mentioned that you thought I had deleted the record with
ctid=(0,2). I did delete that, but thats what caused the database to
not work. I restored a backup copy of the data directory I had made
yesterday morning before I started messing with it.

If I restored the data directory again wouldn't that undo the Vacuum
Freeze command that I did that has locked the database into the state
that it is in now?

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it using:

vim -b 1260

Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.

I think you need to change T^B to ^B^@ ... though I agree with Alvaro
that a hex editor would be a more comfortable tool.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#18)
Re: Unable to dump database using pg_dump

Adam Dear <adear@usnx.net> writes:

If I restored the data directory again wouldn't that undo the Vacuum
Freeze command that I did that has locked the database into the state
that it is in now?

It'd undo the freeze, but that doesn't get you any closer to a solution.
What you need is to get rid of the wrapped-around xmax in the (0,1)
tuple.

regards, tom lane

#20Adam Dear
adear@usnx.net
In reply to: Tom Lane (#19)
Re: Unable to dump database using pg_dump

Let me ask this question. Is there any other option for moving data
from one server to another that doesn't involve pg_dump? If I can get
the data out of this server, and onto my new one I don't care if the old
server gets fixed. I just need the data. The new server already has
the table structure, I just need the most current data to put into the
server.

Tom Lane wrote:

Show quoted text

Adam Dear <adear@usnx.net> writes:

If I restored the data directory again wouldn't that undo the Vacuum
Freeze command that I did that has locked the database into the state
that it is in now?

It'd undo the freeze, but that doesn't get you any closer to a solution.
What you need is to get rid of the wrapped-around xmax in the (0,1)
tuple.

regards, tom lane

#21Adam Dear
adear@usnx.net
In reply to: Adam Dear (#1)
#22Joshua D. Drake
jd@commandprompt.com
In reply to: Adam Dear (#20)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Adam Dear (#20)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Dear (#20)
#25Craig Ringer
craig@2ndquadrant.com
In reply to: Adam Dear (#20)