Unable to dump database using pg_dump
I am migrating to a new server. I am trying to get a dump of my
database to import into postgres 8.1.11 on my new server. When I run
the pg_dump commmand:
pg_dump --clean --file=madison.sql --schema=public -U madisoncounty
madisoncounty
I get this message:
pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression
I'm pretty new to postgres, and I'm not sure how to debug / fix this
issue. I did a VACUUM FULL on the server before I tried to do the dump.
Adam Dear wrote:
pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression
Is the pg_dump command from the old postgresql install or the new one?
--
Craig Ringer
The pg_dump command is from version 7.4.11. This is the version of
postgres I am running on my old server.
Craig Ringer wrote:
Show quoted text
Adam Dear wrote:
pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expressionIs the pg_dump command from the old postgresql install or the new one?
--
Craig Ringer
Adam Dear <adear@usnx.net> writes:
pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expression
The pg_dump command is from version 7.4.11. This is the version of
postgres I am running on my old server.
Just FYI, when doing a version upgrade it's considered better practice
to dump with the newer version's pg_dump, if possible. Sometimes the
newer pg_dump knows things that will smooth the migration. However,
that's not related to your immediate problem.
A look at the pg_dump source code shows that the subquery it's
complaining about is trying to fetch the name of each view's owner.
Apparently you've got multiple rows in pg_shadow with the same usesysid
--- or perhaps just one row but the index on usesysid is corrupt and is
returning it more than once. Try
select ctid, usename, usesysid from pg_shadow;
and see if you see any duplicate usesysid's. If so, try removing the
extra copies (use WHERE ctid = '...' if there isn't any other difference
between the rows). If you see no dups in a query like this, then the
index is at fault --- try reindexing.
regards, tom lane
I ran the select as instructed, and this is the output:
madisoncounty=# select ctid, usename, usesysid from pg_shadow;
ctid | usename | usesysid
-------+---------------+----------
(0,1) | postgres | 1
(0,2) | postgres | 1
(0,5) | madisoncounty | 100
(3 rows)
I then did:
delete from pg_shadow where ctid='(0,2)';
DELETE 1
I had already reindexed the tables. Now when I run the command, I get this:
pg_dump: [archiver (db)] connection to database "madisoncounty" failed:
FATAL: user "postgres" does not exist
Also, I have been connecting to the db as the postgres user. Now I am
unable to connect to the database with the postgres user, or with the
actual user for the site.
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
pg_dump: query to obtain list of schemas failed: ERROR: more than one
row returned by a subquery used as an expressionThe pg_dump command is from version 7.4.11. This is the version of
postgres I am running on my old server.Just FYI, when doing a version upgrade it's considered better practice
to dump with the newer version's pg_dump, if possible. Sometimes the
newer pg_dump knows things that will smooth the migration. However,
that's not related to your immediate problem.A look at the pg_dump source code shows that the subquery it's complaining about is trying to fetch the name of each view's owner. Apparently you've got multiple rows in pg_shadow with the same usesysid --- or perhaps just one row but the index on usesysid is corrupt and is returning it more than once. Try select ctid, usename, usesysid from pg_shadow; and see if you see any duplicate usesysid's. If so, try removing the extra copies (use WHERE ctid = '...' if there isn't any other difference between the rows). If you see no dups in a query like this, then the index is at fault --- try reindexing.regards, tom lane
Adam Dear <adear@usnx.net> writes:
I ran the select as instructed, and this is the output:
madisoncounty=# select ctid, usename, usesysid from pg_shadow;
ctid | usename | usesysid
-------+---------------+----------
(0,1) | postgres | 1
(0,2) | postgres | 1
(0,5) | madisoncounty | 100
(3 rows)
I then did:
delete from pg_shadow where ctid='(0,2)';
DELETE 1
I had already reindexed the tables. Now when I run the command, I get this:
pg_dump: [archiver (db)] connection to database "madisoncounty" failed:
FATAL: user "postgres" does not exist
How annoying :-(. And I suppose madisoncounty isn't a superuser,
so you're now stuck with no working superuser. What you'll need
to do is shut down the database and start up a standalone backend
(read the "postgres" reference page about how to work in this mode).
What I'd then do is delete the other postgres row (if you can still
see it) and do CREATE USER postgres WITH SYSID 1 CREATEUSER CREATEDB.
That should get you back to having a working postgres account.
It's hard to be sure how you got into this state, although one possible
theory is you got burnt by not vacuuming pg_shadow on a regular basis.
The newer version should keep you out of that error, if so.
regards, tom lane
I really appreciate you help so far. here is what I am seeing . I did:
select * from pg_shadow;
1: usename (typeid = 19, len = 64, typmod = -1, byval = f)
2: usesysid (typeid = 23, len = 4, typmod = -1, byval = t)
3: usecreatedb (typeid = 16, len = 1, typmod = -1, byval = t)
4: usesuper (typeid = 16, len = 1, typmod = -1, byval = t)
5: usecatupd (typeid = 16, len = 1, typmod = -1, byval = t)
6: passwd (typeid = 25, len = -1, typmod = -1, byval = f)
7: valuntil (typeid = 702, len = 4, typmod = -1, byval = t)
8: useconfig (typeid = 1009, len = -1, typmod = -1, byval = f)
----
1: usename = "postgres" (typeid = 19, len = 64, typmod
= -1, byval = f)
2: usesysid = "1" (typeid = 23, len = 4, typmod = -1,
byval = t)
3: usecreatedb = "t" (typeid = 16, len = 1, typmod = -1,
byval = t)
4: usesuper = "t" (typeid = 16, len = 1, typmod = -1,
byval = t)
5: usecatupd = "t" (typeid = 16, len = 1, typmod = -1,
byval = t)
----
I'm not seeing the madisoncounty user in there. Also, I tried starting
the db using /etc/init.d/postgres start, and it fails. Is that the
proper way to get the service going, or should I be doing something else?
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
I ran the select as instructed, and this is the output:
madisoncounty=# select ctid, usename, usesysid from pg_shadow;
ctid | usename | usesysid
-------+---------------+----------
(0,1) | postgres | 1
(0,2) | postgres | 1
(0,5) | madisoncounty | 100
(3 rows)I then did:
delete from pg_shadow where ctid='(0,2)';
DELETE 1I had already reindexed the tables. Now when I run the command, I get this:
pg_dump: [archiver (db)] connection to database "madisoncounty" failed:
FATAL: user "postgres" does not existHow annoying :-(. And I suppose madisoncounty isn't a superuser,
so you're now stuck with no working superuser. What you'll need
to do is shut down the database and start up a standalone backend
(read the "postgres" reference page about how to work in this mode).
What I'd then do is delete the other postgres row (if you can still
see it) and do CREATE USER postgres WITH SYSID 1 CREATEUSER CREATEDB.
That should get you back to having a working postgres account.It's hard to be sure how you got into this state, although one possible
theory is you got burnt by not vacuuming pg_shadow on a regular basis.
The newer version should keep you out of that error, if so.regards, tom lane
Adam Dear <adear@usnx.net> writes:
I'm not seeing the madisoncounty user in there.
Odder and odder. It might be worth trying "vacuum freeze pg_shadow".
Also, I tried starting
the db using /etc/init.d/postgres start, and it fails.
Fails how? In particular, what shows up in the postmaster log?
Is that the
proper way to get the service going, or should I be doing something else?
The usual way to manually start/stop daemons on Linux is
sudo /sbin/service postgresql start
sudo /sbin/service postgresql stop
(omitting sudo if you're already root). I'm not sure offhand if there's
any real difference between that and just calling the init.d script
directly, but I believe that's how you're Supposed To Do It.
regards, tom lane
I managed to get the server started again. It was failing because the
permissions on the data folder were not right.
Now, though, I'm back to where I started. There are two postgres users
in the pg_shadow table. One of them has a password, the other doesn't,
but they both have the same sysid. I tried to delete the one without
the password, I couldn't get it deleted.
I tried:
Delete from pg_shadow where ctid = '(0,1)';
DELETE 0
and
Delete from pg_shadow where passwd = '';
DELETE 0
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
I'm not seeing the madisoncounty user in there.
Odder and odder. It might be worth trying "vacuum freeze pg_shadow".
Also, I tried starting
the db using /etc/init.d/postgres start, and it fails.Fails how? In particular, what shows up in the postmaster log?
Is that the
proper way to get the service going, or should I be doing something else?The usual way to manually start/stop daemons on Linux is
sudo /sbin/service postgresql start
sudo /sbin/service postgresql stop(omitting sudo if you're already root). I'm not sure offhand if there's
any real difference between that and just calling the init.d script
directly, but I believe that's how you're Supposed To Do It.regards, tom lane
Adam Dear <adear@usnx.net> writes:
I tried:
Delete from pg_shadow where ctid = '(0,1)';
DELETE 0
This is looking more and more like a transaction ID wraparound problem.
Did you try the vacuum freeze suggestion?
regards, tom lane
I tried:
Vacuum Freeze pg_shadow;
Then
Reindex database madisoncounty;
Then
Delete from pg_shadow where ctid='(0,1)';
The record is still there.
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
I tried:
Delete from pg_shadow where ctid = '(0,1)';
DELETE 0This is looking more and more like a transaction ID wraparound problem.
Did you try the vacuum freeze suggestion?regards, tom lane
Adam Dear <adear@usnx.net> writes:
The record is still there.
Hmph. Could we see all the system columns from that table?
select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;
regards, tom lane
madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;
ctid | xmin | xmax | cmin | cmax | usename
-------+------+------+------+------+---------------
(0,1) | 1 | 596 | 596 | 1 | postgres
(0,2) | 2 | 1 | 1 | 0 | postgres
(0,5) | 2 | 0 | 0 | 0 | madisoncounty
(3 rows)
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
The record is still there.
Hmph. Could we see all the system columns from that table?
select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;regards, tom lane
Adam Dear <adear@usnx.net> writes:
madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;
ctid | xmin | xmax | cmin | cmax | usename
-------+------+------+------+------+---------------
(0,1) | 1 | 596 | 596 | 1 | postgres
(0,2) | 2 | 1 | 1 | 0 | postgres
(0,5) | 2 | 0 | 0 | 0 | madisoncounty
(3 rows)
Hm, I thought you deleted the (0,2) tuple ... did you restore a physical
backup or something?
Anyway, as far as I can see the way that you got into this state must
have been
1. The (0,1) tuple must have been the one originally inserted by initdb;
there's no other way it could have xmin=1.
2. Shortly after initdb (at transaction 596 to be exact) this tuple was
updated --- probably by a password-assignment operation --- creating the
tuple at (0,2), which must originally have had xmin = 596, cmin = 1 (the
xmax overlays cmin in 7.4, so we can assume that column value is bogus).
3. Much time passes, and pg_shadow never gets vacuumed so the dead
tuple at (0,1) is never cleaned up. Eventually the XID counter passes 2
billion + 596, and suddenly transaction 596 appears to be in the future,
so the tuple at (0,1) starts to be seen by SELECTs again.
4. At this point you ran VACUUM FREEZE, which replaced the xmins of the
second and third tuples with 2 (FrozenTransactionId) ... but 7.4 does
not think it could ever need to freeze xmax, and at this point VACUUM
wouldn't touch the (0,1) tuple anyway because it considers the tuple as
RECENTLY_DEAD.
So VACUUM won't help you, at least not for another 2 billion
transactions. And the DELETE doesn't work either because it correctly
perceives (0,1) as an updated tuple that's been superseded by (0,2),
which doesn't meet the WHERE clause so DELETE doesn't touch it.
You could delete (0,2) but that leaves you with no working postgres user
(since the system's SnapshotNow rules consider (0,1) as dead), and if
you create another one you're back to having 2 entries in pg_shadow.
Nasty :-(
I can't think of any way out of this using plain 7.4 SQL operations.
You could maybe hack a special case into VACUUM to make it nuke the
dead tuple, but what's probably going to be easier is to manipulate the
data on disk. Are you comfortable enough with editing binary data
to find the "596" and replace it with "2"? It'd be somewhere near
the end of the first (and probably only) block of pg_shadow, and a
few bytes before one of the occurrences of the string "postgres".
BTW, pg_shadow is $PGDATA/global/1260.
(If you try this, do the editing while the postmaster is stopped,
else you might have problems with it buffering the old data.)
regards, tom lane
I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it using:
vim -b 1260
Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^E^@^H^@^C)^X?madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@'^@^@^@md58fd917bc348d399f31056d33330ba74f^@<98>^B^@^@<81>^E^@^@^@^@^@^@^@^@^@^@
^E^@^H^@^C%^X?madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@'^@^@
^@md58fd917bc348d399f31056d33330ba74f^@<94>^B^@^@<98>^B^@^@^@^@^@^@^@^@^@^@^D^@^H^@^A^E^X^_madisoncounty^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@d^@^@^@^@^@^@^@^B^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^B^@^H^@^C)^X?postgres^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^A^A^A^@'^@^@^@md506464ceceb2b5b44a27417bf6ac59c8a^@^A^@^@
^@T^B^@^@^A^@^@^@^@^@^@^@^B^@^H^@^A^E^X^_postgres^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^A^@^@^@^A^A^A^@
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
madisoncounty=# select ctid,xmin,xmax,cmin,cmax,usename from pg_shadow;
ctid | xmin | xmax | cmin | cmax | usename
-------+------+------+------+------+---------------
(0,1) | 1 | 596 | 596 | 1 | postgres
(0,2) | 2 | 1 | 1 | 0 | postgres
(0,5) | 2 | 0 | 0 | 0 | madisoncounty
(3 rows)Hm, I thought you deleted the (0,2) tuple ... did you restore a physical
backup or something?Anyway, as far as I can see the way that you got into this state must
have been1. The (0,1) tuple must have been the one originally inserted by initdb;
there's no other way it could have xmin=1.2. Shortly after initdb (at transaction 596 to be exact) this tuple was
updated --- probably by a password-assignment operation --- creating the
tuple at (0,2), which must originally have had xmin = 596, cmin = 1 (the
xmax overlays cmin in 7.4, so we can assume that column value is bogus).3. Much time passes, and pg_shadow never gets vacuumed so the dead
tuple at (0,1) is never cleaned up. Eventually the XID counter passes 2
billion + 596, and suddenly transaction 596 appears to be in the future,
so the tuple at (0,1) starts to be seen by SELECTs again.4. At this point you ran VACUUM FREEZE, which replaced the xmins of the
second and third tuples with 2 (FrozenTransactionId) ... but 7.4 does
not think it could ever need to freeze xmax, and at this point VACUUM
wouldn't touch the (0,1) tuple anyway because it considers the tuple as
RECENTLY_DEAD.So VACUUM won't help you, at least not for another 2 billion
transactions. And the DELETE doesn't work either because it correctly
perceives (0,1) as an updated tuple that's been superseded by (0,2),
which doesn't meet the WHERE clause so DELETE doesn't touch it.
You could delete (0,2) but that leaves you with no working postgres user
(since the system's SnapshotNow rules consider (0,1) as dead), and if
you create another one you're back to having 2 entries in pg_shadow.
Nasty :-(I can't think of any way out of this using plain 7.4 SQL operations.
You could maybe hack a special case into VACUUM to make it nuke the
dead tuple, but what's probably going to be easier is to manipulate the
data on disk. Are you comfortable enough with editing binary data
to find the "596" and replace it with "2"? It'd be somewhere near
the end of the first (and probably only) block of pg_shadow, and a
few bytes before one of the occurrences of the string "postgres".
BTW, pg_shadow is $PGDATA/global/1260.(If you try this, do the editing while the postmaster is stopped,
else you might have problems with it buffering the old data.)regards, tom lane
Adam Dear wrote:
I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it
using:vim -b 1260
Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.
Huh, you really need a proper hex editor. Try "hexedit" for example.
And you need to convert the values to hexadecimal.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Adam Dear <adear@usnx.net> writes:
I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it using:
vim -b 1260
Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.
I think you need to change T^B to ^B^@ ... though I agree with Alvaro
that a hex editor would be a more comfortable tool.
regards, tom lane
I downloaded a hex editor, but honestly, I can't make heads or tails of
what I'm looking at. I'm not sure what needs changing.
Also, FYI, you mentioned that you thought I had deleted the record with
ctid=(0,2). I did delete that, but thats what caused the database to
not work. I restored a backup copy of the data directory I had made
yesterday morning before I started messing with it.
If I restored the data directory again wouldn't that undo the Vacuum
Freeze command that I did that has locked the database into the state
that it is in now?
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
I'm comfortable enough with it that I can edit it if I can find exactly
what to edit. Whats the best way to edit the file? I've opened it using:vim -b 1260
Here is what is at the end of the block in the file. If you could point
me in the right direction, I'd appreciate it.I think you need to change T^B to ^B^@ ... though I agree with Alvaro
that a hex editor would be a more comfortable tool.regards, tom lane
Adam Dear <adear@usnx.net> writes:
If I restored the data directory again wouldn't that undo the Vacuum
Freeze command that I did that has locked the database into the state
that it is in now?
It'd undo the freeze, but that doesn't get you any closer to a solution.
What you need is to get rid of the wrapped-around xmax in the (0,1)
tuple.
regards, tom lane
Let me ask this question. Is there any other option for moving data
from one server to another that doesn't involve pg_dump? If I can get
the data out of this server, and onto my new one I don't care if the old
server gets fixed. I just need the data. The new server already has
the table structure, I just need the most current data to put into the
server.
Tom Lane wrote:
Show quoted text
Adam Dear <adear@usnx.net> writes:
If I restored the data directory again wouldn't that undo the Vacuum
Freeze command that I did that has locked the database into the state
that it is in now?It'd undo the freeze, but that doesn't get you any closer to a solution.
What you need is to get rid of the wrapped-around xmax in the (0,1)
tuple.regards, tom lane