Help me recovering data
Hi folks,
I ran into big trouble - it seems that my DB is lost.
"select * from pg_database" gives me 0 rows, but I still can connect to
databases with \c and even select from tables there, although they're also
not visible with \dt.
After asking the guys in the #postgresql@irc.freenode.net channel they told
me that the reason is the "Transaction ID wraparound", because I have never
ran VACUUM on the whole database.
So they proposed to ask here for help. I have stopped the server, but what
could I do in order to save the data if it's possible at all?
You could also take a look at my pg_controldata output:
http://rafb.net/paste/results/ghcIb880.html
Regards,
Kouber Saparev
"Kouber Saparev" <postgresql@saparev.com> writes:
After asking the guys in the #postgresql@irc.freenode.net channel they told
me that the reason is the "Transaction ID wraparound", because I have never
ran VACUUM on the whole database.
So they proposed to ask here for help. I have stopped the server, but what
could I do in order to save the data if it's possible at all?
I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).
Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.
Then put in a cron job to do periodic vacuuming ;-)
regards, tom lane
Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.Then put in a cron job to do periodic vacuuming ;-)
Thank you, I just discovered in the mailing lists that I'm not the first nor
the last guy that have forgotten to VACUUM the database. ;-)
Regards,
Kouber Saparev
I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.Then put in a cron job to do periodic vacuuming ;-)
This might seem like a stupid question, but since this is a massive data
loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs and
either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?
I think people'd rather their db just stopped accepting new transactions
rather than just losing data...
Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
This might seem like a stupid question, but since this is a massive
data loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs
and either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?
I think autovac-in-backend is the preferred solution to this, and it's
definitely on the TODO list...
-Doug
It must be possible to create a tool based on the PostgreSQL sources that
can read all the tuples in a database and dump them to a file stream. All
the data remains in the file until overwritten with data after a vacuum.
It *should* be doable.
If there data in the table is worth anything, then it would be worth
extracting.
It would, of course, be a tool of last resort.
Show quoted text
"Kouber Saparev" <postgresql@saparev.com> writes:
After asking the guys in the #postgresql@irc.freenode.net channel they
told
me that the reason is the "Transaction ID wraparound", because I have
never
ran VACUUM on the whole database.So they proposed to ask here for help. I have stopped the server, but
what
could I do in order to save the data if it's possible at all?I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.Then put in a cron job to do periodic vacuuming ;-)
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.Then put in a cron job to do periodic vacuuming ;-)
This might seem like a stupid question, but since this is a massive data
loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs and
either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?I think people'd rather their db just stopped accepting new transactions
rather than just losing data...
I would certainly prefer the system to issue an error and stop working
than complete data loss.
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
This might seem like a stupid question, but since this is a massive data
loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs and
either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?
The checkpointer is entirely incapable of either detecting the problem
(it doesn't have enough infrastructure to examine pg_database in a
reasonable way) or preventing backends from doing anything if it did
know there was a problem.
I think people'd rather their db just stopped accepting new transactions
rather than just losing data...
Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?
autovacuum is the correct long-term solution to this, not some kind of
automatic hara-kiri.
regards, tom lane
Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?
Yes, but I also would prefer the server to say something as "The database is
full, please vacuum." - the same as when the hard disk is full and you try
to record something on it - it's not exactly data loss, just an incapability
to continue the job.
The thing is that a warning is issued only when you start the vacuum itself:
play=# VACUUM;
WARNING: some databases have not been vacuumed in 1613770184 transactions
HINT: Better vacuum them within 533713463 transactions, or you may have a
wraparound failure.
VACUUM
So, it's something like the chicken and the egg problem, you have to vacuum
in order to receive a message that you had to do it earlier, but sometimes
it's just too late. As it was in my case, I have just discovered that almost
all of my data is missing - not even a notice or a warning message to let me
know that the end of the world is approaching. :)
Regards,
Kouber Saparev
The checkpointer is entirely incapable of either detecting the problem
(it doesn't have enough infrastructure to examine pg_database in a
reasonable way) or preventing backends from doing anything if it did
know there was a problem.
Well, I guess I meant 'some regularly running process'...
I think people'd rather their db just stopped accepting new transactions
rather than just losing data...Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?
Not allowing any transactions except a vacuum...
autovacuum is the correct long-term solution to this, not some kind of
automatic hara-kiri.
Yeah, seems like it should really happen soon...
Chris
The checkpointer is entirely incapable of either detecting the problem
(it doesn't have enough infrastructure to examine pg_database in a
reasonable way) or preventing backends from doing anything if it did
know there was a problem.Well, I guess I meant 'some regularly running process'...
I think people'd rather their db just stopped accepting new transactions
rather than just losing data...Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?Not allowing any transactions except a vacuum...
autovacuum is the correct long-term solution to this, not some kind of
automatic hara-kiri.Yeah, seems like it should really happen soon...
Chris
Maybe I'm missing something, but shouldn't the prospect of data loss (even
in the presense of admin ignorance) be something that should be
unacceptable? Certainly within the realm "normal PostgreSQL" operation.
pgsql@mohawksoft.com writes:
Maybe I'm missing something, but shouldn't the prospect of data loss (even
in the presense of admin ignorance) be something that should be
unacceptable? Certainly within the realm "normal PostgreSQL" operation.
[ shrug... ] The DBA will always be able to find a way to shoot himself
in the foot. We've seen several instances of people blowing away
pg_xlog and pg_clog, for example, because they "don't need log files".
Or how about failing to keep adequate backups? That's a sure way for an
ignorant admin to lose data too.
Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.
regards, tom lane
pgsql@mohawksoft.com writes:
Maybe I'm missing something, but shouldn't the prospect of data loss
(even
in the presense of admin ignorance) be something that should be
unacceptable? Certainly within the realm "normal PostgreSQL" operation.[ shrug... ] The DBA will always be able to find a way to shoot himself
in the foot. We've seen several instances of people blowing away
pg_xlog and pg_clog, for example, because they "don't need log files".
Or how about failing to keep adequate backups? That's a sure way for an
ignorant admin to lose data too.
There is a difference between actively doing something stupid and failing
to realize a maintenence task is required.
PostgreSQL should stop working. When the admin tries to understand why,
they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum
thingy." That is a whole lot better than falling off a cliff you didn't
even know was there.
Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.
I don't know how to say this without sounding like a jerk, (I guess that's
my role sometimes) but would you go back and re-read this sentence?
To paraphrase: "I know this causes a catestrophic data loss, and we have
plans to fix it in the future, but for now, I'm not going panic about it."
What would you do if the FreeBSD group or Linux kernel group said this
about a file system? If you failed to run fsck after 100 mounts, you loose
your data?
I thought PostgreSQL was about "protecting your data." How many times have
we smugly said, "yea, you can use MySQL if you don't care about your
data." Any data loss caused by postgresql should be seen as unacceptable.
It's funny, while I've known about this for a while, and it has always
seemed a sort of distant edge condition that is easily avoided. However,
with todays faster machines and disks, it is easier to reach this
limitation than ever before. All PostgreSQL needs is one or two VERY UPSET
mainstream users who lose data to completely reverse the momemntum that it
is gaining.
No amount of engineering discussion about it not being the fault of
postgresql will be lost, and rightfully so, IMHO.
Sorry.
On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:
Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.I don't know how to say this without sounding like a jerk, (I guess that's
my role sometimes) but would you go back and re-read this sentence?To paraphrase: "I know this causes a catestrophic data loss, and we have
plans to fix it in the future, but for now, I'm not going panic about it."
Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.
in the foot. We've seen several instances of people blowing away
pg_xlog and pg_clog, for example, because they "don't need log files".
Or how about failing to keep adequate backups? That's a sure way for an
ignorant admin to lose data too.There is a difference between actively doing something stupid and failing
to realize a maintenence task is required.PostgreSQL should stop working. When the admin tries to understand why,
they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum
thingy." That is a whole lot better than falling off a cliff you didn't
even know was there.
There is another way to look at this as lends itself to mohawksoft's
argument.
More often than not DBAs and Sysadmins are neither one. They are people
that get shoved into the job because they happen to mention around
the water cooler that they "once" installed linux/freebsd -- whatever.
Maybe it is an executive that has some of his brains left after
sitting behind a desk all day for the last 10 years. One day he/she
gets a thought in his head to create a new project named "foo".
He does not want to waste his internal resources so said executive
decides he will do it himself as a hobby. For some reason, the project
actually succeeds (I have seen this many times) and the company starts
using it.
Well guess what... it uses PostgreSQL. The guy isn't a DBA, heck he is
even really a programmer. He had know idea about this "vacuum" thing. He
had never heard of other databases having to do it.
So they run for a year, and then all of a sudden **BOOM** the world ends.
Do you think they are going to care that we "documented" the issue? Uhmmm
no they won't. Chances are they will drop kick PostgreSQL and bad talk it
to all their other executive friends.
In short, this whole argument has the mark of irresponsibility on both
parties but it is is the PostgreSQL projects responisbility to make
reasonable effort to produce a piece of software that doesn't break.
We are not talking about a user who ran a query: delete from foo;
At this point we have a known critical bug. Usually the PostgreSQL community
is all over critical bugs. Why is this any different?
It sounds to me that people are just annoyed that users don't RTFM. Get
over it. Most won't. If users RTFM more often, it would put most support
companies out of business.
Sincerely,
Joshua D. Drake
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
At this point we have a known critical bug. Usually the PostgreSQL
community
is all over critical bugs. Why is this any different?It sounds to me that people are just annoyed that users don't RTFM. Get
over it. Most won't. If users RTFM more often, it would put most support
companies out of business.
I wonder if I should point out that we just had 3 people suffering XID
wraparound failure in 2 days in the IRC channel...
Chris
Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.
I believe that the ability for PostgreSQL to stop accepting
queries and to log to the file or STDERR why it stopped working
and how to resolve it is appropriate.
Also it is probably appropriate to warn ahead of time...
WARNING: Only 50,000 transactions left before lock out
or something like that.
J
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
Christopher Kings-Lynne wrote:
At this point we have a known critical bug. Usually the PostgreSQL
community
is all over critical bugs. Why is this any different?It sounds to me that people are just annoyed that users don't RTFM.
Get over it. Most won't. If users RTFM more often, it would put most
support companies out of business.I wonder if I should point out that we just had 3 people suffering XID
wraparound failure in 2 days in the IRC channel...
I have had half a dozen new customers in the last six months that have
had the same problem. Nothing like the phone call:
Uhmmm I am a new customer, help I can't see my databases.
Sincerely,
Joshua D. Drake
Chris
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
On Wed, 16 Feb 2005, Joshua D. Drake wrote:
Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.I believe that the ability for PostgreSQL to stop accepting
queries and to log to the file or STDERR why it stopped working
and how to resolve it is appropriate.
Right, but since the how to resolve it currently involves executing a
query, simply stopping dead won't allow you to resolve it. Also, if we
stop at the exact wraparound point, can we run into problems actually
trying to do the vacuum if that's still the resolution technique? If so,
how far in advance of wraparound must we stop to guarantee it will
succeed? It's not rocket science, but figuring such things out is part of
actually making a workable solution.
On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:
Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.I don't know how to say this without sounding like a jerk, (I guess
that's
my role sometimes) but would you go back and re-read this sentence?To paraphrase: "I know this causes a catestrophic data loss, and we have
plans to fix it in the future, but for now, I'm not going panic about
it."Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.
Actually, it is not a solution to the problem of losing data. It is a drop
dead last ditch failsafe that EVERY PRODUCT should have before losing
data.