Help me recovering data

Started by Kouber Saparevabout 21 years ago53 messageshackers
Jump to latest
#1Kouber Saparev
postgresql@saparev.com

Hi folks,

I ran into big trouble - it seems that my DB is lost.

"select * from pg_database" gives me 0 rows, but I still can connect to
databases with \c and even select from tables there, although they're also
not visible with \dt.

After asking the guys in the #postgresql@irc.freenode.net channel they told
me that the reason is the "Transaction ID wraparound", because I have never
ran VACUUM on the whole database.

So they proposed to ask here for help. I have stopped the server, but what
could I do in order to save the data if it's possible at all?

You could also take a look at my pg_controldata output:
http://rafb.net/paste/results/ghcIb880.html

Regards,
Kouber Saparev

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kouber Saparev (#1)
Re: Help me recovering data

"Kouber Saparev" <postgresql@saparev.com> writes:

After asking the guys in the #postgresql@irc.freenode.net channel they told
me that the reason is the "Transaction ID wraparound", because I have never
ran VACUUM on the whole database.

So they proposed to ask here for help. I have stopped the server, but what
could I do in order to save the data if it's possible at all?

I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).

Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.

Then put in a cron job to do periodic vacuuming ;-)

regards, tom lane

#3Kouber Saparev
postgresql@saparev.com
In reply to: Kouber Saparev (#1)
Re: Help me recovering data

Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.

Then put in a cron job to do periodic vacuuming ;-)

Thank you, I just discovered in the mailing lists that I'm not the first nor
the last guy that have forgotten to VACUUM the database. ;-)

Regards,
Kouber Saparev

#4Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#2)
Re: Help me recovering data

I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).

Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.

Then put in a cron job to do periodic vacuuming ;-)

This might seem like a stupid question, but since this is a massive data
loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs and
either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?

I think people'd rather their db just stopped accepting new transactions
rather than just losing data...

Chris

#5Doug McNaught
doug@mcnaught.org
In reply to: Christopher Kings-Lynne (#4)
Re: Help me recovering data

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

This might seem like a stupid question, but since this is a massive
data loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs
and either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?

I think autovac-in-backend is the preferred solution to this, and it's
definitely on the TODO list...

-Doug

#6Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#2)
Re: Help me recovering data

It must be possible to create a tool based on the PostgreSQL sources that
can read all the tuples in a database and dump them to a file stream. All
the data remains in the file until overwritten with data after a vacuum.
It *should* be doable.

If there data in the table is worth anything, then it would be worth
extracting.

It would, of course, be a tool of last resort.

Show quoted text

"Kouber Saparev" <postgresql@saparev.com> writes:

After asking the guys in the #postgresql@irc.freenode.net channel they
told
me that the reason is the "Transaction ID wraparound", because I have
never
ran VACUUM on the whole database.

So they proposed to ask here for help. I have stopped the server, but
what
could I do in order to save the data if it's possible at all?

I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).

Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.

Then put in a cron job to do periodic vacuuming ;-)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#7Mark Woodward
pgsql@mohawksoft.com
In reply to: Christopher Kings-Lynne (#4)
Re: Help me recovering data

I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).

Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.

Then put in a cron job to do periodic vacuuming ;-)

This might seem like a stupid question, but since this is a massive data
loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs and
either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?

I think people'd rather their db just stopped accepting new transactions
rather than just losing data...

I would certainly prefer the system to issue an error and stop working
than complete data loss.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#4)
Re: Help me recovering data

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

This might seem like a stupid question, but since this is a massive data
loss potential in PostgreSQL, what's so hard about having the
checkpointer or something check the transaction counter when it runs and
either issue a db-wide vacuum if it's about to wrap, or simply
disallow any new transactions?

The checkpointer is entirely incapable of either detecting the problem
(it doesn't have enough infrastructure to examine pg_database in a
reasonable way) or preventing backends from doing anything if it did
know there was a problem.

I think people'd rather their db just stopped accepting new transactions
rather than just losing data...

Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?

autovacuum is the correct long-term solution to this, not some kind of
automatic hara-kiri.

regards, tom lane

#9Kouber Saparev
postgresql@saparev.com
In reply to: Kouber Saparev (#1)
Re: Help me recovering data

Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?

Yes, but I also would prefer the server to say something as "The database is
full, please vacuum." - the same as when the hard disk is full and you try
to record something on it - it's not exactly data loss, just an incapability
to continue the job.

The thing is that a warning is issued only when you start the vacuum itself:

play=# VACUUM;
WARNING: some databases have not been vacuumed in 1613770184 transactions
HINT: Better vacuum them within 533713463 transactions, or you may have a
wraparound failure.
VACUUM

So, it's something like the chicken and the egg problem, you have to vacuum
in order to receive a message that you had to do it earlier, but sometimes
it's just too late. As it was in my case, I have just discovered that almost
all of my data is missing - not even a notice or a warning message to let me
know that the end of the world is approaching. :)

Regards,
Kouber Saparev

#10Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#8)
Re: Help me recovering data

The checkpointer is entirely incapable of either detecting the problem
(it doesn't have enough infrastructure to examine pg_database in a
reasonable way) or preventing backends from doing anything if it did
know there was a problem.

Well, I guess I meant 'some regularly running process'...

I think people'd rather their db just stopped accepting new transactions
rather than just losing data...

Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?

Not allowing any transactions except a vacuum...

autovacuum is the correct long-term solution to this, not some kind of
automatic hara-kiri.

Yeah, seems like it should really happen soon...

Chris

#11Mark Woodward
pgsql@mohawksoft.com
In reply to: Christopher Kings-Lynne (#10)
Re: Help me recovering data

The checkpointer is entirely incapable of either detecting the problem
(it doesn't have enough infrastructure to examine pg_database in a
reasonable way) or preventing backends from doing anything if it did
know there was a problem.

Well, I guess I meant 'some regularly running process'...

I think people'd rather their db just stopped accepting new transactions
rather than just losing data...

Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?

Not allowing any transactions except a vacuum...

autovacuum is the correct long-term solution to this, not some kind of
automatic hara-kiri.

Yeah, seems like it should really happen soon...

Chris

Maybe I'm missing something, but shouldn't the prospect of data loss (even
in the presense of admin ignorance) be something that should be
unacceptable? Certainly within the realm "normal PostgreSQL" operation.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Woodward (#11)
Re: Help me recovering data

pgsql@mohawksoft.com writes:

Maybe I'm missing something, but shouldn't the prospect of data loss (even
in the presense of admin ignorance) be something that should be
unacceptable? Certainly within the realm "normal PostgreSQL" operation.

[ shrug... ] The DBA will always be able to find a way to shoot himself
in the foot. We've seen several instances of people blowing away
pg_xlog and pg_clog, for example, because they "don't need log files".
Or how about failing to keep adequate backups? That's a sure way for an
ignorant admin to lose data too.

Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.

regards, tom lane

#13Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#12)
Re: Help me recovering data

pgsql@mohawksoft.com writes:

Maybe I'm missing something, but shouldn't the prospect of data loss
(even
in the presense of admin ignorance) be something that should be
unacceptable? Certainly within the realm "normal PostgreSQL" operation.

[ shrug... ] The DBA will always be able to find a way to shoot himself
in the foot. We've seen several instances of people blowing away
pg_xlog and pg_clog, for example, because they "don't need log files".
Or how about failing to keep adequate backups? That's a sure way for an
ignorant admin to lose data too.

There is a difference between actively doing something stupid and failing
to realize a maintenence task is required.

PostgreSQL should stop working. When the admin tries to understand why,
they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum
thingy." That is a whole lot better than falling off a cliff you didn't
even know was there.

Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.

I don't know how to say this without sounding like a jerk, (I guess that's
my role sometimes) but would you go back and re-read this sentence?

To paraphrase: "I know this causes a catestrophic data loss, and we have
plans to fix it in the future, but for now, I'm not going panic about it."

What would you do if the FreeBSD group or Linux kernel group said this
about a file system? If you failed to run fsck after 100 mounts, you loose
your data?

I thought PostgreSQL was about "protecting your data." How many times have
we smugly said, "yea, you can use MySQL if you don't care about your
data." Any data loss caused by postgresql should be seen as unacceptable.
It's funny, while I've known about this for a while, and it has always
seemed a sort of distant edge condition that is easily avoided. However,
with todays faster machines and disks, it is easier to reach this
limitation than ever before. All PostgreSQL needs is one or two VERY UPSET
mainstream users who lose data to completely reverse the momemntum that it
is gaining.

No amount of engineering discussion about it not being the fault of
postgresql will be lost, and rightfully so, IMHO.

Sorry.

#14Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Mark Woodward (#13)
Re: Help me recovering data

On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:

Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.

I don't know how to say this without sounding like a jerk, (I guess that's
my role sometimes) but would you go back and re-read this sentence?

To paraphrase: "I know this causes a catestrophic data loss, and we have
plans to fix it in the future, but for now, I'm not going panic about it."

Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.

#15Joshua D. Drake
jd@commandprompt.com
In reply to: Mark Woodward (#13)
Re: Help me recovering data

in the foot. We've seen several instances of people blowing away
pg_xlog and pg_clog, for example, because they "don't need log files".
Or how about failing to keep adequate backups? That's a sure way for an
ignorant admin to lose data too.

There is a difference between actively doing something stupid and failing
to realize a maintenence task is required.

PostgreSQL should stop working. When the admin tries to understand why,
they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum
thingy." That is a whole lot better than falling off a cliff you didn't
even know was there.

There is another way to look at this as lends itself to mohawksoft's
argument.

More often than not DBAs and Sysadmins are neither one. They are people
that get shoved into the job because they happen to mention around
the water cooler that they "once" installed linux/freebsd -- whatever.

Maybe it is an executive that has some of his brains left after
sitting behind a desk all day for the last 10 years. One day he/she
gets a thought in his head to create a new project named "foo".

He does not want to waste his internal resources so said executive
decides he will do it himself as a hobby. For some reason, the project
actually succeeds (I have seen this many times) and the company starts
using it.

Well guess what... it uses PostgreSQL. The guy isn't a DBA, heck he is
even really a programmer. He had know idea about this "vacuum" thing. He
had never heard of other databases having to do it.

So they run for a year, and then all of a sudden **BOOM** the world ends.

Do you think they are going to care that we "documented" the issue? Uhmmm
no they won't. Chances are they will drop kick PostgreSQL and bad talk it
to all their other executive friends.

In short, this whole argument has the mark of irresponsibility on both
parties but it is is the PostgreSQL projects responisbility to make
reasonable effort to produce a piece of software that doesn't break.

We are not talking about a user who ran a query: delete from foo;

At this point we have a known critical bug. Usually the PostgreSQL community
is all over critical bugs. Why is this any different?

It sounds to me that people are just annoyed that users don't RTFM. Get
over it. Most won't. If users RTFM more often, it would put most support
companies out of business.

Sincerely,

Joshua D. Drake

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
#16Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Joshua D. Drake (#15)
Re: Help me recovering data

At this point we have a known critical bug. Usually the PostgreSQL
community
is all over critical bugs. Why is this any different?

It sounds to me that people are just annoyed that users don't RTFM. Get
over it. Most won't. If users RTFM more often, it would put most support
companies out of business.

I wonder if I should point out that we just had 3 people suffering XID
wraparound failure in 2 days in the IRC channel...

Chris

#17Joshua D. Drake
jd@commandprompt.com
In reply to: Stephan Szabo (#14)
Re: Help me recovering data

Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.

I believe that the ability for PostgreSQL to stop accepting
queries and to log to the file or STDERR why it stopped working
and how to resolve it is appropriate.

Also it is probably appropriate to warn ahead of time...

WARNING: Only 50,000 transactions left before lock out

or something like that.

J

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
#18Joshua D. Drake
jd@commandprompt.com
In reply to: Christopher Kings-Lynne (#16)
Re: Help me recovering data

Christopher Kings-Lynne wrote:

At this point we have a known critical bug. Usually the PostgreSQL
community
is all over critical bugs. Why is this any different?

It sounds to me that people are just annoyed that users don't RTFM.
Get over it. Most won't. If users RTFM more often, it would put most
support companies out of business.

I wonder if I should point out that we just had 3 people suffering XID
wraparound failure in 2 days in the IRC channel...

I have had half a dozen new customers in the last six months that have
had the same problem. Nothing like the phone call:

Uhmmm I am a new customer, help I can't see my databases.

Sincerely,

Joshua D. Drake

Chris

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
#19Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Joshua D. Drake (#17)
Re: Help me recovering data

On Wed, 16 Feb 2005, Joshua D. Drake wrote:

Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.

I believe that the ability for PostgreSQL to stop accepting
queries and to log to the file or STDERR why it stopped working
and how to resolve it is appropriate.

Right, but since the how to resolve it currently involves executing a
query, simply stopping dead won't allow you to resolve it. Also, if we
stop at the exact wraparound point, can we run into problems actually
trying to do the vacuum if that's still the resolution technique? If so,
how far in advance of wraparound must we stop to guarantee it will
succeed? It's not rocket science, but figuring such things out is part of
actually making a workable solution.

#20Mark Woodward
pgsql@mohawksoft.com
In reply to: Stephan Szabo (#14)
Re: Help me recovering data

On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:

Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.

I don't know how to say this without sounding like a jerk, (I guess
that's
my role sometimes) but would you go back and re-read this sentence?

To paraphrase: "I know this causes a catestrophic data loss, and we have
plans to fix it in the future, but for now, I'm not going panic about
it."

Do you have a useful suggestion about how to fix it? "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.

Actually, it is not a solution to the problem of losing data. It is a drop
dead last ditch failsafe that EVERY PRODUCT should have before losing
data.

Show quoted text
#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#19)
#22Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Mark Woodward (#20)
#23Mark Woodward
pgsql@mohawksoft.com
In reply to: Stephan Szabo (#19)
#24Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#21)
#25Mark Woodward
pgsql@mohawksoft.com
In reply to: Stephan Szabo (#22)
#26Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#21)
#27Richard Huxton
dev@archonet.com
In reply to: Stephan Szabo (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#21)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#26)
#30Bruno Wolff III
bruno@wolff.to
In reply to: Stephan Szabo (#26)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#18)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#32)
#34Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#33)
#35Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#12)
#36Gaetano Mendola
mendola@bigfoot.com
In reply to: Bruce Momjian (#32)
#37Gaetano Mendola
mendola@bigfoot.com
In reply to: Stephan Szabo (#14)
#38Gaetano Mendola
mendola@bigfoot.com
In reply to: Tom Lane (#31)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gaetano Mendola (#38)
#40Bruce Momjian
bruce@momjian.us
In reply to: Gaetano Mendola (#36)
#41Dennis Bjorklund
db@zigo.dhs.org
In reply to: Bruce Momjian (#40)
#42Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#33)
#43Mark Woodward
pgsql@mohawksoft.com
In reply to: Bruce Momjian (#40)
#44Gaetano Mendola
mendola@bigfoot.com
In reply to: Bruce Momjian (#40)
#45Kevin Brown
kevin@sysexperts.com
In reply to: Tom Lane (#39)
#46Jürgen Cappel
email@juergen-cappel.de
In reply to: Kevin Brown (#45)
#47Russell Smith
mr-russ@pws.com.au
In reply to: Kevin Brown (#45)
#48Russell Smith
mr-russ@pws.com.au
In reply to: Jürgen Cappel (#46)
#49Robert Treat
xzilla@users.sourceforge.net
In reply to: Mark Woodward (#43)
#50Matthew T. O'Connor
matthew@zeut.net
In reply to: Russell Smith (#47)
#51Bruce Momjian
bruce@momjian.us
In reply to: Matthew T. O'Connor (#35)
#52Thomas F.O'Connell
tfo@sitening.com
In reply to: Matthew T. O'Connor (#35)
#53Matthew T. O'Connor
matthew@zeut.net
In reply to: Thomas F.O'Connell (#52)