Tables dissapearing
Hi guys!
I use pgsql for some time already and am happy with it. Heh, sure this
post has it's big BUT :-)
Starting few months ago, one of our projects encoutered loss of one DB
table (in that time, it was version 8.0 or so...) I did some research
and found out, the vacuuming was set wrong and transaction wraparound
failure occured... OK, I upgraded to 8.2.1 (latest in that time) and set
the auto vacuum... (till now I think it was set correctly and
working)... but after less than one month, the same happened... but this
time, the age of all databases looked OK...
So I did upgrade to 8.2.3 (the project is running this version till now)
but the same happened today...
This time I didn't rely on the autovacuuming and did code cron scheduled
job to do vacuum full on all tables during nightly maintenance. I'm now
sure, the wraparound didn't occure (am I?)... in fact, I'm lost...
One more thing:
The project runs proprietal CMS system and there are more instances of
it with the same database layout in different databases. Every time the
"lost" table is the same one - the bussiest one (mostly read)... and
everytime the "lost" table is gone from more projects at once... today
it was 2 projects affected out of 7 in the same pgsql server... it looks
also like the problem happens everytime short after midnight - scarry -
ha ha :-)
I have the backup of DB files as they was this morning, but it's alter
the vacuum script...
My system is:
OS: Debian 3.0
Kernel: 2.4.26
PgSQL: 8.2.3
Any advice and/or hint to the right direction is greatly welcome!
Regards,
--
Kamil
Kamil Srot <kamil.srot@nlogy.com> writes:
One more thing:
The project runs proprietal CMS system and there are more instances of
it with the same database layout in different databases. Every time the
"lost" table is the same one - the bussiest one (mostly read)... and
everytime the "lost" table is gone from more projects at once... today
it was 2 projects affected out of 7 in the same pgsql server... it looks
also like the problem happens everytime short after midnight - scarry -
ha ha :-)
Tables don't just "disappear" like that. Check your application code
to see if it's not dropping the table someplace.
regards, tom lane
Tom Lane wrote:
Kamil Srot <kamil.srot@nlogy.com> writes:
One more thing:
The project runs proprietal CMS system and there are more instances of
it with the same database layout in different databases. Every time the
"lost" table is the same one - the bussiest one (mostly read)... and
everytime the "lost" table is gone from more projects at once... today
it was 2 projects affected out of 7 in the same pgsql server... it looks
also like the problem happens everytime short after midnight - scarry -
ha ha :-)Tables don't just "disappear" like that. Check your application code
to see if it's not dropping the table someplace.regards, tom lane
grep -i drop your_application_dir
brian wrote:
Tom Lane wrote:
Kamil Srot <kamil.srot@nlogy.com> writes:
One more thing:
The project runs proprietal CMS system and there are more instances
of it with the same database layout in different databases. Every
time the "lost" table is the same one - the bussiest one (mostly
read)... and everytime the "lost" table is gone from more projects
at once... today it was 2 projects affected out of 7 in the same
pgsql server... it looks also like the problem happens everytime
short after midnight - scarry - ha ha :-)Tables don't just "disappear" like that. Check your application code
to see if it's not dropping the table someplace.regards, tom lane
grep -i drop your_application_dir
This was the first thing, I checked... if something, this is the thing,
I'm 100% sure is not the case.
The application doesn't manipulate with the database schema at all... so
no CREATE/ALTER/DROP statements anywhere.
On top there is the "dissaperance" from more databases at once (they are
absolutelly no interconnected in any way - all live their own life)
I don't say, it's gone by itself, I'm asking for help debuging this
situation and hopefully find a solution. For the first time it happened,
it had the same symptoms - this specific table was missing and
transaction counter was wrapped-around. Now it seems to be the same but
the transaction counters are far below the critical value...
It happened four times already and the customer is starting to become
crazy... the same as I do...
Thank you, Tom, Brian and all who read it, for your time!
Regards,
--
Kamil
On Mon, Aug 27, 2007 at 06:37:17PM +0200, Kamil Srot wrote:
I don't say, it's gone by itself, I'm asking for help debuging this
situation and hopefully find a solution. For the first time it happened,
it had the same symptoms - this specific table was missing and
transaction counter was wrapped-around. Now it seems to be the same but
the transaction counters are far below the critical value...
You don't say explicitly, but when you say "vacuum all tables" you
didn't specify any tables names right? You just typed "vacuum". On each
database...
As for debugging, maybe something like:
select xmin, age(xmin) from pg_class;
Just to check the wraparound issue...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Mon, Aug 27, 2007 at 06:37:17PM +0200, Kamil Srot wrote:
I don't say, it's gone by itself, I'm asking for help debuging this
situation and hopefully find a solution. For the first time it happened,
it had the same symptoms - this specific table was missing and
transaction counter was wrapped-around. Now it seems to be the same but
the transaction counters are far below the critical value...You don't say explicitly, but when you say "vacuum all tables" you
didn't specify any tables names right? You just typed "vacuum". On each
database...
Correct...the script does echo "vacuum full;" | $PGDIR/bin/psql -U
postgres $db for each database...
Hope it's correct?
As for debugging, maybe something like:
select xmin, age(xmin) from pg_class;
Just to check the wraparound issue...
Maybe it's it... I did check the wraparound with:
SELECT datname, age(datfrozenxid) FROM pg_database;
In your SQL I see several rows with too high numbers!
They are all "internal" like pg_toast_618854, views and also some
application level indices etc.
So it seems it's not enough to vacuum all DB's (templateX inclusive)
Can you please advice how to correctly manually vacuum the pgsql?
Thank you!
Regards,
--
Kamil
On Mon, Aug 27, 2007 at 06:57:54PM +0200, Kamil Srot wrote:
Correct...the script does echo "vacuum full;" | $PGDIR/bin/psql -U
postgres $db for each database...
Hope it's correct?
Well, I'd drop the "full" part, it tends to bloat indexes. Also, did
you check it was actually completing (no errors)?
Maybe it's it... I did check the wraparound with:
SELECT datname, age(datfrozenxid) FROM pg_database;In your SQL I see several rows with too high numbers!
They are all "internal" like pg_toast_618854, views and also some
application level indices etc.
Depends what you mean by too high. Anything with XID 1 and 2 is not a
problem, and age returns a really big number for them. Can you give
some examples?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Mon, Aug 27, 2007 at 06:57:54PM +0200, Kamil Srot wrote:
Correct...the script does echo "vacuum full;" | $PGDIR/bin/psql -U
postgres $db for each database...
Hope it's correct?Well, I'd drop the "full" part, it tends to bloat indexes. Also, did
you check it was actually completing (no errors)?
Yes, it completes w/o errors... just VACUUM as output...
OK, I'll drop the full part and do it less often...
Maybe it's it... I did check the wraparound with:
SELECT datname, age(datfrozenxid) FROM pg_database;In your SQL I see several rows with too high numbers!
They are all "internal" like pg_toast_618854, views and also some
application level indices etc.Depends what you mean by too high. Anything with XID 1 and 2 is not a
problem, and age returns a really big number for them. Can you give
some examples?
High numbers is near by the bigint limit... 2 billions.
Here you go:
# select xmin, age(xmin) from pg_class;
xmin | age
-----------+------------
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
236838019 | 539453
2 | 2147483647
2 | 2147483647
231899309 | 5478163
etc.
Basically everythin has the same age 2147483647 with xmin = 1 or xmin =
2.. but the two lines shown...
Actually the same problem happened several minutes ago :-(
Thank you for any hint!
Regards,
--
Kamil
Kamil Srot wrote:
Martijn van Oosterhout wrote:
On Mon, Aug 27, 2007 at 06:57:54PM +0200, Kamil Srot wrote:
Correct...the script does echo "vacuum full;" | $PGDIR/bin/psql -U
postgres $db for each database...
Hope it's correct?Well, I'd drop the "full" part, it tends to bloat indexes. Also, did
you check it was actually completing (no errors)?Yes, it completes w/o errors... just VACUUM as output...
OK, I'll drop the full part and do it less often...
What user is doing it? Is it a superuser? If it's not, then there's
your problem.
--
Alvaro Herrera Valdivia, Chile ICBM: S 39� 49' 18.1", W 73� 13' 56.4"
"If it wasn't for my companion, I believe I'd be having
the time of my life" (John Dunbar)
Alvaro Herrera wrote:
Kamil Srot wrote:
Martijn van Oosterhout wrote:
On Mon, Aug 27, 2007 at 06:57:54PM +0200, Kamil Srot wrote:
Correct...the script does echo "vacuum full;" | $PGDIR/bin/psql -U
postgres $db for each database...
Hope it's correct?Well, I'd drop the "full" part, it tends to bloat indexes. Also, did
you check it was actually completing (no errors)?Yes, it completes w/o errors... just VACUUM as output...
OK, I'll drop the full part and do it less often...What user is doing it? Is it a superuser? If it's not, then there's
your problem.
It's user postgres... the vacuum completes correctly...
Thank you,
--
Kamil
On Mon, Aug 27, 2007 at 07:15:44PM +0200, Kamil Srot wrote:
OK, I'll drop the full part and do it less often...
This doesn't address your problem, but when you move from VACUUM FULL
to VACUUM, you want to do it _more_ often, not less.
But given what you've posted, I am not even a little bit convinced
your script is working as you think it is. You're clearly not
managing to vacuum the entire database sometimes.
A
--
Andrew Sullivan | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton
Andrew Sullivan wrote:
On Mon, Aug 27, 2007 at 07:15:44PM +0200, Kamil Srot wrote:
OK, I'll drop the full part and do it less often...
This doesn't address your problem, but when you move from VACUUM FULL
to VACUUM, you want to do it _more_ often, not less.
Sure, I ment it like I'll do the FULL vacuum less often than daily and
do daily the plain vacuum command.
But given what you've posted, I am not even a little bit convinced
your script is working as you think it is. You're clearly not
managing to vacuum the entire database sometimes.
Well, I do list all databases with the command "psql -U postgres -l -t"
and with some shell coding I do vacuum all shown databases...
I don't think there is some problem in it...
If you do see some, can you give me a hint?
Thank you,
--
Kamil
On Mon, Aug 27, 2007 at 10:03:04PM +0200, Kamil Srot wrote:
Sure, I ment it like I'll do the FULL vacuum less often than daily and
do daily the plain vacuum command.
If you have your servers set up correctly, you should never need to
perform VACUUM FULL.
Well, I do list all databases with the command "psql -U postgres -l -t"
and with some shell coding I do vacuum all shown databases...
I don't think there is some problem in it...If you do see some, can you give me a hint?
I don't think I've seen the script itself. Without seeing it, I can
only guess. But you don't actually need to do that. You should be
able to run vacuumdb -a, and get the result you need.
A
--
Andrew Sullivan | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
--Dennis Ritchie
Andrew Sullivan wrote:
On Mon, Aug 27, 2007 at 10:03:04PM +0200, Kamil Srot wrote:
Sure, I ment it like I'll do the FULL vacuum less often than daily and
do daily the plain vacuum command.If you have your servers set up correctly, you should never need to
perform VACUUM FULL.
I see. Thank you for the hint...
Well, I do list all databases with the command "psql -U postgres -l -t"
and with some shell coding I do vacuum all shown databases...
I don't think there is some problem in it...If you do see some, can you give me a hint?
I don't think I've seen the script itself. Without seeing it, I can
only guess. But you don't actually need to do that. You should be
able to run vacuumdb -a, and get the result you need.
The script is very simple one:
#!/bin/sh
PGDIR=/opt/pgsql
DATABASES=`$PGDIR/bin/psql -U postgres -l -t | cut -f 1 -d '|'`
for db in $DATABASES
do
echo "vacuum;" | $PGDIR/bin/psql -U postgres $db
done
I can easily rewrite it to use the vacuumdb command, but I doubt it'll
make any difference.
Is there a way how to find out the problem is here? I scanned the log
output of pgsql for wrap around but didn't found aby recent relevant
entries... I'm pretty sure, I miss something stupid :-(
Thank you very much for your help,
--
Kamil
On Mon, Aug 27, 2007 at 10:31:11PM +0200, Kamil Srot wrote:
The script is very simple one:
Well, I don't see anything obvious, but. . .
I can easily rewrite it to use the vacuumdb command, but I doubt it'll
make any difference.
The point is that you don't have to rewrite it. Just run "vacuumdb
-a" and it vacuums _all_ databases.
Is there a way how to find out the problem is here? I scanned the log
output of pgsql for wrap around but didn't found aby recent relevant
entries... I'm pretty sure, I miss something stupid :-(
Are you quite sure you're getting all your logs where you think you
are?
A
--
Andrew Sullivan | ajs@crankycanuck.ca
The plural of anecdote is not data.
--Roger Brinner
Andrew Sullivan wrote:
I can easily rewrite it to use the vacuumdb command, but I doubt it'll
make any difference.The point is that you don't have to rewrite it. Just run "vacuumdb
-a" and it vacuums _all_ databases.
Oh, I have it now! It takes some time, but at the end, I'll understand :-D
Sure, it's much better than coding some scripts etc. I already used it
in my script...
Is there a way how to find out the problem is here? I scanned the log
output of pgsql for wrap around but didn't found aby recent relevant
entries... I'm pretty sure, I miss something stupid :-(Are you quite sure you're getting all your logs where you think you
are?
Yes, the log is correct... I can see statement errors I did selecting
from system tables few hours ago...
I have default debug level... should I increase it?
I really need to make sure, the next problem (if it's necessary to let
it happen) will be the last one... :-(
Thank you,
--
Kamil
Kamil Srot <kamil.srot@nlogy.com> writes:
# select xmin, age(xmin) from pg_class;
xmin | age
-----------+------------
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
236838019 | 539453
2 | 2147483647
2 | 2147483647
231899309 | 5478163
etc.
I see no indication here that there's anything wrong at all.
The rows with xmin = 2 (ie, FrozenTransactionID) are evidently pg_class
rows that have never been modified since the database was created.
There isn't anything wrong with them. age() is returning INT_MAX for
them to denote that they are "infinitely far in the past", but they are
in no danger of being invalidated by wraparound.
The newer rows indicate that your current transaction counter is around
237 million, which is barely a tenth of the way to wraparound. While
it's conceivable that it already wrapped around (ie, you've really
executed 237 million plus some multiple of 2^32 transactions), I think
it's more probable that no wraparound has happened and your problem is
entirely unrelated to that. I believe this in part because having just
one table "disappear" is not the likely result of an XID wraparound
problem, and in part because 8.2 has got very strong defenses against
allowing wraparound failure to occur: it will shut down first, and it
will do a whole lot of bleating and autovacuuming before that.
You haven't yet showed us any details of what happens. What exactly do
you mean by the table "disappearing"? Can you select from it? Do you
see a row for it in pg_class?
regards, tom lane
Tom Lane wrote:
Kamil Srot <kamil.srot@nlogy.com> writes:
# select xmin, age(xmin) from pg_class;
xmin | age
-----------+------------
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
236838019 | 539453
2 | 2147483647
2 | 2147483647
231899309 | 5478163
etc.I see no indication here that there's anything wrong at all.
The rows with xmin = 2 (ie, FrozenTransactionID) are evidently pg_class
rows that have never been modified since the database was created.
There isn't anything wrong with them. age() is returning INT_MAX for
them to denote that they are "infinitely far in the past", but they are
in no danger of being invalidated by wraparound.
I think the same, but I'm at the end of my ideas...
You haven't yet showed us any details of what happens. What exactly do
you mean by the table "disappearing"? Can you select from it? Do you
see a row for it in pg_class?
It's completelly gone... cannot do anything with this relation... I
fixed it by creating this specific table from several hours old backup
and it's fine (this table is nearly static)...
I didn't check if the relation was in pg_class... but I have complete
backup of database files after this issue from today and also from the
last time it happened (but they was vacuumed few hour before the backup
by nightly script, so the data can be changed in this snapshots)... but
it it can help and you'll navigate me, I can fetch any data you want
from it...
Thank you,
--
Kamil
On Aug 27, 2007, at 4:08 PM, Tom Lane wrote:
Kamil Srot <kamil.srot@nlogy.com> writes:
# select xmin, age(xmin) from pg_class;
xmin | age
-----------+------------
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
2 | 2147483647
236838019 | 539453
2 | 2147483647
2 | 2147483647
231899309 | 5478163
etc.I see no indication here that there's anything wrong at all.
The rows with xmin = 2 (ie, FrozenTransactionID) are evidently
pg_class
rows that have never been modified since the database was created.
There isn't anything wrong with them. age() is returning INT_MAX for
them to denote that they are "infinitely far in the past", but they
are
in no danger of being invalidated by wraparound.The newer rows indicate that your current transaction counter is
around
237 million, which is barely a tenth of the way to wraparound. While
it's conceivable that it already wrapped around (ie, you've really
executed 237 million plus some multiple of 2^32 transactions), I think
it's more probable that no wraparound has happened and your problem is
entirely unrelated to that. I believe this in part because having
just
one table "disappear" is not the likely result of an XID wraparound
problem, and in part because 8.2 has got very strong defenses against
allowing wraparound failure to occur: it will shut down first, and it
will do a whole lot of bleating and autovacuuming before that.You haven't yet showed us any details of what happens. What
exactly do
you mean by the table "disappearing"? Can you select from it? Do you
see a row for it in pg_class?regards, tom lane
Also, in your original post you mentioned a "proprietal CMS system".
Is this proprietary to your company or one that you've purchased?
The fact that the same table going on multiple dbs all being run by
that CMS system certainly makes it worthy of suspicion.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
Also, in your original post you mentioned a "proprietal CMS system".
Is this proprietary to your company or one that you've purchased? The
fact that the same table going on multiple dbs all being run by that
CMS system certainly makes it worthy of suspicion.
This is software developed in our company... so I'm sure it's not duing
aby schema manipulation. I'm actually senior developer of this project
by accident :-)
The strange thing is, all the projects are completelly independend...
has its own DB, folder with scripts, different data... just the DB user
is the same... so it's higly unprobable, that it'll do 2 similar errors
in thow distinct databases at nearly the same time...
When this problem appeared for the first time, I had clearly the
wraparound problem... I did vacuum it and partially restored the data...
but in some meantime, I had commands like \dt showing all relations
twice... (some system catalog problem)... then I did full dump and
restore along with upgrade to newest pgsql server software... this
duplicity was gone and never appeared again.
From above mentioned duplications of relatio names and what Tom wrote
recently (doesn't see like WA problem), it looks like the relation name
is/gets corrupted in some way and this corruption is internally taken
over to another instance of relation named the same but in another
database... but I know - it's too speculative.
Maybe the binary backups of all the DB files from the time of this
problem can give some answers?
Regards,
--
Kamil