Transaction, Rollback and Database Corruption question,

Started by Joe Kisloover 25 years ago4 messagesgeneral
Jump to latest
#1Joe Kislo
kislo@athenium.com

I've never used PGSQL in a production environment, so forgive me if I'm
missing some of the common nuances assosciated with PGSQL. I have
burrowed through the dox, but havn't found anything which goes into the
detail I need.

I was trying to import a database into PGSQL, I have a file which
contains a long series of create table and insert clauses. I was using
the command:
isql DB_NAME < FILENAME

At the top of the import file there is a BEGIN;

and at the bottom there is a commit. This should make this entire
import a single transaction.

During this process I accidentially hit control-C, canceling the whole
shabang. The transaction obviously was never commited, and since the
connection to the database was terminated, a rollback should have
occoured.

I tried to restart, but errors started flying every which way.
Investigating using psql yielded illogical results:

TT_fcgi=# create table answer (i int);
ERROR: cannot create answer
ERROR: cannot create answer
TT_fcgi=# \d
No relations found.
TT_fcgi=# drop table answer;
ERROR: Relation 'answer' does not exist
ERROR: Relation 'answer' does not exist
TT_fcgi=# create table answer (i int);
ERROR: cannot create answer
ERROR: cannot create answer

PG gives no usuable error message as to why the table cannot be created,
and I'm simply nolonger able to create a table named "answer".

I terminated the PGserver, and restarted it. Obviously that would
-force- a rollback of any inprogress transactions. Still, same results,
my database is hosed.

From my experiences with Interbase this should simply never happen.

Here's the obvious questions stemming from this problem:

Why was my transaction never rolled back?
How do you determine what the outstanding transactions are?
How do you remotely roll back pending transactions?
How do you determine who is currently connected to a database?

What happens in this case of two transactions:

T1: begin
T2: begin
T1: Create table x with schema y
T2: create table x with schema z
T1: insert a into x
T2: insert b into x
T2: rollback
T1: commit

Does the existance of a table in one transaction context cross over to
other transactions?
How do I query my PG server and determine the version? The PGSQL client
version reports 7.0.2; but in the future I would like to query the
server and not some client application assosciated with PG.

FYI, deleting the answer file in the db directory seemed to alleviate my
problem. But obviously that is highly highly highly unacceptable.

Thanks,

-Joe

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Kislo (#1)
Re: Transaction, Rollback and Database Corruption question,

Joe Kislo <kislo@athenium.com> writes:

At the top of the import file there is a BEGIN;
and at the bottom there is a commit. This should make this entire
import a single transaction.

During this process I accidentially hit control-C, canceling the whole
shabang. The transaction obviously was never commited, and since the
connection to the database was terminated, a rollback should have
occoured.

Hm. There are some longstanding problems in 7.0.* and before if you try
to roll back a DROP TABLE (finally fixed for 7.1, but that doesn't help
you now). I'm not sure if that explains your results, however. What
is the structure of DROP and CREATE commands in your script?

TT_fcgi=# create table answer (i int);
ERROR: cannot create answer
ERROR: cannot create answer
TT_fcgi=# \d
No relations found.
TT_fcgi=# drop table answer;
ERROR: Relation 'answer' does not exist
ERROR: Relation 'answer' does not exist

This looks a lot like there is a physical file
$PGDATA/base/TT_fcgi/answer, but no pg_class entry for "answer".
I'm not sure how you could've gotten to that state based on what
you said above, but you should be able to restore the database to
usability by reaching in and deleting the orphaned file.

PG gives no usuable error message as to why the table cannot be created,

7.0.3 fixes the lack of kernel error messages when reporting failure
to create/delete a physical file.

What happens in this case of two transactions:

T1: begin
T2: begin
T1: Create table x with schema y
T2: create table x with schema z

T2 will wait to see if T1 commits, and if so fail with a duplicate-key
error...

How do I query my PG server and determine the version?

select version();

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Kislo (#1)
Re: Transaction, Rollback and Database Corruption question,

Joe Kislo <kislo@athenium.com> writes:

Well, I never issue a drop table. I issue a CREATE TABLE, I issue some
inserts.. then I TERMINATE the database connection. Implicit rollback
should uncreate the table and leave everything consistent.

Oh, wait, you're running 7.0.2 aren't you? Update to 7.0.3, it'll get
better. 7.0.2 forgets to run the transaction-abort cleanup routine if
the client disconnects mid-transaction. The database itself is fine,
but the action of deleting the no-longer-referenced table files doesn't
get done.

If I terminate a large set of create
tables and inserts in the middle.. and I try again, I get this error
spewing from the server:

FATAL 1: my bits moved right off the end of the world!
Recreate index pg_attribute_relid_attnam_index.

We've heard sporadic reports of this, but I've never been able to
reproduce it myself. If you can submit a reproducible test script,
it'd be a big help.

Oh, one more thing. How do I, using SQL, determine if a table exists?
EG, in Mysql it's something like:
show tables like 'MyTable';

In Interbase you select it out of the RDBS$RELATIONS table...

In PG you look for its entry in pg_class. What, you wanted a standard ;-) ?

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Kislo (#1)
Re: Transaction, Rollback and Database Corruption question,

Joe Kislo <kislo@athenium.com> writes:

FATAL 1: my bits moved right off the end of the world!
Recreate index pg_attribute_relid_attnam_index.

We've heard sporadic reports of this, but I've never been able to
reproduce it myself. If you can submit a reproducible test script,
it'd be a big help.

Oh.. it's repeatable :) I will upgrade to 7.0.3 tomorrow at work, and
see if it still happens.. I had been assuming it had something to do
with the same problem described at the top (so it may be fixed by
7.0.3).

Probably it's not fixed. My suspicion is that there's some corner-case
bug lurking in the btree index routines. With a reproducible test case,
we could find and squash it... but I've been over that code with a
flinty eye, and I haven't found it yet.

regards, tom lane