2PC-induced lockup

Started by Peter Eisentrautalmost 19 years ago30 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

The following command sequence appears to lock up the database system:

BEGIN;
LOCK pg_authid;
PREPARE TRANSACTION 'foo';
\q

After that you can't connect anymore, even in single-user mode. The only way
I could find is to clear out the pg_twophase directory, but I'm not sure
whether it is safe to do that.

Should this be prevented somehow, and is there a better recovery path?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#2Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Peter Eisentraut (#1)
Re: 2PC-induced lockup

is it good to allow locks on system tables at all?
i am not so sure. have seen some disaster in the past with that. just
consider somebody placing ACCESS EXCLUSIVE LOCK on a system table. it
is basically denial of service.

best regards,

hans

On Jul 10, 2007, at 3:14 PM, Peter Eisentraut wrote:

The following command sequence appears to lock up the database system:

BEGIN;
LOCK pg_authid;
PREPARE TRANSACTION 'foo';
\q

After that you can't connect anymore, even in single-user mode.
The only way
I could find is to clear out the pg_twophase directory, but I'm not
sure
whether it is safe to do that.

Should this be prevented somehow, and is there a better recovery path?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Cybertec Geschwinde & Schönig GmbH
Gröhrmühlgasse 26, 2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: 2PC-induced lockup

Peter Eisentraut <peter_e@gmx.net> writes:

The following command sequence appears to lock up the database system:
BEGIN;
LOCK pg_authid;
PREPARE TRANSACTION 'foo';
\q

After that you can't connect anymore, even in single-user mode. The
only way I could find is to clear out the pg_twophase directory, but
I'm not sure whether it is safe to do that.

Should this be prevented somehow, and is there a better recovery path?

AFAICS this is just one of many ways in which a superuser can shoot
himself in the foot; I'm not eager to try to prevent it.

Right offhand, clearing pg_twophase while the system is stopped should
be safe enough.

regards, tom lane

#4tomas@tuxteam.de
tomas@tuxteam.de
In reply to: Tom Lane (#3)
Re: 2PC-induced lockup

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Jul 10, 2007 at 10:41:31AM -0400, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

The following command sequence appears to lock up the database system:
BEGIN;
LOCK pg_authid;
PREPARE TRANSACTION 'foo';
\q

After that you can't connect anymore, even in single-user mode. The
only way I could find is to clear out the pg_twophase directory, but
I'm not sure whether it is safe to do that.

Should this be prevented somehow, and is there a better recovery path?

AFAICS this is just one of many ways in which a superuser can shoot
himself in the foot; I'm not eager to try to prevent it.

Right offhand, clearing pg_twophase while the system is stopped should
be safe enough.

It might make sense then to clear the pg_twophase directory on DB
startup. Nobody would expect the locks to persist a database restart --
or am I way off?

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGlFdxBcgs9XrR2kYRAp9/AJ4s8fBkhtaxqfu0QxBhSN2lCi++zgCfRsS9
Jpjv6513ubPtfldf2fItzj0=
=KAW0
-----END PGP SIGNATURE-----

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: tomas@tuxteam.de (#4)
Re: 2PC-induced lockup

tomas@tuxteam.de writes:

On Tue, Jul 10, 2007 at 10:41:31AM -0400, Tom Lane wrote:

Right offhand, clearing pg_twophase while the system is stopped should
be safe enough.

It might make sense then to clear the pg_twophase directory on DB
startup.

<blink> I fear you have 100% misunderstood the point. The *only*
reason for that feature is to survive DB crashes.

regards, tom lane

#6tomas@tuxteam.de
tomas@tuxteam.de
In reply to: Tom Lane (#5)
Re: 2PC-induced lockup

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote:

tomas@tuxteam.de writes:

[...]

It might make sense then to clear the pg_twophase directory on DB
startup.

<blink> I fear you have 100% misunderstood the point. The *only*
reason for that feature is to survive DB crashes.

Ah -- so it is intentional that it keeps the DB from starting again.
OK, then I misunderstood. Sorry for the noise.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGlJG5Bcgs9XrR2kYRApC9AJsF+wm9z5zJXpZ98ThuV/gn9ozpVwCfbf3L
G4OA0pu3rh/o2rOL/OvZ9bU=
=+fd6
-----END PGP SIGNATURE-----

#7Chris Browne
cbbrowne@acm.org
In reply to: Peter Eisentraut (#1)
Re: 2PC-induced lockup

tomas@tuxteam.de writes:

On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote:

tomas@tuxteam.de writes:

[...]

It might make sense then to clear the pg_twophase directory on DB
startup.

<blink> I fear you have 100% misunderstood the point. The *only*
reason for that feature is to survive DB crashes.

Ah -- so it is intentional that it keeps the DB from starting again.
OK, then I misunderstood. Sorry for the noise.

I don't think that is so much "intentional" as it is an "emergent
property."

The usual point to 2PC is that once transactions are PREPAREd, they
*need* to be stored robustly enough to survive even a DB crash.

If one locks certain vital system resources, as part of that PREPAREd
transaction, that evidently causes some problems, alas...

The right resolution to this is not, a priori, evident yet.
--
(format nil "~S@~S" "cbbrowne" "linuxdatabases.info")
http://cbbrowne.com/info/linuxdistributions.html
Rules of the Evil Overlord #38. "If an enemy I have just killed has a
younger sibling or offspring anywhere, I will find them and have them
killed immediately, instead of waiting for them to grow up harboring
feelings of vengeance towards me in my old age."
<http://www.eviloverlord.com/&gt;

#8Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Chris Browne (#7)
Re: 2PC-induced lockup

On Wed, Jul 11, 2007 at 10:43:23AM -0400, Chris Browne wrote:

The right resolution to this is not, a priori, evident yet.

_A posteriori_, though, it seems to me the right resolution is "don't
do that" ;-)

A
--
Andrew Sullivan | ajs@crankycanuck.ca
Users never remark, "Wow, this software may be buggy and hard
to use, but at least there is a lot of code underneath."
--Damien Katz

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Chris Browne (#7)
Re: 2PC-induced lockup

Chris Browne wrote:

If one locks certain vital system resources, as part of that PREPAREd
transaction, that evidently causes some problems, alas...

The right resolution to this is not, a priori, evident yet.

It's not? I agree with Tom here; this is just one of the numerous things
you can do to screw up your database as a superuser. Why would you LOCK
the pg_auth table, or any other system table for that matter, in the
first place? Let alone in a distributed transaction.

FWIW, deleting the files from pg_twophase is safe when the system is
shut down.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Heikki Linnakangas (#9)
Re: 2PC-induced lockup

On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote:

FWIW, deleting the files from pg_twophase is safe when the system is
shut down.

Is it safe for the PREPAREd TRANSACTIONs? I assume not. That is, in
Peter's presumably experimental case, it might be ok to delete the
files, but on a production system, you'd violate the semantics of 2PC
by doing this?

A

--
Andrew Sullivan | ajs@crankycanuck.ca
Users never remark, "Wow, this software may be buggy and hard
to use, but at least there is a lot of code underneath."
--Damien Katz

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrew Sullivan (#10)
Re: 2PC-induced lockup

Andrew Sullivan wrote:

On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote:

FWIW, deleting the files from pg_twophase is safe when the system is
shut down.

Is it safe for the PREPAREd TRANSACTIONs? I assume not. That is, in
Peter's presumably experimental case, it might be ok to delete the
files, but on a production system, you'd violate the semantics of 2PC
by doing this?

It's effectively the same as manually issuing a ROLLBACK PREPARED. It
will brake the atomicity of the global transaction, if some branches of
that global transaction in other resource managers have already been
committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Heikki Linnakangas (#11)
Re: 2PC-induced lockup

On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:

It's effectively the same as manually issuing a ROLLBACK PREPARED. It
will brake the atomicity of the global transaction, if some branches of
that global transaction in other resource managers have already been
committed.

But how do you know which file to delete? Is it keyed to the
transaction identifier or something?

A

--
Andrew Sullivan | ajs@crankycanuck.ca
However important originality may be in some fields, restraint and
adherence to procedure emerge as the more significant virtues in a
great many others. --Alain de Botton

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#9)
Re: 2PC-induced lockup

Heikki Linnakangas wrote:

It's not? I agree with Tom here; this is just one of the numerous
things you can do to screw up your database as a superuser. Why would
you LOCK the pg_auth table, or any other system table for that
matter, in the first place? Let alone in a distributed transaction.

Well, my test case arose from a real application scenario, not an
attempt to destroy my database system.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Eisentraut (#13)
Re: 2PC-induced lockup

Peter Eisentraut wrote:

Heikki Linnakangas wrote:

It's not? I agree with Tom here; this is just one of the numerous
things you can do to screw up your database as a superuser. Why would
you LOCK the pg_auth table, or any other system table for that
matter, in the first place? Let alone in a distributed transaction.

Well, my test case arose from a real application scenario, not an
attempt to destroy my database system.

Why does the application LOCK pg_auth?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#14)
Re: 2PC-induced lockup

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Peter Eisentraut wrote:

Heikki Linnakangas wrote:

It's not? I agree with Tom here; this is just one of the numerous
things you can do to screw up your database as a superuser. Why would
you LOCK the pg_auth table, or any other system table for that
matter, in the first place? Let alone in a distributed transaction.

Well, my test case arose from a real application scenario, not an
attempt to destroy my database system.

Why does the application LOCK pg_auth?

Even if there is a reason for a lock, surely it's not necessary to use
AccessExclusiveLock. A lesser lock would synchronize whatever the heck
it's doing without locking out readers.

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Sullivan (#12)
Re: 2PC-induced lockup

Andrew Sullivan <ajs@crankycanuck.ca> writes:

On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:

It's effectively the same as manually issuing a ROLLBACK PREPARED.

But how do you know which file to delete?

You don't. In extremis you could probably throw together some
inspection tool that could look though the 2PC records to find out which
file mentioned an exclusive lock on pg_authid ...

regards, tom lane

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#14)
Re: 2PC-induced lockup

Heikki Linnakangas wrote:

Why does the application LOCK pg_auth?

It does it with NOWAIT to determine if some other connection had already
locked it (because it was modifying some roles) in order not to lock up
the program. This (or something like it, because this doesn't work,
after all) is unfortunately necessary because schema changes don't obey
ordinary snapshot rules.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrew Sullivan (#12)
Re: 2PC-induced lockup

Andrew Sullivan wrote:

On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:

It's effectively the same as manually issuing a ROLLBACK PREPARED. It
will brake the atomicity of the global transaction, if some branches of
that global transaction in other resource managers have already been
committed.

But how do you know which file to delete? Is it keyed to the
transaction identifier or something?

The xid is encoded in the filename. If you can't start up the database
and look at pg_locks, you can't do much other than guess.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Heikki Linnakangas (#18)
Re: 2PC-induced lockup

On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:

The xid is encoded in the filename. If you can't start up the database
and look at pg_locks, you can't do much other than guess.

So then in this sort of case, it isn't _really_ safe to delete those
files, because the commitment you made before crash when you accepted
a PREPARE TRANSACTION is going to be gone, which violates the 2PC
rules.

A

--
Andrew Sullivan | ajs@crankycanuck.ca
When my information changes, I alter my conclusions. What do you do sir?
--attr. John Maynard Keynes

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrew Sullivan (#19)
Re: 2PC-induced lockup

Andrew Sullivan wrote:

On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:

The xid is encoded in the filename. If you can't start up the database
and look at pg_locks, you can't do much other than guess.

So then in this sort of case, it isn't _really_ safe to delete those
files, because the commitment you made before crash when you accepted
a PREPARE TRANSACTION is going to be gone, which violates the 2PC
rules.

Yes, though if neither you nor the transaction manager can connect, you
don't have much choice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#3)
#22Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#21)
#24Florian Pflug
fgp@phlo.org
In reply to: Tom Lane (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#23)
#26Robert Treat
xzilla@users.sourceforge.net
In reply to: Simon Riggs (#25)
#27Michael Paesold
mpaesold@gmx.at
In reply to: Simon Riggs (#25)
#28Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Simon Riggs (#22)
#29Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Tom Lane (#23)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Sullivan (#29)