Small Bug in GetConflictingVirtualXIDs
Hi Simon, Hi all,
HS currently does the following in GetConflictingVirtualXIDs
TransactionId pxmin = proc->xmin;
/*
* We ignore an invalid pxmin because this means that backend
* has no snapshot and cannot get another one while we hold exclusive lock.
*/
if (TransactionIdIsValid(pxmin) && !TransactionIdFollows(pxmin, limitXmin))
{
VirtualTransactionId vxid;
GET_VXID_FROM_PGPROC(vxid, *proc);
if (VirtualTransactionIdIsValid(vxid))
vxids[count++] = vxid;
}
The logic behind this seems fine except in the case of dropping a database.
There you very well might have a open connection without an open snapshot.
This leads to nice errors when youre connected to a database on the slave
without having a in progress transaction while dropping the database on the
primary:
CET FATAL: terminating connection due to administrator command
CET LOG: shutting down
CET LOG: restartpoint starting: shutdown immediate
CET FATAL: could not open file "base/16604/1259": No such file or directory
CET CONTEXT: writing block 5 of relation base/16604/1259
CET WARNING: could not write block 5 of base/16604/1259
CET DETAIL: Multiple failures --- write error might be permanent.
CET CONTEXT: writing block 5 of relation base/16604/1259
Should easily be solvable through an extra parameter for
GetConflictingVirtualXIDs. Want me to prepare a patch?
Andres
On Mon, 2009-12-21 at 04:02 +0100, Andres Freund wrote:
The logic behind this seems fine except in the case of dropping a database.
There you very well might have a open connection without an open snapshot.
Yes, you're right, thanks for the report.
I re-arranged the logic there recently to reduce the number of backends
that would conflict, so it looks like I broke that case when I did
that.
Should easily be solvable through an extra parameter for
GetConflictingVirtualXIDs. Want me to prepare a patch?
Much appreciated, thanks.
--
Simon Riggs www.2ndQuadrant.com
Andres Freund wrote:
The logic behind this seems fine except in the case of dropping a database.
There you very well might have a open connection without an open snapshot.
Perhaps the simplest fix is to ensure that drop database gets a snapshot?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes:
Andres Freund wrote:
The logic behind this seems fine except in the case of dropping a database.
There you very well might have a open connection without an open snapshot.
Perhaps the simplest fix is to ensure that drop database gets a snapshot?
I confess to not having followed the thread closely, but why is DROP
DATABASE special in this regard? Wouldn't we soon find ourselves
needing every utility command to take a snapshot?
regards, tom lane
On Mon, 2009-12-21 at 10:38 -0500, Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Andres Freund wrote:
The logic behind this seems fine except in the case of dropping a database.
There you very well might have a open connection without an open snapshot.Perhaps the simplest fix is to ensure that drop database gets a snapshot?
I confess to not having followed the thread closely, but why is DROP
DATABASE special in this regard? Wouldn't we soon find ourselves
needing every utility command to take a snapshot?
Andres has worded this a little imprecisely, causing a confusion. In
cases regarding HS we need to be clear whether the interacting sessions
are on the master or on the standby to understand the reasons for poor
interactions.
What he means is that you can be connected to the standby without an
open snapshot (from the standby) at the point we replay a drop database
command that had been run on the master. That case currently causes the
bug, created by my recent change to GetConflictingVirtualXids().
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.
--
Simon Riggs www.2ndQuadrant.com
On Monday 21 December 2009 16:38:07 Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Andres Freund wrote:
The logic behind this seems fine except in the case of dropping a
database. There you very well might have a open connection without an
open snapshot.Perhaps the simplest fix is to ensure that drop database gets a snapshot?
I confess to not having followed the thread closely, but why is DROP
DATABASE special in this regard? Wouldn't we soon find ourselves
needing every utility command to take a snapshot?
Because most other "entities" are locked when you access them. So on the
standby the AccessExlusive (generated on the master) will conflict with
whatever lock you currently have on that entity (on the slave).
There are no locks for an idle session though.
Andres
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.
Actually its less simply than I had thought at first - I don't think the code
ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit sparse...
The whole conflict resolution builds on the concept of waiting for an VXid, but
an idle backend does not have a valid vxid. Thats correct, right?
Sure, the code should be modifyable to handle that code mostly transparently
(simply ignoring a invalid localTransactionId when the database id is valid),
but ...
I am inclined to just unconditionally kill the users of the database. Its not
like that would be an issue in production. I cant see a case where its
important to run a session to its end on a database which was dropped on the
master.
Opinions on that?
Andres
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.Actually its less simply than I had thought at first - I don't think the code
ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit sparse...
The whole conflict resolution builds on the concept of waiting for an VXid, but
an idle backend does not have a valid vxid. Thats correct, right?
Yes, that's correct. I'll take this one back then.
Sure, the code should be modifyable to handle that code mostly transparently
(simply ignoring a invalid localTransactionId when the database id is valid),
but ...I am inclined to just unconditionally kill the users of the database. Its not
like that would be an issue in production. I cant see a case where its
important to run a session to its end on a database which was dropped on the
master.
Opinions on that?
I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.
--
Simon Riggs www.2ndQuadrant.com
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.Actually its less simply than I had thought at first - I don't think the
code ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit
sparse... The whole conflict resolution builds on the concept of waiting
for an VXid, but an idle backend does not have a valid vxid. Thats
correct, right?Yes, that's correct. I'll take this one back then.
So youre writing a fix or shall I?
Andres
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.Actually its less simply than I had thought at first - I don't think the
code ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit
sparse... The whole conflict resolution builds on the concept of waiting
for an VXid, but an idle backend does not have a valid vxid. Thats
correct, right?I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.
So here is a small patch implementing that behaviour.
Andres
Attachments:
0001-Fix-the-bug-in-HS-that-it-is-possible-to-stay-connec.patchtext/x-patch; charset=UTF-8; name=0001-Fix-the-bug-in-HS-that-it-is-possible-to-stay-connec.patchDownload
From 36f57e6f6b1d25d7433d9706682e53cc79b45a61 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 27 Dec 2009 19:14:21 +0100
Subject: [PATCH] Fix the bug in HS that it is possible to stay connected to a database
on a slave after it has been deleted.
Instead immediately kill sessions using such a database - it is
unlikely that those sessions will quit and its unlikely as well that
its a significant issue to get thrown of a database which is dropped
on the master.
---
src/backend/commands/dbcommands.c | 22 ++++++++--------------
src/backend/storage/ipc/procarray.c | 25 +++++++++++++++++++++++++
src/include/storage/procarray.h | 1 +
3 files changed, 34 insertions(+), 14 deletions(-)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 316222f..85d1d64 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1945,22 +1945,16 @@ dbase_redo(XLogRecPtr lsn, XLogRecord *record)
if (InHotStandby)
{
- VirtualTransactionId *database_users;
-
/*
- * Find all users connected to this database and ask them
- * politely to immediately kill their sessions before processing
- * the drop database record, after the usual grace period.
- * We don't wait for commit because drop database is
- * non-transactional.
+ * We dont do the usual GetConflictingVirtualXIDs/
+ * ResolveRecoveryConflictWithVirutalXIDs dance here because we are
+ * not only conflicting against concurrently running transactions
+ * but against concurrently running backends using this database.
+ *
+ * We do not wait for those sessions to exit by free will because
+ * quite likely that will never happen. So force them.
*/
- database_users = GetConflictingVirtualXIDs(InvalidTransactionId,
- xlrec->db_id,
- false);
-
- ResolveRecoveryConflictWithVirtualXIDs(database_users,
- "drop database",
- CONFLICT_MODE_FATAL);
+ CancelDBBackends(xlrec->db_id);
}
/* Drop pages for this database that are in the shared buffer cache */
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 8e2de35..07c7899 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1827,6 +1827,31 @@ CountDBBackends(Oid databaseid)
}
/*
+ * CancelDBBackends --- cancel backends that are using specified database
+ */
+void
+CancelDBBackends(Oid databaseid)
+{
+ ProcArrayStruct *arrayP = procArray;
+ int index;
+
+ /* tell all backends to die */
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ for (index = 0; index < arrayP->numProcs; index++)
+ {
+ volatile PGPROC *proc = arrayP->procs[index];
+
+ if (proc->databaseId == databaseid){
+ if (proc->recoveryConflictMode < CONFLICT_MODE_FATAL){
+ proc->recoveryConflictMode = CONFLICT_MODE_FATAL;
+ }
+ kill(proc->pid, SIGINT);
+ }
+ }
+ LWLockRelease(ProcArrayLock);
+}
+
+/*
* CountUserBackends --- count backends that are used by specified user
*/
int
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 1cee639..707396b 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -63,6 +63,7 @@ extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid,
extern int CountActiveBackends(void);
extern int CountDBBackends(Oid databaseid);
+extern void CancelDBBackends(Oid databaseid);
extern int CountUserBackends(Oid roleid);
extern bool CountOtherDBBackends(Oid databaseId,
int *nbackends, int *nprepared);
--
1.6.5.12.gd65df24
On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.Actually its less simply than I had thought at first - I don't think the
code ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit
sparse... The whole conflict resolution builds on the concept of waiting
for an VXid, but an idle backend does not have a valid vxid. Thats
correct, right?I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.So here is a small patch implementing that behaviour.
OK, thanks. I have also written something, as mentioned. Will review.
--
Simon Riggs www.2ndQuadrant.com
On Sunday 27 December 2009 21:04:43 Simon Riggs wrote:
On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect
Andres to be able to fix this with a simple patch that would not
effect the case of normal running.Actually its less simply than I had thought at first - I don't think
the code ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit
sparse... The whole conflict resolution builds on the concept of
waiting for an VXid, but an idle backend does not have a valid vxid.
Thats correct, right?I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.So here is a small patch implementing that behaviour.
OK, thanks. I have also written something, as mentioned. Will review.
Thats why I had asked in another mail ;-) But I have learned a bit more about
pg while writing that patch so its fine for me at least ;-)
Andres
On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.Actually its less simply than I had thought at first - I don't think the
code ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit
sparse... The whole conflict resolution builds on the concept of waiting
for an VXid, but an idle backend does not have a valid vxid. Thats
correct, right?I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.So here is a small patch implementing that behaviour.
I've committed a slightly modified form of this patch.
It was an outstanding bug, so delaying fix at this stage not worth it. I
had in mind a slightly grander fix, but it's hardly a priority to polish
the chrome on this one.
Thanks for the bug report and fix.
--
Simon Riggs www.2ndQuadrant.com
On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.Actually its less simply than I had thought at first - I don't think the
code ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit
sparse... The whole conflict resolution builds on the concept of waiting
for an VXid, but an idle backend does not have a valid vxid. Thats
correct, right?I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.So here is a small patch implementing that behaviour.
On further testing, I received a re-connection from an automatic session
retry. That shouldn't have happened, but it indicates the need for
locking around the conflict handler. I had understood that to be placed
elsewhere but that seems wrong now.
This is a low priority item, so looking for a quick fix to allow time on
other areas.
Any objections?
--
Simon Riggs www.2ndQuadrant.com
Attachments:
db_conflict_locking.patchtext/x-patch; charset=UTF-8; name=db_conflict_locking.patchDownload
*** a/src/backend/commands/dbcommands.c
--- b/src/backend/commands/dbcommands.c
***************
*** 1944,1950 **** dbase_redo(XLogRecPtr lsn, XLogRecord *record)
--- 1944,1958 ----
dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
if (InHotStandby)
+ {
+ /*
+ * Lock database while we resolve conflicts to ensure that InitPostgres()
+ * cannot fully re-execute concurrently. This avoids backends re-connecting
+ * automatically to same database, which can happen in some cases.
+ */
+ LockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
ResolveRecoveryConflictWithDatabase(xlrec->db_id);
+ }
/* Drop pages for this database that are in the shared buffer cache */
DropDatabaseBuffers(xlrec->db_id);
***************
*** 1960,1965 **** dbase_redo(XLogRecPtr lsn, XLogRecord *record)
--- 1968,1984 ----
ereport(WARNING,
(errmsg("some useless files may be left behind in old database directory \"%s\"",
dst_path)));
+
+ if (InHotStandby)
+ {
+ /*
+ * Release locks prior to commit. XX There is a race condition here that may allow
+ * backends to reconnect, but the window for this is small because the gap between
+ * here and commit is mostly fairly small and it is unlikely that people will be
+ * dropping databases that we are trying to connect to anyway.
+ */
+ UnlockSharedObjectForSession(DatabaseRelationId, xlrec->db_id, 0, AccessExclusiveLock);
+ }
}
else
elog(PANIC, "dbase_redo: unknown op code %u", info);