pgsql: Extract catalog info for error reporting before an error actually

Started by Nonameabout 18 years ago29 messages
#1Noname
alvherre@postgresql.org

Log Message:
-----------
Extract catalog info for error reporting before an error actually happens.
Also, remove redundant reset of for-wraparound PGPROC flag.

Thanks to Tom Lane for noticing both bogosities.

Modified Files:
--------------
pgsql/src/backend/postmaster:
autovacuum.c (r1.63 -> r1.64)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/postmaster/autovacuum.c?r1=1.63&r2=1.64)

#2Simon Riggs
simon@2ndquadrant.com
In reply to: Noname (#1)
Re: pgsql: Extract catalog info for error reporting before an error actually

On Thu, 2007-10-25 at 14:45 +0000, Alvaro Herrera wrote:

Log Message:
-----------
Extract catalog info for error reporting before an error actually happens.
Also, remove redundant reset of for-wraparound PGPROC flag.

Just noticed you've made these changes. I was working on them, but
hadn't fully tested the patch because of all the different touch points.
Sorry for the delay.

Would you like me to refresh my earlier patch against the newly
committed state (or did you commit that aspect already as well?).

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#3Alvaro Herrera
alvherre@commandprompt.com
In reply to: Simon Riggs (#2)
Re: pgsql: Extract catalog info for error reporting before an error actually

Simon Riggs wrote:

On Thu, 2007-10-25 at 14:45 +0000, Alvaro Herrera wrote:

Log Message:
-----------
Extract catalog info for error reporting before an error actually happens.
Also, remove redundant reset of for-wraparound PGPROC flag.

Just noticed you've made these changes. I was working on them, but
hadn't fully tested the patch because of all the different touch points.
Sorry for the delay.

Would you like me to refresh my earlier patch against the newly
committed state (or did you commit that aspect already as well?).

I am doing just that. I'll post it soon.

FWIW I disagree with cancelling just any autovac work automatically; in
my patch I'm only cancelling if it's analyze, on the grounds that if
you have really bad luck you can potentially lose a lot of work that
vacuum did. We can relax this restriction when we have cancellable
vacuum.

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"La principal caracter�stica humana es la tonter�a"
(Augusto Monterroso)

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Alvaro Herrera (#3)
Re: pgsql: Extract catalog info for error reporting before an error actually

On Thu, 2007-10-25 at 13:41 -0300, Alvaro Herrera wrote:

Simon Riggs wrote:

On Thu, 2007-10-25 at 14:45 +0000, Alvaro Herrera wrote:

Log Message:
-----------
Extract catalog info for error reporting before an error actually happens.
Also, remove redundant reset of for-wraparound PGPROC flag.

Just noticed you've made these changes. I was working on them, but
hadn't fully tested the patch because of all the different touch points.
Sorry for the delay.

Would you like me to refresh my earlier patch against the newly
committed state (or did you commit that aspect already as well?).

I am doing just that. I'll post it soon.

OK thanks.

FWIW I disagree with cancelling just any autovac work automatically; in
my patch I'm only cancelling if it's analyze, on the grounds that if
you have really bad luck you can potentially lose a lot of work that
vacuum did. We can relax this restriction when we have cancellable
vacuum.

That was requested by others, not myself, but I did agree with the
conclusions. The other bad luck might be that you don't complete some
critical piece of work in the available time window because an automated
job kicked in.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#5Michael Paesold
mpaesold@gmx.at
In reply to: Simon Riggs (#4)
Re: [HACKERS] Re: pgsql: Extract catalog info for error reporting before an error actually

Simon Riggs wrote:

On Thu, 2007-10-25 at 13:41 -0300, Alvaro Herrera wrote:

...

FWIW I disagree with cancelling just any autovac work automatically; in
my patch I'm only cancelling if it's analyze, on the grounds that if
you have really bad luck you can potentially lose a lot of work that
vacuum did. We can relax this restriction when we have cancellable
vacuum.

That was requested by others, not myself, but I did agree with the
conclusions. The other bad luck might be that you don't complete some
critical piece of work in the available time window because an automated
job kicked in.

Yeah, I thought we had agreed that we must cancel all auto
vacuum/analyzes, on the ground that foreground operations are usually
more important than maintenance tasks. Remember the complaint we already
had on hackers just after beta1: auto *vacuum* blocked a schema change,
and of course the user complained.

Best Regards
Michael Paesold

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Michael Paesold (#5)
Re: [HACKERS] Re: pgsql: Extract catalog info for error reporting before an error actually

Michael Paesold wrote:

Simon Riggs wrote:

On Thu, 2007-10-25 at 13:41 -0300, Alvaro Herrera wrote:

...

FWIW I disagree with cancelling just any autovac work automatically; in
my patch I'm only cancelling if it's analyze, on the grounds that if
you have really bad luck you can potentially lose a lot of work that
vacuum did. We can relax this restriction when we have cancellable
vacuum.

That was requested by others, not myself, but I did agree with the
conclusions. The other bad luck might be that you don't complete some
critical piece of work in the available time window because an automated
job kicked in.

Yeah, I thought we had agreed that we must cancel all auto vacuum/analyzes,
on the ground that foreground operations are usually more important than
maintenance tasks.

What this means is that autovacuum will be starved a lot of the time,
and in the end you will only vacuum the tables when you run out of time
for Xid wraparound.

Remember the complaint we already had on hackers just after beta1:
auto *vacuum* blocked a schema change, and of course the user
complained.

Actually I can't remember it, but I think we should decree that this is
a known shortcoming; that we will fix it when we have cancellable
vacuum; and that the user is free to cancel the vacuuming on his own if
he so decides.

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"The ability to monopolize a planet is insignificant
next to the power of the source"

#7Michael Paesold
mpaesold@gmx.at
In reply to: Alvaro Herrera (#6)
Re: [HACKERS] Re: pgsql: Extract catalog info for error reporting before an error actually

Alvaro Herrera wrote:

Michael Paesold wrote:

Simon Riggs wrote:

On Thu, 2007-10-25 at 13:41 -0300, Alvaro Herrera wrote:

...

FWIW I disagree with cancelling just any autovac work automatically; in
my patch I'm only cancelling if it's analyze, on the grounds that if
you have really bad luck you can potentially lose a lot of work that
vacuum did. We can relax this restriction when we have cancellable
vacuum.

That was requested by others, not myself, but I did agree with the
conclusions. The other bad luck might be that you don't complete some
critical piece of work in the available time window because an automated
job kicked in.

Yeah, I thought we had agreed that we must cancel all auto vacuum/analyzes,
on the ground that foreground operations are usually more important than
maintenance tasks.

What this means is that autovacuum will be starved a lot of the time,
and in the end you will only vacuum the tables when you run out of time
for Xid wraparound.

Well, only if you do a lot of schema changes. In the previous
discussion, Simon and me agreed that schema changes should not happen on
a regular basis on production systems.

Shouldn't we rather support the regular usage pattern instead of the
uncommon one? Users doing a lot of schema changes are the ones who
should have to work around issues, not those using a DBMS sanely. No?

Best Regards
Michael Paesold

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: Re: [COMMITTERS] pgsql: Extract catalog info for error reporting before an error actually

Alvaro Herrera <alvherre@commandprompt.com> writes:

Michael Paesold wrote:

Yeah, I thought we had agreed that we must cancel all auto vacuum/analyzes,
on the ground that foreground operations are usually more important than
maintenance tasks.

What this means is that autovacuum will be starved a lot of the time,

Not really, because DDL changes aren't *that* common (at least not for
non-temp tables). I think the consensus was that we should cancel
autovac in these cases unless it is an anti-wraparound vacuum. Isn't
that why you were putting in the flag to show it is for wraparound?

regards, tom lane

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paesold (#7)
Re: [HACKERS] Re: pgsql: Extract catalog info for error reporting before an error actually

Michael Paesold wrote:

In the previous discussion, Simon and me agreed that schema changes
should not happen on a regular basis on production systems.

Shouldn't we rather support the regular usage pattern instead of the
uncommon one? Users doing a lot of schema changes are the ones who
should have to work around issues, not those using a DBMS sanely. No?

Unfortunately, doing lots of schema changes is a very common phenomenon.
It makes me uncomfortable too, but saying that those who do it have to
work around issues isn't acceptable IMNSHO - it's far too widely done.

cheers

andrew

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#9)
Re: Re: [COMMITTERS] pgsql: Extract catalog info for error reporting before an error actually

Andrew Dunstan <andrew@dunslane.net> writes:

Michael Paesold wrote:

Shouldn't we rather support the regular usage pattern instead of the
uncommon one? Users doing a lot of schema changes are the ones who
should have to work around issues, not those using a DBMS sanely. No?

Unfortunately, doing lots of schema changes is a very common phenomenon.
It makes me uncomfortable too, but saying that those who do it have to
work around issues isn't acceptable IMNSHO - it's far too widely done.

Well, there's going to be pain *somewhere* here, and we already know
that users will find the current 8.3 behavior unacceptable. I'd rather
have autovacuum not make progress than have users turn it off because it
gets in their way too much. Which I think is exactly what will happen
if we ship it with the current behavior.

regards, tom lane

#11Simon Riggs
simon@2ndquadrant.com
In reply to: Andrew Dunstan (#9)
Re: [HACKERS] Re: pgsql: Extract catalog info for error reporting before an error actually

On Thu, 2007-10-25 at 13:51 -0400, Andrew Dunstan wrote:

Michael Paesold wrote:

In the previous discussion, Simon and me agreed that schema changes
should not happen on a regular basis on production systems.

Shouldn't we rather support the regular usage pattern instead of the
uncommon one? Users doing a lot of schema changes are the ones who
should have to work around issues, not those using a DBMS sanely. No?

Unfortunately, doing lots of schema changes is a very common phenomenon.
It makes me uncomfortable too, but saying that those who do it have to
work around issues isn't acceptable IMNSHO - it's far too widely done.

We didn't agree that DDL was uncommon, we agreed that running DDL was
more important than running an auto VACUUM. DDL runs very quickly,
unless blocked, though holds up everybody else. So you must run it at
pre-planned windows. VACUUMs can run at any time, so a autoVACUUM
shouldn't be allowed to prevent DDL from running. The queuing DDL makes
other requests queue behind it, even ones that would normally have been
able to execute at same time as the VACUUM.

Anyway, we covered all this before. I started off saying we shouldn't do
this and Heikki and Michael came up with convincing arguments, for me,
so now I think we should allow autovacuums to be cancelled.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: Re: [COMMITTERS] pgsql: Extract catalog info for error reporting before an error actually

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Michael Paesold wrote:

Shouldn't we rather support the regular usage pattern instead of the
uncommon one? Users doing a lot of schema changes are the ones who
should have to work around issues, not those using a DBMS sanely. No?

Unfortunately, doing lots of schema changes is a very common phenomenon.
It makes me uncomfortable too, but saying that those who do it have to
work around issues isn't acceptable IMNSHO - it's far too widely done.

Well, there's going to be pain *somewhere* here, and we already know
that users will find the current 8.3 behavior unacceptable. I'd rather
have autovacuum not make progress than have users turn it off because it
gets in their way too much. Which I think is exactly what will happen
if we ship it with the current behavior.

Agreed. If auto-vacuum is blocking activity, it isn't very 'auto' to
me. If vacuum is blocking something, by definition it is a bad time for
it to be vacuuming and it should abort and try again later.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Simon Riggs (#11)
Re: [HACKERS] Re: pgsql: Extract catalog info for error reporting before an error actually

Simon Riggs wrote:

On Thu, 2007-10-25 at 13:51 -0400, Andrew Dunstan wrote:

Michael Paesold wrote:

In the previous discussion, Simon and me agreed that schema changes
should not happen on a regular basis on production systems.

Shouldn't we rather support the regular usage pattern instead of the
uncommon one? Users doing a lot of schema changes are the ones who
should have to work around issues, not those using a DBMS sanely. No?

Unfortunately, doing lots of schema changes is a very common phenomenon.
It makes me uncomfortable too, but saying that those who do it have to
work around issues isn't acceptable IMNSHO - it's far too widely done.

We didn't agree that DDL was uncommon, we agreed that running DDL was
more important than running an auto VACUUM. DDL runs very quickly,
unless blocked, though holds up everybody else. So you must run it at
pre-planned windows. VACUUMs can run at any time, so a autoVACUUM
shouldn't be allowed to prevent DDL from running. The queuing DDL makes
other requests queue behind it, even ones that would normally have been
able to execute at same time as the VACUUM.

Anyway, we covered all this before. I started off saying we shouldn't do
this and Heikki and Michael came up with convincing arguments, for me,
so now I think we should allow autovacuums to be cancelled.

Perhaps I misunderstood, or have been mistunderstood :-) - I am actually
agreeing that autovac should not block DDL.

cheers

andrew

#14Alvaro Herrera
alvherre@commandprompt.com
In reply to: Simon Riggs (#2)
1 attachment(s)
Autovacuum cancellation

Simon Riggs wrote:

Just noticed you've made these changes. I was working on them, but
hadn't fully tested the patch because of all the different touch points.
Sorry for the delay.

Would you like me to refresh my earlier patch against the newly
committed state (or did you commit that aspect already as well?).

Patch attached, please comment. It only avoids cancelling when the
vacuum is for wraparound.

What we're missing here is doc updates (mainly to lmgr/README, I think)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachments:

autovac-cancel.patchtext/x-diff; charset=us-asciiDownload
Index: src/backend/storage/lmgr/deadlock.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.48
diff -c -p -r1.48 deadlock.c
*** src/backend/storage/lmgr/deadlock.c	19 Jun 2007 20:13:21 -0000	1.48
--- src/backend/storage/lmgr/deadlock.c	25 Oct 2007 20:06:36 -0000
*************** static int	maxPossibleConstraints;
*** 109,114 ****
--- 109,117 ----
  static DEADLOCK_INFO *deadlockDetails;
  static int	nDeadlockDetails;
  
+ /* PGPROC pointer of any blocking av worker found */
+ static PGPROC *blocking_autovacuum_proc = NULL; 
+ 
  
  /*
   * InitDeadLockChecking -- initialize deadlock checker during backend startup
*************** DeadLockCheck(PGPROC *proc)
*** 206,211 ****
--- 209,217 ----
  	nPossibleConstraints = 0;
  	nWaitOrders = 0;
  
+ 	/* Initialize to not blocked by an autovacuum worker */
+ 	blocking_autovacuum_proc = NULL;
+ 
  	/* Search for deadlocks and possible fixes */
  	if (DeadLockCheckRecurse(proc))
  	{
*************** DeadLockCheck(PGPROC *proc)
*** 255,265 ****
--- 261,289 ----
  	/* Return code tells caller if we had to escape a deadlock or not */
  	if (nWaitOrders > 0)
  		return DS_SOFT_DEADLOCK;
+ 	else if (blocking_autovacuum_proc != NULL)
+ 		return DS_BLOCKED_BY_AUTOVACUUM;
  	else
  		return DS_NO_DEADLOCK;
  }
  
  /*
+  * Return the PGPROC of the autovacuum that's blocking a process.
+  *
+  * We reset the saved pointer as soon as we pass it back.
+  */
+ PGPROC *
+ GetBlockingAutoVacuumPgproc(void)
+ {
+ 	PGPROC	*ptr;
+ 
+ 	ptr = blocking_autovacuum_proc;
+ 	blocking_autovacuum_proc = NULL;
+ 
+ 	return ptr;
+ }
+ 
+ /*
   * DeadLockCheckRecurse -- recursively search for valid orderings
   *
   * curConstraints[] holds the current set of constraints being considered
*************** FindLockCycleRecurse(PGPROC *checkProc,
*** 497,502 ****
--- 521,534 ----
  				if ((proclock->holdMask & LOCKBIT_ON(lm)) &&
  					(conflictMask & LOCKBIT_ON(lm)))
  				{
+ 					/*
+ 					 * Look for a blocking autovacuum. There will only ever
+ 					 * be one, since the autovacuum workers are careful
+ 					 * not to operate concurrently on the same table. 
+ 					 */
+ 					if (proc->vacuumFlags & PROC_IS_AUTOVACUUM)
+ 						blocking_autovacuum_proc = proc;
+ 
  					/* This proc hard-blocks checkProc */
  					if (FindLockCycleRecurse(proc, depth + 1,
  											 softEdges, nSoftEdges))
Index: src/backend/storage/lmgr/proc.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/lmgr/proc.c,v
retrieving revision 1.195
diff -c -p -r1.195 proc.c
*** src/backend/storage/lmgr/proc.c	24 Oct 2007 20:55:36 -0000	1.195
--- src/backend/storage/lmgr/proc.c	25 Oct 2007 20:03:58 -0000
*************** ProcSleep(LOCALLOCK *locallock, LockMeth
*** 734,739 ****
--- 734,740 ----
  	PROC_QUEUE *waitQueue = &(lock->waitProcs);
  	LOCKMASK	myHeldLocks = MyProc->heldLocks;
  	bool		early_deadlock = false;
+ 	bool 		allow_autovacuum_cancel = true;
  	int			myWaitStatus;
  	PGPROC	   *proc;
  	int			i;
*************** ProcSleep(LOCALLOCK *locallock, LockMeth
*** 894,899 ****
--- 895,949 ----
  		myWaitStatus = MyProc->waitStatus;
  
  		/*
+ 		 * If we are not deadlocked, but are waiting on an autovacuum-induced
+ 		 * task, send a signal to interrupt it.  
+ 		 */
+ 		if (deadlock_state == DS_BLOCKED_BY_AUTOVACUUM && allow_autovacuum_cancel)
+ 		{
+ 			PGPROC	*autovac = GetBlockingAutoVacuumPgproc();
+ 
+ 			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ 
+ 			/*
+ 			 * only do it if the worker is not working to protect against Xid
+ 			 * wraparound 
+ 			 */
+ 			if ((autovac != NULL) &&
+ 				!(autovac->vacuumFlags & PROC_VACUUM_FOR_WRAPAROUND))
+ 			{
+ 				int		pid = autovac->pid;
+ 
+ 				elog(DEBUG2, "sending cancel to blocking autovacuum pid = %d",
+ 					 pid);
+ 
+ 				/* don't hold the lock across the kill() syscall */
+ 				LWLockRelease(ProcArrayLock);
+ 
+ 				/*
+ 				 * Send the autovacuum worker Back to Old Kent Road
+ 				 *
+ 				 * If we have setsid(), signal the backend's whole process group 
+ 				 */
+ #ifdef HAVE_SETSID
+ 				if (kill(-pid, SIGINT))
+ #else
+ 					if (kill(pid, SIGINT))
+ #endif
+ 					{
+ 						/* Just a warning to allow multiple callers */
+ 						ereport(WARNING,
+ 								(errmsg("could not send signal to process %d: %m",
+ 										pid)));
+ 					}
+ 			}
+ 			else
+ 				LWLockRelease(ProcArrayLock);
+ 
+ 			/* prevent signal from being resent more than once */
+ 			allow_autovacuum_cancel = false;
+ 		}
+ 
+ 		/*
  		 * If awoken after the deadlock check interrupt has run, and
  		 * log_lock_waits is on, then report about the wait.
  		 */
*************** CheckDeadLock(void)
*** 1189,1201 ****
  		 * RemoveFromWaitQueue took care of waking up any such processes.
  		 */
  	}
! 	else if (log_lock_waits)
  	{
  		/*
  		 * Unlock my semaphore so that the interrupted ProcSleep() call can
  		 * print the log message (we daren't do it here because we are inside
  		 * a signal handler).  It will then sleep again until someone
  		 * releases the lock.
  		 */
  		PGSemaphoreUnlock(&MyProc->sem);
  	}
--- 1239,1254 ----
  		 * RemoveFromWaitQueue took care of waking up any such processes.
  		 */
  	}
! 	else if (log_lock_waits || deadlock_state == DS_BLOCKED_BY_AUTOVACUUM)
  	{
  		/*
  		 * Unlock my semaphore so that the interrupted ProcSleep() call can
  		 * print the log message (we daren't do it here because we are inside
  		 * a signal handler).  It will then sleep again until someone
  		 * releases the lock.
+ 		 *
+ 		 * If blocked by autovacuum, this wakeup will enable ProcSleep to send
+ 		 * the cancelling signal to the autovacuum worker.
  		 */
  		PGSemaphoreUnlock(&MyProc->sem);
  	}
Index: src/include/storage/lock.h
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/include/storage/lock.h,v
retrieving revision 1.107
diff -c -p -r1.107 lock.h
*** src/include/storage/lock.h	5 Sep 2007 18:10:48 -0000	1.107
--- src/include/storage/lock.h	25 Oct 2007 17:11:02 -0000
*************** typedef enum
*** 442,448 ****
  	DS_NOT_YET_CHECKED,			/* no deadlock check has run yet */
  	DS_NO_DEADLOCK,				/* no deadlock detected */
  	DS_SOFT_DEADLOCK,			/* deadlock avoided by queue rearrangement */
! 	DS_HARD_DEADLOCK			/* deadlock, no way out but ERROR */
  } DeadLockState;
  
  
--- 442,449 ----
  	DS_NOT_YET_CHECKED,			/* no deadlock check has run yet */
  	DS_NO_DEADLOCK,				/* no deadlock detected */
  	DS_SOFT_DEADLOCK,			/* deadlock avoided by queue rearrangement */
! 	DS_HARD_DEADLOCK,			/* deadlock, no way out but ERROR */
! 	DS_BLOCKED_BY_AUTOVACUUM	/* queue blocked by autovacuum worker */
  } DeadLockState;
  
  
*************** extern void lock_twophase_postabort(Tran
*** 495,500 ****
--- 496,502 ----
  						void *recdata, uint32 len);
  
  extern DeadLockState DeadLockCheck(PGPROC *proc);
+ extern PGPROC *GetBlockingAutoVacuumPgproc(void);
  extern void DeadLockReport(void);
  extern void RememberSimpleDeadLock(PGPROC *proc1,
  					   LOCKMODE lockmode,
#15Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#10)
Re: Re: [COMMITTERS] pgsql: Extract catalog info for error reporting before an error actually

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Michael Paesold wrote:

Shouldn't we rather support the regular usage pattern instead of the
uncommon one? Users doing a lot of schema changes are the ones who
should have to work around issues, not those using a DBMS sanely. No?

Unfortunately, doing lots of schema changes is a very common phenomenon.
It makes me uncomfortable too, but saying that those who do it have to
work around issues isn't acceptable IMNSHO - it's far too widely done.

Well, there's going to be pain *somewhere* here, and we already know
that users will find the current 8.3 behavior unacceptable. I'd rather
have autovacuum not make progress than have users turn it off because it
gets in their way too much. Which I think is exactly what will happen
if we ship it with the current behavior.

exactly - 8.3 will be the first release with autovac enabled by default
(and concurrent autovacuuming) and we really need to make sure that
people wont get surprised by it in the way I (and others) did when
playing with Beta1.
So my vote would be on cancelling always except in the case of a
wraparound vacuum.

Stefan

#16David Fetter
david@fetter.org
In reply to: Andrew Dunstan (#13)
Re: [HACKERS] Re: pgsql: Extract catalog info for error reporting before an error actually

On Thu, Oct 25, 2007 at 03:54:28PM -0400, Andrew Dunstan wrote:

Simon Riggs wrote:

On Thu, 2007-10-25 at 13:51 -0400, Andrew Dunstan wrote:

Michael Paesold wrote:

In the previous discussion, Simon and me agreed that schema
changes should not happen on a regular basis on production
systems.

Shouldn't we rather support the regular usage pattern instead of
the uncommon one? Users doing a lot of schema changes are the
ones who should have to work around issues, not those using a
DBMS sanely. No?

Unfortunately, doing lots of schema changes is a very common
phenomenon. It makes me uncomfortable too, but saying that those
who do it have to work around issues isn't acceptable IMNSHO -
it's far too widely done.

We didn't agree that DDL was uncommon, we agreed that running DDL
was more important than running an auto VACUUM. DDL runs very
quickly, unless blocked, though holds up everybody else. So you
must run it at pre-planned windows. VACUUMs can run at any time, so
a autoVACUUM shouldn't be allowed to prevent DDL from running. The
queuing DDL makes other requests queue behind it, even ones that
would normally have been able to execute at same time as the
VACUUM.

Anyway, we covered all this before. I started off saying we
shouldn't do this and Heikki and Michael came up with convincing
arguments, for me, so now I think we should allow autovacuums to be
cancelled.

Perhaps I misunderstood, or have been mistunderstood :-) - I am
actually agreeing that autovac should not block DDL.

+1 here for having autovacuum not block DDL :)

Cheers,
David (for what it's worth)
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#14)
Re: Autovacuum cancellation

Alvaro Herrera <alvherre@commandprompt.com> writes:

Patch attached, please comment. It only avoids cancelling when the
vacuum is for wraparound.

I'm not entirely convinced that there can be only one autovac proc in
the portion of the waits-for graph explored by DeadlockCheck. If there
is more than one, we'll cancel a random one of them, which seems OK ---
but the comment added to FindLockCycleRecurse is bogus.

I thought about suggesting that we test PROC_VACUUM_FOR_WRAPAROUND
before setting blocking_autovacuum_proc at all, but I guess the reason
you don't do that is you don't want to take ProcArrayLock there (and we
decided it was unsafe to check the bit without the lock). So the other
thing that comment block needs is a note that it seems OK to check
PROC_IS_AUTOVACUUM without the lock, because it never changes for an
existing PGPROC, but not PROC_VACUUM_FOR_WRAPAROUND.

Otherwise it looks OK --- a bit ugly but I don't have a better idea.

There's some things still to be desired here: if an autovac process is
involved in a hard deadlock, the patch doesn't favor zapping it over
anybody else, nor consider cancelling the autovac as an alternative to
rearranging queues for a soft deadlock. But dealing with that will open
cans of worms that I don't think we want to open for 8.3.

regards, tom lane

#18Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#17)
Re: Autovacuum cancellation

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

There's some things still to be desired here: if an autovac process is
involved in a hard deadlock, the patch doesn't favor zapping it over
anybody else, nor consider cancelling the autovac as an alternative to
rearranging queues for a soft deadlock. But dealing with that will open
cans of worms that I don't think we want to open for 8.3.

Can autovacuum actually get into a hard deadlock? Does it take more than one
lock that can block others at the same time?

I think there's a window where the process waiting directly on autovacuum
could have already fired its deadlock check before it was waiting directly on
autovacuum. But the only way I can see it happening is if another process is
cancelled before its deadlock check fires and the signals are processed out of
order. I'm not sure that's a case we really need to worry about.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#18)
Re: Autovacuum cancellation

Gregory Stark <stark@enterprisedb.com> writes:

Can autovacuum actually get into a hard deadlock?

It can certainly be part of a deadlock loop, though the practical cases
might be few. It will be holding more than one lock, eg a lock on its
target table and various transient locks on system catalogs, and other
processes taking or trying for exclusive locks on those things could
create issues. I think it's OK to leave the issue go for now, until
we see if anyone hits it in practice --- I just wanted to mention that
we might need to consider the problem in future.

I think there's a window where the process waiting directly on
autovacuum could have already fired its deadlock check before it was
waiting directly on autovacuum.

I think you don't understand what that code is doing. If there's an
autovac anywhere in the dependency graph, it'll find it.

regards, tom lane

#20Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#19)
Re: Autovacuum cancellation

Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

I think there's a window where the process waiting directly on
autovacuum could have already fired its deadlock check before it was
waiting directly on autovacuum.

I think you don't understand what that code is doing. If there's an
autovac anywhere in the dependency graph, it'll find it.

Thanks for making that explicit, I was trying to build a test case where
it would lock :-)

(If we had concurrent psql we could even have it in the regression tests
...)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#21Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Alvaro Herrera (#14)
Re: Autovacuum cancellation

Alvaro Herrera wrote:

/*
* Look for a blocking autovacuum. There will only ever
* be one, since the autovacuum workers are careful
* not to operate concurrently on the same table.
*/

I think that's a bit unaccurate. You could have multiple autovacuum
workers operating on different tables participating in a deadlock. The
reason that can't happen is that autovacuum never holds a lock while
waiting for another.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#22Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#19)
Re: Autovacuum cancellation

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

I think there's a window where the process waiting directly on
autovacuum could have already fired its deadlock check before it was
waiting directly on autovacuum.

I think you don't understand what that code is doing. If there's an
autovac anywhere in the dependency graph, it'll find it.

That'll teach me to try to read code from a patch directly without trying to
apply it or at least read the original source next to it. I thought I had seen
this code recently enough to apply the patch from memory -- clearly not.

I assume the right thing happens if multiple deadlock check signals fire for
the same autovacuum?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#21)
Re: Autovacuum cancellation

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Alvaro Herrera wrote:

/*
* Look for a blocking autovacuum. There will only ever
* be one, since the autovacuum workers are careful
* not to operate concurrently on the same table.
*/

I think that's a bit unaccurate. You could have multiple autovacuum
workers operating on different tables participating in a deadlock. The
reason that can't happen is that autovacuum never holds a lock while
waiting for another.

And that's not true either. It may only want low-grade locks (eg
AccessShareLock on a system catalog) but deadlock is nonetheless
entirely possible in principle.

regards, tom lane

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#22)
Re: Autovacuum cancellation

Gregory Stark <stark@enterprisedb.com> writes:

I assume the right thing happens if multiple deadlock check signals fire for
the same autovacuum?

Multiple signals shouldn't be a problem, but late-arriving ones could be.
It might be worth having autovac explicitly clear QueryCancelPending
after it's finished a table, so that a SIGINT sent because of activity
on one table couldn't force cancellation of vacuum on the next one.

regards, tom lane

#25Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#24)
Re: Autovacuum cancellation

Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

I assume the right thing happens if multiple deadlock check signals fire for
the same autovacuum?

Multiple signals shouldn't be a problem, but late-arriving ones could be.
It might be worth having autovac explicitly clear QueryCancelPending
after it's finished a table, so that a SIGINT sent because of activity
on one table couldn't force cancellation of vacuum on the next one.

Ok, committed; I snuck that in as well, but I'm not sure how to test
that it works.

I adjusted the comments -- I think they're more correct now. I also
added a puny paragraph to lmgr/README.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#26Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#17)
Re: Autovacuum cancellation

On Thu, 2007-10-25 at 17:35 -0400, Tom Lane wrote:

There's some things still to be desired here: if an autovac process is
involved in a hard deadlock, the patch doesn't favor zapping it over
anybody else, nor consider cancelling the autovac as an alternative to
rearranging queues for a soft deadlock. But dealing with that will
open cans of worms that I don't think we want to open for 8.3.

I did look at doing that but decided it would not be appropriate to do
that in all cases. i.e. there are hard deadlock cases where the autovac
can be the head of the lock queue and yet a deadlock still exists
between two other processes. The deadlock detector doesn't get called
twice for the same deadlock, so it wasn't possible to speculatively do
that and then re-catch it second time around.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#27Simon Riggs
simon@2ndquadrant.com
In reply to: Heikki Linnakangas (#21)
Re: Autovacuum cancellation

On Fri, 2007-10-26 at 10:32 +0100, Heikki Linnakangas wrote:

Alvaro Herrera wrote:

/*
* Look for a blocking autovacuum. There will only ever
* be one, since the autovacuum workers are careful
* not to operate concurrently on the same table.
*/

I think that's a bit unaccurate. You could have multiple autovacuum
workers operating on different tables participating in a deadlock. The
reason that can't happen is that autovacuum never holds a lock while
waiting for another.

I wrote that code comment; as you say it is true only when there are at
least 4 processes in the lock graph where 2+ normal backends are
deadlocking and there are 2+ autovacuums holding existing locks. The
comment should have said "If blocking is caused by an autovacuum process
then ... (there will)".

The blocking_autovacuum_proc doesn't react unless there are no hard
deadlocks, so the code works.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#28Simon Riggs
simon@2ndquadrant.com
In reply to: Alvaro Herrera (#25)
Re: Autovacuum cancellation

On Fri, 2007-10-26 at 17:50 -0300, Alvaro Herrera wrote:

Ok, committed; I snuck that in as well, but I'm not sure how to test
that it works.

I've had time to review that now. I didn't reply to your original post
because you'd taken my name off the copy list for some reason and I've
been too busy to read non-addressed mail.

The committed patch is pretty much the same as my original AFAICS.

I'm sure you didn't mean to forget that, but can you please acknowledge
my contribution in CVS? Thanks.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#29Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#27)
Re: Autovacuum cancellation

On Sat, 2007-10-27 at 23:22 +0100, Simon Riggs wrote:

On Fri, 2007-10-26 at 10:32 +0100, Heikki Linnakangas wrote:

Alvaro Herrera wrote:

/*
* Look for a blocking autovacuum. There will only ever
* be one, since the autovacuum workers are careful
* not to operate concurrently on the same table.
*/

I think that's a bit unaccurate. You could have multiple autovacuum
workers operating on different tables participating in a deadlock. The
reason that can't happen is that autovacuum never holds a lock while
waiting for another.

I wrote that code comment; as you say it is true only when there are at
least 4 processes in the lock graph where 2+ normal backends are
deadlocking and there are 2+ autovacuums holding existing locks. The
comment should have said "If blocking is caused by an autovacuum process
then ... (there will)".

Sorry...this should read "as you say it is **not** true".

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com