7.4RC2 PANIC: insufficient room in FSM

Started by Arthur Wardover 22 years ago4 messagesbugs
Jump to latest
#1Arthur Ward
award@dominionsciences.com

I was a bit stunned last night when I found this in the server logs for a
7.4RC2 installation:

Nov 24 20:37:18 x pg_autovacuum: [2003-11-24 08:37:18 PM] Performing:
VACUUM ANALYZE "clients"."x"
Nov 24 20:37:19 x postgres: [13904] PANIC: insufficient room in FSM
Nov 24 20:37:19 x postgres: STATEMENT: VACUUM ANALYZE "clients"."x"

Following this is of course the fallout of backends shutting down and PG
recycling itself with no other problems. Did I miss something along the
way about the FSM needing to be sufficiently large to hold all free pages
no matter what?

I plan to bump up the FSM size anyhow (perhaps tonight I can get some FSM
stats from manually vacuuming), but my gosh, that's some bad behavior for
a presumably minor situation. IMO, that's a significant bug.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arthur Ward (#1)
Re: 7.4RC2 PANIC: insufficient room in FSM

"Arthur Ward" <award@dominionsciences.com> writes:

I was a bit stunned last night when I found this in the server logs for a
7.4RC2 installation:

Nov 24 20:37:18 x pg_autovacuum: [2003-11-24 08:37:18 PM] Performing:
VACUUM ANALYZE "clients"."x"
Nov 24 20:37:19 x postgres: [13904] PANIC: insufficient room in FSM

We have seen reports of similar things in situations where the real
problem was that the lock table had gotten too big --- is it possible
that you had something going on in parallel that would have acquired
lots of locks? If so, raising max_locks_per_transaction should avoid
the problem.

I'll look at whether we couldn't downgrade the failure to something
less than a PANIC, too ...

regards, tom lane

#3Arthur Ward
award@dominionsciences.com
In reply to: Tom Lane (#2)
Re: 7.4RC2 PANIC: insufficient room in FSM

"Arthur Ward" <award@dominionsciences.com> writes:

I was a bit stunned last night when I found this in the server logs for
a
7.4RC2 installation:

Nov 24 20:37:18 x pg_autovacuum: [2003-11-24 08:37:18 PM] Performing:
VACUUM ANALYZE "clients"."x"
Nov 24 20:37:19 x postgres: [13904] PANIC: insufficient room in FSM

We have seen reports of similar things in situations where the real
problem was that the lock table had gotten too big --- is it possible
that you had something going on in parallel that would have acquired
lots of locks? If so, raising max_locks_per_transaction should avoid
the problem.

I've combed through the system logs, our data-acquisition daemon's log,
and web logs, and there's nothing indicating that there would be any more
activity than there is normally all workday. It normally gets auto-vacuum
hits during the day when there is a little more large-transaction activity
with no problems. In the wee hours of the morning, I have a process doing
bulk loads that locks about a dozen tables explicitly to avoid unnecessary
rollbacks, but that was at least four hours in the future (or finished
19-ish hours in the past). That load also runs without issue. So, no, I
can't say that there was anything out of the ordinary happening to cause
the panic.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arthur Ward (#1)
Re: 7.4RC2 PANIC: insufficient room in FSM

"Arthur Ward" <award@dominionsciences.com> writes:

[ 7.4RC2 produced this: ]
Nov 24 20:37:19 x postgres: [13904] PANIC: insufficient room in FSM

After further study I've concluded that this means the fix I put in
place here:

2003-10-29 12:36 tgl

* src/backend/storage/freespace/freespace.c: compact_fsm_storage()
does need to handle the case where a relation's FSM data has to be
both moved down and compressed. Per report from Dror Matalon.

was incomplete, and that in fact there is no can't-happen case for this
routine. I've applied the attached patch for 7.4.1.

regards, tom lane

*** src/backend/storage/freespace/freespace.c.orig	Wed Oct 29 12:36:57 2003
--- src/backend/storage/freespace/freespace.c	Wed Nov 26 13:43:16 2003
***************
*** 1394,1399 ****
--- 1394,1400 ----
  compact_fsm_storage(void)
  {
  	int			nextChunkIndex = 0;
+ 	bool		did_push = false;
  	FSMRelation *fsmrel;

for (fsmrel = FreeSpaceMap->firstRel;
***************
*** 1419,1434 ****
newAllocPages = newAlloc * INDEXCHUNKPAGES;
else
newAllocPages = newAlloc * CHUNKPAGES;
- newChunkIndex = nextChunkIndex;
- nextChunkIndex += newAlloc;

/*
* Determine current size, current and new locations
*/
curChunks = fsm_current_chunks(fsmrel);
oldChunkIndex = fsmrel->firstChunk;
- newLocation = FreeSpaceMap->arena + newChunkIndex * CHUNKBYTES;
oldLocation = FreeSpaceMap->arena + oldChunkIndex * CHUNKBYTES;

  		/*
  		 * It's possible that we have to move data down, not up, if the
--- 1420,1434 ----
  			newAllocPages = newAlloc * INDEXCHUNKPAGES;
  		else
  			newAllocPages = newAlloc * CHUNKPAGES;
  		/*
  		 * Determine current size, current and new locations
  		 */
  		curChunks = fsm_current_chunks(fsmrel);
  		oldChunkIndex = fsmrel->firstChunk;
  		oldLocation = FreeSpaceMap->arena + oldChunkIndex * CHUNKBYTES;
+ 		newChunkIndex = nextChunkIndex;
+ 		newLocation = FreeSpaceMap->arena + newChunkIndex * CHUNKBYTES;
  		/*
  		 * It's possible that we have to move data down, not up, if the
***************
*** 1440,1449 ****
  		 * more than once, so pack everything against the end of the arena
  		 * if so.
  		 *
! 		 * In corner cases where roundoff has affected our allocation, it's
! 		 * possible that we have to move down and compress our data too.
! 		 * Since this case is extremely infrequent, we do not try to be smart
! 		 * about it --- we just drop pages from the end of the rel's data.
  		 */
  		if (newChunkIndex > oldChunkIndex)
  		{
--- 1440,1455 ----
  		 * more than once, so pack everything against the end of the arena
  		 * if so.
  		 *
! 		 * In corner cases where we are on the short end of a roundoff choice
! 		 * that we were formerly on the long end of, it's possible that we
! 		 * have to move down and compress our data too.  In fact, even after
! 		 * pushing down the following rels, there might not be as much space
! 		 * as we computed for this rel above --- that would imply that some
! 		 * following rel(s) are also on the losing end of roundoff choices.
! 		 * We could handle this fairly by doing the per-rel compactions
! 		 * out-of-order, but that seems like way too much complexity to deal
! 		 * with a very infrequent corner case.  Instead, we simply drop pages
! 		 * from the end of the current rel's data until it fits.
  		 */
  		if (newChunkIndex > oldChunkIndex)
  		{
***************
*** 1455,1475 ****
  				fsmrel->storedPages = newAllocPages;
  				curChunks = fsm_current_chunks(fsmrel);
  			}
  			if (fsmrel->nextPhysical != NULL)
  				limitChunkIndex = fsmrel->nextPhysical->firstChunk;
  			else
  				limitChunkIndex = FreeSpaceMap->totalChunks;
  			if (newChunkIndex + curChunks > limitChunkIndex)
  			{
! 				/* need to push down additional rels */
! 				push_fsm_rels_after(fsmrel);
! 				/* recheck for safety */
  				if (fsmrel->nextPhysical != NULL)
  					limitChunkIndex = fsmrel->nextPhysical->firstChunk;
  				else
  					limitChunkIndex = FreeSpaceMap->totalChunks;
  				if (newChunkIndex + curChunks > limitChunkIndex)
! 					elog(PANIC, "insufficient room in FSM");
  			}
  			memmove(newLocation, oldLocation, curChunks * CHUNKBYTES);
  		}
--- 1461,1504 ----
  				fsmrel->storedPages = newAllocPages;
  				curChunks = fsm_current_chunks(fsmrel);
  			}
+ 			/* is there enough space? */
  			if (fsmrel->nextPhysical != NULL)
  				limitChunkIndex = fsmrel->nextPhysical->firstChunk;
  			else
  				limitChunkIndex = FreeSpaceMap->totalChunks;
  			if (newChunkIndex + curChunks > limitChunkIndex)
  			{
! 				/* not enough space, push down following rels */
! 				if (!did_push)
! 				{
! 					push_fsm_rels_after(fsmrel);
! 					did_push = true;
! 				}
! 				/* now is there enough space? */
  				if (fsmrel->nextPhysical != NULL)
  					limitChunkIndex = fsmrel->nextPhysical->firstChunk;
  				else
  					limitChunkIndex = FreeSpaceMap->totalChunks;
  				if (newChunkIndex + curChunks > limitChunkIndex)
! 				{
! 					/* uh-oh, forcibly cut the allocation to fit */
! 					newAlloc = limitChunkIndex - newChunkIndex;
! 					/*
! 					 * If newAlloc < 0 at this point, we are moving the rel's
! 					 * firstChunk into territory currently assigned to a later
! 					 * rel.  This is okay so long as we do not copy any data.
! 					 * The rels will be back in nondecreasing firstChunk order
! 					 * at completion of the compaction pass.
! 					 */
! 					if (newAlloc < 0)
! 						newAlloc = 0;
! 					if (fsmrel->isIndex)
! 						newAllocPages = newAlloc * INDEXCHUNKPAGES;
! 					else
! 						newAllocPages = newAlloc * CHUNKPAGES;
! 					fsmrel->storedPages = newAllocPages;
! 					curChunks = fsm_current_chunks(fsmrel);
! 				}
  			}
  			memmove(newLocation, oldLocation, curChunks * CHUNKBYTES);
  		}
***************
*** 1504,1509 ****
--- 1533,1539 ----
  			memmove(newLocation, oldLocation, curChunks * CHUNKBYTES);
  		}
  		fsmrel->firstChunk = newChunkIndex;
+ 		nextChunkIndex += newAlloc;
  	}
  	Assert(nextChunkIndex <= FreeSpaceMap->totalChunks);
  	FreeSpaceMap->usedChunks = nextChunkIndex;
***************
*** 1544,1551 ****
  		oldChunkIndex = fsmrel->firstChunk;
  		if (newChunkIndex < oldChunkIndex)
  		{
! 			/* trouble... */
! 			elog(PANIC, "insufficient room in FSM");
  		}
  		else if (newChunkIndex > oldChunkIndex)
  		{
--- 1574,1581 ----
  		oldChunkIndex = fsmrel->firstChunk;
  		if (newChunkIndex < oldChunkIndex)
  		{
! 			/* we're pushing down, how can it move up? */
! 			elog(PANIC, "inconsistent entry sizes in FSM");
  		}
  		else if (newChunkIndex > oldChunkIndex)
  		{
***************
*** 1758,1771 ****
  {
  	int			chunkCount;

/* Convert page count to chunk count */
if (fsmrel->isIndex)
chunkCount = (fsmrel->storedPages - 1) / INDEXCHUNKPAGES + 1;
else
chunkCount = (fsmrel->storedPages - 1) / CHUNKPAGES + 1;
- /* Make sure storedPages==0 produces right answer */
- if (chunkCount < 0)
- chunkCount = 0;
return chunkCount;
}

--- 1788,1801 ----
  {
  	int			chunkCount;
+ 	/* Make sure storedPages==0 produces right answer */
+ 	if (fsmrel->storedPages <= 0)
+ 		return 0;
  	/* Convert page count to chunk count */
  	if (fsmrel->isIndex)
  		chunkCount = (fsmrel->storedPages - 1) / INDEXCHUNKPAGES + 1;
  	else
  		chunkCount = (fsmrel->storedPages - 1) / CHUNKPAGES + 1;
  	return chunkCount;
  }