9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

Started by Jeff Janesover 11 years ago5 messages
#1Jeff Janes
jeff.janes@gmail.com
5 attachment(s)

On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Jeff Janes wrote:

This problem was initially fairly easy to reproduce, but since I
started adding instrumentation specifically to catch it, it has become
devilishly hard to reproduce.

I think my next step will be to also log each of the values which goes
into the complex if (...) expression that decides on the deletion.

Could you please to reproduce it after updating to latest? I pushed
fixes that should close these issues. Maybe you want to remove the
instrumentation you added, to make failures more likely.

There are still some problems in 9.4, but I haven't been able to diagnose
them and wanted to do more research on it. The announcement of upcoming
back-branches for 9.3 spurred me to try it there, and I have problems with
9.3 (12c5bbdcbaa292b2a4b09d298786) as well. The move of truncation to the
checkpoint seems to have made the problem easier to reproduce. On an 8
core machine, this test fell over after about 20 minutes, which is much
faster than it usually reproduces.

This the error I get:

2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR: could not access status of
transaction 85837221
2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL: Could not open file
"pg_multixact/members/14031": No such file or directory.
2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT: SQL statement "SELECT 1
FROM ONLY "public"."foo_parent" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR
KEY SHARE OF x"

The testing harness is attached as 3 patches that must be made to the test
server, and 2 scripts. The script do.sh sets up the database (using fixed
paths, so be careful) and then invokes count.pl in a loop to do the actual
work.

Cheers,

Jeff

Attachments:

0002-pg_burn_multixact-utility.patchapplication/octet-stream; name=0002-pg_burn_multixact-utility.patchDownload
From b264bfe61b315a5f65d72b9550592cc9f73bf0a8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 31 Dec 2013 00:42:11 -0300
Subject: [PATCH 1/3] pg_burn_multixact utility

Andres Freund, minor tweaks by me

(And then updated by Jeff to apply to pageinspect 1.2)

diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
new file mode 100644
index dedc8fe..7903b35
*** a/contrib/pageinspect/heapfuncs.c
--- b/contrib/pageinspect/heapfuncs.c
***************
*** 29,34 ****
--- 29,36 ----
  #include "funcapi.h"
  #include "utils/builtins.h"
  #include "miscadmin.h"
+ #include "access/multixact.h"
+ #include "access/transam.h"
  
  
  /*
*************** heap_page_items(PG_FUNCTION_ARGS)
*** 222,224 ****
--- 224,266 ----
  	else
  		SRF_RETURN_DONE(fctx);
  }
+ 
+ extern Datum
+ pg_burn_multixact(PG_FUNCTION_ARGS);
+ PG_FUNCTION_INFO_V1(pg_burn_multixact);
+ 
+ Datum
+ pg_burn_multixact(PG_FUNCTION_ARGS)
+ {
+ 	int		rep = PG_GETARG_INT32(0);
+ 	int		size = PG_GETARG_INT32(1);
+ 	MultiXactMember *members;
+ 	MultiXactId ret;
+ 	TransactionId id = ReadNewTransactionId() - size;
+ 	int		i;
+ 
+ 	if (rep < 1)
+ 		elog(ERROR, "need to burn, burn, burn");
+ 
+ 	members = palloc(size * sizeof(MultiXactMember));
+ 	for (i = 0; i < size; i++)
+ 	{
+ 		members[i].xid = id++;
+ 		members[i].status = MultiXactStatusForShare;
+ 
+ 		if (!TransactionIdIsNormal(members[i].xid))
+ 		{
+ 			id = FirstNormalTransactionId;
+ 			members[i].xid = id++;
+ 		}
+ 	}
+ 
+ 	MultiXactIdSetOldestMember();
+ 
+ 	for (i = 0; i < rep; i++)
+ 	{
+ 		ret = MultiXactIdCreateFromMembers(size, members, true);
+ 	}
+ 
+ 	PG_RETURN_INT64((int64) ret);
+ }
diff --git a/contrib/pageinspect/pageinspect--1.2.sql b/contrib/pageinspect/pageinspect--1.2.sql
new file mode 100644
index 15e8e1e..f0aeecf
*** a/contrib/pageinspect/pageinspect--1.2.sql
--- b/contrib/pageinspect/pageinspect--1.2.sql
*************** CREATE FUNCTION fsm_page_contents(IN pag
*** 105,107 ****
--- 105,112 ----
  RETURNS text
  AS 'MODULE_PATHNAME', 'fsm_page_contents'
  LANGUAGE C STRICT;
+ 
+ CREATE FUNCTION pg_burn_multixact(num int4, size int4)
+ RETURNS int4
+ AS 'MODULE_PATHNAME', 'pg_burn_multixact'
+ LANGUAGE C STRICT;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
new file mode 100644
index b77c32c..cdd59fa
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** FreezeMultiXactId(MultiXactId multi, uin
*** 5714,5720 ****
  		 * Create a new multixact with the surviving members of the previous
  		 * one, to set as new Xmax in the tuple.
  		 */
! 		xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
  		*flags |= FRM_RETURN_IS_MULTI;
  	}
  
--- 5714,5720 ----
  		 * Create a new multixact with the surviving members of the previous
  		 * one, to set as new Xmax in the tuple.
  		 */
! 		xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers, false);
  		*flags |= FRM_RETURN_IS_MULTI;
  	}
  
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
new file mode 100644
index 9da22c8..86ed386
*** a/src/backend/access/transam/multixact.c
--- b/src/backend/access/transam/multixact.c
*************** MultiXactIdCreate(TransactionId xid1, Mu
*** 349,355 ****
  	members[1].xid = xid2;
  	members[1].status = status2;
  
! 	newMulti = MultiXactIdCreateFromMembers(2, members);
  
  	debug_elog3(DEBUG2, "Create: %s",
  				mxid_to_string(newMulti, 2, members));
--- 349,355 ----
  	members[1].xid = xid2;
  	members[1].status = status2;
  
! 	newMulti = MultiXactIdCreateFromMembers(2, members, false);
  
  	debug_elog3(DEBUG2, "Create: %s",
  				mxid_to_string(newMulti, 2, members));
*************** MultiXactIdExpand(MultiXactId multi, Tra
*** 415,421 ****
  		 */
  		member.xid = xid;
  		member.status = status;
! 		newMulti = MultiXactIdCreateFromMembers(1, &member);
  
  		debug_elog4(DEBUG2, "Expand: %u has no members, create singleton %u",
  					multi, newMulti);
--- 415,421 ----
  		 */
  		member.xid = xid;
  		member.status = status;
! 		newMulti = MultiXactIdCreateFromMembers(1, &member, false);
  
  		debug_elog4(DEBUG2, "Expand: %u has no members, create singleton %u",
  					multi, newMulti);
*************** MultiXactIdExpand(MultiXactId multi, Tra
*** 467,473 ****
  
  	newMembers[j].xid = xid;
  	newMembers[j++].status = status;
! 	newMulti = MultiXactIdCreateFromMembers(j, newMembers);
  
  	pfree(members);
  	pfree(newMembers);
--- 467,473 ----
  
  	newMembers[j].xid = xid;
  	newMembers[j++].status = status;
! 	newMulti = MultiXactIdCreateFromMembers(j, newMembers, false);
  
  	pfree(members);
  	pfree(newMembers);
*************** ReadNextMultiXactId(void)
*** 686,692 ****
   * NB: the passed members[] array will be sorted in-place.
   */
  MultiXactId
! MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members)
  {
  	MultiXactId multi;
  	MultiXactOffset offset;
--- 686,692 ----
   * NB: the passed members[] array will be sorted in-place.
   */
  MultiXactId
! MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members, bool nocache)
  {
  	MultiXactId multi;
  	MultiXactOffset offset;
*************** MultiXactIdCreateFromMembers(int nmember
*** 706,712 ****
  	 * corner cases where someone else added us to a MultiXact without our
  	 * knowledge, but it's not worth checking for.)
  	 */
! 	multi = mXactCacheGetBySet(nmembers, members);
  	if (MultiXactIdIsValid(multi))
  	{
  		debug_elog2(DEBUG2, "Create: in cache!");
--- 706,714 ----
  	 * corner cases where someone else added us to a MultiXact without our
  	 * knowledge, but it's not worth checking for.)
  	 */
! 	multi = nocache ? InvalidMultiXactId :
! 		mXactCacheGetBySet(nmembers, members);
! 
  	if (MultiXactIdIsValid(multi))
  	{
  		debug_elog2(DEBUG2, "Create: in cache!");
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
new file mode 100644
index 1f048e8..0fe5390
*** a/src/include/access/multixact.h
--- b/src/include/access/multixact.h
*************** typedef struct xl_multixact_create
*** 83,92 ****
  extern MultiXactId MultiXactIdCreate(TransactionId xid1,
  				  MultiXactStatus status1, TransactionId xid2,
  				  MultiXactStatus status2);
  extern MultiXactId MultiXactIdExpand(MultiXactId multi, TransactionId xid,
  				  MultiXactStatus status);
  extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
! 							 MultiXactMember *members);
  
  extern MultiXactId ReadNextMultiXactId(void);
  extern bool MultiXactIdIsRunning(MultiXactId multi);
--- 83,93 ----
  extern MultiXactId MultiXactIdCreate(TransactionId xid1,
  				  MultiXactStatus status1, TransactionId xid2,
  				  MultiXactStatus status2);
+ extern MultiXactId CreateMultiXactId(int nmembers, MultiXactMember *members, bool nocache);
  extern MultiXactId MultiXactIdExpand(MultiXactId multi, TransactionId xid,
  				  MultiXactStatus status);
  extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
! 							 MultiXactMember *members, bool nocache);
  
  extern MultiXactId ReadNextMultiXactId(void);
  extern bool MultiXactIdIsRunning(MultiXactId multi);
count.plapplication/octet-stream; name=count.plDownload
crash_REL9_4_BETA1.patchapplication/octet-stream; name=crash_REL9_4_BETA1.patchDownload
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
new file mode 100644
index 9da22c8..434ea9b
*** a/src/backend/access/transam/multixact.c
--- b/src/backend/access/transam/multixact.c
*************** SetMultiXactIdLimit(MultiXactId oldest_d
*** 2133,2140 ****
  	LWLockRelease(MultiXactGenLock);
  
  	/* Log the info */
! 	ereport(DEBUG1,
! 	 (errmsg("MultiXactId wrap limit is %u, limited by database with OID %u",
  			 multiWrapLimit, oldest_datoid)));
  
  	/*
--- 2133,2140 ----
  	LWLockRelease(MultiXactGenLock);
  
  	/* Log the info */
! 	ereport(LOG,
! 	 (errmsg("JJ MultiXactId wrap limit is %u, limited by database with OID %u",
  			 multiWrapLimit, oldest_datoid)));
  
  	/*
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
new file mode 100644
index 7013fb8..87811f2
*** a/src/backend/access/transam/varsup.c
--- b/src/backend/access/transam/varsup.c
***************
*** 31,36 ****
--- 31,38 ----
  /* pointer to "variable cache" in shared memory (set up by shmem.c) */
  VariableCache ShmemVariableCache = NULL;
  
+ int JJ_xid=0;
+ 
  
  /*
   * Allocate the next XID for a new transaction or subtransaction.
*************** GetNewTransactionId(bool isSubXact)
*** 159,174 ****
  	 *
  	 * Extend pg_subtrans too.
  	 */
! 	ExtendCLOG(xid);
! 	ExtendSUBTRANS(xid);
  
! 	/*
! 	 * Now advance the nextXid counter.  This must not happen until after we
! 	 * have successfully completed ExtendCLOG() --- if that routine fails, we
! 	 * want the next incoming transaction to try it again.  We cannot assign
! 	 * more XIDs until there is CLOG space for them.
! 	 */
! 	TransactionIdAdvance(ShmemVariableCache->nextXid);
  
  	/*
  	 * We must store the new XID into the shared ProcArray before releasing
--- 161,185 ----
  	 *
  	 * Extend pg_subtrans too.
  	 */
! 	{
! 		int		incr;
  
! 		for (incr=0; incr <=JJ_xid; incr++)
! 		{
! 			xid = ShmemVariableCache->nextXid;
! 
! 			ExtendCLOG(xid);
! 			ExtendSUBTRANS(xid);
! 
! 			/*
! 			 * Now advance the nextXid counter.  This must not happen until after we
! 			 * have successfully completed ExtendCLOG() --- if that routine fails, we
! 			 * want the next incoming transaction to try it again.  We cannot assign
! 			 * more XIDs until there is CLOG space for them.
! 			 */
! 			TransactionIdAdvance(ShmemVariableCache->nextXid);
! 		}
! 	}
  
  	/*
  	 * We must store the new XID into the shared ProcArray before releasing
*************** SetTransactionIdLimit(TransactionId olde
*** 332,339 ****
  	LWLockRelease(XidGenLock);
  
  	/* Log the info */
! 	ereport(DEBUG1,
! 			(errmsg("transaction ID wrap limit is %u, limited by database with OID %u",
  					xidWrapLimit, oldest_datoid)));
  
  	/*
--- 343,350 ----
  	LWLockRelease(XidGenLock);
  
  	/* Log the info */
! 	ereport(LOG,
! 			(errmsg("JJ transaction ID wrap limit is %u, limited by database with OID %u",
  					xidWrapLimit, oldest_datoid)));
  
  	/*
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index d675560..abca72c
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** BootStrapXLOG(void)
*** 4943,4948 ****
--- 4943,4949 ----
  	ShmemVariableCache->nextOid = checkPoint.nextOid;
  	ShmemVariableCache->oidCount = 0;
  	MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
+ 	//elog(LOG,"JJ SetTransactionIDLimit %d", checkPoint.oldestXid);
  	SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
  	SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
  
*************** StartupXLOG(void)
*** 6252,6257 ****
--- 6253,6259 ----
  	ShmemVariableCache->nextOid = checkPoint.nextOid;
  	ShmemVariableCache->oidCount = 0;
  	MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
+ 	//elog(LOG,"JJ SetTransactionIDLimit %d", checkPoint.oldestXid);
  	SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
  	SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
  	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
*************** xlog_redo(XLogRecPtr lsn, XLogRecord *re
*** 9107,9112 ****
--- 9109,9115 ----
  		LWLockRelease(OidGenLock);
  		MultiXactSetNextMXact(checkPoint.nextMulti,
  							  checkPoint.nextMultiOffset);
+ 		//elog(LOG,"JJ SetTransactionIDLimit %d", checkPoint.oldestXid);
  		SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
  		SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
  
*************** xlog_redo(XLogRecPtr lsn, XLogRecord *re
*** 9203,9208 ****
--- 9206,9212 ----
  		LWLockRelease(OidGenLock);
  		MultiXactAdvanceNextMXact(checkPoint.nextMulti,
  								  checkPoint.nextMultiOffset);
+ 		//elog(LOG,"JJ maybe SetTransactionIDLimit %d", checkPoint.oldestXid);
  		if (TransactionIdPrecedes(ShmemVariableCache->oldestXid,
  								  checkPoint.oldestXid))
  			SetTransactionIdLimit(checkPoint.oldestXid,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
new file mode 100644
index 3d2c739..2ed8ddd
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*************** int			vacuum_freeze_min_age;
*** 57,62 ****
--- 57,64 ----
  int			vacuum_freeze_table_age;
  int			vacuum_multixact_freeze_min_age;
  int			vacuum_multixact_freeze_table_age;
+ int			JJ_vac=0;
+ 
  
  
  /* A few variables that don't seem worth passing around as parameters */
*************** vacuum_set_xid_limits(Relation rel,
*** 466,471 ****
--- 468,474 ----
  	}
  
  	*freezeLimit = limit;
+ 	if (JJ_vac) elog(LOG,"JJ freezeLimit %d", *freezeLimit);
  
  	/*
  	 * Determine the minimum multixact freeze age to use: as specified by
*************** vacuum_set_xid_limits(Relation rel,
*** 513,518 ****
--- 516,523 ----
  		 * VACUUM schedule, the nightly VACUUM gets a chance to freeze tuples
  		 * before anti-wraparound autovacuum is launched.
  		 */
+ 		if (JJ_vac) elog(LOG,"JJ freeze_min_age %d vacuum_freeze_table_age %d freeze_table_age %d ReadNew %d", freeze_min_age, 
+                            vacuum_freeze_table_age, freeze_table_age,ReadNewTransactionId());
  		freezetable = freeze_table_age;
  		if (freezetable < 0)
  			freezetable = vacuum_freeze_table_age;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
new file mode 100644
index b4abeed..7d8080b
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***************
*** 62,67 ****
--- 62,68 ----
  #include "utils/tqual.h"
  
  
+ extern int JJ_vac;
  /*
   * Space/time tradeoff parameters: do these need to be user-tunable?
   *
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 223,228 ****
--- 224,231 ----
  	scan_all |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
  											mxactFullScanLimit);
  
+ 	if (JJ_vac) elog(LOG,"JJ scan_all %d, relfrozenid %d", scan_all, onerel->rd_rel->relfrozenxid);
+ 
  	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
  
  	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
new file mode 100644
index b53cfdb..cc31792
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
*************** int			autovacuum_vac_cost_delay;
*** 123,128 ****
--- 123,129 ----
  int			autovacuum_vac_cost_limit;
  
  int			Log_autovacuum_min_duration = -1;
+ extern int  JJ_vac;
  
  /* how long to keep pgstat data in the launcher, in milliseconds */
  #define STATS_READ_DELAY 1000
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1666,1673 ****
  		InitPostgres(NULL, dbid, NULL, dbname);
  		SetProcessingMode(NormalProcessing);
  		set_ps_display(dbname, false);
- 		ereport(DEBUG1,
- 				(errmsg("autovacuum: processing database \"%s\"", dbname)));
  
  		if (PostAuthDelay)
  			pg_usleep(PostAuthDelay * 1000000L);
--- 1667,1672 ----
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1675,1681 ****
--- 1674,1684 ----
  		/* And do an appropriate amount of work */
  		recentXid = ReadNewTransactionId();
  		recentMulti = ReadNextMultiXactId();
+ 		if (JJ_vac) ereport(LOG,
+ 				(errmsg("autovacuum: processing database \"%s\" at recent Xid of %u recent mxid of %u", dbname,recentXid,recentMulti)));
  		do_autovacuum();
+ 		if (JJ_vac) ereport(LOG,
+ 				(errmsg("autovacuum: done processing database \"%s\" at recent Xid of %u recent mxid of %u", dbname,ReadNewTransactionId(),ReadNextMultiXactId())));
  	}
  
  	/*
*************** relation_needs_vacanalyze(Oid relid,
*** 2732,2744 ****
  		 * reset, because if that happens, the last vacuum and analyze counts
  		 * will be reset too.
  		 */
- 		elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)",
- 			 NameStr(classForm->relname),
- 			 vactuples, vacthresh, anltuples, anlthresh);
  
  		/* Determine if this table needs vacuum or analyze. */
  		*dovacuum = force_vacuum || (vactuples > vacthresh);
  		*doanalyze = (anltuples > anlthresh);
  	}
  	else
  	{
--- 2735,2748 ----
  		 * reset, because if that happens, the last vacuum and analyze counts
  		 * will be reset too.
  		 */
  
  		/* Determine if this table needs vacuum or analyze. */
  		*dovacuum = force_vacuum || (vactuples > vacthresh);
  		*doanalyze = (anltuples > anlthresh);
+ 
+ 		if (JJ_vac) elog(LOG, "%s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f) wraparound %d dovaccum %d doanalyze %d",
+ 			 NameStr(classForm->relname),
+ 			 vactuples, vacthresh, anltuples, anlthresh, *wraparound, *dovacuum, *doanalyze);
  	}
  	else
  	{
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
new file mode 100644
index 3c1c81a..06b558e
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
***************
*** 67,72 ****
--- 67,74 ----
  #define FILE_POSSIBLY_DELETED(err)	((err) == ENOENT || (err) == EACCES)
  #endif
  
+ int JJ_torn_page=0;
+ 
  /*
   *	The magnetic disk storage manager keeps track of open file
   *	descriptors in its own descriptor pool.  This is done to make it
*************** mdwrite(SMgrRelation reln, ForkNumber fo
*** 727,732 ****
--- 729,735 ----
  	off_t		seekpos;
  	int			nbytes;
  	MdfdVec    *v;
+         static int counter=0;
  
  	/* This assert is too expensive to have on normally ... */
  #ifdef CHECK_WRITE_VS_EXTEND
*************** mdwrite(SMgrRelation reln, ForkNumber fo
*** 751,757 ****
  				 errmsg("could not seek to block %u in file \"%s\": %m",
  						blocknum, FilePathName(v->mdfd_vfd))));
  
! 	nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ);
  
  	TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum,
  										reln->smgr_rnode.node.spcNode,
--- 754,771 ----
  				 errmsg("could not seek to block %u in file \"%s\": %m",
  						blocknum, FilePathName(v->mdfd_vfd))));
  
!         if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress()) {
! 	  nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3);
! 		ereport(FATAL,
! 				(errcode(ERRCODE_DISK_FULL),
! 				 errmsg("could not write block %u of relation %s: wrote only %d of %d bytes",
! 						blocknum,
! 						relpath(reln->smgr_rnode, forknum),
! 						nbytes, BLCKSZ),
! 				 errhint("JJ is screwing with the database.")));
!         } else {
! 	  nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ);
! 	}
  
  	TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum,
  										reln->smgr_rnode.node.spcNode,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 1d094f0..a5f47d1
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 120,125 ****
--- 120,128 ----
  /* XXX these should appear in other modules' header files */
  extern bool Log_disconnections;
  extern int	CommitDelay;
+ int	JJ_torn_page;
+ extern int	JJ_xid;
+ extern int	JJ_vac;
  extern int	CommitSiblings;
  extern char *default_tablespace;
  extern char *temp_tablespaces;
*************** static struct config_int ConfigureNamesI
*** 2176,2181 ****
--- 2179,2211 ----
  	},
  
  	{
+ 		{"JJ_torn_page", PGC_USERSET, WAL_SETTINGS,
+ 			gettext_noop("Simulate a torn-page crash after this number of page writes (0 to turn off)"),
+ 			NULL
+ 		},
+ 		&JJ_torn_page,
+ 		0, 0, 100000, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"JJ_xid", PGC_USERSET, WAL_SETTINGS,
+ 			gettext_noop("Skip this many xid every time we acquire one"),
+ 			NULL
+ 		},
+ 		&JJ_xid,
+ 		0, 0, 1000000, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"JJ_vac", PGC_USERSET, WAL_SETTINGS,
+ 			gettext_noop("turn on verbose logging"),
+ 			NULL
+ 		},
+ 		&JJ_vac,
+ 		0, 0, 1000000, NULL, NULL
+ 	},
+ 
+ 	{
  		{"commit_siblings", PGC_USERSET, WAL_SETTINGS,
  			gettext_noop("Sets the minimum concurrent open transactions before performing "
  						 "commit_delay."),
do.shapplication/x-sh; name=do.shDownload
member_delete_log.patchapplication/octet-stream; name=member_delete_log.patchDownload
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
new file mode 100644
index 9da22c8..12cc03b
*** a/src/backend/access/transam/multixact.c
--- b/src/backend/access/transam/multixact.c
*************** SlruScanDirCbRemoveMembers(SlruCtl ctl, 
*** 2426,2433 ****
  	if ((range->rangeStart > range->rangeEnd &&
  		 segpage > range->rangeEnd && segpage < range->rangeStart) ||
  		(range->rangeStart < range->rangeEnd &&
! 		 (segpage < range->rangeStart || segpage > range->rangeEnd)))
  		SlruDeleteSegment(ctl, filename);
  
  	return false;				/* keep going */
  }
--- 2426,2435 ----
  	if ((range->rangeStart > range->rangeEnd &&
  		 segpage > range->rangeEnd && segpage < range->rangeStart) ||
  		(range->rangeStart < range->rangeEnd &&
! 		 (segpage < range->rangeStart || segpage > range->rangeEnd))) {
!         elog(LOG,"JJ deleting %p %s", ctl, filename);
  		SlruDeleteSegment(ctl, filename);
+     };
  
  	return false;				/* keep going */
  }
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Janes (#1)
Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

I'm not saying there is no multixact bug here, but I wonder if this part
of your crasher patch might be the cause:

--- 754,771 ----
  				 errmsg("could not seek to block %u in file \"%s\": %m",
  						blocknum, FilePathName(v->mdfd_vfd))));

! if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress()) {
! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3);
! ereport(FATAL,
! (errcode(ERRCODE_DISK_FULL),
! errmsg("could not write block %u of relation %s: wrote only %d of %d bytes",
! blocknum,
! relpath(reln->smgr_rnode, forknum),
! nbytes, BLCKSZ),
! errhint("JJ is screwing with the database.")));
! } else {
! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ);
! }

Wouldn't this BLCKSZ/3 business update the page's LSN but not the full
contents, meaning that on xlog replay the block wouldn't be rewritten
when the xlog replays next time around? That could cause the block to
have the upper two thirds containing multixacts in xmax that had been
removed by a vacuuming round previous to the crash.

(Maybe I'm just too tired and I'm failing to fully understand the torn
page protection. I thought I understood how it worked, but now I'm not
sure -- I mean I don't see how it can possibly have any value at all.
Surely if the disk writes the first 512-byte sector of the page and then
forgets the updates to the next 15 sectors, the page will appear as not
needing the full page image to be restored ...)

Is the page containing the borked multixact value the one that was
half-written by this code?

Is the problem reproducible if you cause this path to ereport(FATAL)
without writing 1/3rd of the page?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

On Wed, Jul 16, 2014 at 12:46 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

(Maybe I'm just too tired and I'm failing to fully understand the torn
page protection. I thought I understood how it worked, but now I'm not
sure -- I mean I don't see how it can possibly have any value at all.
Surely if the disk writes the first 512-byte sector of the page and then
forgets the updates to the next 15 sectors, the page will appear as not
needing the full page image to be restored ...)

We always restore full page images, regardless of the page LSN.
Otherwise, we'd have exactly the problem you describe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#1)
Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

On Tue, Jul 15, 2014 at 3:58 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera <alvherre@2ndquadrant.com

wrote:

Jeff Janes wrote:

This problem was initially fairly easy to reproduce, but since I
started adding instrumentation specifically to catch it, it has become
devilishly hard to reproduce.

I think my next step will be to also log each of the values which goes
into the complex if (...) expression that decides on the deletion.

Could you please to reproduce it after updating to latest? I pushed
fixes that should close these issues. Maybe you want to remove the
instrumentation you added, to make failures more likely.

There are still some problems in 9.4, but I haven't been able to diagnose
them and wanted to do more research on it. The announcement of upcoming
back-branches for 9.3 spurred me to try it there, and I have problems with
9.3 (12c5bbdcbaa292b2a4b09d298786) as well. The move of truncation to the
checkpoint seems to have made the problem easier to reproduce. On an 8
core machine, this test fell over after about 20 minutes, which is much
faster than it usually reproduces.

This the error I get:

2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR: could not access status of
transaction 85837221
2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL: Could not open file
"pg_multixact/members/14031": No such file or directory.
2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT: SQL statement "SELECT 1
FROM ONLY "public"."foo_parent" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR
KEY SHARE OF x"

The testing harness is attached as 3 patches that must be made to the test
server, and 2 scripts. The script do.sh sets up the database (using fixed
paths, so be careful) and then invokes count.pl in a loop to do the
actual work.

Sorry, after a long time when I couldn't do much testing on this, I've now
been able to get back to it.

It looks like what is happening is that checkPoint.nextMultiOffset wraps
around from 2^32 to 0, even if 0 is still being used. At that point it
starts deleting member files that are still needed.

Is there some interlock which is supposed to prevent from
checkPoint.nextMultiOffset rom lapping iself? I haven't been able to find
it. It seems like the interlock applies only to MultiXid, not the Offset.

Thanks,

Jeff

#5Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#4)
Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

On August 19, 2014 10:24:20 PM CEST, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Jul 15, 2014 at 3:58 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:

On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera

<alvherre@2ndquadrant.com

wrote:

Jeff Janes wrote:

This problem was initially fairly easy to reproduce, but since I
started adding instrumentation specifically to catch it, it has

become

devilishly hard to reproduce.

I think my next step will be to also log each of the values which

goes

into the complex if (...) expression that decides on the deletion.

Could you please to reproduce it after updating to latest? I pushed
fixes that should close these issues. Maybe you want to remove the
instrumentation you added, to make failures more likely.

There are still some problems in 9.4, but I haven't been able to

diagnose

them and wanted to do more research on it. The announcement of

upcoming

back-branches for 9.3 spurred me to try it there, and I have problems

with

9.3 (12c5bbdcbaa292b2a4b09d298786) as well. The move of truncation

to the

checkpoint seems to have made the problem easier to reproduce. On an

8

core machine, this test fell over after about 20 minutes, which is

much

faster than it usually reproduces.

This the error I get:

2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR: could not access

status of

transaction 85837221
2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL: Could not open file
"pg_multixact/members/14031": No such file or directory.
2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT: SQL statement

"SELECT 1

FROM ONLY "public"."foo_parent" x WHERE "id" OPERATOR(pg_catalog.=)

$1 FOR

KEY SHARE OF x"

The testing harness is attached as 3 patches that must be made to the

test

server, and 2 scripts. The script do.sh sets up the database (using

fixed

paths, so be careful) and then invokes count.pl in a loop to do the
actual work.

Sorry, after a long time when I couldn't do much testing on this, I've
now
been able to get back to it.

It looks like what is happening is that checkPoint.nextMultiOffset
wraps
around from 2^32 to 0, even if 0 is still being used. At that point it
starts deleting member files that are still needed.

Is there some interlock which is supposed to prevent from
checkPoint.nextMultiOffset rom lapping iself? I haven't been able to
find
it. It seems like the interlock applies only to MultiXid, not the
Offset.

There is none (and there never has been one either). I've complained about it a couple of times but nobody, me included, had time and energy to fix that :(

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers