including backend ID in relpath of temp rels - updated patch

Started by Robert Haasover 15 years ago47 messages

robertmhaas@gmail.com

over 15 years ago

1 attachment(s)

For previous discussion of this topic, see:

http://archives.postgresql.org/pgsql-hackers/2010-04/msg01181.php
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00352.php
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00302.php

As in the original version of the patch, I have not simply added
backend ID to RelFileNode, because there are too many places using
RelFileNode in contexts where the backend ID can be determined from
context, such as the shared and local buffer managers and the xlog
code. Instead, I have introduced BackendRelFileNode for contexts
where we need both the RelFileNode and the backend ID. The smgr layer
has to use BackendRelFileNode across the board, since it operates on
both permanent and temporary relation, including - potentially -
temporary relations of other backends. smgr invalidations must also
include the backend ID, as must communication between regular backends
and the bgwriter. The relcache now stores rd_backend instead of
rd_islocaltemp so that it remains straightforward to call smgropen()
based on a relcache entry. Some smgr functions no longer require an
isTemp argument, because they can infer the necessary information from
their BackendRelFileNode. smgrwrite() and smgrextend() now take a
skipFsync argument rather than an isTemp argument.

In this version of the patch, I've improved the temporary file cleanup
mechanism. In CVS HEAD, when a transaction commits or aborts, we
write an XLOG record with all relations that must be unlinked,
including temporary relations. With this patch, we no longer include
temporary relations in the XLOG records (which may be a tiny
performance benefit for some people); instead, on every startup of the
database cluster, we just nuke all temporary relation files (which is
now easy to do, since they are named differently than files for
non-temporary relations) at the same time that we nuke other temp
files. This produces slightly different behavior. In CVS HEAD,
temporary files get removed whenever an xlog redo happens, so either
at cluster start or after a backend crash, but only to the extent that
they appear in WAL. With this patch, we can be sure of removing ALL
stray files, which is maybe ever-so-slightly leaky in CVS HEAD. But
on the other hand, because it hooks into the existing temporary file
cleanup code, it only happens at cluster startup, NOT after a backend
crash. The existing coding leaves temporary sort files and similar
around after a backend crash for forensics purposes. That might or
might not be worth rethinking for non-debug builds, but I don't think
there's any very good reason to think that temporary relation files
need to be handled differently than temporary work files.

I believe that this patch will clear away one major obstacle to
implementing global temporary and unlogged tables: it enables us to be
sure of cleaning up properly after a crash without relying on catalog
entries or XLOG. Based on previous discussions, however, I believe
there is support for making this change independently of what becomes
of that project, just for the benefit of having a more robust cleanup
mechanism.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachments:

temprelnames-v2.patchapplication/octet-stream; name=temprelnames-v2.patchDownload

diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d1f7bcc..9645c95 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -373,8 +373,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	}
 
 	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks,
-				 rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
 
 	/*
 	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 89ed8a0..06e304e 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -295,9 +295,8 @@ _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
 	}
 
 	/*
-	 * Now write the page.	We say isTemp = true even if it's not a temp
-	 * index, because there's no need for smgr to schedule an fsync for this
-	 * write; we'll do it ourselves before ending the build.
+	 * Now write the page.	There's no need for smgr to schedule an fsync for
+	 * this write; we'll do it ourselves before ending the build.
 	 */
 	if (blkno == wstate->btws_pages_written)
 	{
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d432c9d..c0114f9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -865,8 +865,8 @@ StartPrepare(GlobalTransaction gxact)
 	hdr.prepared_at = gxact->prepared_at;
 	hdr.owner = gxact->owner;
 	hdr.nsubxacts = xactGetCommittedChildren(&children);
-	hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels, NULL);
-	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels, NULL);
+	hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels);
+	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
 	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
 														  &hdr.initfileinval);
 	StrNCpy(hdr.gid, gxact->gid, GIDSIZE);
@@ -1320,13 +1320,13 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	}
 	for (i = 0; i < ndelrels; i++)
 	{
-		SMgrRelation srel = smgropen(delrels[i]);
+		SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
 		{
 			if (smgrexists(srel, fork))
-				smgrdounlink(srel, fork, false, false);
+				smgrdounlink(srel, fork, false);
 		}
 		smgrclose(srel);
 	}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index fa87d90..3e8eb40 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -890,7 +890,6 @@ RecordTransactionCommit(void)
 	TransactionId latestXid = InvalidTransactionId;
 	int			nrels;
 	RelFileNode *rels;
-	bool		haveNonTemp;
 	int			nchildren;
 	TransactionId *children;
 	int			nmsgs;
@@ -898,7 +897,7 @@ RecordTransactionCommit(void)
 	bool		RelcacheInitFileInval;
 
 	/* Get data needed for commit record */
-	nrels = smgrGetPendingDeletes(true, &rels, &haveNonTemp);
+	nrels = smgrGetPendingDeletes(true, &rels);
 	nchildren = xactGetCommittedChildren(&children);
 	nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 												 &RelcacheInitFileInval);
@@ -1025,7 +1024,7 @@ RecordTransactionCommit(void)
 	 * asynchronous commit if all to-be-deleted tables are temporary though,
 	 * since they are lost anyway if we crash.)
 	 */
-	if (XactSyncCommit || forceSyncCommit || haveNonTemp)
+	if (XactSyncCommit || forceSyncCommit || nrels > 0)
 	{
 		/*
 		 * Synchronous commit case.
@@ -1306,7 +1305,7 @@ RecordTransactionAbort(bool isSubXact)
 			 xid);
 
 	/* Fetch the data we need for the abort record */
-	nrels = smgrGetPendingDeletes(false, &rels, NULL);
+	nrels = smgrGetPendingDeletes(false, &rels);
 	nchildren = xactGetCommittedChildren(&children);
 
 	/* XXX do we really need a critical section here? */
@@ -4446,7 +4445,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4454,7 +4453,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
@@ -4551,7 +4550,7 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4559,7 +4558,7 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
@@ -4633,7 +4632,7 @@ xact_desc_commit(StringInfo buf, xl_xact_commit *xlrec)
 		appendStringInfo(buf, "; rels:");
 		for (i = 0; i < xlrec->nrels; i++)
 		{
-			char	   *path = relpath(xlrec->xnodes[i], MAIN_FORKNUM);
+			char	   *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
 
 			appendStringInfo(buf, " %s", path);
 			pfree(path);
@@ -4688,7 +4687,7 @@ xact_desc_abort(StringInfo buf, xl_xact_abort *xlrec)
 		appendStringInfo(buf, "; rels:");
 		for (i = 0; i < xlrec->nrels; i++)
 		{
-			char	   *path = relpath(xlrec->xnodes[i], MAIN_FORKNUM);
+			char	   *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
 
 			appendStringInfo(buf, " %s", path);
 			pfree(path);
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 9ee2036..3675fa8 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -68,7 +68,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 */
 	if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
 	{
-		char	   *path = relpath(node, forkno);
+		char	   *path = relpathperm(node, forkno);
 
 		if (present)
 			elog(DEBUG1, "page %u of relation %s is uninitialized",
@@ -133,7 +133,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
 		{
 			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
 			{
-				char	   *path = relpath(hentry->key.node, forkno);
+				char	   *path = relpathperm(hentry->key.node, forkno);
 
 				elog(DEBUG2, "page %u of relation %s has been dropped",
 					 hentry->key.blkno, path);
@@ -166,7 +166,7 @@ forget_invalid_pages_db(Oid dbid)
 		{
 			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
 			{
-				char	   *path = relpath(hentry->key.node, hentry->key.forkno);
+				char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
 				elog(DEBUG2, "page %u of relation %s has been dropped",
 					 hentry->key.blkno, path);
@@ -200,7 +200,7 @@ XLogCheckInvalidPages(void)
 	 */
 	while ((hentry = (xl_invalid_page *) hash_seq_search(&status)) != NULL)
 	{
-		char	   *path = relpath(hentry->key.node, hentry->key.forkno);
+		char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
 		if (hentry->present)
 			elog(WARNING, "page %u of relation %s was uninitialized",
@@ -278,7 +278,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	Assert(blkno != P_NEW);
 
 	/* Open the relation at smgr level */
-	smgr = smgropen(rnode);
+	smgr = smgropen(rnode, InvalidBackendId);
 
 	/*
 	 * Create the target file if it doesn't already exist.  This lets us cope
@@ -295,7 +295,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	if (blkno < lastblock)
 	{
 		/* page exists in file */
-		buffer = ReadBufferWithoutRelcache(rnode, false, forknum, blkno,
+		buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
 										   mode, NULL);
 	}
 	else
@@ -314,7 +314,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 		{
 			if (buffer != InvalidBuffer)
 				ReleaseBuffer(buffer);
-			buffer = ReadBufferWithoutRelcache(rnode, false, forknum,
+			buffer = ReadBufferWithoutRelcache(rnode, forknum,
 											   P_NEW, mode, NULL);
 			lastblock++;
 		}
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 3edfc23..5ec5602 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -78,12 +78,37 @@ forkname_to_number(char *forkName)
 }
 
 /*
- * relpath			- construct path to a relation's file
+ * forkname_chars
+ * 		We use this to figure out whether a filename could be a relation
+ * 		fork (as opposed to an oddly named stray file that somehow ended
+ * 		up in the database directory).  If the passed string begins with
+ * 		a fork name (other than the main fork name), we return its length.
+ * 		If not, we return 0.
+ *
+ * Note that the present coding assumes that there are no fork names which
+ * are prefixes of other fork names.
+ */
+int
+forkname_chars(const char *str)
+{
+	ForkNumber	forkNum;
+
+	for (forkNum = 1; forkNum <= MAX_FORKNUM; forkNum++)
+	{
+		int len = strlen(forkNames[forkNum]);
+		if (strncmp(forkNames[forkNum], str, len) == 0)
+			return len;
+	}
+	return 0;
+}
+
+/*
+ * relpathbackend - construct path to a relation's file
  *
  * Result is a palloc'd string.
  */
 char *
-relpath(RelFileNode rnode, ForkNumber forknum)
+relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)
 {
 	int			pathlen;
 	char	   *path;
@@ -92,6 +117,7 @@ relpath(RelFileNode rnode, ForkNumber forknum)
 	{
 		/* Shared system relations live in {datadir}/global */
 		Assert(rnode.dbNode == 0);
+		Assert(backend == InvalidBackendId);
 		pathlen = 7 + OIDCHARS + 1 + FORKNAMECHARS + 1;
 		path = (char *) palloc(pathlen);
 		if (forknum != MAIN_FORKNUM)
@@ -103,29 +129,69 @@ relpath(RelFileNode rnode, ForkNumber forknum)
 	else if (rnode.spcNode == DEFAULTTABLESPACE_OID)
 	{
 		/* The default tablespace is {datadir}/base */
-		pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-		path = (char *) palloc(pathlen);
-		if (forknum != MAIN_FORKNUM)
-			snprintf(path, pathlen, "base/%u/%u_%s",
-					 rnode.dbNode, rnode.relNode, forkNames[forknum]);
+		if (backend == InvalidBackendId)
+		{
+			pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "base/%u/%u_%s",
+						 rnode.dbNode, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "base/%u/%u",
+						 rnode.dbNode, rnode.relNode);
+		}
 		else
-			snprintf(path, pathlen, "base/%u/%u",
-					 rnode.dbNode, rnode.relNode);
+		{
+			/* OIDCHARS will suffice for an integer, too */
+			pathlen = 5 + OIDCHARS + 2 + OIDCHARS + 1 + OIDCHARS + 1
+					+ FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "base/%u/t%d_%u_%s",
+						 rnode.dbNode, backend, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "base/%u/t%d_%u",
+						 rnode.dbNode, backend, rnode.relNode);
+		}
 	}
 	else
 	{
 		/* All other tablespaces are accessed via symlinks */
-		pathlen = 9 + 1 + OIDCHARS + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) +
-			1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-		path = (char *) palloc(pathlen);
-		if (forknum != MAIN_FORKNUM)
-			snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
-					 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
-					 rnode.dbNode, rnode.relNode, forkNames[forknum]);
+		if (backend == InvalidBackendId)
+		{
+			pathlen = 9 + 1 + OIDCHARS + 1
+					+ strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1
+					+ OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, rnode.relNode);
+		}
 		else
-			snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
-					 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
-					 rnode.dbNode, rnode.relNode);
+		{
+			/* OIDCHARS will suffice for an integer, too */
+			pathlen = 9 + 1 + OIDCHARS + 1
+					+ strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 2
+					+ OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u_%s",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, backend, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, backend, rnode.relNode);
+		}
 	}
 	return path;
 }
@@ -458,16 +524,17 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
  * created by bootstrap have preassigned OIDs, so there's no need.
  */
 Oid
-GetNewRelFileNode(Oid reltablespace, Relation pg_class)
+GetNewRelFileNode(Oid reltablespace, Relation pg_class, BackendId backend)
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 	char	   *rpath;
 	int			fd;
 	bool		collides;
 
 	/* This logic should match RelationInitPhysicalAddr */
-	rnode.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
-	rnode.dbNode = (rnode.spcNode == GLOBALTABLESPACE_OID) ? InvalidOid : MyDatabaseId;
+	rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
+	rnode.node.dbNode = (rnode.node.spcNode == GLOBALTABLESPACE_OID) ? InvalidOid : MyDatabaseId;
+	rnode.backend = backend;
 
 	do
 	{
@@ -475,9 +542,9 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class)
 
 		/* Generate the OID */
 		if (pg_class)
-			rnode.relNode = GetNewOid(pg_class);
+			rnode.node.relNode = GetNewOid(pg_class);
 		else
-			rnode.relNode = GetNewObjectId();
+			rnode.node.relNode = GetNewObjectId();
 
 		/* Check for existing file of same name */
 		rpath = relpath(rnode, MAIN_FORKNUM);
@@ -508,5 +575,5 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class)
 		pfree(rpath);
 	} while (collides);
 
-	return rnode.relNode;
+	return rnode.node.relNode;
 }
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index d848ef0..6e852b4 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -39,6 +39,7 @@
 #include "catalog/heap.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_attrdef.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -975,7 +976,9 @@ heap_create_with_catalog(const char *relname,
 			binary_upgrade_next_toast_relfilenode = InvalidOid;
 		}
 		else
-			relid = GetNewRelFileNode(reltablespace, pg_class_desc);
+			relid = GetNewRelFileNode(reltablespace, pg_class_desc,
+									  isTempOrToastNamespace(relnamespace) ?
+										  MyBackendId : InvalidBackendId);
 	}
 
 	/*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 69946fe..3408a10 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -645,7 +645,12 @@ index_create(Oid heapRelationId,
 			binary_upgrade_next_index_relfilenode = InvalidOid;
 		}
 		else
-			indexRelationId = GetNewRelFileNode(tableSpaceId, pg_class);
+		{
+			indexRelationId =
+				GetNewRelFileNode(tableSpaceId, pg_class,
+								  heapRelation->rd_istemp ?
+									MyBackendId : InvalidBackendId);
+		}
 	}
 
 	/*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index ad376a1..4515741 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -52,7 +52,7 @@
 typedef struct PendingRelDelete
 {
 	RelFileNode relnode;		/* relation that may need to be deleted */
-	bool		isTemp;			/* is it a temporary relation? */
+	BackendId	backend;		/* InvalidBackendId if not a temp rel */
 	bool		atCommit;		/* T=delete at commit; F=delete at abort */
 	int			nestLevel;		/* xact nesting level of request */
 	struct PendingRelDelete *next;		/* linked-list link */
@@ -102,8 +102,9 @@ RelationCreateStorage(RelFileNode rnode, bool istemp)
 	XLogRecData rdata;
 	xl_smgr_create xlrec;
 	SMgrRelation srel;
+	BackendId	backend = istemp ? MyBackendId : InvalidBackendId;
 
-	srel = smgropen(rnode);
+	srel = smgropen(rnode, backend);
 	smgrcreate(srel, MAIN_FORKNUM, false);
 
 	if (!istemp)
@@ -125,7 +126,7 @@ RelationCreateStorage(RelFileNode rnode, bool istemp)
 	pending = (PendingRelDelete *)
 		MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
 	pending->relnode = rnode;
-	pending->isTemp = istemp;
+	pending->backend = backend;
 	pending->atCommit = false;	/* delete if abort */
 	pending->nestLevel = GetCurrentTransactionNestLevel();
 	pending->next = pendingDeletes;
@@ -145,7 +146,7 @@ RelationDropStorage(Relation rel)
 	pending = (PendingRelDelete *)
 		MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
 	pending->relnode = rel->rd_node;
-	pending->isTemp = rel->rd_istemp;
+	pending->backend = rel->rd_backend;
 	pending->atCommit = true;	/* delete if commit */
 	pending->nestLevel = GetCurrentTransactionNestLevel();
 	pending->next = pendingDeletes;
@@ -283,7 +284,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
 }
 
 /*
@@ -322,14 +323,11 @@ smgrDoPendingDeletes(bool isCommit)
 				SMgrRelation srel;
 				int			i;
 
-				srel = smgropen(pending->relnode);
+				srel = smgropen(pending->relnode, pending->backend);
 				for (i = 0; i <= MAX_FORKNUM; i++)
 				{
 					if (smgrexists(srel, i))
-						smgrdounlink(srel,
-									 i,
-									 pending->isTemp,
-									 false);
+						smgrdounlink(srel, i, false);
 				}
 				smgrclose(srel);
 			}
@@ -341,20 +339,24 @@ smgrDoPendingDeletes(bool isCommit)
 }
 
 /*
- * smgrGetPendingDeletes() -- Get a list of relations to be deleted.
+ * smgrGetPendingDeletes() -- Get a list of non-temp relations to be deleted.
  *
  * The return value is the number of relations scheduled for termination.
  * *ptr is set to point to a freshly-palloc'd array of RelFileNodes.
  * If there are no relations to be deleted, *ptr is set to NULL.
  *
- * If haveNonTemp isn't NULL, the bool it points to gets set to true if
- * there is any non-temp table pending to be deleted; false if not.
+ * Only non-temporary relations are included in the returned list.  This is OK
+ * because the list is used only in contexts where temporary relations don't
+ * matter: we're either writing to the two-phase state file (and transactions
+ * that have touched temp tables can't be prepared) or we're writing to xlog
+ * (and all temporary files will be zapped if we restart anyway, so no need
+ * for redo to do it also).
  *
  * Note that the list does not include anything scheduled for termination
  * by upper-level transactions.
  */
 int
-smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
+smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr)
 {
 	int			nestLevel = GetCurrentTransactionNestLevel();
 	int			nrels;
@@ -362,11 +364,10 @@ smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
 	PendingRelDelete *pending;
 
 	nrels = 0;
-	if (haveNonTemp)
-		*haveNonTemp = false;
 	for (pending = pendingDeletes; pending != NULL; pending = pending->next)
 	{
-		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit)
+		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit
+			&& pending->backend == InvalidBackendId)
 			nrels++;
 	}
 	if (nrels == 0)
@@ -378,13 +379,12 @@ smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
 	*ptr = rptr;
 	for (pending = pendingDeletes; pending != NULL; pending = pending->next)
 	{
-		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit)
+		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit
+			&& pending->backend == InvalidBackendId)
 		{
 			*rptr = pending->relnode;
 			rptr++;
 		}
-		if (haveNonTemp && !pending->isTemp)
-			*haveNonTemp = true;
 	}
 	return nrels;
 }
@@ -456,7 +456,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		xl_smgr_create *xlrec = (xl_smgr_create *) XLogRecGetData(record);
 		SMgrRelation reln;
 
-		reln = smgropen(xlrec->rnode);
+		reln = smgropen(xlrec->rnode, InvalidBackendId);
 		smgrcreate(reln, MAIN_FORKNUM, true);
 	}
 	else if (info == XLOG_SMGR_TRUNCATE)
@@ -465,7 +465,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		SMgrRelation reln;
 		Relation	rel;
 
-		reln = smgropen(xlrec->rnode);
+		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
 		/*
 		 * Forcibly create relation if it doesn't exist (which suggests that
@@ -475,7 +475,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		 */
 		smgrcreate(reln, MAIN_FORKNUM, true);
 
-		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno, false);
+		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
 
 		/* Also tell xlogutils.c about it */
 		XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
@@ -502,7 +502,7 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)
 	if (info == XLOG_SMGR_CREATE)
 	{
 		xl_smgr_create *xlrec = (xl_smgr_create *) rec;
-		char	   *path = relpath(xlrec->rnode, MAIN_FORKNUM);
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
 
 		appendStringInfo(buf, "file create: %s", path);
 		pfree(path);
@@ -510,7 +510,7 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)
 	else if (info == XLOG_SMGR_TRUNCATE)
 	{
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) rec;
-		char	   *path = relpath(xlrec->rnode, MAIN_FORKNUM);
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
 
 		appendStringInfo(buf, "file truncate: %s to %u blocks", path,
 						 xlrec->blkno);
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 435dfdd..54cdc4a 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -195,7 +195,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
 	 * Toast tables for regular relations go in pg_toast; those for temp
 	 * relations go into the per-backend temp-toast-table namespace.
 	 */
-	if (rel->rd_islocaltemp)
+	if (rel->rd_backend == MyBackendId)
 		namespaceid = GetTempToastNamespace();
 	else
 		namespaceid = PG_TOAST_NAMESPACE;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9d46e47..3d61116 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1022,7 +1022,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString)
 		}
 
 		/* check read-only transaction */
-		if (XactReadOnly && is_from && !cstate->rel->rd_islocaltemp)
+		if (XactReadOnly && is_from && cstate->rel->rd_backend != MyBackendId)
 			PreventCommandIfReadOnly("COPY FROM");
 
 		/* Don't allow COPY w/ OIDs to or from a table without them */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index f52e1d8..f6867f4 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -470,7 +470,7 @@ nextval_internal(Oid relid)
 						RelationGetRelationName(seqrel))));
 
 	/* read-only transactions may only modify temp sequences */
-	if (!seqrel->rd_islocaltemp)
+	if (seqrel->rd_backend != MyBackendId)
 		PreventCommandIfReadOnly("nextval()");
 
 	if (elm->last != elm->cached)		/* some numbers were cached */
@@ -747,7 +747,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
 						RelationGetRelationName(seqrel))));
 
 	/* read-only transactions may only modify temp sequences */
-	if (!seqrel->rd_islocaltemp)
+	if (seqrel->rd_backend != MyBackendId)
 		PreventCommandIfReadOnly("setval()");
 
 	/* lock page' buffer and read tuple */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9b5ce65..2515e6c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6968,13 +6968,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace)
 	 * Relfilenodes are not unique across tablespaces, so we need to allocate
 	 * a new one in the new tablespace.
 	 */
-	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL);
+	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL, rel->rd_backend);
 
 	/* Open old and new relation */
 	newrnode = rel->rd_node;
 	newrnode.relNode = newrelfilenode;
 	newrnode.spcNode = newTableSpace;
-	dstrel = smgropen(newrnode);
+	dstrel = smgropen(newrnode, rel->rd_backend);
 
 	RelationOpenSmgr(rel);
 
@@ -7053,7 +7053,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
 
 		/* XLOG stuff */
 		if (use_wal)
-			log_newpage(&dst->smgr_rnode, forkNum, blkno, page);
+			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
 
 		/*
 		 * Now write the page.	We say isTemp = true even if it's not a temp
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 95e9d37..cf79de3 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -113,7 +113,7 @@
  */
 typedef struct
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 	ForkNumber	forknum;
 	BlockNumber segno;			/* see md.c for special values */
 	/* might add a real request-type field later; not needed yet */
@@ -1071,7 +1071,8 @@ RequestCheckpoint(int flags)
  * than we have to here.
  */
 bool
-ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+ForwardFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
+					BlockNumber segno)
 {
 	BgWriterRequest *request;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index caae936..4dd894b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -95,7 +95,8 @@ static void WaitIO(volatile BufferDesc *buf);
 static bool StartBufferIO(volatile BufferDesc *buf, bool forInput);
 static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty,
 				  int set_flag_bits);
-static void buffer_write_error_callback(void *arg);
+static void shared_buffer_write_error_callback(void *arg);
+static void local_buffer_write_error_callback(void *arg);
 static volatile BufferDesc *BufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
 			BlockNumber blockNum,
 			BufferAccessStrategy strategy,
@@ -141,7 +142,8 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 		int			buf_id;
 
 		/* create a tag so we can lookup the buffer */
-		INIT_BUFFERTAG(newTag, reln->rd_smgr->smgr_rnode, forkNum, blockNum);
+		INIT_BUFFERTAG(newTag, reln->rd_smgr->smgr_rnode.node,
+					   forkNum, blockNum);
 
 		/* determine its hash code and partition lock ID */
 		newHash = BufTableHashCode(&newTag);
@@ -251,18 +253,21 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
  * ReadBufferWithoutRelcache -- like ReadBufferExtended, but doesn't require
  *		a relcache entry for the relation.
  *
- * NB: caller is assumed to know what it's doing if isTemp is true.
+ * NB: At present, this function may not be used on temporary relations, which
+ * is OK, because we only use it during XLOG replay.  If in the future we
+ * want to use it on temporary relations, we could pass the backend ID as an
+ * additional parameter.
  */
 Buffer
-ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
-						  ForkNumber forkNum, BlockNumber blockNum,
-						  ReadBufferMode mode, BufferAccessStrategy strategy)
+ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
+						  BlockNumber blockNum, ReadBufferMode mode,
+						  BufferAccessStrategy strategy)
 {
 	bool		hit;
 
-	SMgrRelation smgr = smgropen(rnode);
+	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
 
-	return ReadBuffer_common(smgr, isTemp, forkNum, blockNum, mode, strategy,
+	return ReadBuffer_common(smgr, false, forkNum, blockNum, mode, strategy,
 							 &hit);
 }
 
@@ -414,7 +419,7 @@ ReadBuffer_common(SMgrRelation smgr, bool isLocalBuf, ForkNumber forkNum,
 	{
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
-		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, isLocalBuf);
+		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
 	}
 	else
 	{
@@ -465,10 +470,10 @@ ReadBuffer_common(SMgrRelation smgr, bool isLocalBuf, ForkNumber forkNum,
 		VacuumCostBalance += VacuumCostPageMiss;
 
 	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
-									  smgr->smgr_rnode.spcNode,
-									  smgr->smgr_rnode.dbNode,
-									  smgr->smgr_rnode.relNode,
-									  isLocalBuf,
+									  smgr->smgr_rnode.node.spcNode,
+									  smgr->smgr_rnode.node.dbNode,
+									  smgr->smgr_rnode.node.relNode,
+									  smgr->smgr_rnode.backend,
 									  isExtend,
 									  found);
 
@@ -512,7 +517,7 @@ BufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
 	bool		valid;
 
 	/* create a tag so we can lookup the buffer */
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* determine its hash code and partition lock ID */
 	newHash = BufTableHashCode(&newTag);
@@ -1693,21 +1698,24 @@ PrintBufferLeakWarning(Buffer buffer)
 	volatile BufferDesc *buf;
 	int32		loccount;
 	char	   *path;
+	BackendId	backend;
 
 	Assert(BufferIsValid(buffer));
 	if (BufferIsLocal(buffer))
 	{
 		buf = &LocalBufferDescriptors[-buffer - 1];
 		loccount = LocalRefCount[-buffer - 1];
+		backend = MyBackendId;
 	}
 	else
 	{
 		buf = &BufferDescriptors[buffer - 1];
 		loccount = PrivateRefCount[buffer - 1];
+		backend = InvalidBackendId;
 	}
 
 	/* theoretically we should lock the bufhdr here */
-	path = relpath(buf->tag.rnode, buf->tag.forkNum);
+	path = relpathbackend(buf->tag.rnode, backend, buf->tag.forkNum);
 	elog(WARNING,
 		 "buffer refcount leak: [%03d] "
 		 "(rel=%s, blockNum=%u, flags=0x%x, refcount=%u %d)",
@@ -1831,14 +1839,14 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
 		return;
 
 	/* Setup error traceback support for ereport() */
-	errcontext.callback = buffer_write_error_callback;
+	errcontext.callback = shared_buffer_write_error_callback;
 	errcontext.arg = (void *) buf;
 	errcontext.previous = error_context_stack;
 	error_context_stack = &errcontext;
 
 	/* Find smgr relation for buffer */
 	if (reln == NULL)
-		reln = smgropen(buf->tag.rnode);
+		reln = smgropen(buf->tag.rnode, InvalidBackendId);
 
 	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
 										buf->tag.blockNum,
@@ -1929,14 +1937,15 @@ RelationGetNumberOfBlocks(Relation relation)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum, bool istemp,
+DropRelFileNodeBuffers(BackendRelFileNode rnode, ForkNumber forkNum,
 					   BlockNumber firstDelBlock)
 {
 	int			i;
 
-	if (istemp)
+	if (rnode.backend != InvalidBackendId)
 	{
-		DropRelFileNodeLocalBuffers(rnode, forkNum, firstDelBlock);
+		if (rnode.backend == MyBackendId)
+			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
 		return;
 	}
 
@@ -1945,7 +1954,7 @@ DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum, bool istemp,
 		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
 
 		LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) &&
+		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
 			bufHdr->tag.forkNum == forkNum &&
 			bufHdr->tag.blockNum >= firstDelBlock)
 			InvalidateBuffer(bufHdr);	/* releases spinlock */
@@ -2008,7 +2017,7 @@ PrintBufferDescs(void)
 			 "[%02d] (freeNext=%d, rel=%s, "
 			 "blockNum=%u, flags=0x%x, refcount=%u %d)",
 			 i, buf->freeNext,
-			 relpath(buf->tag.rnode, buf->tag.forkNum),
+			 relpathbackend(buf->tag.rnode, InvalidBackendId, buf->tag.forkNum),
 			 buf->tag.blockNum, buf->flags,
 			 buf->refcount, PrivateRefCount[i]);
 	}
@@ -2078,7 +2087,7 @@ FlushRelationBuffers(Relation rel)
 				ErrorContextCallback errcontext;
 
 				/* Setup error traceback support for ereport() */
-				errcontext.callback = buffer_write_error_callback;
+				errcontext.callback = local_buffer_write_error_callback;
 				errcontext.arg = (void *) bufHdr;
 				errcontext.previous = error_context_stack;
 				error_context_stack = &errcontext;
@@ -2087,7 +2096,7 @@ FlushRelationBuffers(Relation rel)
 						  bufHdr->tag.forkNum,
 						  bufHdr->tag.blockNum,
 						  (char *) LocalBufHdrGetBlock(bufHdr),
-						  true);
+						  false);
 
 				bufHdr->flags &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 
@@ -2699,8 +2708,9 @@ AbortBufferIO(void)
 			if (sv_flags & BM_IO_ERROR)
 			{
 				/* Buffer is pinned, so we can read tag without spinlock */
-				char	   *path = relpath(buf->tag.rnode, buf->tag.forkNum);
+				char	   *path;
 
+				path = relpathperm(buf->tag.rnode, buf->tag.forkNum);
 				ereport(WARNING,
 						(errcode(ERRCODE_IO_ERROR),
 						 errmsg("could not write block %u of %s",
@@ -2714,17 +2724,36 @@ AbortBufferIO(void)
 }
 
 /*
- * Error context callback for errors occurring during buffer writes.
+ * Error context callback for errors occurring during shared buffer writes.
  */
 static void
-buffer_write_error_callback(void *arg)
+shared_buffer_write_error_callback(void *arg)
 {
 	volatile BufferDesc *bufHdr = (volatile BufferDesc *) arg;
 
 	/* Buffer is pinned, so we can read the tag without locking the spinlock */
 	if (bufHdr != NULL)
 	{
-		char	   *path = relpath(bufHdr->tag.rnode, bufHdr->tag.forkNum);
+		char	   *path = relpathperm(bufHdr->tag.rnode, bufHdr->tag.forkNum);
+
+		errcontext("writing block %u of relation %s",
+				   bufHdr->tag.blockNum, path);
+		pfree(path);
+	}
+}
+
+/*
+ * Error context callback for errors occurring during buffer writes.
+ */
+static void
+local_buffer_write_error_callback(void *arg)
+{
+	volatile BufferDesc *bufHdr = (volatile BufferDesc *) arg;
+
+	if (bufHdr != NULL)
+	{
+		char	   *path = relpathbackend(bufHdr->tag.rnode, MyBackendId,
+										 bufHdr->tag.forkNum);
 
 		errcontext("writing block %u of relation %s",
 				   bufHdr->tag.blockNum, path);
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index c5b6a2c..bbf0a01 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -68,7 +68,7 @@ LocalPrefetchBuffer(SMgrRelation smgr, ForkNumber forkNum,
 	BufferTag	newTag;			/* identity of requested block */
 	LocalBufferLookupEnt *hresult;
 
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* Initialize local buffers if first request in this session */
 	if (LocalBufHash == NULL)
@@ -110,7 +110,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 	int			trycounter;
 	bool		found;
 
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* Initialize local buffers if first request in this session */
 	if (LocalBufHash == NULL)
@@ -127,7 +127,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		Assert(BUFFERTAGS_EQUAL(bufHdr->tag, newTag));
 #ifdef LBDEBUG
 		fprintf(stderr, "LB ALLOC (%u,%d,%d) %d\n",
-				smgr->smgr_rnode.relNode, forkNum, blockNum, -b - 1);
+				smgr->smgr_rnode.node.relNode, forkNum, blockNum, -b - 1);
 #endif
 		/* this part is equivalent to PinBuffer for a shared buffer */
 		if (LocalRefCount[b] == 0)
@@ -150,7 +150,8 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 
 #ifdef LBDEBUG
 	fprintf(stderr, "LB ALLOC (%u,%d,%d) %d\n",
-		 smgr->smgr_rnode.relNode, forkNum, blockNum, -nextFreeLocalBuf - 1);
+		 smgr->smgr_rnode.node.relNode, forkNum, blockNum,
+		 -nextFreeLocalBuf - 1);
 #endif
 
 	/*
@@ -198,14 +199,14 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		SMgrRelation oreln;
 
 		/* Find smgr relation for buffer */
-		oreln = smgropen(bufHdr->tag.rnode);
+		oreln = smgropen(bufHdr->tag.rnode, MyBackendId);
 
 		/* And write... */
 		smgrwrite(oreln,
 				  bufHdr->tag.forkNum,
 				  bufHdr->tag.blockNum,
 				  (char *) LocalBufHdrGetBlock(bufHdr),
-				  true);
+				  false);
 
 		/* Mark not-dirty now in case we error out below */
 		bufHdr->flags &= ~BM_DIRTY;
@@ -309,7 +310,8 @@ DropRelFileNodeLocalBuffers(RelFileNode rnode, ForkNumber forkNum,
 			if (LocalRefCount[i] != 0)
 				elog(ERROR, "block %u of %s is still referenced (local %u)",
 					 bufHdr->tag.blockNum,
-					 relpath(bufHdr->tag.rnode, bufHdr->tag.forkNum),
+					 relpathbackend(bufHdr->tag.rnode, MyBackendId,
+								   bufHdr->tag.forkNum),
 					 LocalRefCount[i]);
 			/* Remove entry from hashtable */
 			hresult = (LocalBufferLookupEnt *)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 244e35e..64928e9 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -249,6 +249,9 @@ static File OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError);
 static void AtProcExit_Files(int code, Datum arg);
 static void CleanupTempFiles(bool isProcExit);
 static void RemovePgTempFilesInDir(const char *tmpdirname);
+static void RemovePgTempRelationFiles(const char *tsdirname);
+static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname);
+static bool looks_like_temp_rel_name(const char *name);
 
 
 /*
@@ -1824,10 +1827,12 @@ CleanupTempFiles(bool isProcExit)
 
 
 /*
- * Remove temporary files left over from a prior postmaster session
+ * Remove temporary and temporary relation files left over from a prior
+ * postmaster session
  *
  * This should be called during postmaster startup.  It will forcibly
- * remove any leftover files created by OpenTemporaryFile.
+ * remove any leftover files created by OpenTemporaryFile and any leftover
+ * temporary relation files created by mdcreate.
  *
  * NOTE: we could, but don't, call this during a post-backend-crash restart
  * cycle.  The argument for not doing it is that someone might want to examine
@@ -1847,6 +1852,7 @@ RemovePgTempFiles(void)
 	 */
 	snprintf(temp_path, sizeof(temp_path), "base/%s", PG_TEMP_FILES_DIR);
 	RemovePgTempFilesInDir(temp_path);
+	RemovePgTempRelationFiles("base");
 
 	/*
 	 * Cycle through temp directories for all non-default tablespaces.
@@ -1862,6 +1868,10 @@ RemovePgTempFiles(void)
 		snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s/%s",
 			spc_de->d_name, TABLESPACE_VERSION_DIRECTORY, PG_TEMP_FILES_DIR);
 		RemovePgTempFilesInDir(temp_path);
+
+		snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
+			spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
+		RemovePgTempRelationFiles(temp_path);
 	}
 
 	FreeDir(spc_dir);
@@ -1915,3 +1925,123 @@ RemovePgTempFilesInDir(const char *tmpdirname)
 
 	FreeDir(temp_dir);
 }
+
+/* Process one tablespace directory, look for per-DB subdirectories */
+static void
+RemovePgTempRelationFiles(const char *tsdirname)
+{
+	DIR		   *ts_dir;
+	struct dirent *de;
+	char		dbspace_path[MAXPGPATH];
+
+	ts_dir = AllocateDir(tsdirname);
+	if (ts_dir == NULL)
+	{
+		/* anything except ENOENT is fishy */
+		if (errno != ENOENT)
+			elog(LOG,
+				 "could not open tablespace directory \"%s\": %m",
+				 tsdirname);
+		return;
+	}
+
+	while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
+	{
+		int		i = 0;
+
+		/*
+		 * We're only interested in the per-database directories, which have
+		 * numeric names.  Note that this code will also (properly) ignore "."
+		 * and "..".
+		 */
+		while (isdigit((unsigned char) de->d_name[i]))
+			++i;
+		if (de->d_name[i] != '\0' || i == 0)
+			continue;
+
+		snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
+				 tsdirname, de->d_name);
+		RemovePgTempRelationFilesInDbspace(dbspace_path);
+	}
+
+	FreeDir(ts_dir);
+}
+
+/* Process one per-dbspace directory for RemovePgTempRelationFiles */
+static void
+RemovePgTempRelationFilesInDbspace(const char *dbspacedirname)
+{
+	DIR		   *dbspace_dir;
+	struct dirent *de;
+	char		rm_path[MAXPGPATH];
+
+	dbspace_dir = AllocateDir(dbspacedirname);
+	if (dbspace_dir == NULL)
+	{
+		/* we just saw this directory, so it really ought to be there */
+		elog(LOG,
+			 "could not open dbspace directory \"%s\": %m",
+			 dbspacedirname);
+		return;
+	}
+
+	while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+	{
+		if (!looks_like_temp_rel_name(de->d_name))
+			continue;
+
+		snprintf(rm_path, sizeof(rm_path), "%s/%s",
+				 dbspacedirname, de->d_name);
+
+		unlink(rm_path);	/* note we ignore any error */
+	}
+
+	FreeDir(dbspace_dir);
+}
+
+/* t<digits>_<digits>, or t<digits>_<digits>_<forkname> */
+static bool
+looks_like_temp_rel_name(const char *name)
+{
+	int			pos;
+	int			savepos;
+
+	/* Must start with "t". */
+	if (name[0] != 't')
+		return false;
+
+	/* Followed by a non-empty string of digits and then an underscore. */
+	for (pos = 1; isdigit((unsigned char) name[pos]); ++pos)
+		;
+	if (pos == 1 || name[pos] != '_')
+		return false;
+
+	/* Followed by another nonempty string of digits. */
+	for (savepos = ++pos; isdigit((unsigned char) name[pos]); ++pos)
+		;
+	if (savepos == pos)
+		return false;
+
+	/* We might have _forkname or .segment or both. */
+	if (name[pos] == '_')
+	{
+		int		forkchar = forkname_chars(&name[pos+1]);
+		if (forkchar <= 0)
+			return false;
+		pos += forkchar + 1;
+	}
+	if (name[pos] == '.')
+	{
+		int		segchar;
+		for (segchar = 1; isdigit((unsigned char) name[pos+segchar]); ++segchar)
+			;
+		if (segchar <= 1)
+			return false;
+		pos += segchar;
+	}
+
+	/* Now we should be at the end. */
+	if (name[pos] != '\0')
+		return false;
+	return true;
+}
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 579572f..b43a2ce 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -303,7 +303,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
 
 	/*
 	 * We might as well update the local smgr_fsm_nblocks setting.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 4163ca0..79805b4 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -119,7 +119,7 @@ static MemoryContext MdCxt;		/* context for all md.c allocations */
  */
 typedef struct
 {
-	RelFileNode rnode;			/* the targeted relation */
+	BackendRelFileNode rnode;	/* the targeted relation */
 	ForkNumber	forknum;
 	BlockNumber segno;			/* which segment */
 } PendingOperationTag;
@@ -135,7 +135,7 @@ typedef struct
 
 typedef struct
 {
-	RelFileNode rnode;			/* the dead relation to delete */
+	BackendRelFileNode rnode;	/* the dead relation to delete */
 	CycleCtr	cycle_ctr;		/* mdckpt_cycle_ctr when request was made */
 } PendingUnlinkEntry;
 
@@ -158,14 +158,14 @@ static MdfdVec *mdopen(SMgrRelation reln, ForkNumber forknum,
 	   ExtensionBehavior behavior);
 static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
 					   MdfdVec *seg);
-static void register_unlink(RelFileNode rnode);
+static void register_unlink(BackendRelFileNode rnode);
 static MdfdVec *_fdvec_alloc(void);
 static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
 			  BlockNumber segno);
 static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forkno,
 			  BlockNumber segno, int oflags);
 static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forkno,
-			 BlockNumber blkno, bool isTemp, ExtensionBehavior behavior);
+			 BlockNumber blkno, bool skipFsync, ExtensionBehavior behavior);
 static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
 		   MdfdVec *seg);
 
@@ -321,7 +321,7 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
  * we are usually not in a transaction anymore when this is called.
  */
 void
-mdunlink(RelFileNode rnode, ForkNumber forkNum, bool isRedo)
+mdunlink(BackendRelFileNode rnode, ForkNumber forkNum, bool isRedo)
 {
 	char	   *path;
 	int			ret;
@@ -417,7 +417,7 @@ mdunlink(RelFileNode rnode, ForkNumber forkNum, bool isRedo)
  */
 void
 mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		 char *buffer, bool isTemp)
+		 char *buffer, bool skipFsync)
 {
 	off_t		seekpos;
 	int			nbytes;
@@ -440,7 +440,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 						relpath(reln->smgr_rnode, forknum),
 						InvalidBlockNumber)));
 
-	v = _mdfd_getseg(reln, forknum, blocknum, isTemp, EXTENSION_CREATE);
+	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
 
 	seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -478,7 +478,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 				 errhint("Check free disk space.")));
 	}
 
-	if (!isTemp)
+	if (!skipFsync && !SmgrIsTemp(reln))
 		register_dirty_segment(reln, forknum, v);
 
 	Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
@@ -605,9 +605,10 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	MdfdVec    *v;
 
 	TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,
-										reln->smgr_rnode.spcNode,
-										reln->smgr_rnode.dbNode,
-										reln->smgr_rnode.relNode);
+										reln->smgr_rnode.node.spcNode,
+										reln->smgr_rnode.node.dbNode,
+										reln->smgr_rnode.node.relNode,
+										reln->smgr_rnode.backend);
 
 	v = _mdfd_getseg(reln, forknum, blocknum, false, EXTENSION_FAIL);
 
@@ -624,9 +625,10 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ);
 
 	TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,
-									   reln->smgr_rnode.spcNode,
-									   reln->smgr_rnode.dbNode,
-									   reln->smgr_rnode.relNode,
+									   reln->smgr_rnode.node.spcNode,
+									   reln->smgr_rnode.node.dbNode,
+									   reln->smgr_rnode.node.relNode,
+									   reln->smgr_rnode.backend,
 									   nbytes,
 									   BLCKSZ);
 
@@ -666,7 +668,7 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  */
 void
 mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		char *buffer, bool isTemp)
+		char *buffer, bool skipFsync)
 {
 	off_t		seekpos;
 	int			nbytes;
@@ -678,11 +680,12 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 #endif
 
 	TRACE_POSTGRESQL_SMGR_MD_WRITE_START(forknum, blocknum,
-										 reln->smgr_rnode.spcNode,
-										 reln->smgr_rnode.dbNode,
-										 reln->smgr_rnode.relNode);
+										 reln->smgr_rnode.node.spcNode,
+										 reln->smgr_rnode.node.dbNode,
+										 reln->smgr_rnode.node.relNode,
+										 reln->smgr_rnode.backend);
 
-	v = _mdfd_getseg(reln, forknum, blocknum, isTemp, EXTENSION_FAIL);
+	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_FAIL);
 
 	seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -697,9 +700,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ);
 
 	TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum,
-										reln->smgr_rnode.spcNode,
-										reln->smgr_rnode.dbNode,
-										reln->smgr_rnode.relNode,
+										reln->smgr_rnode.node.spcNode,
+										reln->smgr_rnode.node.dbNode,
+										reln->smgr_rnode.node.relNode,
+										reln->smgr_rnode.backend,
 										nbytes,
 										BLCKSZ);
 
@@ -720,7 +724,7 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 				 errhint("Check free disk space.")));
 	}
 
-	if (!isTemp)
+	if (!skipFsync && !SmgrIsTemp(reln))
 		register_dirty_segment(reln, forknum, v);
 }
 
@@ -794,8 +798,7 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
  *	mdtruncate() -- Truncate relation to specified number of blocks.
  */
 void
-mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-		   bool isTemp)
+mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	MdfdVec    *v;
 	BlockNumber curnblk;
@@ -839,7 +842,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 						 errmsg("could not truncate file \"%s\": %m",
 								FilePathName(v->mdfd_vfd))));
 
-			if (!isTemp)
+			if (!SmgrIsTemp(reln))
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			Assert(ov != reln->md_fd[forknum]); /* we never drop the 1st
@@ -864,7 +867,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 					errmsg("could not truncate file \"%s\" to %u blocks: %m",
 						   FilePathName(v->mdfd_vfd),
 						   nblocks)));
-			if (!isTemp)
+			if (!SmgrIsTemp(reln))
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			ov->mdfd_chain = NULL;
@@ -1052,7 +1055,8 @@ mdsync(void)
 				 * the relation will have been dirtied through this same smgr
 				 * relation, and so we can save a file open/close cycle.
 				 */
-				reln = smgropen(entry->tag.rnode);
+				reln = smgropen(entry->tag.rnode.node,
+								entry->tag.rnode.backend);
 
 				/*
 				 * It is possible that the relation has been dropped or
@@ -1235,7 +1239,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
  * a remote pending-ops table.
  */
 static void
-register_unlink(RelFileNode rnode)
+register_unlink(BackendRelFileNode rnode)
 {
 	if (pendingOpsTable)
 	{
@@ -1278,7 +1282,8 @@ register_unlink(RelFileNode rnode)
  * structure for them.)
  */
 void
-RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+RememberFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
+					 BlockNumber segno)
 {
 	Assert(pendingOpsTable);
 
@@ -1291,7 +1296,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 		hash_seq_init(&hstat, pendingOpsTable);
 		while ((entry = (PendingOperationEntry *) hash_seq_search(&hstat)) != NULL)
 		{
-			if (RelFileNodeEquals(entry->tag.rnode, rnode) &&
+			if (BackendRelFileNodeEquals(entry->tag.rnode, rnode) &&
 				entry->tag.forknum == forknum)
 			{
 				/* Okay, cancel this entry */
@@ -1312,7 +1317,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 		hash_seq_init(&hstat, pendingOpsTable);
 		while ((entry = (PendingOperationEntry *) hash_seq_search(&hstat)) != NULL)
 		{
-			if (entry->tag.rnode.dbNode == rnode.dbNode)
+			if (entry->tag.rnode.node.dbNode == rnode.node.dbNode)
 			{
 				/* Okay, cancel this entry */
 				entry->canceled = true;
@@ -1326,7 +1331,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 			PendingUnlinkEntry *entry = (PendingUnlinkEntry *) lfirst(cell);
 
 			next = lnext(cell);
-			if (entry->rnode.dbNode == rnode.dbNode)
+			if (entry->rnode.node.dbNode == rnode.node.dbNode)
 			{
 				pendingUnlinks = list_delete_cell(pendingUnlinks, cell, prev);
 				pfree(entry);
@@ -1393,7 +1398,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
  * ForgetRelationFsyncRequests -- forget any fsyncs for a rel
  */
 void
-ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum)
+ForgetRelationFsyncRequests(BackendRelFileNode rnode, ForkNumber forknum)
 {
 	if (pendingOpsTable)
 	{
@@ -1428,11 +1433,12 @@ ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum)
 void
 ForgetDatabaseFsyncRequests(Oid dbid)
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 
-	rnode.dbNode = dbid;
-	rnode.spcNode = 0;
-	rnode.relNode = 0;
+	rnode.node.dbNode = dbid;
+	rnode.node.spcNode = 0;
+	rnode.node.relNode = 0;
+	rnode.backend = InvalidBackendId;
 
 	if (pendingOpsTable)
 	{
@@ -1523,12 +1529,12 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
  *		specified block.
  *
  * If the segment doesn't exist, we ereport, return NULL, or create the
- * segment, according to "behavior".  Note: isTemp need only be correct
- * in the EXTENSION_CREATE case.
+ * segment, according to "behavior".  Note: skipFsync is only used in the
+ * EXTENSION_CREATE case.
  */
 static MdfdVec *
 _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
-			 bool isTemp, ExtensionBehavior behavior)
+			 bool skipFsync, ExtensionBehavior behavior)
 {
 	MdfdVec    *v = mdopen(reln, forknum, behavior);
 	BlockNumber targetseg;
@@ -1566,7 +1572,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
 
 					mdextend(reln, forknum,
 							 nextsegno * ((BlockNumber) RELSEG_SIZE) - 1,
-							 zerobuf, isTemp);
+							 zerobuf, skipFsync);
 					pfree(zerobuf);
 				}
 				v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT);
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 3b12cb3..ecf238d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -45,19 +45,19 @@ typedef struct f_smgr
 	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
 											bool isRedo);
 	bool		(*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_unlink) (RelFileNode rnode, ForkNumber forknum,
+	void		(*smgr_unlink) (BackendRelFileNode rnode, ForkNumber forknum,
 											bool isRedo);
 	void		(*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
-							BlockNumber blocknum, char *buffer, bool isTemp);
+							BlockNumber blocknum, char *buffer, bool skipFsync);
 	void		(*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
 											  BlockNumber blocknum);
 	void		(*smgr_read) (SMgrRelation reln, ForkNumber forknum,
 										  BlockNumber blocknum, char *buffer);
 	void		(*smgr_write) (SMgrRelation reln, ForkNumber forknum,
-							BlockNumber blocknum, char *buffer, bool isTemp);
+							BlockNumber blocknum, char *buffer, bool skipFsync);
 	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
-										   BlockNumber nblocks, bool isTemp);
+										   BlockNumber nblocks);
 	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_pre_ckpt) (void);		/* may be NULL */
 	void		(*smgr_sync) (void);	/* may be NULL */
@@ -83,8 +83,6 @@ static HTAB *SMgrRelationHash = NULL;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
-static void smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
-					 int which, bool isTemp, bool isRedo);
 
 
 /*
@@ -131,8 +129,9 @@ smgrshutdown(int code, Datum arg)
  *		This does not attempt to actually open the object.
  */
 SMgrRelation
-smgropen(RelFileNode rnode)
+smgropen(RelFileNode rnode, BackendId backend)
 {
+	BackendRelFileNode brnode;
 	SMgrRelation reln;
 	bool		found;
 
@@ -142,7 +141,7 @@ smgropen(RelFileNode rnode)
 		HASHCTL		ctl;
 
 		MemSet(&ctl, 0, sizeof(ctl));
-		ctl.keysize = sizeof(RelFileNode);
+		ctl.keysize = sizeof(BackendRelFileNode);
 		ctl.entrysize = sizeof(SMgrRelationData);
 		ctl.hash = tag_hash;
 		SMgrRelationHash = hash_create("smgr relation table", 400,
@@ -150,8 +149,10 @@ smgropen(RelFileNode rnode)
 	}
 
 	/* Look up or create an entry */
+	brnode.node = rnode;
+	brnode.backend = backend;
 	reln = (SMgrRelation) hash_search(SMgrRelationHash,
-									  (void *) &rnode,
+									  (void *) &brnode,
 									  HASH_ENTER, &found);
 
 	/* Initialize it if not present before */
@@ -261,7 +262,7 @@ smgrcloseall(void)
  * such entry exists already.
  */
 void
-smgrclosenode(RelFileNode rnode)
+smgrclosenode(BackendRelFileNode rnode)
 {
 	SMgrRelation reln;
 
@@ -305,8 +306,8 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * should be here and not in commands/tablespace.c?  But that would imply
 	 * importing a lot of stuff that smgr.c oughtn't know, either.
 	 */
-	TablespaceCreateDbspace(reln->smgr_rnode.spcNode,
-							reln->smgr_rnode.dbNode,
+	TablespaceCreateDbspace(reln->smgr_rnode.node.spcNode,
+							reln->smgr_rnode.node.dbNode,
 							isRedo);
 
 	(*(smgrsw[reln->smgr_which].smgr_create)) (reln, forknum, isRedo);
@@ -323,29 +324,19 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  *		already.
  */
 void
-smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isTemp, bool isRedo)
+smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 {
-	RelFileNode rnode = reln->smgr_rnode;
+	BackendRelFileNode rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
 
 	/* Close the fork */
 	(*(smgrsw[which].smgr_close)) (reln, forknum);
 
-	smgr_internal_unlink(rnode, forknum, which, isTemp, isRedo);
-}
-
-/*
- * Shared subroutine that actually does the unlink ...
- */
-static void
-smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
-					 int which, bool isTemp, bool isRedo)
-{
 	/*
 	 * Get rid of any remaining buffers for the relation.  bufmgr will just
 	 * drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, isTemp, 0);
+	DropRelFileNodeBuffers(rnode, forknum, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -385,10 +376,10 @@ smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
  */
 void
 smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		   char *buffer, bool isTemp)
+		   char *buffer, bool skipFsync)
 {
 	(*(smgrsw[reln->smgr_which].smgr_extend)) (reln, forknum, blocknum,
-											   buffer, isTemp);
+											   buffer, skipFsync);
 }
 
 /*
@@ -426,16 +417,16 @@ smgrread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  *		on disk at return, only dumped out to the kernel.  However,
  *		provisions will be made to fsync the write before the next checkpoint.
  *
- *		isTemp indicates that the relation is a temp table (ie, is managed
- *		by the local-buffer manager).  In this case no provisions need be
- *		made to fsync the write before checkpointing.
+ *		skipFsync indicates that the caller will make other provisions to
+ *		fsync the relation, so we needn't bother.  Temporary relations also
+ *		do not require fsync.
  */
 void
 smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		  char *buffer, bool isTemp)
+		  char *buffer, bool skipFsync)
 {
 	(*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum,
-											  buffer, isTemp);
+											  buffer, skipFsync);
 }
 
 /*
@@ -455,14 +446,13 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-			 bool isTemp)
+smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, isTemp, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -479,8 +469,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 	/*
 	 * Do the truncation.
 	 */
-	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks,
-												 isTemp);
+	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks);
 }
 
 /*
@@ -499,7 +488,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
  *		to use the WAL log for PITR or replication purposes: in that case
  *		we have to make WAL entries as well.)
  *
- *		The preceding writes should specify isTemp = true to avoid
+ *		The preceding writes should specify skipFsync = true to avoid
  *		duplicative fsyncs.
  *
  *		Note that you need to do FlushRelationBuffers() first if there is
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index a4e0252..dac2d72 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -256,14 +256,14 @@ pg_tablespace_size_name(PG_FUNCTION_ARGS)
  * calculate size of (one fork of) a relation
  */
 static int64
-calculate_relation_size(RelFileNode *rfn, ForkNumber forknum)
+calculate_relation_size(RelFileNode *rfn, BackendId backend, ForkNumber forknum)
 {
 	int64		totalsize = 0;
 	char	   *relationpath;
 	char		pathname[MAXPGPATH];
 	unsigned int segcount = 0;
 
-	relationpath = relpath(*rfn, forknum);
+	relationpath = relpathbackend(*rfn, backend, forknum);
 
 	for (segcount = 0;; segcount++)
 	{
@@ -303,7 +303,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
 
 	rel = relation_open(relOid, AccessShareLock);
 
-	size = calculate_relation_size(&(rel->rd_node),
+	size = calculate_relation_size(&(rel->rd_node), rel->rd_backend,
 							  forkname_to_number(text_to_cstring(forkName)));
 
 	relation_close(rel, AccessShareLock);
@@ -327,12 +327,14 @@ calculate_toast_table_size(Oid toastrelid)
 
 	/* toast heap size, including FSM and VM size */
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(toastRel->rd_node), forkNum);
+		size += calculate_relation_size(&(toastRel->rd_node),
+										toastRel->rd_backend, forkNum);
 
 	/* toast index size, including FSM and VM size */
 	toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(toastIdxRel->rd_node), forkNum);
+		size += calculate_relation_size(&(toastIdxRel->rd_node),
+										toastIdxRel->rd_backend, forkNum);
 
 	relation_close(toastIdxRel, AccessShareLock);
 	relation_close(toastRel, AccessShareLock);
@@ -361,7 +363,8 @@ calculate_table_size(Oid relOid)
 	 * heap size, including FSM and VM
 	 */
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(rel->rd_node), forkNum);
+		size += calculate_relation_size(&(rel->rd_node), rel->rd_backend,
+										forkNum);
 
 	/*
 	 * Size of toast relation
@@ -404,7 +407,9 @@ calculate_indexes_size(Oid relOid)
 			idxRel = relation_open(idxOid, AccessShareLock);
 
 			for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-				size += calculate_relation_size(&(idxRel->rd_node), forkNum);
+				size += calculate_relation_size(&(idxRel->rd_node),
+												idxRel->rd_backend,
+												forkNum);
 
 			relation_close(idxRel, AccessShareLock);
 		}
@@ -575,6 +580,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 	HeapTuple	tuple;
 	Form_pg_class relform;
 	RelFileNode rnode;
+	BackendId	backend;
 	char	   *path;
 
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
@@ -612,12 +618,27 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 			break;
 	}
 
-	ReleaseSysCache(tuple);
-
 	if (!OidIsValid(rnode.relNode))
+	{
+		ReleaseSysCache(tuple);
 		PG_RETURN_NULL();
+	}
+
+	/* If temporary, determine owning backend. */
+	if (!relform->relistemp)
+		backend = InvalidBackendId;
+	else if (isTempOrToastNamespace(relform->relnamespace))
+		backend = MyBackendId;
+	else
+	{
+		/* Do it the hard way. */
+		backend = GetTempNamespaceBackendId(relform->relnamespace);
+		Assert(backend != InvalidOid);
+	}
+
+	ReleaseSysCache(tuple);
 
-	path = relpath(rnode, MAIN_FORKNUM);
+	path = relpathbackend(rnode, backend, MAIN_FORKNUM);
 
 	PG_RETURN_TEXT_P(cstring_to_text(path));
 }
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 3b15d85..91de76d 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1165,7 +1165,7 @@ CacheInvalidateRelcacheByRelid(Oid relid)
  * replaying WAL as well as when creating it.
  */
 void
-CacheInvalidateSmgr(RelFileNode rnode)
+CacheInvalidateSmgr(BackendRelFileNode rnode)
 {
 	SharedInvalidationMessage msg;
 
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index d462510..91c653f 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -858,10 +858,20 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 	relation->rd_istemp = relation->rd_rel->relistemp;
-	if (relation->rd_istemp)
-		relation->rd_islocaltemp = isTempOrToastNamespace(relation->rd_rel->relnamespace);
+	if (!relation->rd_istemp)
+		relation->rd_backend = InvalidBackendId;
+	else if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
+		relation->rd_backend = MyBackendId;
 	else
-		relation->rd_islocaltemp = false;
+	{
+		/*
+		 * If it's a temporary table, but not one of ours, we have to use
+		 * the slow, grotty method to figure out the owning backend.
+		 */
+		relation->rd_backend =
+			GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
+		Assert(relation->rd_backend != InvalidOid);
+	}
 
 	/*
 	 * initialize the tuple descriptor (relation->rd_att).
@@ -1424,7 +1434,7 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 	relation->rd_istemp = false;
-	relation->rd_islocaltemp = false;
+	relation->rd_backend = InvalidBackendId;
 
 	/*
 	 * initialize relation tuple form
@@ -2515,7 +2525,7 @@ RelationBuildLocalRelation(const char *relname,
 
 	/* it is temporary if and only if it is in my temp-table namespace */
 	rel->rd_istemp = isTempOrToastNamespace(relnamespace);
-	rel->rd_islocaltemp = rel->rd_istemp;
+	rel->rd_backend = rel->rd_istemp ? MyBackendId : InvalidBackendId;
 
 	/*
 	 * create a new tuple descriptor from the one passed in.  We do this
@@ -2629,7 +2639,7 @@ void
 RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 {
 	Oid			newrelfilenode;
-	RelFileNode newrnode;
+	BackendRelFileNode newrnode;
 	Relation	pg_class;
 	HeapTuple	tuple;
 	Form_pg_class classform;
@@ -2640,7 +2650,8 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 		   TransactionIdIsNormal(freezeXid));
 
 	/* Allocate a new relfilenode */
-	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL);
+	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
+									   relation->rd_backend);
 
 	/*
 	 * Get a writable copy of the pg_class tuple for the given relation.
@@ -2660,9 +2671,10 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 	 * NOTE: any conflict in relfilenode value will be caught here, if
 	 * GetNewRelFileNode messes up for any reason.
 	 */
-	newrnode = relation->rd_node;
-	newrnode.relNode = newrelfilenode;
-	RelationCreateStorage(newrnode, relation->rd_istemp);
+	newrnode.node = relation->rd_node;
+	newrnode.node.relNode = newrelfilenode;
+	newrnode.backend = relation->rd_backend;
+	RelationCreateStorage(newrnode.node, relation->rd_istemp);
 	smgrclosenode(newrnode);
 
 	/*
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 8ccb948..b06f8ff 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -55,7 +55,7 @@ provider postgresql {
 	probe sort__done(bool, long);
 
 	probe buffer__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool);
-	probe buffer__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool, bool);
+	probe buffer__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool, bool);
 	probe buffer__flush__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
 	probe buffer__flush__done(ForkNumber, BlockNumber, Oid, Oid, Oid);
 
@@ -81,10 +81,10 @@ provider postgresql {
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
 
-	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
-	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int);
-	probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
-	probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int);
+	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
+	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
+	probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
+	probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
 
 	probe xlog__insert(unsigned char, unsigned char);
 	probe xlog__switch();
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index bd430cb..765f571 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -25,10 +25,17 @@
 
 extern const char *forkNames[];
 extern ForkNumber forkname_to_number(char *forkName);
+extern int forkname_chars(const char *str);
 
-extern char *relpath(RelFileNode rnode, ForkNumber forknum);
+extern char *relpathbackend(RelFileNode rnode, BackendId backend,
+			  ForkNumber forknum);
 extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
 
+#define relpath(rnode, forknum) \
+		relpathbackend((rnode).node, (rnode).backend, (forknum))
+#define relpathperm(rnode, forknum) \
+		relpathbackend((rnode), InvalidBackendId, (forknum))
+
 extern bool IsSystemRelation(Relation relation);
 extern bool IsToastRelation(Relation relation);
 
@@ -45,6 +52,7 @@ extern bool IsSharedRelation(Oid relationId);
 extern Oid	GetNewOid(Relation relation);
 extern Oid GetNewOidWithIndex(Relation relation, Oid indexId,
 				   AttrNumber oidcolumn);
-extern Oid	GetNewRelFileNode(Oid reltablespace, Relation pg_class);
+extern Oid	GetNewRelFileNode(Oid reltablespace, Relation pg_class,
+				  BackendId backend);
 
 #endif   /* CATALOG_H */
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index 609f1c6..7712d1f 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -30,8 +30,7 @@ extern void RelationTruncate(Relation rel, BlockNumber nblocks);
  * naming
  */
 extern void smgrDoPendingDeletes(bool isCommit);
-extern int smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr,
-					  bool *haveNonTemp);
+extern int smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr);
 extern void AtSubCommit_smgr(void);
 extern void AtSubAbort_smgr(void);
 extern void PostPrepare_smgr(void);
diff --git a/src/include/postmaster/bgwriter.h b/src/include/postmaster/bgwriter.h
index 06a4e37..a22fbee 100644
--- a/src/include/postmaster/bgwriter.h
+++ b/src/include/postmaster/bgwriter.h
@@ -27,7 +27,7 @@ extern void BackgroundWriterMain(void);
 extern void RequestCheckpoint(int flags);
 extern void CheckpointWriteDelay(int flags, double progress);
 
-extern bool ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum,
+extern bool ForwardFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
 					BlockNumber segno);
 extern void AbsorbFsyncRequests(void);
 
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index f38f545..d4bf341 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -160,7 +160,7 @@ extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
 extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 				   BlockNumber blockNum, ReadBufferMode mode,
 				   BufferAccessStrategy strategy);
-extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
+extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
@@ -180,8 +180,8 @@ extern BlockNumber BufferGetBlockNumber(Buffer buffer);
 extern BlockNumber RelationGetNumberOfBlocks(Relation relation);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum,
-					   bool istemp, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(BackendRelFileNode rnode,
+					   ForkNumber forkNum, BlockNumber firstDelBlock);
 extern void DropDatabaseBuffers(Oid dbid);
 
 #ifdef NOT_USED
diff --git a/src/include/storage/relfilenode.h b/src/include/storage/relfilenode.h
index 6f0c0ad..9bb0fa2 100644
--- a/src/include/storage/relfilenode.h
+++ b/src/include/storage/relfilenode.h
@@ -14,6 +14,8 @@
 #ifndef RELFILENODE_H
 #define RELFILENODE_H
 
+#include "storage/backendid.h"
+
 /*
  * The physical storage of a relation consists of one or more forks. The
  * main fork is always created, but in addition to that there can be
@@ -37,7 +39,8 @@ typedef enum ForkNumber
 
 /*
  * RelFileNode must provide all that we need to know to physically access
- * a relation. Note, however, that a "physical" relation is comprised of
+ * a relation, with the exception of the backend ID, which can be provided
+ * separately. Note, however, that a "physical" relation is comprised of
  * multiple files on the filesystem, as each fork is stored as a separate
  * file, and each fork can be divided into multiple segments. See md.c.
  *
@@ -74,14 +77,30 @@ typedef struct RelFileNode
 } RelFileNode;
 
 /*
- * Note: RelFileNodeEquals compares relNode first since that is most likely
- * to be different in two unequal RelFileNodes.  It is probably redundant
- * to compare spcNode if the other two fields are found equal, but do it
- * anyway to be sure.
+ * Augmenting a relfilenode with the backend ID provides all the information
+ * we need to locate the physical storage.
+ */
+typedef struct BackendRelFileNode
+{
+	RelFileNode	node;
+	BackendId	backend;
+} BackendRelFileNode;
+
+/*
+ * Note: RelFileNodeEquals and BackendRelFileNodeEquals compare relNode first
+ * since that is most likely to be different in two unequal RelFileNodes.  It
+ * is probably redundant to compare spcNode if the other fields are found equal,
+ * but do it anyway to be sure.
  */
 #define RelFileNodeEquals(node1, node2) \
 	((node1).relNode == (node2).relNode && \
 	 (node1).dbNode == (node2).dbNode && \
 	 (node1).spcNode == (node2).spcNode)
 
+#define BackendRelFileNodeEquals(node1, node2) \
+	((node1).node.relNode == (node2).node.relNode && \
+	 (node1).node.dbNode == (node2).node.dbNode && \
+	 (node1).backend == (node2).backend && \
+	 (node1).node.spcNode == (node2).node.spcNode)
+
 #endif   /* RELFILENODE_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index bbdde81..0168d17 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -92,7 +92,7 @@ typedef struct
 typedef struct
 {
 	int16		id;				/* type field --- must be first */
-	RelFileNode rnode;			/* physical file ID */
+	BackendRelFileNode rnode;	/* physical file ID */
 } SharedInvalSmgrMsg;
 
 #define SHAREDINVALRELMAP_ID	(-4)
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index cf248b8..7ae78ec 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -16,6 +16,7 @@
 
 #include "access/xlog.h"
 #include "fmgr.h"
+#include "storage/backendid.h"
 #include "storage/block.h"
 #include "storage/relfilenode.h"
 
@@ -38,7 +39,7 @@
 typedef struct SMgrRelationData
 {
 	/* rnode is the hashtable lookup key, so it must be first! */
-	RelFileNode smgr_rnode;		/* relation physical identifier */
+	BackendRelFileNode smgr_rnode;		/* relation physical identifier */
 
 	/* pointer to owning pointer, or NULL if none */
 	struct SMgrRelationData **smgr_owner;
@@ -68,28 +69,30 @@ typedef struct SMgrRelationData
 
 typedef SMgrRelationData *SMgrRelation;
 
+#define SmgrIsTemp(smgr) \
+	((smgr)->smgr_rnode.backend != InvalidBackendId)
 
 extern void smgrinit(void);
-extern SMgrRelation smgropen(RelFileNode rnode);
+extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend);
 extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
 extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
 extern void smgrclose(SMgrRelation reln);
 extern void smgrcloseall(void);
-extern void smgrclosenode(RelFileNode rnode);
+extern void smgrclosenode(BackendRelFileNode rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, ForkNumber forknum,
-			 bool isTemp, bool isRedo);
+			 bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
-		   BlockNumber blocknum, char *buffer, bool isTemp);
+		   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
 			 BlockNumber blocknum);
 extern void smgrread(SMgrRelation reln, ForkNumber forknum,
 		 BlockNumber blocknum, char *buffer);
 extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
-		  BlockNumber blocknum, char *buffer, bool isTemp);
+		  BlockNumber blocknum, char *buffer, bool skipFsync);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-			 BlockNumber nblocks, bool isTemp);
+			 BlockNumber nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void smgrpreckpt(void);
 extern void smgrsync(void);
@@ -103,27 +106,28 @@ extern void mdinit(void);
 extern void mdclose(SMgrRelation reln, ForkNumber forknum);
 extern void mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern bool mdexists(SMgrRelation reln, ForkNumber forknum);
-extern void mdunlink(RelFileNode rnode, ForkNumber forknum, bool isRedo);
+extern void mdunlink(BackendRelFileNode rnode, ForkNumber forknum, bool isRedo);
 extern void mdextend(SMgrRelation reln, ForkNumber forknum,
-		 BlockNumber blocknum, char *buffer, bool isTemp);
+		 BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void mdprefetch(SMgrRelation reln, ForkNumber forknum,
 		   BlockNumber blocknum);
 extern void mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	   char *buffer);
 extern void mdwrite(SMgrRelation reln, ForkNumber forknum,
-		BlockNumber blocknum, char *buffer, bool isTemp);
+		BlockNumber blocknum, char *buffer, bool skipFsync);
 extern BlockNumber mdnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void mdtruncate(SMgrRelation reln, ForkNumber forknum,
-		   BlockNumber nblocks, bool isTemp);
+		   BlockNumber nblocks);
 extern void mdimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void mdpreckpt(void);
 extern void mdsync(void);
 extern void mdpostckpt(void);
 
 extern void SetForwardFsyncRequests(void);
-extern void RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum,
+extern void RememberFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
 					 BlockNumber segno);
-extern void ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum);
+extern void ForgetRelationFsyncRequests(BackendRelFileNode rnode,
+							ForkNumber forknum);
 extern void ForgetDatabaseFsyncRequests(Oid dbid);
 
 /* smgrtype.c */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index a86a17c..7fab919 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -49,7 +49,7 @@ extern void CacheInvalidateRelcacheByTuple(HeapTuple classTuple);
 
 extern void CacheInvalidateRelcacheByRelid(Oid relid);
 
-extern void CacheInvalidateSmgr(RelFileNode rnode);
+extern void CacheInvalidateSmgr(BackendRelFileNode rnode);
 
 extern void CacheInvalidateRelmap(Oid databaseId);
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 444e892..296c651 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -127,7 +127,7 @@ typedef struct RelationData
 	struct SMgrRelationData *rd_smgr;	/* cached file handle, or NULL */
 	int			rd_refcnt;		/* reference count */
 	bool		rd_istemp;		/* rel is a temporary relation */
-	bool		rd_islocaltemp; /* rel is a temp rel of this session */
+	BackendId	rd_backend;		/* owning backend id, if temporary relation */
 	bool		rd_isnailed;	/* rel is nailed in cache */
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
@@ -347,7 +347,7 @@ typedef struct StdRdOptions
 #define RelationOpenSmgr(relation) \
 	do { \
 		if ((relation)->rd_smgr == NULL) \
-			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node)); \
+			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node, (relation)->rd_backend)); \
 	} while (0)
 
 /*
@@ -393,7 +393,7 @@ typedef struct StdRdOptions
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_LOCAL(relation) \
-	((relation)->rd_islocaltemp || \
+	((relation)->rd_backend == MyBackendId || \
 	 (relation)->rd_createSubid != InvalidSubTransactionId)
 
 /*
@@ -403,7 +403,7 @@ typedef struct StdRdOptions
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_OTHER_TEMP(relation) \
-	((relation)->rd_istemp && !(relation)->rd_islocaltemp)
+	((relation)->rd_istemp && (relation)->rd_backend != MyBackendId)
 
 /* routines in utils/cache/relcache.c */
 extern void RelationIncrementReferenceCount(Relation rel);

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Robert Haas (#1)

1 attachment(s)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Jun 10, 2010 at 4:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

For previous discussion of this topic, see:

http://archives.postgresql.org/pgsql-hackers/2010-04/msg01181.php
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00352.php
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00302.php

As in the original version of the patch, I have not simply added
backend ID to RelFileNode, because there are too many places using
RelFileNode in contexts where the backend ID can be determined from
context, such as the shared and local buffer managers and the xlog
code. Instead, I have introduced BackendRelFileNode for contexts
where we need both the RelFileNode and the backend ID. The smgr layer
has to use BackendRelFileNode across the board, since it operates on
both permanent and temporary relation, including - potentially -
temporary relations of other backends. smgr invalidations must also
include the backend ID, as must communication between regular backends
and the bgwriter. The relcache now stores rd_backend instead of
rd_islocaltemp so that it remains straightforward to call smgropen()
based on a relcache entry. Some smgr functions no longer require an
isTemp argument, because they can infer the necessary information from
their BackendRelFileNode. smgrwrite() and smgrextend() now take a
skipFsync argument rather than an isTemp argument.

In this version of the patch, I've improved the temporary file cleanup
mechanism. In CVS HEAD, when a transaction commits or aborts, we
write an XLOG record with all relations that must be unlinked,
including temporary relations. With this patch, we no longer include
temporary relations in the XLOG records (which may be a tiny
performance benefit for some people); instead, on every startup of the
database cluster, we just nuke all temporary relation files (which is
now easy to do, since they are named differently than files for
non-temporary relations) at the same time that we nuke other temp
files. This produces slightly different behavior. In CVS HEAD,
temporary files get removed whenever an xlog redo happens, so either
at cluster start or after a backend crash, but only to the extent that
they appear in WAL. With this patch, we can be sure of removing ALL
stray files, which is maybe ever-so-slightly leaky in CVS HEAD. But
on the other hand, because it hooks into the existing temporary file
cleanup code, it only happens at cluster startup, NOT after a backend
crash. The existing coding leaves temporary sort files and similar
around after a backend crash for forensics purposes. That might or
might not be worth rethinking for non-debug builds, but I don't think
there's any very good reason to think that temporary relation files
need to be handled differently than temporary work files.

I believe that this patch will clear away one major obstacle to
implementing global temporary and unlogged tables: it enables us to be
sure of cleaning up properly after a crash without relying on catalog
entries or XLOG. Based on previous discussions, however, I believe
there is support for making this change independently of what becomes
of that project, just for the benefit of having a more robust cleanup
mechanism.

Updated patch to remove minor bitrot.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachments:

temprelnames-v3.patchapplication/octet-stream; name=temprelnames-v3.patchDownload

diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d1f7bcc..9645c95 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -373,8 +373,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	}
 
 	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks,
-				 rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
 
 	/*
 	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 89ed8a0..06e304e 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -295,9 +295,8 @@ _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
 	}
 
 	/*
-	 * Now write the page.	We say isTemp = true even if it's not a temp
-	 * index, because there's no need for smgr to schedule an fsync for this
-	 * write; we'll do it ourselves before ending the build.
+	 * Now write the page.	There's no need for smgr to schedule an fsync for
+	 * this write; we'll do it ourselves before ending the build.
 	 */
 	if (blkno == wstate->btws_pages_written)
 	{
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d0680f0..615a7fa 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -865,8 +865,8 @@ StartPrepare(GlobalTransaction gxact)
 	hdr.prepared_at = gxact->prepared_at;
 	hdr.owner = gxact->owner;
 	hdr.nsubxacts = xactGetCommittedChildren(&children);
-	hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels, NULL);
-	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels, NULL);
+	hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels);
+	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
 	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
 														  &hdr.initfileinval);
 	StrNCpy(hdr.gid, gxact->gid, GIDSIZE);
@@ -1320,13 +1320,13 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	}
 	for (i = 0; i < ndelrels; i++)
 	{
-		SMgrRelation srel = smgropen(delrels[i]);
+		SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
 		{
 			if (smgrexists(srel, fork))
-				smgrdounlink(srel, fork, false, false);
+				smgrdounlink(srel, fork, false);
 		}
 		smgrclose(srel);
 	}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 6ba6534..d69e8db 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -890,7 +890,6 @@ RecordTransactionCommit(void)
 	TransactionId latestXid = InvalidTransactionId;
 	int			nrels;
 	RelFileNode *rels;
-	bool		haveNonTemp;
 	int			nchildren;
 	TransactionId *children;
 	int			nmsgs;
@@ -898,7 +897,7 @@ RecordTransactionCommit(void)
 	bool		RelcacheInitFileInval;
 
 	/* Get data needed for commit record */
-	nrels = smgrGetPendingDeletes(true, &rels, &haveNonTemp);
+	nrels = smgrGetPendingDeletes(true, &rels);
 	nchildren = xactGetCommittedChildren(&children);
 	nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 												 &RelcacheInitFileInval);
@@ -1025,7 +1024,7 @@ RecordTransactionCommit(void)
 	 * asynchronous commit if all to-be-deleted tables are temporary though,
 	 * since they are lost anyway if we crash.)
 	 */
-	if (XactSyncCommit || forceSyncCommit || haveNonTemp)
+	if (XactSyncCommit || forceSyncCommit || nrels > 0)
 	{
 		/*
 		 * Synchronous commit case:
@@ -1311,7 +1310,7 @@ RecordTransactionAbort(bool isSubXact)
 			 xid);
 
 	/* Fetch the data we need for the abort record */
-	nrels = smgrGetPendingDeletes(false, &rels, NULL);
+	nrels = smgrGetPendingDeletes(false, &rels);
 	nchildren = xactGetCommittedChildren(&children);
 
 	/* XXX do we really need a critical section here? */
@@ -4451,7 +4450,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4459,7 +4458,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
@@ -4556,7 +4555,7 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4564,7 +4563,7 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
@@ -4638,7 +4637,7 @@ xact_desc_commit(StringInfo buf, xl_xact_commit *xlrec)
 		appendStringInfo(buf, "; rels:");
 		for (i = 0; i < xlrec->nrels; i++)
 		{
-			char	   *path = relpath(xlrec->xnodes[i], MAIN_FORKNUM);
+			char	   *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
 
 			appendStringInfo(buf, " %s", path);
 			pfree(path);
@@ -4693,7 +4692,7 @@ xact_desc_abort(StringInfo buf, xl_xact_abort *xlrec)
 		appendStringInfo(buf, "; rels:");
 		for (i = 0; i < xlrec->nrels; i++)
 		{
-			char	   *path = relpath(xlrec->xnodes[i], MAIN_FORKNUM);
+			char	   *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
 
 			appendStringInfo(buf, " %s", path);
 			pfree(path);
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b5bbc5e..dba842c 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -68,7 +68,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 */
 	if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
 	{
-		char	   *path = relpath(node, forkno);
+		char	   *path = relpathperm(node, forkno);
 
 		if (present)
 			elog(DEBUG1, "page %u of relation %s is uninitialized",
@@ -133,7 +133,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
 		{
 			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
 			{
-				char	   *path = relpath(hentry->key.node, forkno);
+				char	   *path = relpathperm(hentry->key.node, forkno);
 
 				elog(DEBUG2, "page %u of relation %s has been dropped",
 					 hentry->key.blkno, path);
@@ -166,7 +166,7 @@ forget_invalid_pages_db(Oid dbid)
 		{
 			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
 			{
-				char	   *path = relpath(hentry->key.node, hentry->key.forkno);
+				char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
 				elog(DEBUG2, "page %u of relation %s has been dropped",
 					 hentry->key.blkno, path);
@@ -200,7 +200,7 @@ XLogCheckInvalidPages(void)
 	 */
 	while ((hentry = (xl_invalid_page *) hash_seq_search(&status)) != NULL)
 	{
-		char	   *path = relpath(hentry->key.node, hentry->key.forkno);
+		char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
 		if (hentry->present)
 			elog(WARNING, "page %u of relation %s was uninitialized",
@@ -276,7 +276,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	Assert(blkno != P_NEW);
 
 	/* Open the relation at smgr level */
-	smgr = smgropen(rnode);
+	smgr = smgropen(rnode, InvalidBackendId);
 
 	/*
 	 * Create the target file if it doesn't already exist.  This lets us cope
@@ -293,7 +293,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	if (blkno < lastblock)
 	{
 		/* page exists in file */
-		buffer = ReadBufferWithoutRelcache(rnode, false, forknum, blkno,
+		buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
 										   mode, NULL);
 	}
 	else
@@ -312,7 +312,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 		{
 			if (buffer != InvalidBuffer)
 				ReleaseBuffer(buffer);
-			buffer = ReadBufferWithoutRelcache(rnode, false, forknum,
+			buffer = ReadBufferWithoutRelcache(rnode, forknum,
 											   P_NEW, mode, NULL);
 			lastblock++;
 		}
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 3edfc23..5ec5602 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -78,12 +78,37 @@ forkname_to_number(char *forkName)
 }
 
 /*
- * relpath			- construct path to a relation's file
+ * forkname_chars
+ * 		We use this to figure out whether a filename could be a relation
+ * 		fork (as opposed to an oddly named stray file that somehow ended
+ * 		up in the database directory).  If the passed string begins with
+ * 		a fork name (other than the main fork name), we return its length.
+ * 		If not, we return 0.
+ *
+ * Note that the present coding assumes that there are no fork names which
+ * are prefixes of other fork names.
+ */
+int
+forkname_chars(const char *str)
+{
+	ForkNumber	forkNum;
+
+	for (forkNum = 1; forkNum <= MAX_FORKNUM; forkNum++)
+	{
+		int len = strlen(forkNames[forkNum]);
+		if (strncmp(forkNames[forkNum], str, len) == 0)
+			return len;
+	}
+	return 0;
+}
+
+/*
+ * relpathbackend - construct path to a relation's file
  *
  * Result is a palloc'd string.
  */
 char *
-relpath(RelFileNode rnode, ForkNumber forknum)
+relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)
 {
 	int			pathlen;
 	char	   *path;
@@ -92,6 +117,7 @@ relpath(RelFileNode rnode, ForkNumber forknum)
 	{
 		/* Shared system relations live in {datadir}/global */
 		Assert(rnode.dbNode == 0);
+		Assert(backend == InvalidBackendId);
 		pathlen = 7 + OIDCHARS + 1 + FORKNAMECHARS + 1;
 		path = (char *) palloc(pathlen);
 		if (forknum != MAIN_FORKNUM)
@@ -103,29 +129,69 @@ relpath(RelFileNode rnode, ForkNumber forknum)
 	else if (rnode.spcNode == DEFAULTTABLESPACE_OID)
 	{
 		/* The default tablespace is {datadir}/base */
-		pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-		path = (char *) palloc(pathlen);
-		if (forknum != MAIN_FORKNUM)
-			snprintf(path, pathlen, "base/%u/%u_%s",
-					 rnode.dbNode, rnode.relNode, forkNames[forknum]);
+		if (backend == InvalidBackendId)
+		{
+			pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "base/%u/%u_%s",
+						 rnode.dbNode, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "base/%u/%u",
+						 rnode.dbNode, rnode.relNode);
+		}
 		else
-			snprintf(path, pathlen, "base/%u/%u",
-					 rnode.dbNode, rnode.relNode);
+		{
+			/* OIDCHARS will suffice for an integer, too */
+			pathlen = 5 + OIDCHARS + 2 + OIDCHARS + 1 + OIDCHARS + 1
+					+ FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "base/%u/t%d_%u_%s",
+						 rnode.dbNode, backend, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "base/%u/t%d_%u",
+						 rnode.dbNode, backend, rnode.relNode);
+		}
 	}
 	else
 	{
 		/* All other tablespaces are accessed via symlinks */
-		pathlen = 9 + 1 + OIDCHARS + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) +
-			1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-		path = (char *) palloc(pathlen);
-		if (forknum != MAIN_FORKNUM)
-			snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
-					 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
-					 rnode.dbNode, rnode.relNode, forkNames[forknum]);
+		if (backend == InvalidBackendId)
+		{
+			pathlen = 9 + 1 + OIDCHARS + 1
+					+ strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1
+					+ OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, rnode.relNode);
+		}
 		else
-			snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
-					 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
-					 rnode.dbNode, rnode.relNode);
+		{
+			/* OIDCHARS will suffice for an integer, too */
+			pathlen = 9 + 1 + OIDCHARS + 1
+					+ strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 2
+					+ OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u_%s",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, backend, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, backend, rnode.relNode);
+		}
 	}
 	return path;
 }
@@ -458,16 +524,17 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
  * created by bootstrap have preassigned OIDs, so there's no need.
  */
 Oid
-GetNewRelFileNode(Oid reltablespace, Relation pg_class)
+GetNewRelFileNode(Oid reltablespace, Relation pg_class, BackendId backend)
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 	char	   *rpath;
 	int			fd;
 	bool		collides;
 
 	/* This logic should match RelationInitPhysicalAddr */
-	rnode.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
-	rnode.dbNode = (rnode.spcNode == GLOBALTABLESPACE_OID) ? InvalidOid : MyDatabaseId;
+	rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
+	rnode.node.dbNode = (rnode.node.spcNode == GLOBALTABLESPACE_OID) ? InvalidOid : MyDatabaseId;
+	rnode.backend = backend;
 
 	do
 	{
@@ -475,9 +542,9 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class)
 
 		/* Generate the OID */
 		if (pg_class)
-			rnode.relNode = GetNewOid(pg_class);
+			rnode.node.relNode = GetNewOid(pg_class);
 		else
-			rnode.relNode = GetNewObjectId();
+			rnode.node.relNode = GetNewObjectId();
 
 		/* Check for existing file of same name */
 		rpath = relpath(rnode, MAIN_FORKNUM);
@@ -508,5 +575,5 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class)
 		pfree(rpath);
 	} while (collides);
 
-	return rnode.relNode;
+	return rnode.node.relNode;
 }
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index d848ef0..6e852b4 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -39,6 +39,7 @@
 #include "catalog/heap.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_attrdef.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -975,7 +976,9 @@ heap_create_with_catalog(const char *relname,
 			binary_upgrade_next_toast_relfilenode = InvalidOid;
 		}
 		else
-			relid = GetNewRelFileNode(reltablespace, pg_class_desc);
+			relid = GetNewRelFileNode(reltablespace, pg_class_desc,
+									  isTempOrToastNamespace(relnamespace) ?
+										  MyBackendId : InvalidBackendId);
 	}
 
 	/*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 69946fe..3408a10 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -645,7 +645,12 @@ index_create(Oid heapRelationId,
 			binary_upgrade_next_index_relfilenode = InvalidOid;
 		}
 		else
-			indexRelationId = GetNewRelFileNode(tableSpaceId, pg_class);
+		{
+			indexRelationId =
+				GetNewRelFileNode(tableSpaceId, pg_class,
+								  heapRelation->rd_istemp ?
+									MyBackendId : InvalidBackendId);
+		}
 	}
 
 	/*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index ad376a1..4515741 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -52,7 +52,7 @@
 typedef struct PendingRelDelete
 {
 	RelFileNode relnode;		/* relation that may need to be deleted */
-	bool		isTemp;			/* is it a temporary relation? */
+	BackendId	backend;		/* InvalidBackendId if not a temp rel */
 	bool		atCommit;		/* T=delete at commit; F=delete at abort */
 	int			nestLevel;		/* xact nesting level of request */
 	struct PendingRelDelete *next;		/* linked-list link */
@@ -102,8 +102,9 @@ RelationCreateStorage(RelFileNode rnode, bool istemp)
 	XLogRecData rdata;
 	xl_smgr_create xlrec;
 	SMgrRelation srel;
+	BackendId	backend = istemp ? MyBackendId : InvalidBackendId;
 
-	srel = smgropen(rnode);
+	srel = smgropen(rnode, backend);
 	smgrcreate(srel, MAIN_FORKNUM, false);
 
 	if (!istemp)
@@ -125,7 +126,7 @@ RelationCreateStorage(RelFileNode rnode, bool istemp)
 	pending = (PendingRelDelete *)
 		MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
 	pending->relnode = rnode;
-	pending->isTemp = istemp;
+	pending->backend = backend;
 	pending->atCommit = false;	/* delete if abort */
 	pending->nestLevel = GetCurrentTransactionNestLevel();
 	pending->next = pendingDeletes;
@@ -145,7 +146,7 @@ RelationDropStorage(Relation rel)
 	pending = (PendingRelDelete *)
 		MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
 	pending->relnode = rel->rd_node;
-	pending->isTemp = rel->rd_istemp;
+	pending->backend = rel->rd_backend;
 	pending->atCommit = true;	/* delete if commit */
 	pending->nestLevel = GetCurrentTransactionNestLevel();
 	pending->next = pendingDeletes;
@@ -283,7 +284,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
 }
 
 /*
@@ -322,14 +323,11 @@ smgrDoPendingDeletes(bool isCommit)
 				SMgrRelation srel;
 				int			i;
 
-				srel = smgropen(pending->relnode);
+				srel = smgropen(pending->relnode, pending->backend);
 				for (i = 0; i <= MAX_FORKNUM; i++)
 				{
 					if (smgrexists(srel, i))
-						smgrdounlink(srel,
-									 i,
-									 pending->isTemp,
-									 false);
+						smgrdounlink(srel, i, false);
 				}
 				smgrclose(srel);
 			}
@@ -341,20 +339,24 @@ smgrDoPendingDeletes(bool isCommit)
 }
 
 /*
- * smgrGetPendingDeletes() -- Get a list of relations to be deleted.
+ * smgrGetPendingDeletes() -- Get a list of non-temp relations to be deleted.
  *
  * The return value is the number of relations scheduled for termination.
  * *ptr is set to point to a freshly-palloc'd array of RelFileNodes.
  * If there are no relations to be deleted, *ptr is set to NULL.
  *
- * If haveNonTemp isn't NULL, the bool it points to gets set to true if
- * there is any non-temp table pending to be deleted; false if not.
+ * Only non-temporary relations are included in the returned list.  This is OK
+ * because the list is used only in contexts where temporary relations don't
+ * matter: we're either writing to the two-phase state file (and transactions
+ * that have touched temp tables can't be prepared) or we're writing to xlog
+ * (and all temporary files will be zapped if we restart anyway, so no need
+ * for redo to do it also).
  *
  * Note that the list does not include anything scheduled for termination
  * by upper-level transactions.
  */
 int
-smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
+smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr)
 {
 	int			nestLevel = GetCurrentTransactionNestLevel();
 	int			nrels;
@@ -362,11 +364,10 @@ smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
 	PendingRelDelete *pending;
 
 	nrels = 0;
-	if (haveNonTemp)
-		*haveNonTemp = false;
 	for (pending = pendingDeletes; pending != NULL; pending = pending->next)
 	{
-		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit)
+		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit
+			&& pending->backend == InvalidBackendId)
 			nrels++;
 	}
 	if (nrels == 0)
@@ -378,13 +379,12 @@ smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
 	*ptr = rptr;
 	for (pending = pendingDeletes; pending != NULL; pending = pending->next)
 	{
-		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit)
+		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit
+			&& pending->backend == InvalidBackendId)
 		{
 			*rptr = pending->relnode;
 			rptr++;
 		}
-		if (haveNonTemp && !pending->isTemp)
-			*haveNonTemp = true;
 	}
 	return nrels;
 }
@@ -456,7 +456,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		xl_smgr_create *xlrec = (xl_smgr_create *) XLogRecGetData(record);
 		SMgrRelation reln;
 
-		reln = smgropen(xlrec->rnode);
+		reln = smgropen(xlrec->rnode, InvalidBackendId);
 		smgrcreate(reln, MAIN_FORKNUM, true);
 	}
 	else if (info == XLOG_SMGR_TRUNCATE)
@@ -465,7 +465,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		SMgrRelation reln;
 		Relation	rel;
 
-		reln = smgropen(xlrec->rnode);
+		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
 		/*
 		 * Forcibly create relation if it doesn't exist (which suggests that
@@ -475,7 +475,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		 */
 		smgrcreate(reln, MAIN_FORKNUM, true);
 
-		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno, false);
+		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
 
 		/* Also tell xlogutils.c about it */
 		XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
@@ -502,7 +502,7 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)
 	if (info == XLOG_SMGR_CREATE)
 	{
 		xl_smgr_create *xlrec = (xl_smgr_create *) rec;
-		char	   *path = relpath(xlrec->rnode, MAIN_FORKNUM);
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
 
 		appendStringInfo(buf, "file create: %s", path);
 		pfree(path);
@@ -510,7 +510,7 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)
 	else if (info == XLOG_SMGR_TRUNCATE)
 	{
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) rec;
-		char	   *path = relpath(xlrec->rnode, MAIN_FORKNUM);
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
 
 		appendStringInfo(buf, "file truncate: %s to %u blocks", path,
 						 xlrec->blkno);
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 435dfdd..54cdc4a 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -195,7 +195,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
 	 * Toast tables for regular relations go in pg_toast; those for temp
 	 * relations go into the per-backend temp-toast-table namespace.
 	 */
-	if (rel->rd_islocaltemp)
+	if (rel->rd_backend == MyBackendId)
 		namespaceid = GetTempToastNamespace();
 	else
 		namespaceid = PG_TOAST_NAMESPACE;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9d46e47..3d61116 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1022,7 +1022,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString)
 		}
 
 		/* check read-only transaction */
-		if (XactReadOnly && is_from && !cstate->rel->rd_islocaltemp)
+		if (XactReadOnly && is_from && cstate->rel->rd_backend != MyBackendId)
 			PreventCommandIfReadOnly("COPY FROM");
 
 		/* Don't allow COPY w/ OIDs to or from a table without them */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index f52e1d8..f6867f4 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -470,7 +470,7 @@ nextval_internal(Oid relid)
 						RelationGetRelationName(seqrel))));
 
 	/* read-only transactions may only modify temp sequences */
-	if (!seqrel->rd_islocaltemp)
+	if (seqrel->rd_backend != MyBackendId)
 		PreventCommandIfReadOnly("nextval()");
 
 	if (elm->last != elm->cached)		/* some numbers were cached */
@@ -747,7 +747,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
 						RelationGetRelationName(seqrel))));
 
 	/* read-only transactions may only modify temp sequences */
-	if (!seqrel->rd_islocaltemp)
+	if (seqrel->rd_backend != MyBackendId)
 		PreventCommandIfReadOnly("setval()");
 
 	/* lock page' buffer and read tuple */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5f6fe41..5f6818f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6968,13 +6968,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace)
 	 * Relfilenodes are not unique across tablespaces, so we need to allocate
 	 * a new one in the new tablespace.
 	 */
-	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL);
+	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL, rel->rd_backend);
 
 	/* Open old and new relation */
 	newrnode = rel->rd_node;
 	newrnode.relNode = newrelfilenode;
 	newrnode.spcNode = newTableSpace;
-	dstrel = smgropen(newrnode);
+	dstrel = smgropen(newrnode, rel->rd_backend);
 
 	RelationOpenSmgr(rel);
 
@@ -7056,7 +7056,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
 
 		/* XLOG stuff */
 		if (use_wal)
-			log_newpage(&dst->smgr_rnode, forkNum, blkno, page);
+			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
 
 		/*
 		 * Now write the page.	We say isTemp = true even if it's not a temp
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 95e9d37..cf79de3 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -113,7 +113,7 @@
  */
 typedef struct
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 	ForkNumber	forknum;
 	BlockNumber segno;			/* see md.c for special values */
 	/* might add a real request-type field later; not needed yet */
@@ -1071,7 +1071,8 @@ RequestCheckpoint(int flags)
  * than we have to here.
  */
 bool
-ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+ForwardFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
+					BlockNumber segno)
 {
 	BgWriterRequest *request;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index caae936..4dd894b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -95,7 +95,8 @@ static void WaitIO(volatile BufferDesc *buf);
 static bool StartBufferIO(volatile BufferDesc *buf, bool forInput);
 static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty,
 				  int set_flag_bits);
-static void buffer_write_error_callback(void *arg);
+static void shared_buffer_write_error_callback(void *arg);
+static void local_buffer_write_error_callback(void *arg);
 static volatile BufferDesc *BufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
 			BlockNumber blockNum,
 			BufferAccessStrategy strategy,
@@ -141,7 +142,8 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 		int			buf_id;
 
 		/* create a tag so we can lookup the buffer */
-		INIT_BUFFERTAG(newTag, reln->rd_smgr->smgr_rnode, forkNum, blockNum);
+		INIT_BUFFERTAG(newTag, reln->rd_smgr->smgr_rnode.node,
+					   forkNum, blockNum);
 
 		/* determine its hash code and partition lock ID */
 		newHash = BufTableHashCode(&newTag);
@@ -251,18 +253,21 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
  * ReadBufferWithoutRelcache -- like ReadBufferExtended, but doesn't require
  *		a relcache entry for the relation.
  *
- * NB: caller is assumed to know what it's doing if isTemp is true.
+ * NB: At present, this function may not be used on temporary relations, which
+ * is OK, because we only use it during XLOG replay.  If in the future we
+ * want to use it on temporary relations, we could pass the backend ID as an
+ * additional parameter.
  */
 Buffer
-ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
-						  ForkNumber forkNum, BlockNumber blockNum,
-						  ReadBufferMode mode, BufferAccessStrategy strategy)
+ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
+						  BlockNumber blockNum, ReadBufferMode mode,
+						  BufferAccessStrategy strategy)
 {
 	bool		hit;
 
-	SMgrRelation smgr = smgropen(rnode);
+	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
 
-	return ReadBuffer_common(smgr, isTemp, forkNum, blockNum, mode, strategy,
+	return ReadBuffer_common(smgr, false, forkNum, blockNum, mode, strategy,
 							 &hit);
 }
 
@@ -414,7 +419,7 @@ ReadBuffer_common(SMgrRelation smgr, bool isLocalBuf, ForkNumber forkNum,
 	{
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
-		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, isLocalBuf);
+		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
 	}
 	else
 	{
@@ -465,10 +470,10 @@ ReadBuffer_common(SMgrRelation smgr, bool isLocalBuf, ForkNumber forkNum,
 		VacuumCostBalance += VacuumCostPageMiss;
 
 	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
-									  smgr->smgr_rnode.spcNode,
-									  smgr->smgr_rnode.dbNode,
-									  smgr->smgr_rnode.relNode,
-									  isLocalBuf,
+									  smgr->smgr_rnode.node.spcNode,
+									  smgr->smgr_rnode.node.dbNode,
+									  smgr->smgr_rnode.node.relNode,
+									  smgr->smgr_rnode.backend,
 									  isExtend,
 									  found);
 
@@ -512,7 +517,7 @@ BufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
 	bool		valid;
 
 	/* create a tag so we can lookup the buffer */
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* determine its hash code and partition lock ID */
 	newHash = BufTableHashCode(&newTag);
@@ -1693,21 +1698,24 @@ PrintBufferLeakWarning(Buffer buffer)
 	volatile BufferDesc *buf;
 	int32		loccount;
 	char	   *path;
+	BackendId	backend;
 
 	Assert(BufferIsValid(buffer));
 	if (BufferIsLocal(buffer))
 	{
 		buf = &LocalBufferDescriptors[-buffer - 1];
 		loccount = LocalRefCount[-buffer - 1];
+		backend = MyBackendId;
 	}
 	else
 	{
 		buf = &BufferDescriptors[buffer - 1];
 		loccount = PrivateRefCount[buffer - 1];
+		backend = InvalidBackendId;
 	}
 
 	/* theoretically we should lock the bufhdr here */
-	path = relpath(buf->tag.rnode, buf->tag.forkNum);
+	path = relpathbackend(buf->tag.rnode, backend, buf->tag.forkNum);
 	elog(WARNING,
 		 "buffer refcount leak: [%03d] "
 		 "(rel=%s, blockNum=%u, flags=0x%x, refcount=%u %d)",
@@ -1831,14 +1839,14 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
 		return;
 
 	/* Setup error traceback support for ereport() */
-	errcontext.callback = buffer_write_error_callback;
+	errcontext.callback = shared_buffer_write_error_callback;
 	errcontext.arg = (void *) buf;
 	errcontext.previous = error_context_stack;
 	error_context_stack = &errcontext;
 
 	/* Find smgr relation for buffer */
 	if (reln == NULL)
-		reln = smgropen(buf->tag.rnode);
+		reln = smgropen(buf->tag.rnode, InvalidBackendId);
 
 	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
 										buf->tag.blockNum,
@@ -1929,14 +1937,15 @@ RelationGetNumberOfBlocks(Relation relation)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum, bool istemp,
+DropRelFileNodeBuffers(BackendRelFileNode rnode, ForkNumber forkNum,
 					   BlockNumber firstDelBlock)
 {
 	int			i;
 
-	if (istemp)
+	if (rnode.backend != InvalidBackendId)
 	{
-		DropRelFileNodeLocalBuffers(rnode, forkNum, firstDelBlock);
+		if (rnode.backend == MyBackendId)
+			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
 		return;
 	}
 
@@ -1945,7 +1954,7 @@ DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum, bool istemp,
 		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
 
 		LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) &&
+		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
 			bufHdr->tag.forkNum == forkNum &&
 			bufHdr->tag.blockNum >= firstDelBlock)
 			InvalidateBuffer(bufHdr);	/* releases spinlock */
@@ -2008,7 +2017,7 @@ PrintBufferDescs(void)
 			 "[%02d] (freeNext=%d, rel=%s, "
 			 "blockNum=%u, flags=0x%x, refcount=%u %d)",
 			 i, buf->freeNext,
-			 relpath(buf->tag.rnode, buf->tag.forkNum),
+			 relpathbackend(buf->tag.rnode, InvalidBackendId, buf->tag.forkNum),
 			 buf->tag.blockNum, buf->flags,
 			 buf->refcount, PrivateRefCount[i]);
 	}
@@ -2078,7 +2087,7 @@ FlushRelationBuffers(Relation rel)
 				ErrorContextCallback errcontext;
 
 				/* Setup error traceback support for ereport() */
-				errcontext.callback = buffer_write_error_callback;
+				errcontext.callback = local_buffer_write_error_callback;
 				errcontext.arg = (void *) bufHdr;
 				errcontext.previous = error_context_stack;
 				error_context_stack = &errcontext;
@@ -2087,7 +2096,7 @@ FlushRelationBuffers(Relation rel)
 						  bufHdr->tag.forkNum,
 						  bufHdr->tag.blockNum,
 						  (char *) LocalBufHdrGetBlock(bufHdr),
-						  true);
+						  false);
 
 				bufHdr->flags &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 
@@ -2699,8 +2708,9 @@ AbortBufferIO(void)
 			if (sv_flags & BM_IO_ERROR)
 			{
 				/* Buffer is pinned, so we can read tag without spinlock */
-				char	   *path = relpath(buf->tag.rnode, buf->tag.forkNum);
+				char	   *path;
 
+				path = relpathperm(buf->tag.rnode, buf->tag.forkNum);
 				ereport(WARNING,
 						(errcode(ERRCODE_IO_ERROR),
 						 errmsg("could not write block %u of %s",
@@ -2714,17 +2724,36 @@ AbortBufferIO(void)
 }
 
 /*
- * Error context callback for errors occurring during buffer writes.
+ * Error context callback for errors occurring during shared buffer writes.
  */
 static void
-buffer_write_error_callback(void *arg)
+shared_buffer_write_error_callback(void *arg)
 {
 	volatile BufferDesc *bufHdr = (volatile BufferDesc *) arg;
 
 	/* Buffer is pinned, so we can read the tag without locking the spinlock */
 	if (bufHdr != NULL)
 	{
-		char	   *path = relpath(bufHdr->tag.rnode, bufHdr->tag.forkNum);
+		char	   *path = relpathperm(bufHdr->tag.rnode, bufHdr->tag.forkNum);
+
+		errcontext("writing block %u of relation %s",
+				   bufHdr->tag.blockNum, path);
+		pfree(path);
+	}
+}
+
+/*
+ * Error context callback for errors occurring during buffer writes.
+ */
+static void
+local_buffer_write_error_callback(void *arg)
+{
+	volatile BufferDesc *bufHdr = (volatile BufferDesc *) arg;
+
+	if (bufHdr != NULL)
+	{
+		char	   *path = relpathbackend(bufHdr->tag.rnode, MyBackendId,
+										 bufHdr->tag.forkNum);
 
 		errcontext("writing block %u of relation %s",
 				   bufHdr->tag.blockNum, path);
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index c5b6a2c..bbf0a01 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -68,7 +68,7 @@ LocalPrefetchBuffer(SMgrRelation smgr, ForkNumber forkNum,
 	BufferTag	newTag;			/* identity of requested block */
 	LocalBufferLookupEnt *hresult;
 
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* Initialize local buffers if first request in this session */
 	if (LocalBufHash == NULL)
@@ -110,7 +110,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 	int			trycounter;
 	bool		found;
 
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* Initialize local buffers if first request in this session */
 	if (LocalBufHash == NULL)
@@ -127,7 +127,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		Assert(BUFFERTAGS_EQUAL(bufHdr->tag, newTag));
 #ifdef LBDEBUG
 		fprintf(stderr, "LB ALLOC (%u,%d,%d) %d\n",
-				smgr->smgr_rnode.relNode, forkNum, blockNum, -b - 1);
+				smgr->smgr_rnode.node.relNode, forkNum, blockNum, -b - 1);
 #endif
 		/* this part is equivalent to PinBuffer for a shared buffer */
 		if (LocalRefCount[b] == 0)
@@ -150,7 +150,8 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 
 #ifdef LBDEBUG
 	fprintf(stderr, "LB ALLOC (%u,%d,%d) %d\n",
-		 smgr->smgr_rnode.relNode, forkNum, blockNum, -nextFreeLocalBuf - 1);
+		 smgr->smgr_rnode.node.relNode, forkNum, blockNum,
+		 -nextFreeLocalBuf - 1);
 #endif
 
 	/*
@@ -198,14 +199,14 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		SMgrRelation oreln;
 
 		/* Find smgr relation for buffer */
-		oreln = smgropen(bufHdr->tag.rnode);
+		oreln = smgropen(bufHdr->tag.rnode, MyBackendId);
 
 		/* And write... */
 		smgrwrite(oreln,
 				  bufHdr->tag.forkNum,
 				  bufHdr->tag.blockNum,
 				  (char *) LocalBufHdrGetBlock(bufHdr),
-				  true);
+				  false);
 
 		/* Mark not-dirty now in case we error out below */
 		bufHdr->flags &= ~BM_DIRTY;
@@ -309,7 +310,8 @@ DropRelFileNodeLocalBuffers(RelFileNode rnode, ForkNumber forkNum,
 			if (LocalRefCount[i] != 0)
 				elog(ERROR, "block %u of %s is still referenced (local %u)",
 					 bufHdr->tag.blockNum,
-					 relpath(bufHdr->tag.rnode, bufHdr->tag.forkNum),
+					 relpathbackend(bufHdr->tag.rnode, MyBackendId,
+								   bufHdr->tag.forkNum),
 					 LocalRefCount[i]);
 			/* Remove entry from hashtable */
 			hresult = (LocalBufferLookupEnt *)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 1d2eeec..3fb5844 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -249,6 +249,9 @@ static File OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError);
 static void AtProcExit_Files(int code, Datum arg);
 static void CleanupTempFiles(bool isProcExit);
 static void RemovePgTempFilesInDir(const char *tmpdirname);
+static void RemovePgTempRelationFiles(const char *tsdirname);
+static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname);
+static bool looks_like_temp_rel_name(const char *name);
 
 
 /*
@@ -1824,10 +1827,12 @@ CleanupTempFiles(bool isProcExit)
 
 
 /*
- * Remove temporary files left over from a prior postmaster session
+ * Remove temporary and temporary relation files left over from a prior
+ * postmaster session
  *
  * This should be called during postmaster startup.  It will forcibly
- * remove any leftover files created by OpenTemporaryFile.
+ * remove any leftover files created by OpenTemporaryFile and any leftover
+ * temporary relation files created by mdcreate.
  *
  * NOTE: we could, but don't, call this during a post-backend-crash restart
  * cycle.  The argument for not doing it is that someone might want to examine
@@ -1847,6 +1852,7 @@ RemovePgTempFiles(void)
 	 */
 	snprintf(temp_path, sizeof(temp_path), "base/%s", PG_TEMP_FILES_DIR);
 	RemovePgTempFilesInDir(temp_path);
+	RemovePgTempRelationFiles("base");
 
 	/*
 	 * Cycle through temp directories for all non-default tablespaces.
@@ -1862,6 +1868,10 @@ RemovePgTempFiles(void)
 		snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s/%s",
 			spc_de->d_name, TABLESPACE_VERSION_DIRECTORY, PG_TEMP_FILES_DIR);
 		RemovePgTempFilesInDir(temp_path);
+
+		snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
+			spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
+		RemovePgTempRelationFiles(temp_path);
 	}
 
 	FreeDir(spc_dir);
@@ -1915,3 +1925,123 @@ RemovePgTempFilesInDir(const char *tmpdirname)
 
 	FreeDir(temp_dir);
 }
+
+/* Process one tablespace directory, look for per-DB subdirectories */
+static void
+RemovePgTempRelationFiles(const char *tsdirname)
+{
+	DIR		   *ts_dir;
+	struct dirent *de;
+	char		dbspace_path[MAXPGPATH];
+
+	ts_dir = AllocateDir(tsdirname);
+	if (ts_dir == NULL)
+	{
+		/* anything except ENOENT is fishy */
+		if (errno != ENOENT)
+			elog(LOG,
+				 "could not open tablespace directory \"%s\": %m",
+				 tsdirname);
+		return;
+	}
+
+	while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
+	{
+		int		i = 0;
+
+		/*
+		 * We're only interested in the per-database directories, which have
+		 * numeric names.  Note that this code will also (properly) ignore "."
+		 * and "..".
+		 */
+		while (isdigit((unsigned char) de->d_name[i]))
+			++i;
+		if (de->d_name[i] != '\0' || i == 0)
+			continue;
+
+		snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
+				 tsdirname, de->d_name);
+		RemovePgTempRelationFilesInDbspace(dbspace_path);
+	}
+
+	FreeDir(ts_dir);
+}
+
+/* Process one per-dbspace directory for RemovePgTempRelationFiles */
+static void
+RemovePgTempRelationFilesInDbspace(const char *dbspacedirname)
+{
+	DIR		   *dbspace_dir;
+	struct dirent *de;
+	char		rm_path[MAXPGPATH];
+
+	dbspace_dir = AllocateDir(dbspacedirname);
+	if (dbspace_dir == NULL)
+	{
+		/* we just saw this directory, so it really ought to be there */
+		elog(LOG,
+			 "could not open dbspace directory \"%s\": %m",
+			 dbspacedirname);
+		return;
+	}
+
+	while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+	{
+		if (!looks_like_temp_rel_name(de->d_name))
+			continue;
+
+		snprintf(rm_path, sizeof(rm_path), "%s/%s",
+				 dbspacedirname, de->d_name);
+
+		unlink(rm_path);	/* note we ignore any error */
+	}
+
+	FreeDir(dbspace_dir);
+}
+
+/* t<digits>_<digits>, or t<digits>_<digits>_<forkname> */
+static bool
+looks_like_temp_rel_name(const char *name)
+{
+	int			pos;
+	int			savepos;
+
+	/* Must start with "t". */
+	if (name[0] != 't')
+		return false;
+
+	/* Followed by a non-empty string of digits and then an underscore. */
+	for (pos = 1; isdigit((unsigned char) name[pos]); ++pos)
+		;
+	if (pos == 1 || name[pos] != '_')
+		return false;
+
+	/* Followed by another nonempty string of digits. */
+	for (savepos = ++pos; isdigit((unsigned char) name[pos]); ++pos)
+		;
+	if (savepos == pos)
+		return false;
+
+	/* We might have _forkname or .segment or both. */
+	if (name[pos] == '_')
+	{
+		int		forkchar = forkname_chars(&name[pos+1]);
+		if (forkchar <= 0)
+			return false;
+		pos += forkchar + 1;
+	}
+	if (name[pos] == '.')
+	{
+		int		segchar;
+		for (segchar = 1; isdigit((unsigned char) name[pos+segchar]); ++segchar)
+			;
+		if (segchar <= 1)
+			return false;
+		pos += segchar;
+	}
+
+	/* Now we should be at the end. */
+	if (name[pos] != '\0')
+		return false;
+	return true;
+}
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 579572f..b43a2ce 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -303,7 +303,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
 
 	/*
 	 * We might as well update the local smgr_fsm_nblocks setting.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 4163ca0..79805b4 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -119,7 +119,7 @@ static MemoryContext MdCxt;		/* context for all md.c allocations */
  */
 typedef struct
 {
-	RelFileNode rnode;			/* the targeted relation */
+	BackendRelFileNode rnode;	/* the targeted relation */
 	ForkNumber	forknum;
 	BlockNumber segno;			/* which segment */
 } PendingOperationTag;
@@ -135,7 +135,7 @@ typedef struct
 
 typedef struct
 {
-	RelFileNode rnode;			/* the dead relation to delete */
+	BackendRelFileNode rnode;	/* the dead relation to delete */
 	CycleCtr	cycle_ctr;		/* mdckpt_cycle_ctr when request was made */
 } PendingUnlinkEntry;
 
@@ -158,14 +158,14 @@ static MdfdVec *mdopen(SMgrRelation reln, ForkNumber forknum,
 	   ExtensionBehavior behavior);
 static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
 					   MdfdVec *seg);
-static void register_unlink(RelFileNode rnode);
+static void register_unlink(BackendRelFileNode rnode);
 static MdfdVec *_fdvec_alloc(void);
 static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
 			  BlockNumber segno);
 static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forkno,
 			  BlockNumber segno, int oflags);
 static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forkno,
-			 BlockNumber blkno, bool isTemp, ExtensionBehavior behavior);
+			 BlockNumber blkno, bool skipFsync, ExtensionBehavior behavior);
 static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
 		   MdfdVec *seg);
 
@@ -321,7 +321,7 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
  * we are usually not in a transaction anymore when this is called.
  */
 void
-mdunlink(RelFileNode rnode, ForkNumber forkNum, bool isRedo)
+mdunlink(BackendRelFileNode rnode, ForkNumber forkNum, bool isRedo)
 {
 	char	   *path;
 	int			ret;
@@ -417,7 +417,7 @@ mdunlink(RelFileNode rnode, ForkNumber forkNum, bool isRedo)
  */
 void
 mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		 char *buffer, bool isTemp)
+		 char *buffer, bool skipFsync)
 {
 	off_t		seekpos;
 	int			nbytes;
@@ -440,7 +440,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 						relpath(reln->smgr_rnode, forknum),
 						InvalidBlockNumber)));
 
-	v = _mdfd_getseg(reln, forknum, blocknum, isTemp, EXTENSION_CREATE);
+	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
 
 	seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -478,7 +478,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 				 errhint("Check free disk space.")));
 	}
 
-	if (!isTemp)
+	if (!skipFsync && !SmgrIsTemp(reln))
 		register_dirty_segment(reln, forknum, v);
 
 	Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
@@ -605,9 +605,10 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	MdfdVec    *v;
 
 	TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,
-										reln->smgr_rnode.spcNode,
-										reln->smgr_rnode.dbNode,
-										reln->smgr_rnode.relNode);
+										reln->smgr_rnode.node.spcNode,
+										reln->smgr_rnode.node.dbNode,
+										reln->smgr_rnode.node.relNode,
+										reln->smgr_rnode.backend);
 
 	v = _mdfd_getseg(reln, forknum, blocknum, false, EXTENSION_FAIL);
 
@@ -624,9 +625,10 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ);
 
 	TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,
-									   reln->smgr_rnode.spcNode,
-									   reln->smgr_rnode.dbNode,
-									   reln->smgr_rnode.relNode,
+									   reln->smgr_rnode.node.spcNode,
+									   reln->smgr_rnode.node.dbNode,
+									   reln->smgr_rnode.node.relNode,
+									   reln->smgr_rnode.backend,
 									   nbytes,
 									   BLCKSZ);
 
@@ -666,7 +668,7 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  */
 void
 mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		char *buffer, bool isTemp)
+		char *buffer, bool skipFsync)
 {
 	off_t		seekpos;
 	int			nbytes;
@@ -678,11 +680,12 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 #endif
 
 	TRACE_POSTGRESQL_SMGR_MD_WRITE_START(forknum, blocknum,
-										 reln->smgr_rnode.spcNode,
-										 reln->smgr_rnode.dbNode,
-										 reln->smgr_rnode.relNode);
+										 reln->smgr_rnode.node.spcNode,
+										 reln->smgr_rnode.node.dbNode,
+										 reln->smgr_rnode.node.relNode,
+										 reln->smgr_rnode.backend);
 
-	v = _mdfd_getseg(reln, forknum, blocknum, isTemp, EXTENSION_FAIL);
+	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_FAIL);
 
 	seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -697,9 +700,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ);
 
 	TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum,
-										reln->smgr_rnode.spcNode,
-										reln->smgr_rnode.dbNode,
-										reln->smgr_rnode.relNode,
+										reln->smgr_rnode.node.spcNode,
+										reln->smgr_rnode.node.dbNode,
+										reln->smgr_rnode.node.relNode,
+										reln->smgr_rnode.backend,
 										nbytes,
 										BLCKSZ);
 
@@ -720,7 +724,7 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 				 errhint("Check free disk space.")));
 	}
 
-	if (!isTemp)
+	if (!skipFsync && !SmgrIsTemp(reln))
 		register_dirty_segment(reln, forknum, v);
 }
 
@@ -794,8 +798,7 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
  *	mdtruncate() -- Truncate relation to specified number of blocks.
  */
 void
-mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-		   bool isTemp)
+mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	MdfdVec    *v;
 	BlockNumber curnblk;
@@ -839,7 +842,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 						 errmsg("could not truncate file \"%s\": %m",
 								FilePathName(v->mdfd_vfd))));
 
-			if (!isTemp)
+			if (!SmgrIsTemp(reln))
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			Assert(ov != reln->md_fd[forknum]); /* we never drop the 1st
@@ -864,7 +867,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 					errmsg("could not truncate file \"%s\" to %u blocks: %m",
 						   FilePathName(v->mdfd_vfd),
 						   nblocks)));
-			if (!isTemp)
+			if (!SmgrIsTemp(reln))
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			ov->mdfd_chain = NULL;
@@ -1052,7 +1055,8 @@ mdsync(void)
 				 * the relation will have been dirtied through this same smgr
 				 * relation, and so we can save a file open/close cycle.
 				 */
-				reln = smgropen(entry->tag.rnode);
+				reln = smgropen(entry->tag.rnode.node,
+								entry->tag.rnode.backend);
 
 				/*
 				 * It is possible that the relation has been dropped or
@@ -1235,7 +1239,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
  * a remote pending-ops table.
  */
 static void
-register_unlink(RelFileNode rnode)
+register_unlink(BackendRelFileNode rnode)
 {
 	if (pendingOpsTable)
 	{
@@ -1278,7 +1282,8 @@ register_unlink(RelFileNode rnode)
  * structure for them.)
  */
 void
-RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+RememberFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
+					 BlockNumber segno)
 {
 	Assert(pendingOpsTable);
 
@@ -1291,7 +1296,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 		hash_seq_init(&hstat, pendingOpsTable);
 		while ((entry = (PendingOperationEntry *) hash_seq_search(&hstat)) != NULL)
 		{
-			if (RelFileNodeEquals(entry->tag.rnode, rnode) &&
+			if (BackendRelFileNodeEquals(entry->tag.rnode, rnode) &&
 				entry->tag.forknum == forknum)
 			{
 				/* Okay, cancel this entry */
@@ -1312,7 +1317,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 		hash_seq_init(&hstat, pendingOpsTable);
 		while ((entry = (PendingOperationEntry *) hash_seq_search(&hstat)) != NULL)
 		{
-			if (entry->tag.rnode.dbNode == rnode.dbNode)
+			if (entry->tag.rnode.node.dbNode == rnode.node.dbNode)
 			{
 				/* Okay, cancel this entry */
 				entry->canceled = true;
@@ -1326,7 +1331,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 			PendingUnlinkEntry *entry = (PendingUnlinkEntry *) lfirst(cell);
 
 			next = lnext(cell);
-			if (entry->rnode.dbNode == rnode.dbNode)
+			if (entry->rnode.node.dbNode == rnode.node.dbNode)
 			{
 				pendingUnlinks = list_delete_cell(pendingUnlinks, cell, prev);
 				pfree(entry);
@@ -1393,7 +1398,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
  * ForgetRelationFsyncRequests -- forget any fsyncs for a rel
  */
 void
-ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum)
+ForgetRelationFsyncRequests(BackendRelFileNode rnode, ForkNumber forknum)
 {
 	if (pendingOpsTable)
 	{
@@ -1428,11 +1433,12 @@ ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum)
 void
 ForgetDatabaseFsyncRequests(Oid dbid)
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 
-	rnode.dbNode = dbid;
-	rnode.spcNode = 0;
-	rnode.relNode = 0;
+	rnode.node.dbNode = dbid;
+	rnode.node.spcNode = 0;
+	rnode.node.relNode = 0;
+	rnode.backend = InvalidBackendId;
 
 	if (pendingOpsTable)
 	{
@@ -1523,12 +1529,12 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
  *		specified block.
  *
  * If the segment doesn't exist, we ereport, return NULL, or create the
- * segment, according to "behavior".  Note: isTemp need only be correct
- * in the EXTENSION_CREATE case.
+ * segment, according to "behavior".  Note: skipFsync is only used in the
+ * EXTENSION_CREATE case.
  */
 static MdfdVec *
 _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
-			 bool isTemp, ExtensionBehavior behavior)
+			 bool skipFsync, ExtensionBehavior behavior)
 {
 	MdfdVec    *v = mdopen(reln, forknum, behavior);
 	BlockNumber targetseg;
@@ -1566,7 +1572,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
 
 					mdextend(reln, forknum,
 							 nextsegno * ((BlockNumber) RELSEG_SIZE) - 1,
-							 zerobuf, isTemp);
+							 zerobuf, skipFsync);
 					pfree(zerobuf);
 				}
 				v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT);
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 3b12cb3..ecf238d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -45,19 +45,19 @@ typedef struct f_smgr
 	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
 											bool isRedo);
 	bool		(*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_unlink) (RelFileNode rnode, ForkNumber forknum,
+	void		(*smgr_unlink) (BackendRelFileNode rnode, ForkNumber forknum,
 											bool isRedo);
 	void		(*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
-							BlockNumber blocknum, char *buffer, bool isTemp);
+							BlockNumber blocknum, char *buffer, bool skipFsync);
 	void		(*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
 											  BlockNumber blocknum);
 	void		(*smgr_read) (SMgrRelation reln, ForkNumber forknum,
 										  BlockNumber blocknum, char *buffer);
 	void		(*smgr_write) (SMgrRelation reln, ForkNumber forknum,
-							BlockNumber blocknum, char *buffer, bool isTemp);
+							BlockNumber blocknum, char *buffer, bool skipFsync);
 	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
-										   BlockNumber nblocks, bool isTemp);
+										   BlockNumber nblocks);
 	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_pre_ckpt) (void);		/* may be NULL */
 	void		(*smgr_sync) (void);	/* may be NULL */
@@ -83,8 +83,6 @@ static HTAB *SMgrRelationHash = NULL;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
-static void smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
-					 int which, bool isTemp, bool isRedo);
 
 
 /*
@@ -131,8 +129,9 @@ smgrshutdown(int code, Datum arg)
  *		This does not attempt to actually open the object.
  */
 SMgrRelation
-smgropen(RelFileNode rnode)
+smgropen(RelFileNode rnode, BackendId backend)
 {
+	BackendRelFileNode brnode;
 	SMgrRelation reln;
 	bool		found;
 
@@ -142,7 +141,7 @@ smgropen(RelFileNode rnode)
 		HASHCTL		ctl;
 
 		MemSet(&ctl, 0, sizeof(ctl));
-		ctl.keysize = sizeof(RelFileNode);
+		ctl.keysize = sizeof(BackendRelFileNode);
 		ctl.entrysize = sizeof(SMgrRelationData);
 		ctl.hash = tag_hash;
 		SMgrRelationHash = hash_create("smgr relation table", 400,
@@ -150,8 +149,10 @@ smgropen(RelFileNode rnode)
 	}
 
 	/* Look up or create an entry */
+	brnode.node = rnode;
+	brnode.backend = backend;
 	reln = (SMgrRelation) hash_search(SMgrRelationHash,
-									  (void *) &rnode,
+									  (void *) &brnode,
 									  HASH_ENTER, &found);
 
 	/* Initialize it if not present before */
@@ -261,7 +262,7 @@ smgrcloseall(void)
  * such entry exists already.
  */
 void
-smgrclosenode(RelFileNode rnode)
+smgrclosenode(BackendRelFileNode rnode)
 {
 	SMgrRelation reln;
 
@@ -305,8 +306,8 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * should be here and not in commands/tablespace.c?  But that would imply
 	 * importing a lot of stuff that smgr.c oughtn't know, either.
 	 */
-	TablespaceCreateDbspace(reln->smgr_rnode.spcNode,
-							reln->smgr_rnode.dbNode,
+	TablespaceCreateDbspace(reln->smgr_rnode.node.spcNode,
+							reln->smgr_rnode.node.dbNode,
 							isRedo);
 
 	(*(smgrsw[reln->smgr_which].smgr_create)) (reln, forknum, isRedo);
@@ -323,29 +324,19 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  *		already.
  */
 void
-smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isTemp, bool isRedo)
+smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 {
-	RelFileNode rnode = reln->smgr_rnode;
+	BackendRelFileNode rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
 
 	/* Close the fork */
 	(*(smgrsw[which].smgr_close)) (reln, forknum);
 
-	smgr_internal_unlink(rnode, forknum, which, isTemp, isRedo);
-}
-
-/*
- * Shared subroutine that actually does the unlink ...
- */
-static void
-smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
-					 int which, bool isTemp, bool isRedo)
-{
 	/*
 	 * Get rid of any remaining buffers for the relation.  bufmgr will just
 	 * drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, isTemp, 0);
+	DropRelFileNodeBuffers(rnode, forknum, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -385,10 +376,10 @@ smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
  */
 void
 smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		   char *buffer, bool isTemp)
+		   char *buffer, bool skipFsync)
 {
 	(*(smgrsw[reln->smgr_which].smgr_extend)) (reln, forknum, blocknum,
-											   buffer, isTemp);
+											   buffer, skipFsync);
 }
 
 /*
@@ -426,16 +417,16 @@ smgrread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  *		on disk at return, only dumped out to the kernel.  However,
  *		provisions will be made to fsync the write before the next checkpoint.
  *
- *		isTemp indicates that the relation is a temp table (ie, is managed
- *		by the local-buffer manager).  In this case no provisions need be
- *		made to fsync the write before checkpointing.
+ *		skipFsync indicates that the caller will make other provisions to
+ *		fsync the relation, so we needn't bother.  Temporary relations also
+ *		do not require fsync.
  */
 void
 smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		  char *buffer, bool isTemp)
+		  char *buffer, bool skipFsync)
 {
 	(*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum,
-											  buffer, isTemp);
+											  buffer, skipFsync);
 }
 
 /*
@@ -455,14 +446,13 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-			 bool isTemp)
+smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, isTemp, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -479,8 +469,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 	/*
 	 * Do the truncation.
 	 */
-	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks,
-												 isTemp);
+	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks);
 }
 
 /*
@@ -499,7 +488,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
  *		to use the WAL log for PITR or replication purposes: in that case
  *		we have to make WAL entries as well.)
  *
- *		The preceding writes should specify isTemp = true to avoid
+ *		The preceding writes should specify skipFsync = true to avoid
  *		duplicative fsyncs.
  *
  *		Note that you need to do FlushRelationBuffers() first if there is
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index a4e0252..dac2d72 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -256,14 +256,14 @@ pg_tablespace_size_name(PG_FUNCTION_ARGS)
  * calculate size of (one fork of) a relation
  */
 static int64
-calculate_relation_size(RelFileNode *rfn, ForkNumber forknum)
+calculate_relation_size(RelFileNode *rfn, BackendId backend, ForkNumber forknum)
 {
 	int64		totalsize = 0;
 	char	   *relationpath;
 	char		pathname[MAXPGPATH];
 	unsigned int segcount = 0;
 
-	relationpath = relpath(*rfn, forknum);
+	relationpath = relpathbackend(*rfn, backend, forknum);
 
 	for (segcount = 0;; segcount++)
 	{
@@ -303,7 +303,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
 
 	rel = relation_open(relOid, AccessShareLock);
 
-	size = calculate_relation_size(&(rel->rd_node),
+	size = calculate_relation_size(&(rel->rd_node), rel->rd_backend,
 							  forkname_to_number(text_to_cstring(forkName)));
 
 	relation_close(rel, AccessShareLock);
@@ -327,12 +327,14 @@ calculate_toast_table_size(Oid toastrelid)
 
 	/* toast heap size, including FSM and VM size */
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(toastRel->rd_node), forkNum);
+		size += calculate_relation_size(&(toastRel->rd_node),
+										toastRel->rd_backend, forkNum);
 
 	/* toast index size, including FSM and VM size */
 	toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(toastIdxRel->rd_node), forkNum);
+		size += calculate_relation_size(&(toastIdxRel->rd_node),
+										toastIdxRel->rd_backend, forkNum);
 
 	relation_close(toastIdxRel, AccessShareLock);
 	relation_close(toastRel, AccessShareLock);
@@ -361,7 +363,8 @@ calculate_table_size(Oid relOid)
 	 * heap size, including FSM and VM
 	 */
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(rel->rd_node), forkNum);
+		size += calculate_relation_size(&(rel->rd_node), rel->rd_backend,
+										forkNum);
 
 	/*
 	 * Size of toast relation
@@ -404,7 +407,9 @@ calculate_indexes_size(Oid relOid)
 			idxRel = relation_open(idxOid, AccessShareLock);
 
 			for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-				size += calculate_relation_size(&(idxRel->rd_node), forkNum);
+				size += calculate_relation_size(&(idxRel->rd_node),
+												idxRel->rd_backend,
+												forkNum);
 
 			relation_close(idxRel, AccessShareLock);
 		}
@@ -575,6 +580,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 	HeapTuple	tuple;
 	Form_pg_class relform;
 	RelFileNode rnode;
+	BackendId	backend;
 	char	   *path;
 
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
@@ -612,12 +618,27 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 			break;
 	}
 
-	ReleaseSysCache(tuple);
-
 	if (!OidIsValid(rnode.relNode))
+	{
+		ReleaseSysCache(tuple);
 		PG_RETURN_NULL();
+	}
+
+	/* If temporary, determine owning backend. */
+	if (!relform->relistemp)
+		backend = InvalidBackendId;
+	else if (isTempOrToastNamespace(relform->relnamespace))
+		backend = MyBackendId;
+	else
+	{
+		/* Do it the hard way. */
+		backend = GetTempNamespaceBackendId(relform->relnamespace);
+		Assert(backend != InvalidOid);
+	}
+
+	ReleaseSysCache(tuple);
 
-	path = relpath(rnode, MAIN_FORKNUM);
+	path = relpathbackend(rnode, backend, MAIN_FORKNUM);
 
 	PG_RETURN_TEXT_P(cstring_to_text(path));
 }
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 3b15d85..91de76d 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1165,7 +1165,7 @@ CacheInvalidateRelcacheByRelid(Oid relid)
  * replaying WAL as well as when creating it.
  */
 void
-CacheInvalidateSmgr(RelFileNode rnode)
+CacheInvalidateSmgr(BackendRelFileNode rnode)
 {
 	SharedInvalidationMessage msg;
 
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index dda69e8..bc355ca 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -858,10 +858,20 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 	relation->rd_istemp = relation->rd_rel->relistemp;
-	if (relation->rd_istemp)
-		relation->rd_islocaltemp = isTempOrToastNamespace(relation->rd_rel->relnamespace);
+	if (!relation->rd_istemp)
+		relation->rd_backend = InvalidBackendId;
+	else if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
+		relation->rd_backend = MyBackendId;
 	else
-		relation->rd_islocaltemp = false;
+	{
+		/*
+		 * If it's a temporary table, but not one of ours, we have to use
+		 * the slow, grotty method to figure out the owning backend.
+		 */
+		relation->rd_backend =
+			GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
+		Assert(relation->rd_backend != InvalidOid);
+	}
 
 	/*
 	 * initialize the tuple descriptor (relation->rd_att).
@@ -1424,7 +1434,7 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 	relation->rd_istemp = false;
-	relation->rd_islocaltemp = false;
+	relation->rd_backend = InvalidBackendId;
 
 	/*
 	 * initialize relation tuple form
@@ -2515,7 +2525,7 @@ RelationBuildLocalRelation(const char *relname,
 
 	/* it is temporary if and only if it is in my temp-table namespace */
 	rel->rd_istemp = isTempOrToastNamespace(relnamespace);
-	rel->rd_islocaltemp = rel->rd_istemp;
+	rel->rd_backend = rel->rd_istemp ? MyBackendId : InvalidBackendId;
 
 	/*
 	 * create a new tuple descriptor from the one passed in.  We do this
@@ -2629,7 +2639,7 @@ void
 RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 {
 	Oid			newrelfilenode;
-	RelFileNode newrnode;
+	BackendRelFileNode newrnode;
 	Relation	pg_class;
 	HeapTuple	tuple;
 	Form_pg_class classform;
@@ -2640,7 +2650,8 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 		   TransactionIdIsNormal(freezeXid));
 
 	/* Allocate a new relfilenode */
-	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL);
+	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
+									   relation->rd_backend);
 
 	/*
 	 * Get a writable copy of the pg_class tuple for the given relation.
@@ -2660,9 +2671,10 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 	 * NOTE: any conflict in relfilenode value will be caught here, if
 	 * GetNewRelFileNode messes up for any reason.
 	 */
-	newrnode = relation->rd_node;
-	newrnode.relNode = newrelfilenode;
-	RelationCreateStorage(newrnode, relation->rd_istemp);
+	newrnode.node = relation->rd_node;
+	newrnode.node.relNode = newrelfilenode;
+	newrnode.backend = relation->rd_backend;
+	RelationCreateStorage(newrnode.node, relation->rd_istemp);
 	smgrclosenode(newrnode);
 
 	/*
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 8ccb948..b06f8ff 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -55,7 +55,7 @@ provider postgresql {
 	probe sort__done(bool, long);
 
 	probe buffer__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool);
-	probe buffer__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool, bool);
+	probe buffer__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool, bool);
 	probe buffer__flush__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
 	probe buffer__flush__done(ForkNumber, BlockNumber, Oid, Oid, Oid);
 
@@ -81,10 +81,10 @@ provider postgresql {
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
 
-	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
-	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int);
-	probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
-	probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int);
+	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
+	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
+	probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
+	probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
 
 	probe xlog__insert(unsigned char, unsigned char);
 	probe xlog__switch();
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index bd430cb..765f571 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -25,10 +25,17 @@
 
 extern const char *forkNames[];
 extern ForkNumber forkname_to_number(char *forkName);
+extern int forkname_chars(const char *str);
 
-extern char *relpath(RelFileNode rnode, ForkNumber forknum);
+extern char *relpathbackend(RelFileNode rnode, BackendId backend,
+			  ForkNumber forknum);
 extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
 
+#define relpath(rnode, forknum) \
+		relpathbackend((rnode).node, (rnode).backend, (forknum))
+#define relpathperm(rnode, forknum) \
+		relpathbackend((rnode), InvalidBackendId, (forknum))
+
 extern bool IsSystemRelation(Relation relation);
 extern bool IsToastRelation(Relation relation);
 
@@ -45,6 +52,7 @@ extern bool IsSharedRelation(Oid relationId);
 extern Oid	GetNewOid(Relation relation);
 extern Oid GetNewOidWithIndex(Relation relation, Oid indexId,
 				   AttrNumber oidcolumn);
-extern Oid	GetNewRelFileNode(Oid reltablespace, Relation pg_class);
+extern Oid	GetNewRelFileNode(Oid reltablespace, Relation pg_class,
+				  BackendId backend);
 
 #endif   /* CATALOG_H */
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index 609f1c6..7712d1f 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -30,8 +30,7 @@ extern void RelationTruncate(Relation rel, BlockNumber nblocks);
  * naming
  */
 extern void smgrDoPendingDeletes(bool isCommit);
-extern int smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr,
-					  bool *haveNonTemp);
+extern int smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr);
 extern void AtSubCommit_smgr(void);
 extern void AtSubAbort_smgr(void);
 extern void PostPrepare_smgr(void);
diff --git a/src/include/postmaster/bgwriter.h b/src/include/postmaster/bgwriter.h
index 06a4e37..a22fbee 100644
--- a/src/include/postmaster/bgwriter.h
+++ b/src/include/postmaster/bgwriter.h
@@ -27,7 +27,7 @@ extern void BackgroundWriterMain(void);
 extern void RequestCheckpoint(int flags);
 extern void CheckpointWriteDelay(int flags, double progress);
 
-extern bool ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum,
+extern bool ForwardFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
 					BlockNumber segno);
 extern void AbsorbFsyncRequests(void);
 
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index f38f545..d4bf341 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -160,7 +160,7 @@ extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
 extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 				   BlockNumber blockNum, ReadBufferMode mode,
 				   BufferAccessStrategy strategy);
-extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
+extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
@@ -180,8 +180,8 @@ extern BlockNumber BufferGetBlockNumber(Buffer buffer);
 extern BlockNumber RelationGetNumberOfBlocks(Relation relation);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum,
-					   bool istemp, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(BackendRelFileNode rnode,
+					   ForkNumber forkNum, BlockNumber firstDelBlock);
 extern void DropDatabaseBuffers(Oid dbid);
 
 #ifdef NOT_USED
diff --git a/src/include/storage/relfilenode.h b/src/include/storage/relfilenode.h
index 6f0c0ad..9bb0fa2 100644
--- a/src/include/storage/relfilenode.h
+++ b/src/include/storage/relfilenode.h
@@ -14,6 +14,8 @@
 #ifndef RELFILENODE_H
 #define RELFILENODE_H
 
+#include "storage/backendid.h"
+
 /*
  * The physical storage of a relation consists of one or more forks. The
  * main fork is always created, but in addition to that there can be
@@ -37,7 +39,8 @@ typedef enum ForkNumber
 
 /*
  * RelFileNode must provide all that we need to know to physically access
- * a relation. Note, however, that a "physical" relation is comprised of
+ * a relation, with the exception of the backend ID, which can be provided
+ * separately. Note, however, that a "physical" relation is comprised of
  * multiple files on the filesystem, as each fork is stored as a separate
  * file, and each fork can be divided into multiple segments. See md.c.
  *
@@ -74,14 +77,30 @@ typedef struct RelFileNode
 } RelFileNode;
 
 /*
- * Note: RelFileNodeEquals compares relNode first since that is most likely
- * to be different in two unequal RelFileNodes.  It is probably redundant
- * to compare spcNode if the other two fields are found equal, but do it
- * anyway to be sure.
+ * Augmenting a relfilenode with the backend ID provides all the information
+ * we need to locate the physical storage.
+ */
+typedef struct BackendRelFileNode
+{
+	RelFileNode	node;
+	BackendId	backend;
+} BackendRelFileNode;
+
+/*
+ * Note: RelFileNodeEquals and BackendRelFileNodeEquals compare relNode first
+ * since that is most likely to be different in two unequal RelFileNodes.  It
+ * is probably redundant to compare spcNode if the other fields are found equal,
+ * but do it anyway to be sure.
  */
 #define RelFileNodeEquals(node1, node2) \
 	((node1).relNode == (node2).relNode && \
 	 (node1).dbNode == (node2).dbNode && \
 	 (node1).spcNode == (node2).spcNode)
 
+#define BackendRelFileNodeEquals(node1, node2) \
+	((node1).node.relNode == (node2).node.relNode && \
+	 (node1).node.dbNode == (node2).node.dbNode && \
+	 (node1).backend == (node2).backend && \
+	 (node1).node.spcNode == (node2).node.spcNode)
+
 #endif   /* RELFILENODE_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index bbdde81..0168d17 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -92,7 +92,7 @@ typedef struct
 typedef struct
 {
 	int16		id;				/* type field --- must be first */
-	RelFileNode rnode;			/* physical file ID */
+	BackendRelFileNode rnode;	/* physical file ID */
 } SharedInvalSmgrMsg;
 
 #define SHAREDINVALRELMAP_ID	(-4)
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index cf248b8..7ae78ec 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -16,6 +16,7 @@
 
 #include "access/xlog.h"
 #include "fmgr.h"
+#include "storage/backendid.h"
 #include "storage/block.h"
 #include "storage/relfilenode.h"
 
@@ -38,7 +39,7 @@
 typedef struct SMgrRelationData
 {
 	/* rnode is the hashtable lookup key, so it must be first! */
-	RelFileNode smgr_rnode;		/* relation physical identifier */
+	BackendRelFileNode smgr_rnode;		/* relation physical identifier */
 
 	/* pointer to owning pointer, or NULL if none */
 	struct SMgrRelationData **smgr_owner;
@@ -68,28 +69,30 @@ typedef struct SMgrRelationData
 
 typedef SMgrRelationData *SMgrRelation;
 
+#define SmgrIsTemp(smgr) \
+	((smgr)->smgr_rnode.backend != InvalidBackendId)
 
 extern void smgrinit(void);
-extern SMgrRelation smgropen(RelFileNode rnode);
+extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend);
 extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
 extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
 extern void smgrclose(SMgrRelation reln);
 extern void smgrcloseall(void);
-extern void smgrclosenode(RelFileNode rnode);
+extern void smgrclosenode(BackendRelFileNode rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, ForkNumber forknum,
-			 bool isTemp, bool isRedo);
+			 bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
-		   BlockNumber blocknum, char *buffer, bool isTemp);
+		   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
 			 BlockNumber blocknum);
 extern void smgrread(SMgrRelation reln, ForkNumber forknum,
 		 BlockNumber blocknum, char *buffer);
 extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
-		  BlockNumber blocknum, char *buffer, bool isTemp);
+		  BlockNumber blocknum, char *buffer, bool skipFsync);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-			 BlockNumber nblocks, bool isTemp);
+			 BlockNumber nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void smgrpreckpt(void);
 extern void smgrsync(void);
@@ -103,27 +106,28 @@ extern void mdinit(void);
 extern void mdclose(SMgrRelation reln, ForkNumber forknum);
 extern void mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern bool mdexists(SMgrRelation reln, ForkNumber forknum);
-extern void mdunlink(RelFileNode rnode, ForkNumber forknum, bool isRedo);
+extern void mdunlink(BackendRelFileNode rnode, ForkNumber forknum, bool isRedo);
 extern void mdextend(SMgrRelation reln, ForkNumber forknum,
-		 BlockNumber blocknum, char *buffer, bool isTemp);
+		 BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void mdprefetch(SMgrRelation reln, ForkNumber forknum,
 		   BlockNumber blocknum);
 extern void mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	   char *buffer);
 extern void mdwrite(SMgrRelation reln, ForkNumber forknum,
-		BlockNumber blocknum, char *buffer, bool isTemp);
+		BlockNumber blocknum, char *buffer, bool skipFsync);
 extern BlockNumber mdnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void mdtruncate(SMgrRelation reln, ForkNumber forknum,
-		   BlockNumber nblocks, bool isTemp);
+		   BlockNumber nblocks);
 extern void mdimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void mdpreckpt(void);
 extern void mdsync(void);
 extern void mdpostckpt(void);
 
 extern void SetForwardFsyncRequests(void);
-extern void RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum,
+extern void RememberFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
 					 BlockNumber segno);
-extern void ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum);
+extern void ForgetRelationFsyncRequests(BackendRelFileNode rnode,
+							ForkNumber forknum);
 extern void ForgetDatabaseFsyncRequests(Oid dbid);
 
 /* smgrtype.c */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index a86a17c..7fab919 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -49,7 +49,7 @@ extern void CacheInvalidateRelcacheByTuple(HeapTuple classTuple);
 
 extern void CacheInvalidateRelcacheByRelid(Oid relid);
 
-extern void CacheInvalidateSmgr(RelFileNode rnode);
+extern void CacheInvalidateSmgr(BackendRelFileNode rnode);
 
 extern void CacheInvalidateRelmap(Oid databaseId);
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 444e892..296c651 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -127,7 +127,7 @@ typedef struct RelationData
 	struct SMgrRelationData *rd_smgr;	/* cached file handle, or NULL */
 	int			rd_refcnt;		/* reference count */
 	bool		rd_istemp;		/* rel is a temporary relation */
-	bool		rd_islocaltemp; /* rel is a temp rel of this session */
+	BackendId	rd_backend;		/* owning backend id, if temporary relation */
 	bool		rd_isnailed;	/* rel is nailed in cache */
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
@@ -347,7 +347,7 @@ typedef struct StdRdOptions
 #define RelationOpenSmgr(relation) \
 	do { \
 		if ((relation)->rd_smgr == NULL) \
-			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node)); \
+			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node, (relation)->rd_backend)); \
 	} while (0)
 
 /*
@@ -393,7 +393,7 @@ typedef struct StdRdOptions
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_LOCAL(relation) \
-	((relation)->rd_islocaltemp || \
+	((relation)->rd_backend == MyBackendId || \
 	 (relation)->rd_createSubid != InvalidSubTransactionId)
 
 /*
@@ -403,7 +403,7 @@ typedef struct StdRdOptions
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_OTHER_TEMP(relation) \
-	((relation)->rd_istemp && !(relation)->rd_islocaltemp)
+	((relation)->rd_istemp && (relation)->rd_backend != MyBackendId)
 
 /* routines in utils/cache/relcache.c */
 extern void RelationIncrementReferenceCount(Relation rel);

Jaime Casanova

jaime@2ndquadrant.com

over 15 years ago

In reply to: Robert Haas (#1)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Jun 10, 2010 at 3:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I believe that this patch will clear away one major obstacle to
implementing global temporary and unlogged tables: it enables us to be
sure of cleaning up properly after a crash without relying on catalog
entries or XLOG. Based on previous discussions, however, I believe
there is support for making this change independently of what becomes
of that project, just for the benefit of having a more robust cleanup
mechanism.

Hi,

i was looking at v3 of this patch...

it looks good, works as advertised, pass regression tests, and all
tests i could think of...
haven't looked the code at much detail but seems ok, to me at least...

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

Jaime Casanova

jaime@2ndquadrant.com

over 15 years ago

In reply to: Jaime Casanova (#3)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Jul 23, 2010 at 10:05 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Thu, Jun 10, 2010 at 3:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I believe that this patch will clear away one major obstacle to
implementing global temporary and unlogged tables: it enables us to be
sure of cleaning up properly after a crash without relying on catalog
entries or XLOG. Based on previous discussions, however, I believe
there is support for making this change independently of what becomes
of that project, just for the benefit of having a more robust cleanup
mechanism.

Hi,

i was looking at v3 of this patch...

Ok, i like what you did in smgrextend, smgrwrite, and others...
changing isTemp for skipFsync is more descriptive

but i have a few questions, maybe is right what you did i only want to
understand it:
- you added this in include/storage/smgr.h, so why is safe to assume
that if the backend != InvalidBackendId it must be a temp relation?

+#define SmgrIsTemp(smgr) \
+   ((smgr)->smgr_rnode.backend != InvalidBackendId)

- you added a question like this "if (rel->rd_backend == MyBackendId)"
in a few places... why are you assuming that? that couldn't be a new
created relation (in current session of course)? is that safe?

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Jaime Casanova (#4)

Re: including backend ID in relpath of temp rels - updated patch

On Sun, Jul 25, 2010 at 2:37 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

but i have a few questions, maybe is right what you did i only want to
understand it:
- you added this in include/storage/smgr.h, so why is safe to assume
that if the backend != InvalidBackendId it must be a temp relation?
+#define SmgrIsTemp(smgr) \
+   ((smgr)->smgr_rnode.backend != InvalidBackendId)

That's pretty much the whole point of the patch. Instead of
identifying relations as simply "temporary" or "not temporary", we
identify as "a temporary relation owned by backend X" or as "not
temporary".

- you added a question like this "if (rel->rd_backend == MyBackendId)"
in a few places... why are you assuming that? that couldn't be a new
created relation (in current session of course)? is that safe?

Again, rd_backend is not the creating backend ID unless the relation
is a temprel.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Jaime Casanova

jaime@2ndquadrant.com

over 15 years ago

In reply to: Robert Haas (#5)

Re: including backend ID in relpath of temp rels - updated patch

On Sun, Jul 25, 2010 at 4:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jul 25, 2010 at 2:37 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
but i have a few questions, maybe is right what you did i only want to
understand it:
- you added this in include/storage/smgr.h, so why is safe to assume
that if the backend != InvalidBackendId it must be a temp relation?
+#define SmgrIsTemp(smgr) \
+   ((smgr)->smgr_rnode.backend != InvalidBackendId)
That's pretty much the whole point of the patch. Instead of
identifying relations as simply "temporary" or "not temporary", we
identify as "a temporary relation owned by backend X" or as "not
temporary".

Ok, this one seems good enough... i'm marking it as "ready for committer"

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#2)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

[ BackendRelFileNode patch ]

One thing that I find rather distressing about this is the 25% bloat
in sizeof(SharedInvalidationMessage). Couldn't we avoid that? Is it
really necessary to *ever* send an SI message for a backend-local rel?

I agree that one needs to send relcache inval sometimes for temp rels,
but I don't see why each backend couldn't interpret that as a flush
on its own local version.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#7)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 5, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

[ BackendRelFileNode patch ]

One thing that I find rather distressing about this is the 25% bloat
in sizeof(SharedInvalidationMessage). Couldn't we avoid that? Is it
really necessary to *ever* send an SI message for a backend-local rel?

It can be dropped or unlinked by another backend, so, yes.

I agree that one needs to send relcache inval sometimes for temp rels,
but I don't see why each backend couldn't interpret that as a flush
on its own local version.

The problem is that receipt of the inval message results in a call to
smgrclosenode(), which has to look up the (Backend)RelFileNode in
SMgrRelationHash. If the backend ID isn't present, we can't search
the hash table efficiently. This is not only a problem for
smgrclosenode(), either; that hash is used in a bunch of places, so
changing the hash key isn't really an option. One possibility would
be to have two separate shared-invalidation message types for smgr.
One would indicate that a non-temporary relation was flushed, so the
backend ID field is known to be -1 and we can search the hash quickly.
The other would indicate that a temporary relation was flushed and
you'd have to scan the whole hash table to find and flush all matching
entries. That still doesn't sound great, though. Another thought I
had was that if we could count on the backend ID to fit into an int16
we could fit it in to what's currently padding space. That would
require rather dramatically lowering the maximum number of backends
(currently INT_MAX/4), but it's a little hard to imagine that we can
really support more than 32,767 simultaneous backends anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#8)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 5, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

One thing that I find rather distressing about this is the 25% bloat
in sizeof(SharedInvalidationMessage). �Couldn't we avoid that? �Is it
really necessary to *ever* send an SI message for a backend-local rel?

It can be dropped or unlinked by another backend, so, yes.

Really? Surely that should be illegal during normal operation. We
might be doing such during crash recovery, but we don't need to
broadcast sinval messages then.

It might be sufficient to consider that there are "local" and "global"
smgr inval messages, where the former never get out of the generating
backend, so a bool is enough in the message struct.

had was that if we could count on the backend ID to fit into an int16
we could fit it in to what's currently padding space. That would
require rather dramatically lowering the maximum number of backends
(currently INT_MAX/4), but it's a little hard to imagine that we can
really support more than 32,767 simultaneous backends anyway.

Yeah, that occurred to me too. A further thought is that the id field
could probably be reduced to 1 byte, leaving 3 for backendid, which
would certainly be plenty. However representing that in a portable
struct declaration would be a bit painful I fear.

regards, tom lane

#10

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#9)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 6, 2010 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 5, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

One thing that I find rather distressing about this is the 25% bloat
in sizeof(SharedInvalidationMessage). Couldn't we avoid that? Is it
really necessary to *ever* send an SI message for a backend-local rel?

It can be dropped or unlinked by another backend, so, yes.

Really? Surely that should be illegal during normal operation. We
might be doing such during crash recovery, but we don't need to
broadcast sinval messages then.

autovacuum.c does it when we start to worry about XID wraparound, but
you can also do it from any normal backend. Just "DROP TABLE
pg_temp_2.foo" or whatever and away you go.

It might be sufficient to consider that there are "local" and "global"
smgr inval messages, where the former never get out of the generating
backend, so a bool is enough in the message struct.

It would be nice to be able to do it that way, but I don't believe
it's the case, per the above.

had was that if we could count on the backend ID to fit into an int16
we could fit it in to what's currently padding space. That would
require rather dramatically lowering the maximum number of backends
(currently INT_MAX/4), but it's a little hard to imagine that we can
really support more than 32,767 simultaneous backends anyway.

Yeah, that occurred to me too. A further thought is that the id field
could probably be reduced to 1 byte, leaving 3 for backendid, which
would certainly be plenty. However representing that in a portable
struct declaration would be a bit painful I fear.

Well, presumably we'd just represent it as a 1-byte field followed by
a 2-byte field, and do a bit of math. But I don't really see the
point. The whole architecture of a shared invalidation queue is
fundamentally non-scalable because it's a broadcast medium. If we
wanted to efficiently support even thousands of backends (let alone
tens or hundreds of thousands) I assume we would need to rearchitect
this completely with more fine-grained queues, and have backends
subscribe to the queues pertaining to the objects they want to access
before touching them. Or maybe something else entirely. But I don't
think broadcasting to 30,000 backends is going to work for the same
reason that plugging 30,000 machines into an Ethernet *hub* doesn't
work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#11

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#10)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 6, 2010 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Really? �Surely that should be illegal during normal operation. We
might be doing such during crash recovery, but we don't need to
broadcast sinval messages then.

autovacuum.c does it when we start to worry about XID wraparound, but
you can also do it from any normal backend. Just "DROP TABLE
pg_temp_2.foo" or whatever and away you go.

Mph. I'm still not convinced that the sinval message needs to carry
backendid. But maybe the first-cut solution should just be to squeeze
the id into the padding area. You should be able to get up to 65535
allowed backends, not 32k --- or perhaps use different message type IDs
for local and global backendid, so that all 65536 bitpatterns are
allowed for a non-global backendid.

Well, presumably we'd just represent it as a 1-byte field followed by
a 2-byte field, and do a bit of math. But I don't really see the
point. The whole architecture of a shared invalidation queue is
fundamentally non-scalable because it's a broadcast medium.

Sure, it tops out somewhere, but 32K is way too close to configurations
we know work well enough in the field (I've seen multiple reports of
people using a couple thousand backends). In any case, sinval readers
don't block each other in the current implementation, so I'm really
dubious that there's any inherent scalability limitation there. I'll
hold still for 64K but I think it might be better to go for 2^24.

regards, tom lane

#12

Jaime Casanova

jaime@2ndquadrant.com

over 15 years ago

In reply to: Robert Haas (#10)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 6, 2010 at 12:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Just "DROP TABLE
pg_temp_2.foo" or whatever and away you go.

wow! that's true... and certainly a bug...
we shouldn't allow any session to drop other session's temp tables, or
is there a reason for this misbehavior?

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

#13

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Jaime Casanova (#12)

Re: including backend ID in relpath of temp rels - updated patch

Jaime Casanova <jaime@2ndquadrant.com> writes:

On Fri, Aug 6, 2010 at 12:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Just "DROP TABLE pg_temp_2.foo" or whatever and away you go.

wow! that's true... and certainly a bug...

No, it's not a bug. You'll find only superusers can do it.

we shouldn't allow any session to drop other session's temp tables, or
is there a reason for this misbehavior?

It is intentional, though I'd be willing to give it up if we had more
bulletproof crash-cleanup of temp tables --- which is one of the things
this patch is supposed to result in.

regards, tom lane

#14

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#11)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 6, 2010 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 6, 2010 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Really? Surely that should be illegal during normal operation. We
might be doing such during crash recovery, but we don't need to
broadcast sinval messages then.

autovacuum.c does it when we start to worry about XID wraparound, but
you can also do it from any normal backend. Just "DROP TABLE
pg_temp_2.foo" or whatever and away you go.

Mph. I'm still not convinced that the sinval message needs to carry
backendid.

Hey, if you have an idea... I'd love to get rid of it, too, but I
don't see how to do it.

But maybe the first-cut solution should just be to squeeze
the id into the padding area. You should be able to get up to 65535
allowed backends, not 32k --- or perhaps use different message type IDs
for local and global backendid, so that all 65536 bitpatterns are
allowed for a non-global backendid.

Well, presumably we'd just represent it as a 1-byte field followed by
a 2-byte field, and do a bit of math. But I don't really see the
point. The whole architecture of a shared invalidation queue is
fundamentally non-scalable because it's a broadcast medium.

Sure, it tops out somewhere, but 32K is way too close to configurations
we know work well enough in the field (I've seen multiple reports of
people using a couple thousand backends). In any case, sinval readers
don't block each other in the current implementation, so I'm really
dubious that there's any inherent scalability limitation there. I'll
hold still for 64K but I think it might be better to go for 2^24.

Well, I wouldn't expect anyone to use an exclusive lock for readers
without a good reason, but you still have n backends that each have to
read, presumably, about O(n) messages, so eventually that's going to
start to pinch.

Do you think it's worth worrying about the reduction in the number of
possible SI message types?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#15

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#13)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 6, 2010 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jaime Casanova <jaime@2ndquadrant.com> writes:

On Fri, Aug 6, 2010 at 12:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Just "DROP TABLE pg_temp_2.foo" or whatever and away you go.

wow! that's true... and certainly a bug...

No, it's not a bug. You'll find only superusers can do it.

we shouldn't allow any session to drop other session's temp tables, or
is there a reason for this misbehavior?

It is intentional, though I'd be willing to give it up if we had more
bulletproof crash-cleanup of temp tables --- which is one of the things
this patch is supposed to result in.

This patch only directly addresses the issue of cleaning up the
storage, so there are still the catalog entries to worry about. But
it doesn't seem impossible to think about building on this approach to
eventually handle that part of the problem better, too. I haven't
thought too much about what that would look like, but I think it could
be done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#16

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#14)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 6, 2010 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sure, it tops out somewhere, but 32K is way too close to configurations
we know work well enough in the field (I've seen multiple reports of
people using a couple thousand backends).

Well, I wouldn't expect anyone to use an exclusive lock for readers
without a good reason, but you still have n backends that each have to
read, presumably, about O(n) messages, so eventually that's going to
start to pinch.

Sure, but I don't see much to be gained from multiple queues either.
There are few (order of zero, in fact) cases where sinval messages
are transmitted that aren't of potential interest to all backends.
Maybe you could do something useful with a very large number of
dynamically-defined queues (like one per relation) ... but managing that
would probably swamp any savings.

Do you think it's worth worrying about the reduction in the number of
possible SI message types?

IIRC the number of message types is the number of catalog caches plus
half a dozen or so. We're a long way from exhausting even a 1-byte
ID field; and we could play more games if we had to, since there would
be a padding byte free in the message types that refer to a catalog
cache. IOW, 1-byte id doesn't bother me.

regards, tom lane

#17

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#15)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

This patch only directly addresses the issue of cleaning up the
storage, so there are still the catalog entries to worry about. But
it doesn't seem impossible to think about building on this approach to
eventually handle that part of the problem better, too. I haven't
thought too much about what that would look like, but I think it could
be done.

Perhaps run through pg_class after restart and flush anything marked
relistemp? Although the ideal solution, probably, would be for temp
tables to not have persistent catalog entries in the first place.

regards, tom lane

#18

David Fetter

david@fetter.org

over 15 years ago

In reply to: Tom Lane (#17)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 06, 2010 at 03:00:35PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This patch only directly addresses the issue of cleaning up the
storage, so there are still the catalog entries to worry about.
But it doesn't seem impossible to think about building on this
approach to eventually handle that part of the problem better,
too. I haven't thought too much about what that would look like,
but I think it could be done.

Perhaps run through pg_class after restart and flush anything marked
relistemp? Although the ideal solution, probably, would be for temp
tables to not have persistent catalog entries in the first place.

For the upcoming global temp tables, which are visible in other
sessions, how would this work?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#19

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: David Fetter (#18)

Re: including backend ID in relpath of temp rels - updated patch

David Fetter <david@fetter.org> writes:

On Fri, Aug 06, 2010 at 03:00:35PM -0400, Tom Lane wrote:

Perhaps run through pg_class after restart and flush anything marked
relistemp? Although the ideal solution, probably, would be for temp
tables to not have persistent catalog entries in the first place.

For the upcoming global temp tables, which are visible in other
sessions, how would this work?

Well, that's a totally different matter. Those would certainly have
persistent entries, at least for the "master" version. I don't think
anyone's really figured out what the best implementation looks like;
but maybe there would be per-backend "child" versions that could act
just like the current kind of temp table (and again would ideally not
have any persistent catalog entries).

regards, tom lane

#20

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#17)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 6, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

This patch only directly addresses the issue of cleaning up the
storage, so there are still the catalog entries to worry about. But
it doesn't seem impossible to think about building on this approach to
eventually handle that part of the problem better, too. I haven't
thought too much about what that would look like, but I think it could
be done.

Perhaps run through pg_class after restart and flush anything marked
relistemp?

The trouble is that you have to bind to a database before you can run
through pg_class, and the postmaster doesn't. Of course, if it could
attach to a database and then detach again, this might be feasible,
although perhaps still a bit overly complex for the postmaster, but in
any event it doesn't. We seem to already have a mechanism in place
where a backend that creates a temprel nukes any pre-existing temprel
schema for its own backend, but of course that's not fool-proof.

Although the ideal solution, probably, would be for temp
tables to not have persistent catalog entries in the first place.

I've been thinking about that, but it's a bit challenging to imagine
how it could work. It's not just the pg_class entries you have to
think about, but also pg_attrdef, pg_attribute, pg_constraint,
pg_description, pg_index, pg_rewrite, and pg_trigger. An even
stickier problem is that we have lots of places in the backend code
that refer to objects by OID. We'd either need to change all of that
code (which seems like a non-starter) or somehow guarantee that the
OIDs assigned to any given backend's private objects would be
different from those assigned to any public object (which I also don't
see how to do).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#21

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#16)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 6, 2010 at 2:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 6, 2010 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sure, it tops out somewhere, but 32K is way too close to configurations
we know work well enough in the field (I've seen multiple reports of
people using a couple thousand backends).

Well, I wouldn't expect anyone to use an exclusive lock for readers
without a good reason, but you still have n backends that each have to
read, presumably, about O(n) messages, so eventually that's going to
start to pinch.

Sure, but I don't see much to be gained from multiple queues either.
There are few (order of zero, in fact) cases where sinval messages
are transmitted that aren't of potential interest to all backends.
Maybe you could do something useful with a very large number of
dynamically-defined queues (like one per relation) ... but managing that
would probably swamp any savings.

Well, what I was thinking is that if you could guarantee that a
certain backend COULDN'T have a particular relfilenode open at the
smgr level, for example, then it needn't read the invalidation
messages for that relfilenode. Precisely how to slice that up is
another matter. For the present case, for instance, you could
creating one queue per backend. In the normal course of events, each
backend would subscribe only to its own queue, but if one backend
wanted to drop a temporary relation belonging to some other backend,
it would temporarily subscribe to that backend's queue, do whatever it
needed to do, and then flush all the smgr references before
unsubscribing from the queue. That's a bit silly because we doubtless
wouldn't invent such a complicated mechanism just for this case, but I
think it illustrates the kind of thing that one could do. Or if you
wanted to optimize for the case of a large number of databases running
in a single cluster, perhaps you'd want one queue per database plus a
shared queue for the shared catalogs. I don't know. This is just pie
in the sky.

Do you think it's worth worrying about the reduction in the number of
possible SI message types?

IIRC the number of message types is the number of catalog caches plus
half a dozen or so. We're a long way from exhausting even a 1-byte
ID field; and we could play more games if we had to, since there would
be a padding byte free in the message types that refer to a catalog
cache. IOW, 1-byte id doesn't bother me.

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#22

Alvaro Herrera

alvherre@commandprompt.com

over 15 years ago

In reply to: Robert Haas (#20)

Re: including backend ID in relpath of temp rels - updated patch

Excerpts from Robert Haas's message of vie ago 06 15:32:21 -0400 2010:

Perhaps run through pg_class after restart and flush anything marked
relistemp?

The trouble is that you have to bind to a database before you can run
through pg_class, and the postmaster doesn't. Of course, if it could
attach to a database and then detach again, this might be feasible,
although perhaps still a bit overly complex for the postmaster, but in
any event it doesn't.

A simpler idea seems to run a process specifically to connect to the
database to scan pg_class there, and then die. It sounds a tad
expensive though.

I've been thinking about that, but it's a bit challenging to imagine
how it could work. It's not just the pg_class entries you have to
think about, but also pg_attrdef, pg_attribute, pg_constraint,
pg_description, pg_index, pg_rewrite, and pg_trigger. An even
stickier problem is that we have lots of places in the backend code
that refer to objects by OID. We'd either need to change all of that
code (which seems like a non-starter) or somehow guarantee that the
OIDs assigned to any given backend's private objects would be
different from those assigned to any public object (which I also don't
see how to do).

Maybe we could reserve one of the 32 bits of OID to indicate private-ness.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#23

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Robert Haas (#21)

1 attachment(s)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 6, 2010 at 3:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Do you think it's worth worrying about the reduction in the number of
possible SI message types?

IIRC the number of message types is the number of catalog caches plus
half a dozen or so. We're a long way from exhausting even a 1-byte
ID field; and we could play more games if we had to, since there would
be a padding byte free in the message types that refer to a catalog
cache. IOW, 1-byte id doesn't bother me.

OK.

Here's an updated patch, with the invalidation changes merged in and
hopefully-suitable adjustments elsewhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachments:

temprelnames-v4.patchapplication/octet-stream; name=temprelnames-v4.patchDownload

diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d1f7bcc..9645c95 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -373,8 +373,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	}
 
 	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks,
-				 rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
 
 	/*
 	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 89ed8a0..06e304e 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -295,9 +295,8 @@ _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
 	}
 
 	/*
-	 * Now write the page.	We say isTemp = true even if it's not a temp
-	 * index, because there's no need for smgr to schedule an fsync for this
-	 * write; we'll do it ourselves before ending the build.
+	 * Now write the page.	There's no need for smgr to schedule an fsync for
+	 * this write; we'll do it ourselves before ending the build.
 	 */
 	if (blkno == wstate->btws_pages_written)
 	{
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d0680f0..615a7fa 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -865,8 +865,8 @@ StartPrepare(GlobalTransaction gxact)
 	hdr.prepared_at = gxact->prepared_at;
 	hdr.owner = gxact->owner;
 	hdr.nsubxacts = xactGetCommittedChildren(&children);
-	hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels, NULL);
-	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels, NULL);
+	hdr.ncommitrels = smgrGetPendingDeletes(true, &commitrels);
+	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
 	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
 														  &hdr.initfileinval);
 	StrNCpy(hdr.gid, gxact->gid, GIDSIZE);
@@ -1320,13 +1320,13 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	}
 	for (i = 0; i < ndelrels; i++)
 	{
-		SMgrRelation srel = smgropen(delrels[i]);
+		SMgrRelation srel = smgropen(delrels[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
 		{
 			if (smgrexists(srel, fork))
-				smgrdounlink(srel, fork, false, false);
+				smgrdounlink(srel, fork, false);
 		}
 		smgrclose(srel);
 	}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 60e22d0..7836677 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -912,7 +912,6 @@ RecordTransactionCommit(void)
 	TransactionId latestXid = InvalidTransactionId;
 	int			nrels;
 	RelFileNode *rels;
-	bool		haveNonTemp;
 	int			nchildren;
 	TransactionId *children;
 	int			nmsgs;
@@ -920,7 +919,7 @@ RecordTransactionCommit(void)
 	bool		RelcacheInitFileInval;
 
 	/* Get data needed for commit record */
-	nrels = smgrGetPendingDeletes(true, &rels, &haveNonTemp);
+	nrels = smgrGetPendingDeletes(true, &rels);
 	nchildren = xactGetCommittedChildren(&children);
 	nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 												 &RelcacheInitFileInval);
@@ -1047,7 +1046,7 @@ RecordTransactionCommit(void)
 	 * asynchronous commit if all to-be-deleted tables are temporary though,
 	 * since they are lost anyway if we crash.)
 	 */
-	if (XactSyncCommit || forceSyncCommit || haveNonTemp)
+	if (XactSyncCommit || forceSyncCommit || nrels > 0)
 	{
 		/*
 		 * Synchronous commit case:
@@ -1333,7 +1332,7 @@ RecordTransactionAbort(bool isSubXact)
 			 xid);
 
 	/* Fetch the data we need for the abort record */
-	nrels = smgrGetPendingDeletes(false, &rels, NULL);
+	nrels = smgrGetPendingDeletes(false, &rels);
 	nchildren = xactGetCommittedChildren(&children);
 
 	/* XXX do we really need a critical section here? */
@@ -4473,7 +4472,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4481,7 +4480,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
@@ -4578,7 +4577,7 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], InvalidBackendId);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4586,7 +4585,7 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
@@ -4660,7 +4659,7 @@ xact_desc_commit(StringInfo buf, xl_xact_commit *xlrec)
 		appendStringInfo(buf, "; rels:");
 		for (i = 0; i < xlrec->nrels; i++)
 		{
-			char	   *path = relpath(xlrec->xnodes[i], MAIN_FORKNUM);
+			char	   *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
 
 			appendStringInfo(buf, " %s", path);
 			pfree(path);
@@ -4715,7 +4714,7 @@ xact_desc_abort(StringInfo buf, xl_xact_abort *xlrec)
 		appendStringInfo(buf, "; rels:");
 		for (i = 0; i < xlrec->nrels; i++)
 		{
-			char	   *path = relpath(xlrec->xnodes[i], MAIN_FORKNUM);
+			char	   *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
 
 			appendStringInfo(buf, " %s", path);
 			pfree(path);
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b5bbc5e..dba842c 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -68,7 +68,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 */
 	if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
 	{
-		char	   *path = relpath(node, forkno);
+		char	   *path = relpathperm(node, forkno);
 
 		if (present)
 			elog(DEBUG1, "page %u of relation %s is uninitialized",
@@ -133,7 +133,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
 		{
 			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
 			{
-				char	   *path = relpath(hentry->key.node, forkno);
+				char	   *path = relpathperm(hentry->key.node, forkno);
 
 				elog(DEBUG2, "page %u of relation %s has been dropped",
 					 hentry->key.blkno, path);
@@ -166,7 +166,7 @@ forget_invalid_pages_db(Oid dbid)
 		{
 			if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
 			{
-				char	   *path = relpath(hentry->key.node, hentry->key.forkno);
+				char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
 				elog(DEBUG2, "page %u of relation %s has been dropped",
 					 hentry->key.blkno, path);
@@ -200,7 +200,7 @@ XLogCheckInvalidPages(void)
 	 */
 	while ((hentry = (xl_invalid_page *) hash_seq_search(&status)) != NULL)
 	{
-		char	   *path = relpath(hentry->key.node, hentry->key.forkno);
+		char	   *path = relpathperm(hentry->key.node, hentry->key.forkno);
 
 		if (hentry->present)
 			elog(WARNING, "page %u of relation %s was uninitialized",
@@ -276,7 +276,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	Assert(blkno != P_NEW);
 
 	/* Open the relation at smgr level */
-	smgr = smgropen(rnode);
+	smgr = smgropen(rnode, InvalidBackendId);
 
 	/*
 	 * Create the target file if it doesn't already exist.  This lets us cope
@@ -293,7 +293,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	if (blkno < lastblock)
 	{
 		/* page exists in file */
-		buffer = ReadBufferWithoutRelcache(rnode, false, forknum, blkno,
+		buffer = ReadBufferWithoutRelcache(rnode, forknum, blkno,
 										   mode, NULL);
 	}
 	else
@@ -312,7 +312,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 		{
 			if (buffer != InvalidBuffer)
 				ReleaseBuffer(buffer);
-			buffer = ReadBufferWithoutRelcache(rnode, false, forknum,
+			buffer = ReadBufferWithoutRelcache(rnode, forknum,
 											   P_NEW, mode, NULL);
 			lastblock++;
 		}
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 3edfc23..5ec5602 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -78,12 +78,37 @@ forkname_to_number(char *forkName)
 }
 
 /*
- * relpath			- construct path to a relation's file
+ * forkname_chars
+ * 		We use this to figure out whether a filename could be a relation
+ * 		fork (as opposed to an oddly named stray file that somehow ended
+ * 		up in the database directory).  If the passed string begins with
+ * 		a fork name (other than the main fork name), we return its length.
+ * 		If not, we return 0.
+ *
+ * Note that the present coding assumes that there are no fork names which
+ * are prefixes of other fork names.
+ */
+int
+forkname_chars(const char *str)
+{
+	ForkNumber	forkNum;
+
+	for (forkNum = 1; forkNum <= MAX_FORKNUM; forkNum++)
+	{
+		int len = strlen(forkNames[forkNum]);
+		if (strncmp(forkNames[forkNum], str, len) == 0)
+			return len;
+	}
+	return 0;
+}
+
+/*
+ * relpathbackend - construct path to a relation's file
  *
  * Result is a palloc'd string.
  */
 char *
-relpath(RelFileNode rnode, ForkNumber forknum)
+relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)
 {
 	int			pathlen;
 	char	   *path;
@@ -92,6 +117,7 @@ relpath(RelFileNode rnode, ForkNumber forknum)
 	{
 		/* Shared system relations live in {datadir}/global */
 		Assert(rnode.dbNode == 0);
+		Assert(backend == InvalidBackendId);
 		pathlen = 7 + OIDCHARS + 1 + FORKNAMECHARS + 1;
 		path = (char *) palloc(pathlen);
 		if (forknum != MAIN_FORKNUM)
@@ -103,29 +129,69 @@ relpath(RelFileNode rnode, ForkNumber forknum)
 	else if (rnode.spcNode == DEFAULTTABLESPACE_OID)
 	{
 		/* The default tablespace is {datadir}/base */
-		pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-		path = (char *) palloc(pathlen);
-		if (forknum != MAIN_FORKNUM)
-			snprintf(path, pathlen, "base/%u/%u_%s",
-					 rnode.dbNode, rnode.relNode, forkNames[forknum]);
+		if (backend == InvalidBackendId)
+		{
+			pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "base/%u/%u_%s",
+						 rnode.dbNode, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "base/%u/%u",
+						 rnode.dbNode, rnode.relNode);
+		}
 		else
-			snprintf(path, pathlen, "base/%u/%u",
-					 rnode.dbNode, rnode.relNode);
+		{
+			/* OIDCHARS will suffice for an integer, too */
+			pathlen = 5 + OIDCHARS + 2 + OIDCHARS + 1 + OIDCHARS + 1
+					+ FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "base/%u/t%d_%u_%s",
+						 rnode.dbNode, backend, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "base/%u/t%d_%u",
+						 rnode.dbNode, backend, rnode.relNode);
+		}
 	}
 	else
 	{
 		/* All other tablespaces are accessed via symlinks */
-		pathlen = 9 + 1 + OIDCHARS + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) +
-			1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-		path = (char *) palloc(pathlen);
-		if (forknum != MAIN_FORKNUM)
-			snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
-					 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
-					 rnode.dbNode, rnode.relNode, forkNames[forknum]);
+		if (backend == InvalidBackendId)
+		{
+			pathlen = 9 + 1 + OIDCHARS + 1
+					+ strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1
+					+ OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, rnode.relNode);
+		}
 		else
-			snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
-					 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
-					 rnode.dbNode, rnode.relNode);
+		{
+			/* OIDCHARS will suffice for an integer, too */
+			pathlen = 9 + 1 + OIDCHARS + 1
+					+ strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 2
+					+ OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
+			path = (char *) palloc(pathlen);
+			if (forknum != MAIN_FORKNUM)
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u_%s",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, backend, rnode.relNode,
+						 forkNames[forknum]);
+			else
+				snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u",
+						 rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
+						 rnode.dbNode, backend, rnode.relNode);
+		}
 	}
 	return path;
 }
@@ -458,16 +524,17 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
  * created by bootstrap have preassigned OIDs, so there's no need.
  */
 Oid
-GetNewRelFileNode(Oid reltablespace, Relation pg_class)
+GetNewRelFileNode(Oid reltablespace, Relation pg_class, BackendId backend)
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 	char	   *rpath;
 	int			fd;
 	bool		collides;
 
 	/* This logic should match RelationInitPhysicalAddr */
-	rnode.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
-	rnode.dbNode = (rnode.spcNode == GLOBALTABLESPACE_OID) ? InvalidOid : MyDatabaseId;
+	rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
+	rnode.node.dbNode = (rnode.node.spcNode == GLOBALTABLESPACE_OID) ? InvalidOid : MyDatabaseId;
+	rnode.backend = backend;
 
 	do
 	{
@@ -475,9 +542,9 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class)
 
 		/* Generate the OID */
 		if (pg_class)
-			rnode.relNode = GetNewOid(pg_class);
+			rnode.node.relNode = GetNewOid(pg_class);
 		else
-			rnode.relNode = GetNewObjectId();
+			rnode.node.relNode = GetNewObjectId();
 
 		/* Check for existing file of same name */
 		rpath = relpath(rnode, MAIN_FORKNUM);
@@ -508,5 +575,5 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class)
 		pfree(rpath);
 	} while (collides);
 
-	return rnode.relNode;
+	return rnode.node.relNode;
 }
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 20d78a2..9403dc1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -39,6 +39,7 @@
 #include "catalog/heap.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_attrdef.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -994,7 +995,9 @@ heap_create_with_catalog(const char *relname,
 			binary_upgrade_next_toast_relfilenode = InvalidOid;
 		}
 		else
-			relid = GetNewRelFileNode(reltablespace, pg_class_desc);
+			relid = GetNewRelFileNode(reltablespace, pg_class_desc,
+									  isTempOrToastNamespace(relnamespace) ?
+										  MyBackendId : InvalidBackendId);
 	}
 
 	/*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 69946fe..3408a10 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -645,7 +645,12 @@ index_create(Oid heapRelationId,
 			binary_upgrade_next_index_relfilenode = InvalidOid;
 		}
 		else
-			indexRelationId = GetNewRelFileNode(tableSpaceId, pg_class);
+		{
+			indexRelationId =
+				GetNewRelFileNode(tableSpaceId, pg_class,
+								  heapRelation->rd_istemp ?
+									MyBackendId : InvalidBackendId);
+		}
 	}
 
 	/*
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index ad376a1..4515741 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -52,7 +52,7 @@
 typedef struct PendingRelDelete
 {
 	RelFileNode relnode;		/* relation that may need to be deleted */
-	bool		isTemp;			/* is it a temporary relation? */
+	BackendId	backend;		/* InvalidBackendId if not a temp rel */
 	bool		atCommit;		/* T=delete at commit; F=delete at abort */
 	int			nestLevel;		/* xact nesting level of request */
 	struct PendingRelDelete *next;		/* linked-list link */
@@ -102,8 +102,9 @@ RelationCreateStorage(RelFileNode rnode, bool istemp)
 	XLogRecData rdata;
 	xl_smgr_create xlrec;
 	SMgrRelation srel;
+	BackendId	backend = istemp ? MyBackendId : InvalidBackendId;
 
-	srel = smgropen(rnode);
+	srel = smgropen(rnode, backend);
 	smgrcreate(srel, MAIN_FORKNUM, false);
 
 	if (!istemp)
@@ -125,7 +126,7 @@ RelationCreateStorage(RelFileNode rnode, bool istemp)
 	pending = (PendingRelDelete *)
 		MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
 	pending->relnode = rnode;
-	pending->isTemp = istemp;
+	pending->backend = backend;
 	pending->atCommit = false;	/* delete if abort */
 	pending->nestLevel = GetCurrentTransactionNestLevel();
 	pending->next = pendingDeletes;
@@ -145,7 +146,7 @@ RelationDropStorage(Relation rel)
 	pending = (PendingRelDelete *)
 		MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
 	pending->relnode = rel->rd_node;
-	pending->isTemp = rel->rd_istemp;
+	pending->backend = rel->rd_backend;
 	pending->atCommit = true;	/* delete if commit */
 	pending->nestLevel = GetCurrentTransactionNestLevel();
 	pending->next = pendingDeletes;
@@ -283,7 +284,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
 }
 
 /*
@@ -322,14 +323,11 @@ smgrDoPendingDeletes(bool isCommit)
 				SMgrRelation srel;
 				int			i;
 
-				srel = smgropen(pending->relnode);
+				srel = smgropen(pending->relnode, pending->backend);
 				for (i = 0; i <= MAX_FORKNUM; i++)
 				{
 					if (smgrexists(srel, i))
-						smgrdounlink(srel,
-									 i,
-									 pending->isTemp,
-									 false);
+						smgrdounlink(srel, i, false);
 				}
 				smgrclose(srel);
 			}
@@ -341,20 +339,24 @@ smgrDoPendingDeletes(bool isCommit)
 }
 
 /*
- * smgrGetPendingDeletes() -- Get a list of relations to be deleted.
+ * smgrGetPendingDeletes() -- Get a list of non-temp relations to be deleted.
  *
  * The return value is the number of relations scheduled for termination.
  * *ptr is set to point to a freshly-palloc'd array of RelFileNodes.
  * If there are no relations to be deleted, *ptr is set to NULL.
  *
- * If haveNonTemp isn't NULL, the bool it points to gets set to true if
- * there is any non-temp table pending to be deleted; false if not.
+ * Only non-temporary relations are included in the returned list.  This is OK
+ * because the list is used only in contexts where temporary relations don't
+ * matter: we're either writing to the two-phase state file (and transactions
+ * that have touched temp tables can't be prepared) or we're writing to xlog
+ * (and all temporary files will be zapped if we restart anyway, so no need
+ * for redo to do it also).
  *
  * Note that the list does not include anything scheduled for termination
  * by upper-level transactions.
  */
 int
-smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
+smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr)
 {
 	int			nestLevel = GetCurrentTransactionNestLevel();
 	int			nrels;
@@ -362,11 +364,10 @@ smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
 	PendingRelDelete *pending;
 
 	nrels = 0;
-	if (haveNonTemp)
-		*haveNonTemp = false;
 	for (pending = pendingDeletes; pending != NULL; pending = pending->next)
 	{
-		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit)
+		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit
+			&& pending->backend == InvalidBackendId)
 			nrels++;
 	}
 	if (nrels == 0)
@@ -378,13 +379,12 @@ smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp)
 	*ptr = rptr;
 	for (pending = pendingDeletes; pending != NULL; pending = pending->next)
 	{
-		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit)
+		if (pending->nestLevel >= nestLevel && pending->atCommit == forCommit
+			&& pending->backend == InvalidBackendId)
 		{
 			*rptr = pending->relnode;
 			rptr++;
 		}
-		if (haveNonTemp && !pending->isTemp)
-			*haveNonTemp = true;
 	}
 	return nrels;
 }
@@ -456,7 +456,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		xl_smgr_create *xlrec = (xl_smgr_create *) XLogRecGetData(record);
 		SMgrRelation reln;
 
-		reln = smgropen(xlrec->rnode);
+		reln = smgropen(xlrec->rnode, InvalidBackendId);
 		smgrcreate(reln, MAIN_FORKNUM, true);
 	}
 	else if (info == XLOG_SMGR_TRUNCATE)
@@ -465,7 +465,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		SMgrRelation reln;
 		Relation	rel;
 
-		reln = smgropen(xlrec->rnode);
+		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
 		/*
 		 * Forcibly create relation if it doesn't exist (which suggests that
@@ -475,7 +475,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		 */
 		smgrcreate(reln, MAIN_FORKNUM, true);
 
-		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno, false);
+		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
 
 		/* Also tell xlogutils.c about it */
 		XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
@@ -502,7 +502,7 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)
 	if (info == XLOG_SMGR_CREATE)
 	{
 		xl_smgr_create *xlrec = (xl_smgr_create *) rec;
-		char	   *path = relpath(xlrec->rnode, MAIN_FORKNUM);
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
 
 		appendStringInfo(buf, "file create: %s", path);
 		pfree(path);
@@ -510,7 +510,7 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)
 	else if (info == XLOG_SMGR_TRUNCATE)
 	{
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) rec;
-		char	   *path = relpath(xlrec->rnode, MAIN_FORKNUM);
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
 
 		appendStringInfo(buf, "file truncate: %s to %u blocks", path,
 						 xlrec->blkno);
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 52eef7c..fbdc423 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -195,7 +195,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
 	 * Toast tables for regular relations go in pg_toast; those for temp
 	 * relations go into the per-backend temp-toast-table namespace.
 	 */
-	if (rel->rd_islocaltemp)
+	if (rel->rd_backend == MyBackendId)
 		namespaceid = GetTempToastNamespace();
 	else
 		namespaceid = PG_TOAST_NAMESPACE;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 906d547..31e19bc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1019,7 +1019,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString)
 		ExecCheckRTPerms(list_make1(rte), true);
 
 		/* check read-only transaction */
-		if (XactReadOnly && is_from && !cstate->rel->rd_islocaltemp)
+		if (XactReadOnly && is_from && cstate->rel->rd_backend != MyBackendId)
 			PreventCommandIfReadOnly("COPY FROM");
 
 		/* Don't allow COPY w/ OIDs to or from a table without them */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 66b67dd..3bf067e 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -472,7 +472,7 @@ nextval_internal(Oid relid)
 						RelationGetRelationName(seqrel))));
 
 	/* read-only transactions may only modify temp sequences */
-	if (!seqrel->rd_islocaltemp)
+	if (seqrel->rd_backend != MyBackendId)
 		PreventCommandIfReadOnly("nextval()");
 
 	if (elm->last != elm->cached)		/* some numbers were cached */
@@ -749,7 +749,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
 						RelationGetRelationName(seqrel))));
 
 	/* read-only transactions may only modify temp sequences */
-	if (!seqrel->rd_islocaltemp)
+	if (seqrel->rd_backend != MyBackendId)
 		PreventCommandIfReadOnly("setval()");
 
 	/* lock page' buffer and read tuple */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2ec1476..067e207 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7165,13 +7165,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	 * Relfilenodes are not unique across tablespaces, so we need to allocate
 	 * a new one in the new tablespace.
 	 */
-	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL);
+	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL, rel->rd_backend);
 
 	/* Open old and new relation */
 	newrnode = rel->rd_node;
 	newrnode.relNode = newrelfilenode;
 	newrnode.spcNode = newTableSpace;
-	dstrel = smgropen(newrnode);
+	dstrel = smgropen(newrnode, rel->rd_backend);
 
 	RelationOpenSmgr(rel);
 
@@ -7262,7 +7262,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
 
 		/* XLOG stuff */
 		if (use_wal)
-			log_newpage(&dst->smgr_rnode, forkNum, blkno, page);
+			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
 
 		/*
 		 * Now write the page.	We say isTemp = true even if it's not a temp
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 95e9d37..cf79de3 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -113,7 +113,7 @@
  */
 typedef struct
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 	ForkNumber	forknum;
 	BlockNumber segno;			/* see md.c for special values */
 	/* might add a real request-type field later; not needed yet */
@@ -1071,7 +1071,8 @@ RequestCheckpoint(int flags)
  * than we have to here.
  */
 bool
-ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+ForwardFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
+					BlockNumber segno)
 {
 	BgWriterRequest *request;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index caae936..4dd894b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -95,7 +95,8 @@ static void WaitIO(volatile BufferDesc *buf);
 static bool StartBufferIO(volatile BufferDesc *buf, bool forInput);
 static void TerminateBufferIO(volatile BufferDesc *buf, bool clear_dirty,
 				  int set_flag_bits);
-static void buffer_write_error_callback(void *arg);
+static void shared_buffer_write_error_callback(void *arg);
+static void local_buffer_write_error_callback(void *arg);
 static volatile BufferDesc *BufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
 			BlockNumber blockNum,
 			BufferAccessStrategy strategy,
@@ -141,7 +142,8 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 		int			buf_id;
 
 		/* create a tag so we can lookup the buffer */
-		INIT_BUFFERTAG(newTag, reln->rd_smgr->smgr_rnode, forkNum, blockNum);
+		INIT_BUFFERTAG(newTag, reln->rd_smgr->smgr_rnode.node,
+					   forkNum, blockNum);
 
 		/* determine its hash code and partition lock ID */
 		newHash = BufTableHashCode(&newTag);
@@ -251,18 +253,21 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
  * ReadBufferWithoutRelcache -- like ReadBufferExtended, but doesn't require
  *		a relcache entry for the relation.
  *
- * NB: caller is assumed to know what it's doing if isTemp is true.
+ * NB: At present, this function may not be used on temporary relations, which
+ * is OK, because we only use it during XLOG replay.  If in the future we
+ * want to use it on temporary relations, we could pass the backend ID as an
+ * additional parameter.
  */
 Buffer
-ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
-						  ForkNumber forkNum, BlockNumber blockNum,
-						  ReadBufferMode mode, BufferAccessStrategy strategy)
+ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
+						  BlockNumber blockNum, ReadBufferMode mode,
+						  BufferAccessStrategy strategy)
 {
 	bool		hit;
 
-	SMgrRelation smgr = smgropen(rnode);
+	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
 
-	return ReadBuffer_common(smgr, isTemp, forkNum, blockNum, mode, strategy,
+	return ReadBuffer_common(smgr, false, forkNum, blockNum, mode, strategy,
 							 &hit);
 }
 
@@ -414,7 +419,7 @@ ReadBuffer_common(SMgrRelation smgr, bool isLocalBuf, ForkNumber forkNum,
 	{
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
-		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, isLocalBuf);
+		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
 	}
 	else
 	{
@@ -465,10 +470,10 @@ ReadBuffer_common(SMgrRelation smgr, bool isLocalBuf, ForkNumber forkNum,
 		VacuumCostBalance += VacuumCostPageMiss;
 
 	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
-									  smgr->smgr_rnode.spcNode,
-									  smgr->smgr_rnode.dbNode,
-									  smgr->smgr_rnode.relNode,
-									  isLocalBuf,
+									  smgr->smgr_rnode.node.spcNode,
+									  smgr->smgr_rnode.node.dbNode,
+									  smgr->smgr_rnode.node.relNode,
+									  smgr->smgr_rnode.backend,
 									  isExtend,
 									  found);
 
@@ -512,7 +517,7 @@ BufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
 	bool		valid;
 
 	/* create a tag so we can lookup the buffer */
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* determine its hash code and partition lock ID */
 	newHash = BufTableHashCode(&newTag);
@@ -1693,21 +1698,24 @@ PrintBufferLeakWarning(Buffer buffer)
 	volatile BufferDesc *buf;
 	int32		loccount;
 	char	   *path;
+	BackendId	backend;
 
 	Assert(BufferIsValid(buffer));
 	if (BufferIsLocal(buffer))
 	{
 		buf = &LocalBufferDescriptors[-buffer - 1];
 		loccount = LocalRefCount[-buffer - 1];
+		backend = MyBackendId;
 	}
 	else
 	{
 		buf = &BufferDescriptors[buffer - 1];
 		loccount = PrivateRefCount[buffer - 1];
+		backend = InvalidBackendId;
 	}
 
 	/* theoretically we should lock the bufhdr here */
-	path = relpath(buf->tag.rnode, buf->tag.forkNum);
+	path = relpathbackend(buf->tag.rnode, backend, buf->tag.forkNum);
 	elog(WARNING,
 		 "buffer refcount leak: [%03d] "
 		 "(rel=%s, blockNum=%u, flags=0x%x, refcount=%u %d)",
@@ -1831,14 +1839,14 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
 		return;
 
 	/* Setup error traceback support for ereport() */
-	errcontext.callback = buffer_write_error_callback;
+	errcontext.callback = shared_buffer_write_error_callback;
 	errcontext.arg = (void *) buf;
 	errcontext.previous = error_context_stack;
 	error_context_stack = &errcontext;
 
 	/* Find smgr relation for buffer */
 	if (reln == NULL)
-		reln = smgropen(buf->tag.rnode);
+		reln = smgropen(buf->tag.rnode, InvalidBackendId);
 
 	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
 										buf->tag.blockNum,
@@ -1929,14 +1937,15 @@ RelationGetNumberOfBlocks(Relation relation)
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum, bool istemp,
+DropRelFileNodeBuffers(BackendRelFileNode rnode, ForkNumber forkNum,
 					   BlockNumber firstDelBlock)
 {
 	int			i;
 
-	if (istemp)
+	if (rnode.backend != InvalidBackendId)
 	{
-		DropRelFileNodeLocalBuffers(rnode, forkNum, firstDelBlock);
+		if (rnode.backend == MyBackendId)
+			DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
 		return;
 	}
 
@@ -1945,7 +1954,7 @@ DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum, bool istemp,
 		volatile BufferDesc *bufHdr = &BufferDescriptors[i];
 
 		LockBufHdr(bufHdr);
-		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) &&
+		if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
 			bufHdr->tag.forkNum == forkNum &&
 			bufHdr->tag.blockNum >= firstDelBlock)
 			InvalidateBuffer(bufHdr);	/* releases spinlock */
@@ -2008,7 +2017,7 @@ PrintBufferDescs(void)
 			 "[%02d] (freeNext=%d, rel=%s, "
 			 "blockNum=%u, flags=0x%x, refcount=%u %d)",
 			 i, buf->freeNext,
-			 relpath(buf->tag.rnode, buf->tag.forkNum),
+			 relpathbackend(buf->tag.rnode, InvalidBackendId, buf->tag.forkNum),
 			 buf->tag.blockNum, buf->flags,
 			 buf->refcount, PrivateRefCount[i]);
 	}
@@ -2078,7 +2087,7 @@ FlushRelationBuffers(Relation rel)
 				ErrorContextCallback errcontext;
 
 				/* Setup error traceback support for ereport() */
-				errcontext.callback = buffer_write_error_callback;
+				errcontext.callback = local_buffer_write_error_callback;
 				errcontext.arg = (void *) bufHdr;
 				errcontext.previous = error_context_stack;
 				error_context_stack = &errcontext;
@@ -2087,7 +2096,7 @@ FlushRelationBuffers(Relation rel)
 						  bufHdr->tag.forkNum,
 						  bufHdr->tag.blockNum,
 						  (char *) LocalBufHdrGetBlock(bufHdr),
-						  true);
+						  false);
 
 				bufHdr->flags &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 
@@ -2699,8 +2708,9 @@ AbortBufferIO(void)
 			if (sv_flags & BM_IO_ERROR)
 			{
 				/* Buffer is pinned, so we can read tag without spinlock */
-				char	   *path = relpath(buf->tag.rnode, buf->tag.forkNum);
+				char	   *path;
 
+				path = relpathperm(buf->tag.rnode, buf->tag.forkNum);
 				ereport(WARNING,
 						(errcode(ERRCODE_IO_ERROR),
 						 errmsg("could not write block %u of %s",
@@ -2714,17 +2724,36 @@ AbortBufferIO(void)
 }
 
 /*
- * Error context callback for errors occurring during buffer writes.
+ * Error context callback for errors occurring during shared buffer writes.
  */
 static void
-buffer_write_error_callback(void *arg)
+shared_buffer_write_error_callback(void *arg)
 {
 	volatile BufferDesc *bufHdr = (volatile BufferDesc *) arg;
 
 	/* Buffer is pinned, so we can read the tag without locking the spinlock */
 	if (bufHdr != NULL)
 	{
-		char	   *path = relpath(bufHdr->tag.rnode, bufHdr->tag.forkNum);
+		char	   *path = relpathperm(bufHdr->tag.rnode, bufHdr->tag.forkNum);
+
+		errcontext("writing block %u of relation %s",
+				   bufHdr->tag.blockNum, path);
+		pfree(path);
+	}
+}
+
+/*
+ * Error context callback for errors occurring during buffer writes.
+ */
+static void
+local_buffer_write_error_callback(void *arg)
+{
+	volatile BufferDesc *bufHdr = (volatile BufferDesc *) arg;
+
+	if (bufHdr != NULL)
+	{
+		char	   *path = relpathbackend(bufHdr->tag.rnode, MyBackendId,
+										 bufHdr->tag.forkNum);
 
 		errcontext("writing block %u of relation %s",
 				   bufHdr->tag.blockNum, path);
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index c5b6a2c..bbf0a01 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -68,7 +68,7 @@ LocalPrefetchBuffer(SMgrRelation smgr, ForkNumber forkNum,
 	BufferTag	newTag;			/* identity of requested block */
 	LocalBufferLookupEnt *hresult;
 
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* Initialize local buffers if first request in this session */
 	if (LocalBufHash == NULL)
@@ -110,7 +110,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 	int			trycounter;
 	bool		found;
 
-	INIT_BUFFERTAG(newTag, smgr->smgr_rnode, forkNum, blockNum);
+	INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
 
 	/* Initialize local buffers if first request in this session */
 	if (LocalBufHash == NULL)
@@ -127,7 +127,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		Assert(BUFFERTAGS_EQUAL(bufHdr->tag, newTag));
 #ifdef LBDEBUG
 		fprintf(stderr, "LB ALLOC (%u,%d,%d) %d\n",
-				smgr->smgr_rnode.relNode, forkNum, blockNum, -b - 1);
+				smgr->smgr_rnode.node.relNode, forkNum, blockNum, -b - 1);
 #endif
 		/* this part is equivalent to PinBuffer for a shared buffer */
 		if (LocalRefCount[b] == 0)
@@ -150,7 +150,8 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 
 #ifdef LBDEBUG
 	fprintf(stderr, "LB ALLOC (%u,%d,%d) %d\n",
-		 smgr->smgr_rnode.relNode, forkNum, blockNum, -nextFreeLocalBuf - 1);
+		 smgr->smgr_rnode.node.relNode, forkNum, blockNum,
+		 -nextFreeLocalBuf - 1);
 #endif
 
 	/*
@@ -198,14 +199,14 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		SMgrRelation oreln;
 
 		/* Find smgr relation for buffer */
-		oreln = smgropen(bufHdr->tag.rnode);
+		oreln = smgropen(bufHdr->tag.rnode, MyBackendId);
 
 		/* And write... */
 		smgrwrite(oreln,
 				  bufHdr->tag.forkNum,
 				  bufHdr->tag.blockNum,
 				  (char *) LocalBufHdrGetBlock(bufHdr),
-				  true);
+				  false);
 
 		/* Mark not-dirty now in case we error out below */
 		bufHdr->flags &= ~BM_DIRTY;
@@ -309,7 +310,8 @@ DropRelFileNodeLocalBuffers(RelFileNode rnode, ForkNumber forkNum,
 			if (LocalRefCount[i] != 0)
 				elog(ERROR, "block %u of %s is still referenced (local %u)",
 					 bufHdr->tag.blockNum,
-					 relpath(bufHdr->tag.rnode, bufHdr->tag.forkNum),
+					 relpathbackend(bufHdr->tag.rnode, MyBackendId,
+								   bufHdr->tag.forkNum),
 					 LocalRefCount[i]);
 			/* Remove entry from hashtable */
 			hresult = (LocalBufferLookupEnt *)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 1d2eeec..3fb5844 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -249,6 +249,9 @@ static File OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError);
 static void AtProcExit_Files(int code, Datum arg);
 static void CleanupTempFiles(bool isProcExit);
 static void RemovePgTempFilesInDir(const char *tmpdirname);
+static void RemovePgTempRelationFiles(const char *tsdirname);
+static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname);
+static bool looks_like_temp_rel_name(const char *name);
 
 
 /*
@@ -1824,10 +1827,12 @@ CleanupTempFiles(bool isProcExit)
 
 
 /*
- * Remove temporary files left over from a prior postmaster session
+ * Remove temporary and temporary relation files left over from a prior
+ * postmaster session
  *
  * This should be called during postmaster startup.  It will forcibly
- * remove any leftover files created by OpenTemporaryFile.
+ * remove any leftover files created by OpenTemporaryFile and any leftover
+ * temporary relation files created by mdcreate.
  *
  * NOTE: we could, but don't, call this during a post-backend-crash restart
  * cycle.  The argument for not doing it is that someone might want to examine
@@ -1847,6 +1852,7 @@ RemovePgTempFiles(void)
 	 */
 	snprintf(temp_path, sizeof(temp_path), "base/%s", PG_TEMP_FILES_DIR);
 	RemovePgTempFilesInDir(temp_path);
+	RemovePgTempRelationFiles("base");
 
 	/*
 	 * Cycle through temp directories for all non-default tablespaces.
@@ -1862,6 +1868,10 @@ RemovePgTempFiles(void)
 		snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s/%s",
 			spc_de->d_name, TABLESPACE_VERSION_DIRECTORY, PG_TEMP_FILES_DIR);
 		RemovePgTempFilesInDir(temp_path);
+
+		snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
+			spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
+		RemovePgTempRelationFiles(temp_path);
 	}
 
 	FreeDir(spc_dir);
@@ -1915,3 +1925,123 @@ RemovePgTempFilesInDir(const char *tmpdirname)
 
 	FreeDir(temp_dir);
 }
+
+/* Process one tablespace directory, look for per-DB subdirectories */
+static void
+RemovePgTempRelationFiles(const char *tsdirname)
+{
+	DIR		   *ts_dir;
+	struct dirent *de;
+	char		dbspace_path[MAXPGPATH];
+
+	ts_dir = AllocateDir(tsdirname);
+	if (ts_dir == NULL)
+	{
+		/* anything except ENOENT is fishy */
+		if (errno != ENOENT)
+			elog(LOG,
+				 "could not open tablespace directory \"%s\": %m",
+				 tsdirname);
+		return;
+	}
+
+	while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
+	{
+		int		i = 0;
+
+		/*
+		 * We're only interested in the per-database directories, which have
+		 * numeric names.  Note that this code will also (properly) ignore "."
+		 * and "..".
+		 */
+		while (isdigit((unsigned char) de->d_name[i]))
+			++i;
+		if (de->d_name[i] != '\0' || i == 0)
+			continue;
+
+		snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
+				 tsdirname, de->d_name);
+		RemovePgTempRelationFilesInDbspace(dbspace_path);
+	}
+
+	FreeDir(ts_dir);
+}
+
+/* Process one per-dbspace directory for RemovePgTempRelationFiles */
+static void
+RemovePgTempRelationFilesInDbspace(const char *dbspacedirname)
+{
+	DIR		   *dbspace_dir;
+	struct dirent *de;
+	char		rm_path[MAXPGPATH];
+
+	dbspace_dir = AllocateDir(dbspacedirname);
+	if (dbspace_dir == NULL)
+	{
+		/* we just saw this directory, so it really ought to be there */
+		elog(LOG,
+			 "could not open dbspace directory \"%s\": %m",
+			 dbspacedirname);
+		return;
+	}
+
+	while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+	{
+		if (!looks_like_temp_rel_name(de->d_name))
+			continue;
+
+		snprintf(rm_path, sizeof(rm_path), "%s/%s",
+				 dbspacedirname, de->d_name);
+
+		unlink(rm_path);	/* note we ignore any error */
+	}
+
+	FreeDir(dbspace_dir);
+}
+
+/* t<digits>_<digits>, or t<digits>_<digits>_<forkname> */
+static bool
+looks_like_temp_rel_name(const char *name)
+{
+	int			pos;
+	int			savepos;
+
+	/* Must start with "t". */
+	if (name[0] != 't')
+		return false;
+
+	/* Followed by a non-empty string of digits and then an underscore. */
+	for (pos = 1; isdigit((unsigned char) name[pos]); ++pos)
+		;
+	if (pos == 1 || name[pos] != '_')
+		return false;
+
+	/* Followed by another nonempty string of digits. */
+	for (savepos = ++pos; isdigit((unsigned char) name[pos]); ++pos)
+		;
+	if (savepos == pos)
+		return false;
+
+	/* We might have _forkname or .segment or both. */
+	if (name[pos] == '_')
+	{
+		int		forkchar = forkname_chars(&name[pos+1]);
+		if (forkchar <= 0)
+			return false;
+		pos += forkchar + 1;
+	}
+	if (name[pos] == '.')
+	{
+		int		segchar;
+		for (segchar = 1; isdigit((unsigned char) name[pos+segchar]); ++segchar)
+			;
+		if (segchar <= 1)
+			return false;
+		pos += segchar;
+	}
+
+	/* Now we should be at the end. */
+	if (name[pos] != '\0')
+		return false;
+	return true;
+}
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 579572f..b43a2ce 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -303,7 +303,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
 
 	/*
 	 * We might as well update the local smgr_fsm_nblocks setting.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 4163ca0..79805b4 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -119,7 +119,7 @@ static MemoryContext MdCxt;		/* context for all md.c allocations */
  */
 typedef struct
 {
-	RelFileNode rnode;			/* the targeted relation */
+	BackendRelFileNode rnode;	/* the targeted relation */
 	ForkNumber	forknum;
 	BlockNumber segno;			/* which segment */
 } PendingOperationTag;
@@ -135,7 +135,7 @@ typedef struct
 
 typedef struct
 {
-	RelFileNode rnode;			/* the dead relation to delete */
+	BackendRelFileNode rnode;	/* the dead relation to delete */
 	CycleCtr	cycle_ctr;		/* mdckpt_cycle_ctr when request was made */
 } PendingUnlinkEntry;
 
@@ -158,14 +158,14 @@ static MdfdVec *mdopen(SMgrRelation reln, ForkNumber forknum,
 	   ExtensionBehavior behavior);
 static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
 					   MdfdVec *seg);
-static void register_unlink(RelFileNode rnode);
+static void register_unlink(BackendRelFileNode rnode);
 static MdfdVec *_fdvec_alloc(void);
 static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
 			  BlockNumber segno);
 static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forkno,
 			  BlockNumber segno, int oflags);
 static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forkno,
-			 BlockNumber blkno, bool isTemp, ExtensionBehavior behavior);
+			 BlockNumber blkno, bool skipFsync, ExtensionBehavior behavior);
 static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
 		   MdfdVec *seg);
 
@@ -321,7 +321,7 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
  * we are usually not in a transaction anymore when this is called.
  */
 void
-mdunlink(RelFileNode rnode, ForkNumber forkNum, bool isRedo)
+mdunlink(BackendRelFileNode rnode, ForkNumber forkNum, bool isRedo)
 {
 	char	   *path;
 	int			ret;
@@ -417,7 +417,7 @@ mdunlink(RelFileNode rnode, ForkNumber forkNum, bool isRedo)
  */
 void
 mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		 char *buffer, bool isTemp)
+		 char *buffer, bool skipFsync)
 {
 	off_t		seekpos;
 	int			nbytes;
@@ -440,7 +440,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 						relpath(reln->smgr_rnode, forknum),
 						InvalidBlockNumber)));
 
-	v = _mdfd_getseg(reln, forknum, blocknum, isTemp, EXTENSION_CREATE);
+	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
 
 	seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -478,7 +478,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 				 errhint("Check free disk space.")));
 	}
 
-	if (!isTemp)
+	if (!skipFsync && !SmgrIsTemp(reln))
 		register_dirty_segment(reln, forknum, v);
 
 	Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
@@ -605,9 +605,10 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	MdfdVec    *v;
 
 	TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,
-										reln->smgr_rnode.spcNode,
-										reln->smgr_rnode.dbNode,
-										reln->smgr_rnode.relNode);
+										reln->smgr_rnode.node.spcNode,
+										reln->smgr_rnode.node.dbNode,
+										reln->smgr_rnode.node.relNode,
+										reln->smgr_rnode.backend);
 
 	v = _mdfd_getseg(reln, forknum, blocknum, false, EXTENSION_FAIL);
 
@@ -624,9 +625,10 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ);
 
 	TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,
-									   reln->smgr_rnode.spcNode,
-									   reln->smgr_rnode.dbNode,
-									   reln->smgr_rnode.relNode,
+									   reln->smgr_rnode.node.spcNode,
+									   reln->smgr_rnode.node.dbNode,
+									   reln->smgr_rnode.node.relNode,
+									   reln->smgr_rnode.backend,
 									   nbytes,
 									   BLCKSZ);
 
@@ -666,7 +668,7 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  */
 void
 mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		char *buffer, bool isTemp)
+		char *buffer, bool skipFsync)
 {
 	off_t		seekpos;
 	int			nbytes;
@@ -678,11 +680,12 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 #endif
 
 	TRACE_POSTGRESQL_SMGR_MD_WRITE_START(forknum, blocknum,
-										 reln->smgr_rnode.spcNode,
-										 reln->smgr_rnode.dbNode,
-										 reln->smgr_rnode.relNode);
+										 reln->smgr_rnode.node.spcNode,
+										 reln->smgr_rnode.node.dbNode,
+										 reln->smgr_rnode.node.relNode,
+										 reln->smgr_rnode.backend);
 
-	v = _mdfd_getseg(reln, forknum, blocknum, isTemp, EXTENSION_FAIL);
+	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_FAIL);
 
 	seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -697,9 +700,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ);
 
 	TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum,
-										reln->smgr_rnode.spcNode,
-										reln->smgr_rnode.dbNode,
-										reln->smgr_rnode.relNode,
+										reln->smgr_rnode.node.spcNode,
+										reln->smgr_rnode.node.dbNode,
+										reln->smgr_rnode.node.relNode,
+										reln->smgr_rnode.backend,
 										nbytes,
 										BLCKSZ);
 
@@ -720,7 +724,7 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 				 errhint("Check free disk space.")));
 	}
 
-	if (!isTemp)
+	if (!skipFsync && !SmgrIsTemp(reln))
 		register_dirty_segment(reln, forknum, v);
 }
 
@@ -794,8 +798,7 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
  *	mdtruncate() -- Truncate relation to specified number of blocks.
  */
 void
-mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-		   bool isTemp)
+mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	MdfdVec    *v;
 	BlockNumber curnblk;
@@ -839,7 +842,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 						 errmsg("could not truncate file \"%s\": %m",
 								FilePathName(v->mdfd_vfd))));
 
-			if (!isTemp)
+			if (!SmgrIsTemp(reln))
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			Assert(ov != reln->md_fd[forknum]); /* we never drop the 1st
@@ -864,7 +867,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 					errmsg("could not truncate file \"%s\" to %u blocks: %m",
 						   FilePathName(v->mdfd_vfd),
 						   nblocks)));
-			if (!isTemp)
+			if (!SmgrIsTemp(reln))
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			ov->mdfd_chain = NULL;
@@ -1052,7 +1055,8 @@ mdsync(void)
 				 * the relation will have been dirtied through this same smgr
 				 * relation, and so we can save a file open/close cycle.
 				 */
-				reln = smgropen(entry->tag.rnode);
+				reln = smgropen(entry->tag.rnode.node,
+								entry->tag.rnode.backend);
 
 				/*
 				 * It is possible that the relation has been dropped or
@@ -1235,7 +1239,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
  * a remote pending-ops table.
  */
 static void
-register_unlink(RelFileNode rnode)
+register_unlink(BackendRelFileNode rnode)
 {
 	if (pendingOpsTable)
 	{
@@ -1278,7 +1282,8 @@ register_unlink(RelFileNode rnode)
  * structure for them.)
  */
 void
-RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
+RememberFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
+					 BlockNumber segno)
 {
 	Assert(pendingOpsTable);
 
@@ -1291,7 +1296,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 		hash_seq_init(&hstat, pendingOpsTable);
 		while ((entry = (PendingOperationEntry *) hash_seq_search(&hstat)) != NULL)
 		{
-			if (RelFileNodeEquals(entry->tag.rnode, rnode) &&
+			if (BackendRelFileNodeEquals(entry->tag.rnode, rnode) &&
 				entry->tag.forknum == forknum)
 			{
 				/* Okay, cancel this entry */
@@ -1312,7 +1317,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 		hash_seq_init(&hstat, pendingOpsTable);
 		while ((entry = (PendingOperationEntry *) hash_seq_search(&hstat)) != NULL)
 		{
-			if (entry->tag.rnode.dbNode == rnode.dbNode)
+			if (entry->tag.rnode.node.dbNode == rnode.node.dbNode)
 			{
 				/* Okay, cancel this entry */
 				entry->canceled = true;
@@ -1326,7 +1331,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
 			PendingUnlinkEntry *entry = (PendingUnlinkEntry *) lfirst(cell);
 
 			next = lnext(cell);
-			if (entry->rnode.dbNode == rnode.dbNode)
+			if (entry->rnode.node.dbNode == rnode.node.dbNode)
 			{
 				pendingUnlinks = list_delete_cell(pendingUnlinks, cell, prev);
 				pfree(entry);
@@ -1393,7 +1398,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
  * ForgetRelationFsyncRequests -- forget any fsyncs for a rel
  */
 void
-ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum)
+ForgetRelationFsyncRequests(BackendRelFileNode rnode, ForkNumber forknum)
 {
 	if (pendingOpsTable)
 	{
@@ -1428,11 +1433,12 @@ ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum)
 void
 ForgetDatabaseFsyncRequests(Oid dbid)
 {
-	RelFileNode rnode;
+	BackendRelFileNode rnode;
 
-	rnode.dbNode = dbid;
-	rnode.spcNode = 0;
-	rnode.relNode = 0;
+	rnode.node.dbNode = dbid;
+	rnode.node.spcNode = 0;
+	rnode.node.relNode = 0;
+	rnode.backend = InvalidBackendId;
 
 	if (pendingOpsTable)
 	{
@@ -1523,12 +1529,12 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
  *		specified block.
  *
  * If the segment doesn't exist, we ereport, return NULL, or create the
- * segment, according to "behavior".  Note: isTemp need only be correct
- * in the EXTENSION_CREATE case.
+ * segment, according to "behavior".  Note: skipFsync is only used in the
+ * EXTENSION_CREATE case.
  */
 static MdfdVec *
 _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
-			 bool isTemp, ExtensionBehavior behavior)
+			 bool skipFsync, ExtensionBehavior behavior)
 {
 	MdfdVec    *v = mdopen(reln, forknum, behavior);
 	BlockNumber targetseg;
@@ -1566,7 +1572,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
 
 					mdextend(reln, forknum,
 							 nextsegno * ((BlockNumber) RELSEG_SIZE) - 1,
-							 zerobuf, isTemp);
+							 zerobuf, skipFsync);
 					pfree(zerobuf);
 				}
 				v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT);
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 3b12cb3..ecf238d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -45,19 +45,19 @@ typedef struct f_smgr
 	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
 											bool isRedo);
 	bool		(*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_unlink) (RelFileNode rnode, ForkNumber forknum,
+	void		(*smgr_unlink) (BackendRelFileNode rnode, ForkNumber forknum,
 											bool isRedo);
 	void		(*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
-							BlockNumber blocknum, char *buffer, bool isTemp);
+							BlockNumber blocknum, char *buffer, bool skipFsync);
 	void		(*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
 											  BlockNumber blocknum);
 	void		(*smgr_read) (SMgrRelation reln, ForkNumber forknum,
 										  BlockNumber blocknum, char *buffer);
 	void		(*smgr_write) (SMgrRelation reln, ForkNumber forknum,
-							BlockNumber blocknum, char *buffer, bool isTemp);
+							BlockNumber blocknum, char *buffer, bool skipFsync);
 	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
-										   BlockNumber nblocks, bool isTemp);
+										   BlockNumber nblocks);
 	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_pre_ckpt) (void);		/* may be NULL */
 	void		(*smgr_sync) (void);	/* may be NULL */
@@ -83,8 +83,6 @@ static HTAB *SMgrRelationHash = NULL;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
-static void smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
-					 int which, bool isTemp, bool isRedo);
 
 
 /*
@@ -131,8 +129,9 @@ smgrshutdown(int code, Datum arg)
  *		This does not attempt to actually open the object.
  */
 SMgrRelation
-smgropen(RelFileNode rnode)
+smgropen(RelFileNode rnode, BackendId backend)
 {
+	BackendRelFileNode brnode;
 	SMgrRelation reln;
 	bool		found;
 
@@ -142,7 +141,7 @@ smgropen(RelFileNode rnode)
 		HASHCTL		ctl;
 
 		MemSet(&ctl, 0, sizeof(ctl));
-		ctl.keysize = sizeof(RelFileNode);
+		ctl.keysize = sizeof(BackendRelFileNode);
 		ctl.entrysize = sizeof(SMgrRelationData);
 		ctl.hash = tag_hash;
 		SMgrRelationHash = hash_create("smgr relation table", 400,
@@ -150,8 +149,10 @@ smgropen(RelFileNode rnode)
 	}
 
 	/* Look up or create an entry */
+	brnode.node = rnode;
+	brnode.backend = backend;
 	reln = (SMgrRelation) hash_search(SMgrRelationHash,
-									  (void *) &rnode,
+									  (void *) &brnode,
 									  HASH_ENTER, &found);
 
 	/* Initialize it if not present before */
@@ -261,7 +262,7 @@ smgrcloseall(void)
  * such entry exists already.
  */
 void
-smgrclosenode(RelFileNode rnode)
+smgrclosenode(BackendRelFileNode rnode)
 {
 	SMgrRelation reln;
 
@@ -305,8 +306,8 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	 * should be here and not in commands/tablespace.c?  But that would imply
 	 * importing a lot of stuff that smgr.c oughtn't know, either.
 	 */
-	TablespaceCreateDbspace(reln->smgr_rnode.spcNode,
-							reln->smgr_rnode.dbNode,
+	TablespaceCreateDbspace(reln->smgr_rnode.node.spcNode,
+							reln->smgr_rnode.node.dbNode,
 							isRedo);
 
 	(*(smgrsw[reln->smgr_which].smgr_create)) (reln, forknum, isRedo);
@@ -323,29 +324,19 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  *		already.
  */
 void
-smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isTemp, bool isRedo)
+smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 {
-	RelFileNode rnode = reln->smgr_rnode;
+	BackendRelFileNode rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
 
 	/* Close the fork */
 	(*(smgrsw[which].smgr_close)) (reln, forknum);
 
-	smgr_internal_unlink(rnode, forknum, which, isTemp, isRedo);
-}
-
-/*
- * Shared subroutine that actually does the unlink ...
- */
-static void
-smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
-					 int which, bool isTemp, bool isRedo)
-{
 	/*
 	 * Get rid of any remaining buffers for the relation.  bufmgr will just
 	 * drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(rnode, forknum, isTemp, 0);
+	DropRelFileNodeBuffers(rnode, forknum, 0);
 
 	/*
 	 * It'd be nice to tell the stats collector to forget it immediately, too.
@@ -385,10 +376,10 @@ smgr_internal_unlink(RelFileNode rnode, ForkNumber forknum,
  */
 void
 smgrextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		   char *buffer, bool isTemp)
+		   char *buffer, bool skipFsync)
 {
 	(*(smgrsw[reln->smgr_which].smgr_extend)) (reln, forknum, blocknum,
-											   buffer, isTemp);
+											   buffer, skipFsync);
 }
 
 /*
@@ -426,16 +417,16 @@ smgrread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  *		on disk at return, only dumped out to the kernel.  However,
  *		provisions will be made to fsync the write before the next checkpoint.
  *
- *		isTemp indicates that the relation is a temp table (ie, is managed
- *		by the local-buffer manager).  In this case no provisions need be
- *		made to fsync the write before checkpointing.
+ *		skipFsync indicates that the caller will make other provisions to
+ *		fsync the relation, so we needn't bother.  Temporary relations also
+ *		do not require fsync.
  */
 void
 smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		  char *buffer, bool isTemp)
+		  char *buffer, bool skipFsync)
 {
 	(*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum,
-											  buffer, isTemp);
+											  buffer, skipFsync);
 }
 
 /*
@@ -455,14 +446,13 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-			 bool isTemp)
+smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, isTemp, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -479,8 +469,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 	/*
 	 * Do the truncation.
 	 */
-	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks,
-												 isTemp);
+	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks);
 }
 
 /*
@@ -499,7 +488,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
  *		to use the WAL log for PITR or replication purposes: in that case
  *		we have to make WAL entries as well.)
  *
- *		The preceding writes should specify isTemp = true to avoid
+ *		The preceding writes should specify skipFsync = true to avoid
  *		duplicative fsyncs.
  *
  *		Note that you need to do FlushRelationBuffers() first if there is
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 8ec6b71..4a5441f 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -244,14 +244,14 @@ pg_tablespace_size_name(PG_FUNCTION_ARGS)
  * calculate size of (one fork of) a relation
  */
 static int64
-calculate_relation_size(RelFileNode *rfn, ForkNumber forknum)
+calculate_relation_size(RelFileNode *rfn, BackendId backend, ForkNumber forknum)
 {
 	int64		totalsize = 0;
 	char	   *relationpath;
 	char		pathname[MAXPGPATH];
 	unsigned int segcount = 0;
 
-	relationpath = relpath(*rfn, forknum);
+	relationpath = relpathbackend(*rfn, backend, forknum);
 
 	for (segcount = 0;; segcount++)
 	{
@@ -291,7 +291,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
 
 	rel = relation_open(relOid, AccessShareLock);
 
-	size = calculate_relation_size(&(rel->rd_node),
+	size = calculate_relation_size(&(rel->rd_node), rel->rd_backend,
 							  forkname_to_number(text_to_cstring(forkName)));
 
 	relation_close(rel, AccessShareLock);
@@ -315,12 +315,14 @@ calculate_toast_table_size(Oid toastrelid)
 
 	/* toast heap size, including FSM and VM size */
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(toastRel->rd_node), forkNum);
+		size += calculate_relation_size(&(toastRel->rd_node),
+										toastRel->rd_backend, forkNum);
 
 	/* toast index size, including FSM and VM size */
 	toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(toastIdxRel->rd_node), forkNum);
+		size += calculate_relation_size(&(toastIdxRel->rd_node),
+										toastIdxRel->rd_backend, forkNum);
 
 	relation_close(toastIdxRel, AccessShareLock);
 	relation_close(toastRel, AccessShareLock);
@@ -349,7 +351,8 @@ calculate_table_size(Oid relOid)
 	 * heap size, including FSM and VM
 	 */
 	for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-		size += calculate_relation_size(&(rel->rd_node), forkNum);
+		size += calculate_relation_size(&(rel->rd_node), rel->rd_backend,
+										forkNum);
 
 	/*
 	 * Size of toast relation
@@ -392,7 +395,9 @@ calculate_indexes_size(Oid relOid)
 			idxRel = relation_open(idxOid, AccessShareLock);
 
 			for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
-				size += calculate_relation_size(&(idxRel->rd_node), forkNum);
+				size += calculate_relation_size(&(idxRel->rd_node),
+												idxRel->rd_backend,
+												forkNum);
 
 			relation_close(idxRel, AccessShareLock);
 		}
@@ -563,6 +568,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 	HeapTuple	tuple;
 	Form_pg_class relform;
 	RelFileNode rnode;
+	BackendId	backend;
 	char	   *path;
 
 	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
@@ -600,12 +606,27 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 			break;
 	}
 
-	ReleaseSysCache(tuple);
-
 	if (!OidIsValid(rnode.relNode))
+	{
+		ReleaseSysCache(tuple);
 		PG_RETURN_NULL();
+	}
+
+	/* If temporary, determine owning backend. */
+	if (!relform->relistemp)
+		backend = InvalidBackendId;
+	else if (isTempOrToastNamespace(relform->relnamespace))
+		backend = MyBackendId;
+	else
+	{
+		/* Do it the hard way. */
+		backend = GetTempNamespaceBackendId(relform->relnamespace);
+		Assert(backend != InvalidOid);
+	}
+
+	ReleaseSysCache(tuple);
 
-	path = relpath(rnode, MAIN_FORKNUM);
+	path = relpathbackend(rnode, backend, MAIN_FORKNUM);
 
 	PG_RETURN_TEXT_P(cstring_to_text(path));
 }
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 3b15d85..238127b 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -319,7 +319,8 @@ AddCatcacheInvalidationMessage(InvalidationListHeader *hdr,
 {
 	SharedInvalidationMessage msg;
 
-	msg.cc.id = (int16) id;
+	Assert(id < CHAR_MAX);
+	msg.cc.id = (int8) id;
 	msg.cc.tuplePtr = *tuplePtr;
 	msg.cc.dbId = dbId;
 	msg.cc.hashValue = hashValue;
@@ -513,7 +514,10 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
 		 * We could have smgr entries for relations of other databases, so no
 		 * short-circuit test is possible here.
 		 */
-		smgrclosenode(msg->sm.rnode);
+		BackendRelFileNode	rnode;
+		rnode.node = msg->sm.rnode;
+		rnode.backend = (msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;
+		smgrclosenode(rnode);
 	}
 	else if (msg->id == SHAREDINVALRELMAP_ID)
 	{
@@ -1163,14 +1167,20 @@ CacheInvalidateRelcacheByRelid(Oid relid)
  * in commit/abort WAL entries.  Instead, calls to CacheInvalidateSmgr()
  * should happen in low-level smgr.c routines, which are executed while
  * replaying WAL as well as when creating it.
+ *
+ * Note: In order to avoid bloating SharedInvalidationMessage, we store only
+ * three bytes of the backend ID using what would otherwise be padding space.
+ * Thus, the maximum possible backend ID is 2^23-1.
  */
 void
-CacheInvalidateSmgr(RelFileNode rnode)
+CacheInvalidateSmgr(BackendRelFileNode rnode)
 {
 	SharedInvalidationMessage msg;
 
 	msg.sm.id = SHAREDINVALSMGR_ID;
-	msg.sm.rnode = rnode;
+	msg.sm.backend_hi = rnode.backend >> 16;
+	msg.sm.backend_lo = rnode.backend & 0xffff;
+	msg.sm.rnode = rnode.node;
 	SendSharedInvalidMessages(&msg, 1);
 }
 
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index dda69e8..bc355ca 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -858,10 +858,20 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 	relation->rd_istemp = relation->rd_rel->relistemp;
-	if (relation->rd_istemp)
-		relation->rd_islocaltemp = isTempOrToastNamespace(relation->rd_rel->relnamespace);
+	if (!relation->rd_istemp)
+		relation->rd_backend = InvalidBackendId;
+	else if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
+		relation->rd_backend = MyBackendId;
 	else
-		relation->rd_islocaltemp = false;
+	{
+		/*
+		 * If it's a temporary table, but not one of ours, we have to use
+		 * the slow, grotty method to figure out the owning backend.
+		 */
+		relation->rd_backend =
+			GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
+		Assert(relation->rd_backend != InvalidOid);
+	}
 
 	/*
 	 * initialize the tuple descriptor (relation->rd_att).
@@ -1424,7 +1434,7 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 	relation->rd_istemp = false;
-	relation->rd_islocaltemp = false;
+	relation->rd_backend = InvalidBackendId;
 
 	/*
 	 * initialize relation tuple form
@@ -2515,7 +2525,7 @@ RelationBuildLocalRelation(const char *relname,
 
 	/* it is temporary if and only if it is in my temp-table namespace */
 	rel->rd_istemp = isTempOrToastNamespace(relnamespace);
-	rel->rd_islocaltemp = rel->rd_istemp;
+	rel->rd_backend = rel->rd_istemp ? MyBackendId : InvalidBackendId;
 
 	/*
 	 * create a new tuple descriptor from the one passed in.  We do this
@@ -2629,7 +2639,7 @@ void
 RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 {
 	Oid			newrelfilenode;
-	RelFileNode newrnode;
+	BackendRelFileNode newrnode;
 	Relation	pg_class;
 	HeapTuple	tuple;
 	Form_pg_class classform;
@@ -2640,7 +2650,8 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 		   TransactionIdIsNormal(freezeXid));
 
 	/* Allocate a new relfilenode */
-	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL);
+	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
+									   relation->rd_backend);
 
 	/*
 	 * Get a writable copy of the pg_class tuple for the given relation.
@@ -2660,9 +2671,10 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 	 * NOTE: any conflict in relfilenode value will be caught here, if
 	 * GetNewRelFileNode messes up for any reason.
 	 */
-	newrnode = relation->rd_node;
-	newrnode.relNode = newrelfilenode;
-	RelationCreateStorage(newrnode, relation->rd_istemp);
+	newrnode.node = relation->rd_node;
+	newrnode.node.relNode = newrelfilenode;
+	newrnode.backend = relation->rd_backend;
+	RelationCreateStorage(newrnode.node, relation->rd_istemp);
 	smgrclosenode(newrnode);
 
 	/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b209128..77f8cd4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -96,6 +96,9 @@
 #define MAX_KILOBYTES	(INT_MAX / 1024)
 #endif
 
+/* upper limit for backends - see CacheInvalidateSmgr */
+#define MAX_BACKENDS	0x7fffff
+
 #define KB_PER_MB (1024)
 #define KB_PER_GB (1024*1024)
 
@@ -1415,7 +1418,9 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	/*
-	 * Note: MaxBackends is limited to INT_MAX/4 because some places compute
+	 * Note: MaxBackends is limited to 2^23-1 because inval.c stores the
+     * backend ID as a 3-byte signed integer.  Even if that limitation were
+     * removed, we still could not exceed INT_MAX/4 because some places compute
 	 * 4*MaxBackends without any overflow check.  This check is made in
 	 * assign_maxconnections, since MaxBackends is computed as MaxConnections
 	 * plus autovacuum_max_workers plus one (for the autovacuum launcher).
@@ -1430,7 +1435,7 @@ static struct config_int ConfigureNamesInt[] =
 			NULL
 		},
 		&MaxConnections,
-		100, 1, INT_MAX / 4, assign_maxconnections, NULL
+		100, 1, MAX_BACKENDS, assign_maxconnections, NULL
 	},
 
 	{
@@ -1439,7 +1444,7 @@ static struct config_int ConfigureNamesInt[] =
 			NULL
 		},
 		&ReservedBackends,
-		3, 0, INT_MAX / 4, NULL, NULL
+		3, 0, MAX_BACKENDS, NULL, NULL
 	},
 
 	{
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 8ccb948..b06f8ff 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -55,7 +55,7 @@ provider postgresql {
 	probe sort__done(bool, long);
 
 	probe buffer__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool);
-	probe buffer__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool, bool);
+	probe buffer__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool, bool);
 	probe buffer__flush__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
 	probe buffer__flush__done(ForkNumber, BlockNumber, Oid, Oid, Oid);
 
@@ -81,10 +81,10 @@ provider postgresql {
 	probe twophase__checkpoint__start();
 	probe twophase__checkpoint__done();
 
-	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
-	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int);
-	probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid);
-	probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int);
+	probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
+	probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
+	probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int);
+	probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int);
 
 	probe xlog__insert(unsigned char, unsigned char);
 	probe xlog__switch();
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 49df96c..e11cce2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -71,7 +71,7 @@ typedef struct XLogContRecord
 /*
  * Each page of XLOG file has a header like this:
  */
-#define XLOG_PAGE_MAGIC 0xD064	/* can be used as WAL version indicator */
+#define XLOG_PAGE_MAGIC 0xD065	/* can be used as WAL version indicator */
 
 typedef struct XLogPageHeaderData
 {
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index bd430cb..765f571 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -25,10 +25,17 @@
 
 extern const char *forkNames[];
 extern ForkNumber forkname_to_number(char *forkName);
+extern int forkname_chars(const char *str);
 
-extern char *relpath(RelFileNode rnode, ForkNumber forknum);
+extern char *relpathbackend(RelFileNode rnode, BackendId backend,
+			  ForkNumber forknum);
 extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
 
+#define relpath(rnode, forknum) \
+		relpathbackend((rnode).node, (rnode).backend, (forknum))
+#define relpathperm(rnode, forknum) \
+		relpathbackend((rnode), InvalidBackendId, (forknum))
+
 extern bool IsSystemRelation(Relation relation);
 extern bool IsToastRelation(Relation relation);
 
@@ -45,6 +52,7 @@ extern bool IsSharedRelation(Oid relationId);
 extern Oid	GetNewOid(Relation relation);
 extern Oid GetNewOidWithIndex(Relation relation, Oid indexId,
 				   AttrNumber oidcolumn);
-extern Oid	GetNewRelFileNode(Oid reltablespace, Relation pg_class);
+extern Oid	GetNewRelFileNode(Oid reltablespace, Relation pg_class,
+				  BackendId backend);
 
 #endif   /* CATALOG_H */
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index 609f1c6..7712d1f 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -30,8 +30,7 @@ extern void RelationTruncate(Relation rel, BlockNumber nblocks);
  * naming
  */
 extern void smgrDoPendingDeletes(bool isCommit);
-extern int smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr,
-					  bool *haveNonTemp);
+extern int smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr);
 extern void AtSubCommit_smgr(void);
 extern void AtSubAbort_smgr(void);
 extern void PostPrepare_smgr(void);
diff --git a/src/include/postmaster/bgwriter.h b/src/include/postmaster/bgwriter.h
index 06a4e37..a22fbee 100644
--- a/src/include/postmaster/bgwriter.h
+++ b/src/include/postmaster/bgwriter.h
@@ -27,7 +27,7 @@ extern void BackgroundWriterMain(void);
 extern void RequestCheckpoint(int flags);
 extern void CheckpointWriteDelay(int flags, double progress);
 
-extern bool ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum,
+extern bool ForwardFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
 					BlockNumber segno);
 extern void AbsorbFsyncRequests(void);
 
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index f38f545..d4bf341 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -160,7 +160,7 @@ extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
 extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 				   BlockNumber blockNum, ReadBufferMode mode,
 				   BufferAccessStrategy strategy);
-extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
+extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
@@ -180,8 +180,8 @@ extern BlockNumber BufferGetBlockNumber(Buffer buffer);
 extern BlockNumber RelationGetNumberOfBlocks(Relation relation);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
-extern void DropRelFileNodeBuffers(RelFileNode rnode, ForkNumber forkNum,
-					   bool istemp, BlockNumber firstDelBlock);
+extern void DropRelFileNodeBuffers(BackendRelFileNode rnode,
+					   ForkNumber forkNum, BlockNumber firstDelBlock);
 extern void DropDatabaseBuffers(Oid dbid);
 
 #ifdef NOT_USED
diff --git a/src/include/storage/relfilenode.h b/src/include/storage/relfilenode.h
index 6f0c0ad..9bb0fa2 100644
--- a/src/include/storage/relfilenode.h
+++ b/src/include/storage/relfilenode.h
@@ -14,6 +14,8 @@
 #ifndef RELFILENODE_H
 #define RELFILENODE_H
 
+#include "storage/backendid.h"
+
 /*
  * The physical storage of a relation consists of one or more forks. The
  * main fork is always created, but in addition to that there can be
@@ -37,7 +39,8 @@ typedef enum ForkNumber
 
 /*
  * RelFileNode must provide all that we need to know to physically access
- * a relation. Note, however, that a "physical" relation is comprised of
+ * a relation, with the exception of the backend ID, which can be provided
+ * separately. Note, however, that a "physical" relation is comprised of
  * multiple files on the filesystem, as each fork is stored as a separate
  * file, and each fork can be divided into multiple segments. See md.c.
  *
@@ -74,14 +77,30 @@ typedef struct RelFileNode
 } RelFileNode;
 
 /*
- * Note: RelFileNodeEquals compares relNode first since that is most likely
- * to be different in two unequal RelFileNodes.  It is probably redundant
- * to compare spcNode if the other two fields are found equal, but do it
- * anyway to be sure.
+ * Augmenting a relfilenode with the backend ID provides all the information
+ * we need to locate the physical storage.
+ */
+typedef struct BackendRelFileNode
+{
+	RelFileNode	node;
+	BackendId	backend;
+} BackendRelFileNode;
+
+/*
+ * Note: RelFileNodeEquals and BackendRelFileNodeEquals compare relNode first
+ * since that is most likely to be different in two unequal RelFileNodes.  It
+ * is probably redundant to compare spcNode if the other fields are found equal,
+ * but do it anyway to be sure.
  */
 #define RelFileNodeEquals(node1, node2) \
 	((node1).relNode == (node2).relNode && \
 	 (node1).dbNode == (node2).dbNode && \
 	 (node1).spcNode == (node2).spcNode)
 
+#define BackendRelFileNodeEquals(node1, node2) \
+	((node1).node.relNode == (node2).node.relNode && \
+	 (node1).node.dbNode == (node2).node.dbNode && \
+	 (node1).backend == (node2).backend && \
+	 (node1).node.spcNode == (node2).node.spcNode)
+
 #endif   /* RELFILENODE_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index bbdde81..d165f4d 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -26,7 +26,7 @@
  *	* invalidate an smgr cache entry for a specific physical relation
  *	* invalidate the mapped-relation mapping for a given database
  * More types could be added if needed.  The message type is identified by
- * the first "int16" field of the message struct.  Zero or positive means a
+ * the first "int8" field of the message struct.  Zero or positive means a
  * specific-catcache inval message (and also serves as the catcache ID field).
  * Negative values identify the other message types, as per codes below.
  *
@@ -63,7 +63,7 @@
 typedef struct
 {
 	/* note: field layout chosen with an eye to alignment concerns */
-	int16		id;				/* cache ID --- must be first */
+	int8		id;				/* cache ID --- must be first */
 	ItemPointerData tuplePtr;	/* tuple identifier in cached relation */
 	Oid			dbId;			/* database ID, or 0 if a shared relation */
 	uint32		hashValue;		/* hash value of key for this catcache */
@@ -73,7 +73,7 @@ typedef struct
 
 typedef struct
 {
-	int16		id;				/* type field --- must be first */
+	int8		id;				/* type field --- must be first */
 	Oid			dbId;			/* database ID, or 0 if a shared catalog */
 	Oid			catId;			/* ID of catalog whose contents are invalid */
 } SharedInvalCatalogMsg;
@@ -82,7 +82,7 @@ typedef struct
 
 typedef struct
 {
-	int16		id;				/* type field --- must be first */
+	int8		id;				/* type field --- must be first */
 	Oid			dbId;			/* database ID, or 0 if a shared relation */
 	Oid			relId;			/* relation ID */
 } SharedInvalRelcacheMsg;
@@ -91,21 +91,23 @@ typedef struct
 
 typedef struct
 {
-	int16		id;				/* type field --- must be first */
-	RelFileNode rnode;			/* physical file ID */
+	int8		id;				/* type field --- must be first */
+	int8		backend_hi;		/* high bits of backend ID, if temprel */
+	uint16		backend_lo;		/* low bits of backend ID, if temprel */
+	RelFileNode rnode;			/* spcNode, dbNode, relNode */
 } SharedInvalSmgrMsg;
 
 #define SHAREDINVALRELMAP_ID	(-4)
 
 typedef struct
 {
-	int16		id;				/* type field --- must be first */
+	int8		id;				/* type field --- must be first */
 	Oid			dbId;			/* database ID, or 0 for shared catalogs */
 } SharedInvalRelmapMsg;
 
 typedef union
 {
-	int16		id;				/* type field --- must be first */
+	int8		id;				/* type field --- must be first */
 	SharedInvalCatcacheMsg cc;
 	SharedInvalCatalogMsg cat;
 	SharedInvalRelcacheMsg rc;
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index cf248b8..7ae78ec 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -16,6 +16,7 @@
 
 #include "access/xlog.h"
 #include "fmgr.h"
+#include "storage/backendid.h"
 #include "storage/block.h"
 #include "storage/relfilenode.h"
 
@@ -38,7 +39,7 @@
 typedef struct SMgrRelationData
 {
 	/* rnode is the hashtable lookup key, so it must be first! */
-	RelFileNode smgr_rnode;		/* relation physical identifier */
+	BackendRelFileNode smgr_rnode;		/* relation physical identifier */
 
 	/* pointer to owning pointer, or NULL if none */
 	struct SMgrRelationData **smgr_owner;
@@ -68,28 +69,30 @@ typedef struct SMgrRelationData
 
 typedef SMgrRelationData *SMgrRelation;
 
+#define SmgrIsTemp(smgr) \
+	((smgr)->smgr_rnode.backend != InvalidBackendId)
 
 extern void smgrinit(void);
-extern SMgrRelation smgropen(RelFileNode rnode);
+extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend);
 extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
 extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
 extern void smgrclose(SMgrRelation reln);
 extern void smgrcloseall(void);
-extern void smgrclosenode(RelFileNode rnode);
+extern void smgrclosenode(BackendRelFileNode rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, ForkNumber forknum,
-			 bool isTemp, bool isRedo);
+			 bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
-		   BlockNumber blocknum, char *buffer, bool isTemp);
+		   BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
 			 BlockNumber blocknum);
 extern void smgrread(SMgrRelation reln, ForkNumber forknum,
 		 BlockNumber blocknum, char *buffer);
 extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
-		  BlockNumber blocknum, char *buffer, bool isTemp);
+		  BlockNumber blocknum, char *buffer, bool skipFsync);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-			 BlockNumber nblocks, bool isTemp);
+			 BlockNumber nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void smgrpreckpt(void);
 extern void smgrsync(void);
@@ -103,27 +106,28 @@ extern void mdinit(void);
 extern void mdclose(SMgrRelation reln, ForkNumber forknum);
 extern void mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern bool mdexists(SMgrRelation reln, ForkNumber forknum);
-extern void mdunlink(RelFileNode rnode, ForkNumber forknum, bool isRedo);
+extern void mdunlink(BackendRelFileNode rnode, ForkNumber forknum, bool isRedo);
 extern void mdextend(SMgrRelation reln, ForkNumber forknum,
-		 BlockNumber blocknum, char *buffer, bool isTemp);
+		 BlockNumber blocknum, char *buffer, bool skipFsync);
 extern void mdprefetch(SMgrRelation reln, ForkNumber forknum,
 		   BlockNumber blocknum);
 extern void mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	   char *buffer);
 extern void mdwrite(SMgrRelation reln, ForkNumber forknum,
-		BlockNumber blocknum, char *buffer, bool isTemp);
+		BlockNumber blocknum, char *buffer, bool skipFsync);
 extern BlockNumber mdnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void mdtruncate(SMgrRelation reln, ForkNumber forknum,
-		   BlockNumber nblocks, bool isTemp);
+		   BlockNumber nblocks);
 extern void mdimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void mdpreckpt(void);
 extern void mdsync(void);
 extern void mdpostckpt(void);
 
 extern void SetForwardFsyncRequests(void);
-extern void RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum,
+extern void RememberFsyncRequest(BackendRelFileNode rnode, ForkNumber forknum,
 					 BlockNumber segno);
-extern void ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum);
+extern void ForgetRelationFsyncRequests(BackendRelFileNode rnode,
+							ForkNumber forknum);
 extern void ForgetDatabaseFsyncRequests(Oid dbid);
 
 /* smgrtype.c */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index a86a17c..7fab919 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -49,7 +49,7 @@ extern void CacheInvalidateRelcacheByTuple(HeapTuple classTuple);
 
 extern void CacheInvalidateRelcacheByRelid(Oid relid);
 
-extern void CacheInvalidateSmgr(RelFileNode rnode);
+extern void CacheInvalidateSmgr(BackendRelFileNode rnode);
 
 extern void CacheInvalidateRelmap(Oid databaseId);
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 444e892..296c651 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -127,7 +127,7 @@ typedef struct RelationData
 	struct SMgrRelationData *rd_smgr;	/* cached file handle, or NULL */
 	int			rd_refcnt;		/* reference count */
 	bool		rd_istemp;		/* rel is a temporary relation */
-	bool		rd_islocaltemp; /* rel is a temp rel of this session */
+	BackendId	rd_backend;		/* owning backend id, if temporary relation */
 	bool		rd_isnailed;	/* rel is nailed in cache */
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
@@ -347,7 +347,7 @@ typedef struct StdRdOptions
 #define RelationOpenSmgr(relation) \
 	do { \
 		if ((relation)->rd_smgr == NULL) \
-			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node)); \
+			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node, (relation)->rd_backend)); \
 	} while (0)
 
 /*
@@ -393,7 +393,7 @@ typedef struct StdRdOptions
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_LOCAL(relation) \
-	((relation)->rd_islocaltemp || \
+	((relation)->rd_backend == MyBackendId || \
 	 (relation)->rd_createSubid != InvalidSubTransactionId)
 
 /*
@@ -403,7 +403,7 @@ typedef struct StdRdOptions
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_OTHER_TEMP(relation) \
-	((relation)->rd_istemp && !(relation)->rd_islocaltemp)
+	((relation)->rd_istemp && (relation)->rd_backend != MyBackendId)
 
 /* routines in utils/cache/relcache.c */
 extern void RelationIncrementReferenceCount(Relation rel);

#24

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#23)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

Here's an updated patch, with the invalidation changes merged in and
hopefully-suitable adjustments elsewhere.

I haven't tested this patch, but I read through it (and have I mentioned
how unbelievably illegible -u format patches are?). It seems to be
close to commitable, but I found a few issues. In no particular order:

storage.sgml needs to be updated to describe this file-naming scheme.

BackendRelFileNode isn't a particularly nice struct name; it seems like
it puts the most important aspect of the struct's purpose somewhere in
the middle of the name. Maybe RelFileNodeBackend would be better, or
RelFileNodeFull, or something like that.

In GetNewRelFileNode, you've changed some code that is commented
/* This logic should match RelationInitPhysicalAddr */, but there
is not any corresponding change in RelationInitPhysicalAddr. I find
this bothersome. I think the problem is that you've made it look
like the assignment of rnode.backend is part of the logic that ought
to match RelationInitPhysicalAddr. Perhaps set that off, at least
by a blank line, if not its own comment.

The relpath() and relpathperm() macros are a tad underdocumented for my
taste; at least put comments noting that one takes a RelFileNode and the
other a BackendRelFileNode. And I wonder if you couldn't find better
names for relpathperm and relpathbackend.

assign_maxconnections and assign_autovacuum_max_workers need to be fixed
for MAX_BACKENDS; in fact I think all the occurrences of INT_MAX / 4 in
guc.c need to be replaced. (And if I were you I'd grep to see if
anyplace outside guc.c knows that value...) Also, as a matter of style,
the comment currently attached to max_connections needs to be attached
to the definition of the constant, not just modified where it is.

As an old bit-shaver this sorta bothers me:

+++ b/src/include/utils/rel.h
@@ -127,7 +127,7 @@ typedef struct RelationData
 	struct SMgrRelationData *rd_smgr;	/* cached file handle, or NULL */
 	int			rd_refcnt;		/* reference count */
 	bool		rd_istemp;		/* rel is a temporary relation */
-	bool		rd_islocaltemp; /* rel is a temp rel of this session */
+	BackendId	rd_backend;		/* owning backend id, if temporary relation */
 	bool		rd_isnailed;	/* rel is nailed in cache */
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =

The struct is going to be less efficiently packed with that field order.
Maybe swap rd_istemp and rd_backend?

+ Assert(relation->rd_backend != InvalidOid);
ought to be InvalidBackendId, no? Both new calls of GetTempNamespaceBackendId
have this wrong. You might also want to change the comment of
GetTempNamespaceBackendId to note that its failure result is not just -1
but InvalidBackendId.

Lastly, it bothers me that you've put in code to delete files belonging
to temp rels during crash restart, without any code to clean up their
catalog entries. This will therefore lead to dangling pg_class
references, with uncertain but probably not very nice consequences.
I think that probably shouldn't go in until there's code to drop the
catalog entries too.

regards, tom lane

#25

David Fetter

david@fetter.org

over 15 years ago

In reply to: Tom Lane (#24)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 12:27:45PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Here's an updated patch, with the invalidation changes merged in and
hopefully-suitable adjustments elsewhere.

I haven't tested this patch, but I read through it (and have I mentioned
how unbelievably illegible -u format patches are?).

I have every confidence that you, of all people, can arrange to use
'filterdiff --format=context' on attached patches automatically,
saving you some time and us some boredom :)

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#26

Joshua D. Drake

jd@commandprompt.com

over 15 years ago

In reply to: David Fetter (#25)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, 2010-08-12 at 09:44 -0700, David Fetter wrote:

On Thu, Aug 12, 2010 at 12:27:45PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Here's an updated patch, with the invalidation changes merged in and
hopefully-suitable adjustments elsewhere.

I haven't tested this patch, but I read through it (and have I mentioned
how unbelievably illegible -u format patches are?).

I have every confidence that you, of all people, can arrange to use
'filterdiff --format=context' on attached patches automatically,
saving you some time and us some boredom :)

I was under the impression that the project guideline was that we only
accepted context diffs?

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#27

David Fetter

david@fetter.org

over 15 years ago

In reply to: Joshua D. Drake (#26)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 09:53:54AM -0700, Joshua D. Drake wrote:

On Thu, 2010-08-12 at 09:44 -0700, David Fetter wrote:

On Thu, Aug 12, 2010 at 12:27:45PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Here's an updated patch, with the invalidation changes merged
in and hopefully-suitable adjustments elsewhere.

I haven't tested this patch, but I read through it (and have I
mentioned how unbelievably illegible -u format patches are?).

I have every confidence that you, of all people, can arrange to
use 'filterdiff --format=context' on attached patches
automatically, saving you some time and us some boredom :)

I was under the impression that the project guideline was that we
only accepted context diffs?

Since they're trivially producible from unified diffs, this is a
pretty silly reason to bounce--or even comment on--patches. It's less
a guideline than a personal preference, namely Tom's.

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#28

Alvaro Herrera

alvherre@commandprompt.com

over 15 years ago

In reply to: David Fetter (#27)

Re: including backend ID in relpath of temp rels - updated patch

Excerpts from David Fetter's message of jue ago 12 12:59:33 -0400 2010:

On Thu, Aug 12, 2010 at 09:53:54AM -0700, Joshua D. Drake wrote:

I was under the impression that the project guideline was that we
only accepted context diffs?

Since they're trivially producible from unified diffs, this is a
pretty silly reason to bounce--or even comment on--patches. It's less
a guideline than a personal preference, namely Tom's.

Not that I review that many patches, but I do dislike unified diffs too.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#29

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#24)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Here's an updated patch, with the invalidation changes merged in and
hopefully-suitable adjustments elsewhere.

I haven't tested this patch, but I read through it (and have I mentioned
how unbelievably illegible -u format patches are?).

Sorry, I'll run it through filterdiff for you next time. As you know,
I have the opposite preference, but since I was producing this
primarily for you to read....

I can clean up the rest of this stuff, but not this:

Lastly, it bothers me that you've put in code to delete files belonging
to temp rels during crash restart, without any code to clean up their
catalog entries. This will therefore lead to dangling pg_class
references, with uncertain but probably not very nice consequences.
I think that probably shouldn't go in until there's code to drop the
catalog entries too.

I thought about this pretty carefully, and I don't believe that there
are any unpleasant consequences. The code that assigns relfilenode
numbers is pretty careful to check that the newly assigned value is
unused BOTH in pg_class and in the directory where the file will be
created, so there should be no danger of a number getting used over
again while the catalog entries remain. Also, the drop-object code
doesn't mind that the physical storage doesn't exist; it's perfectly
happy with that situation. It is true that you will get an ERROR if
you try to SELECT from a table whose physical storage has been
removed, but that seems OK.

We have two existing mechanisms for removing the catalog entries: when
a backend is first asked to access a temporary file, it does a DROP
SCHEMA ... CASCADE on any pre-existing temp schema. And a table is in
wraparound trouble and the owning backend is no longer running,
autovacuum will drop it. Improving on this seems difficult: if you
wanted to *guarantee* that the catalog entries were removed before we
started letting in connections, you'd need to fork a backend per
database and have each one iterate through all the temp schemas and
drop them. Considering that the existing code seems to have been
pretty careful about how this stuff gets handled, I don't think it's
worth making the whole startup sequence slower for it. What might be
worth considering is changing the autovacuum policy to eliminate the
wraparound check, and just have it drop temp table catalog entries for
any backend not currently running, period.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#30

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#29)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 12, 2010 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Lastly, it bothers me that you've put in code to delete files belonging
to temp rels during crash restart, without any code to clean up their
catalog entries. �This will therefore lead to dangling pg_class
references, with uncertain but probably not very nice consequences.

I thought about this pretty carefully, and I don't believe that there
are any unpleasant consequences. The code that assigns relfilenode
numbers is pretty careful to check that the newly assigned value is
unused BOTH in pg_class and in the directory where the file will be
created, so there should be no danger of a number getting used over
again while the catalog entries remain. Also, the drop-object code
doesn't mind that the physical storage doesn't exist; it's perfectly
happy with that situation.

Well, okay, but I'd suggest adding comments to the drop-table code
pointing out that it is now NECESSARY for it to not complain if the file
isn't there. This was never a design goal before, AFAIR --- the fact
that it works like that is kind of accidental. I am also pretty sure
that there used to be at least warning messages for that case, which we
would now not want.

regards, tom lane

#31

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#30)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 2:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 12, 2010 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Lastly, it bothers me that you've put in code to delete files belonging
to temp rels during crash restart, without any code to clean up their
catalog entries. This will therefore lead to dangling pg_class
references, with uncertain but probably not very nice consequences.

I thought about this pretty carefully, and I don't believe that there
are any unpleasant consequences. The code that assigns relfilenode
numbers is pretty careful to check that the newly assigned value is
unused BOTH in pg_class and in the directory where the file will be
created, so there should be no danger of a number getting used over
again while the catalog entries remain. Also, the drop-object code
doesn't mind that the physical storage doesn't exist; it's perfectly
happy with that situation.

Well, okay, but I'd suggest adding comments to the drop-table code
pointing out that it is now NECESSARY for it to not complain if the file
isn't there. This was never a design goal before, AFAIR --- the fact
that it works like that is kind of accidental. I am also pretty sure
that there used to be at least warning messages for that case, which we
would now not want.

That seems like a good idea. I'll post an updated patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#32

Alvaro Herrera

alvherre@commandprompt.com

over 15 years ago

In reply to: Robert Haas (#29)

Re: including backend ID in relpath of temp rels - updated patch

Excerpts from Robert Haas's message of jue ago 12 13:29:57 -0400 2010:

We have two existing mechanisms for removing the catalog entries: when
a backend is first asked to access a temporary file, it does a DROP
SCHEMA ... CASCADE on any pre-existing temp schema. And a table is in
wraparound trouble and the owning backend is no longer running,
autovacuum will drop it. Improving on this seems difficult: if you
wanted to *guarantee* that the catalog entries were removed before we
started letting in connections, you'd need to fork a backend per
database and have each one iterate through all the temp schemas and
drop them. Considering that the existing code seems to have been
pretty careful about how this stuff gets handled, I don't think it's
worth making the whole startup sequence slower for it. What might be
worth considering is changing the autovacuum policy to eliminate the
wraparound check, and just have it drop temp table catalog entries for
any backend not currently running, period.

What about having autovacuum silenty drop the catalog entry if it's a
temp entry for which the underlying file does not exist?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#33

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Alvaro Herrera (#32)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 5:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of jue ago 12 13:29:57 -0400 2010:

We have two existing mechanisms for removing the catalog entries: when
a backend is first asked to access a temporary file, it does a DROP
SCHEMA ... CASCADE on any pre-existing temp schema. And a table is in
wraparound trouble and the owning backend is no longer running,
autovacuum will drop it. Improving on this seems difficult: if you
wanted to *guarantee* that the catalog entries were removed before we
started letting in connections, you'd need to fork a backend per
database and have each one iterate through all the temp schemas and
drop them. Considering that the existing code seems to have been
pretty careful about how this stuff gets handled, I don't think it's
worth making the whole startup sequence slower for it. What might be
worth considering is changing the autovacuum policy to eliminate the
wraparound check, and just have it drop temp table catalog entries for
any backend not currently running, period.

What about having autovacuum silenty drop the catalog entry if it's a
temp entry for which the underlying file does not exist?

I think that would be subject to race conditions. The current
mechanism is actually pretty good, and I think we can build on it if
we want to do more, rather than inventing something new. We just need
to be specific about what problem we're trying to solve.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#34

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#33)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 12, 2010 at 5:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

What about having autovacuum silenty drop the catalog entry if it's a
temp entry for which the underlying file does not exist?

I think that would be subject to race conditions.

Well, autovacuum's willingness to drop sufficiently old temp tables
would already risk any such race conditions. However ...

The current
mechanism is actually pretty good, and I think we can build on it if
we want to do more, rather than inventing something new. We just need
to be specific about what problem we're trying to solve.

... I agree with this point. This idea wouldn't fix the concern I had
about the existence of pg_class entries with no underlying file: if that
causes any issues we'd have to fix them anyway. So I'm not sure what
the gain is.

regards, tom lane

#35

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#31)

Re: including backend ID in relpath of temp rels - updated patch

One other thought about all this: in the past, one objection that's been
raised to deleting files during crash restart is the possible loss of
forensic evidence. It might be a good idea to provide some fairly
simple way for developers to disable that deletion subroutine. I'm not
sure that it rises to the level of needing a GUC, though.

regards, tom lane

#36

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#34)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 6:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 12, 2010 at 5:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

What about having autovacuum silenty drop the catalog entry if it's a
temp entry for which the underlying file does not exist?

I think that would be subject to race conditions.

Well, autovacuum's willingness to drop sufficiently old temp tables
would already risk any such race conditions. However ...

I don't think we do. It only drops them if the backend isn't running.
And even if the backend starts running after we check and before we
drop it, that's OK. We're only dropping the OLD table, which by
definition isn't relevant to the new session. Perhaps we should be
taking a lock on the relation first, but I think that can only really
bite us if the relation were dropped and a new relation were created
with the same OID - in that case, we might drop an unrelated table,
though it's vanishingly unlikely in practice.

The current
mechanism is actually pretty good, and I think we can build on it if
we want to do more, rather than inventing something new. We just need
to be specific about what problem we're trying to solve.

... I agree with this point. This idea wouldn't fix the concern I had
about the existence of pg_class entries with no underlying file: if that
causes any issues we'd have to fix them anyway. So I'm not sure what
the gain is.

Right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#37

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#35)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 7:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

One other thought about all this: in the past, one objection that's been
raised to deleting files during crash restart is the possible loss of
forensic evidence. It might be a good idea to provide some fairly
simple way for developers to disable that deletion subroutine. I'm not
sure that it rises to the level of needing a GUC, though.

See http://archives.postgresql.org/pgsql-hackers/2010-06/msg00622.php :

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#38

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#37)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 12, 2010 at 7:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

One other thought about all this: in the past, one objection that's been
raised to deleting files during crash restart is the possible loss of
forensic evidence.

... With this patch, we can be sure of removing ALL
stray files, which is maybe ever-so-slightly leaky in CVS HEAD. But
on the other hand, because it hooks into the existing temporary file
cleanup code, it only happens at cluster startup, NOT after a backend
crash.

Doh. Thanks.

regards, tom lane

#39

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Robert Haas (#31)

1 attachment(s)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 2:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

That seems like a good idea. I'll post an updated patch.

Here is an updated patch. It's in context-diff format this time, but
that made it go over 100K, so I gzip'd it because I can't remember
what the attachment size limit is these days. I'm not sure whether
that works out to a win or not.

I think this addresses all previous review comments with the exception
that I've been unable to devise a more clever name for relpathperm()
and relpathbackend().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#40

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#39)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

Here is an updated patch. It's in context-diff format this time,

Thanks, I appreciate that ;-)

This looks committable to me, with a couple of tiny suggestions:

In the text added to storage.sgml, s/temporary relation/temporary relations/.
Also, it'd be better if BBB and FFF were marked up as <replaceable>
rather than <literal>, see examples elsewhere in that file.

The comment for local_buffer_write_error_callback() probably meant to
say "during local buffer writes".

regards, tom lane

#41

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#40)

Re: including backend ID in relpath of temp rels - updated patch

On Fri, Aug 13, 2010 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Here is an updated patch. It's in context-diff format this time,

Thanks, I appreciate that ;-)

This looks committable to me, with a couple of tiny suggestions:

Woo hoo!

In the text added to storage.sgml, s/temporary relation/temporary relations/.
Also, it'd be better if BBB and FFF were marked up as <replaceable>
rather than <literal>, see examples elsewhere in that file.

I see. How should I mark tBBB_FFF?

I actually didn't like that way of explaining it very much, but I
couldn't think of anything clearer. Saying "the name will consist of
a lowercase t, followed by the backend ID, followed by an underscore,
followed by the filenode" did not seem better.

The comment for local_buffer_write_error_callback() probably meant to
say "during local buffer writes".

No doubt.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#42

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#41)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 13, 2010 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, it'd be better if BBB and FFF were marked up as <replaceable>
rather than <literal>, see examples elsewhere in that file.

I see. How should I mark tBBB_FFF?

I'd do
<literal>t<replaceable>BBB</replaceable>_<replaceable>FFF</replaceable></literal>

regards, tom lane

#43

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Robert Haas (#29)

Re: including backend ID in relpath of temp rels - updated patch

On Thu, Aug 12, 2010 at 1:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Lastly, it bothers me that you've put in code to delete files belonging
to temp rels during crash restart, without any code to clean up their
catalog entries. This will therefore lead to dangling pg_class
references, with uncertain but probably not very nice consequences.
I think that probably shouldn't go in until there's code to drop the
catalog entries too.

I thought about this pretty carefully, and I don't believe that there
are any unpleasant consequences. The code that assigns relfilenode
numbers is pretty careful to check that the newly assigned value is
unused BOTH in pg_class and in the directory where the file will be
created, so there should be no danger of a number getting used over
again while the catalog entries remain. Also, the drop-object code
doesn't mind that the physical storage doesn't exist; it's perfectly
happy with that situation. It is true that you will get an ERROR if
you try to SELECT from a table whose physical storage has been
removed, but that seems OK.

After further reflection, I think that the above reasoning is wrong.
GetNewRelFileNode() guarantees that the OID doesn't collide with the
OID column of pg_class, not the relfilenode column of pg_class; and,
per the comments to that function, it only guarantees this when
creating a new relation (to which we're therefore assigning an OID)
and not when rewriting an existing relation. So it provides no
protection against the scenario where backend #1 drops a table that
lacks physical storage just as backend #2 assigns that OID to some
other relation and begins creating files, which backend #1 then blows
away.

However, I think we're safe for a different reason. The only time we
unlink the physical storage of a relation without nuking its catalog
entries is during the startup sequence, when we wipe out the physical
storage for any leftover temp tables. So if there's a race condition,
it can only apply to temp tables. But temp tables for a particular
backend ID can only be created by that backend, and before a backend
will create any temp tables it will drop any previously existing temp
schema. Thus, by the time a temp table can get created, there CAN'T
be any leftover catalog entries from previous sessions for it to
potentially collide with.

I think the reasoning above probably should be added to the comment at
the beginning of GetNewRelFileNode(), and I'll go do that unless
someone thinks it's incorrect. The first sentence of that comment is
also now technically incorrect: what it now does is generate a
relfilenode such that <tablespace, relfilenode, backendid> is unique.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#44

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#43)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

After further reflection, I think that the above reasoning is wrong.
GetNewRelFileNode() guarantees that the OID doesn't collide with the
OID column of pg_class, not the relfilenode column of pg_class; and,
per the comments to that function, it only guarantees this when
creating a new relation (to which we're therefore assigning an OID)
and not when rewriting an existing relation. So it provides no
protection against the scenario where backend #1 drops a table that
lacks physical storage just as backend #2 assigns that OID to some
other relation and begins creating files, which backend #1 then blows
away.

However, I think we're safe for a different reason [ omitted ]

The above scenario is only a risk if you suppose that dropping a
relation that lacks physical storage will nonetheless result in
attempted unlink()s. I think that that's probably not the case;
but it seems related to a question that was bothering me recently.
Namely: why do we assign relfilenode values to relations that have
no physical storage? If we did not do so, then relation drop would
see a zero in relfilenode and would certainly not attempt an incorrect
unlink().

So I'd like to look into setting relfilenode to zero for relation
relkinds that lack storage. I'm not exactly sure why the code doesn't
do that already, though.

This came to mind after reading a commit from Bruce in which he had to
hack up pg_upgrade to preserve relfilenode numbers for storage-less
relations. Presumably that hack could get undone.

regards, tom lane

#45

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#44)

Re: including backend ID in relpath of temp rels - updated patch

On Wed, Sep 15, 2010 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

After further reflection, I think that the above reasoning is wrong.
GetNewRelFileNode() guarantees that the OID doesn't collide with the
OID column of pg_class, not the relfilenode column of pg_class; and,
per the comments to that function, it only guarantees this when
creating a new relation (to which we're therefore assigning an OID)
and not when rewriting an existing relation. So it provides no
protection against the scenario where backend #1 drops a table that
lacks physical storage just as backend #2 assigns that OID to some
other relation and begins creating files, which backend #1 then blows
away.

However, I think we're safe for a different reason [ omitted ]

The above scenario is only a risk if you suppose that dropping a
relation that lacks physical storage will nonetheless result in
attempted unlink()s. I think that that's probably not the case;

Why? How would we know that it didn't have physical storage prior to
attempting the unlinks?

but it seems related to a question that was bothering me recently.
Namely: why do we assign relfilenode values to relations that have
no physical storage? If we did not do so, then relation drop would
see a zero in relfilenode and would certainly not attempt an incorrect
unlink().

I agree that setting relfilenode to 0 for relkinds that lack physical
storage is a good idea, but that's obviously only going to handle that
specific case.

This came to mind after reading a commit from Bruce in which he had to
hack up pg_upgrade to preserve relfilenode numbers for storage-less
relations. Presumably that hack could get undone.

Seems like a good thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#46

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#45)

Re: including backend ID in relpath of temp rels - updated patch

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 15, 2010 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The above scenario is only a risk if you suppose that dropping a
relation that lacks physical storage will nonetheless result in
attempted unlink()s. �I think that that's probably not the case;

Why? How would we know that it didn't have physical storage prior to
attempting the unlinks?

From the relkind.

regards, tom lane

#47

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#46)

Re: including backend ID in relpath of temp rels - updated patch

On Wed, Sep 15, 2010 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 15, 2010 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The above scenario is only a risk if you suppose that dropping a
relation that lacks physical storage will nonetheless result in
attempted unlink()s. I think that that's probably not the case;

Why? How would we know that it didn't have physical storage prior to
attempting the unlinks?

From the relkind.

Oh, sure, I agree with you in that specific case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company