unlogged tables

Started by Andy Colsonabout 15 years ago69 messages
#1Andy Colson
andy@squeakycode.net

I am attempting to test this

https://commitfest.postgresql.org/action/patch_view?id=424

but I'm not sure which version of PG this should be applied to. (it would be really neat, on here:
https://commitfest.postgresql.org/action/commitfest_view?id=8
if there was a note that said, this test this stuff against git tag X or branch Y or whatever)

I got the git:

git clone git://git.postgresql.org/git/postgresql.git

downloaded the patches, and applied them ok. then did ./configure and make

after much spewage I got:

bufmgr.c: In function 'PrefetchBuffer':
bufmgr.c:126:10: error: 'struct RelationData' has no member named 'rd_istemp'
make[4]: *** [bufmgr.o] Error 1

Just to make sure everything was ok with the original, I reset:

git reset --hard HEAD^
./configure
make
and all was well.

so I tried again:
make clean
make maintainer-clean

patch -p1 < relpersistence-v1.patch
.. ok ..

but then...

$ patch -p1 < unlogged-tables-v1.patch
patching file doc/src/sgml/indexam.sgml
patching file doc/src/sgml/ref/create_table.sgml
patching file doc/src/sgml/ref/create_table_as.sgml
patching file src/backend/access/gin/gininsert.c
patching file src/backend/access/gist/gist.c
patching file src/backend/access/hash/hash.c
patching file src/backend/access/nbtree/nbtree.c
patching file src/backend/access/transam/xlog.c
patching file src/backend/catalog/catalog.c
patching file src/backend/catalog/heap.c
patching file src/backend/catalog/index.c
patching file src/backend/catalog/storage.c
patching file src/backend/parser/gram.y
patching file src/backend/storage/file/Makefile
patching file src/backend/storage/file/copydir.c
patching file src/backend/storage/file/fd.c
The next patch would create the file src/backend/storage/file/reinit.c,
which already exists! Assume -R? [n]

That didnt happen the first time... I'm almost positive.

Not sure what I should do now.

-Andy

#2Robert Haas
robertmhaas@gmail.com
In reply to: Andy Colson (#1)
3 attachment(s)
Re: unlogged tables

On Mon, Nov 15, 2010 at 8:56 PM, Andy Colson <andy@squeakycode.net> wrote:

I am attempting to test this

https://commitfest.postgresql.org/action/patch_view?id=424

but I'm not sure which version of PG this should be applied to.  (it would
be really neat, on here:
https://commitfest.postgresql.org/action/commitfest_view?id=8
if there was a note that said, this test this stuff against git tag X or
branch Y or whatever)

They're pretty much all against the master branch.

I got the git:

git clone git://git.postgresql.org/git/postgresql.git

downloaded the patches, and applied them ok.  then did ./configure and make

after much spewage I got:

bufmgr.c: In function 'PrefetchBuffer':
bufmgr.c:126:10: error: 'struct RelationData' has no member named
'rd_istemp'
make[4]: *** [bufmgr.o] Error 1

Woops. Good catch. I guess USE_PREFETCH isn't defined on my system.
That line needs to be changed to say RelationUsesLocalBuffers(reln)
rather than reln->rd_istemp. Updated patches attached.

That didnt happen the first time... I'm almost positive.

When you applied the patches the first time, it created that file; but
git reset --hard doesn't remove untracked files.

Not sure what I should do now.

git clean -dfx
git reset --hard
git pull

Apply attached patches.

configure
make
make install

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

unlogged-tables-v2.patchapplication/octet-stream; name=unlogged-tables-v2.patchDownload
commit 02ed85317460134a95236906233974f2a621b615
Author: Robert Haas <rhaas@postgresql.org>
Date:   Sat Nov 13 08:30:55 2010 -0500

    Support unlogged tables.
    
    The contents of an unlogged table are WAL-logged; thus, they are not
    crash-safe and do not appear on standby servers.  On restart, they are
    truncated.
    
    Currently, only btree indexes are support on unlogged tables.

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 925aac4..c599b95 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -167,6 +167,17 @@ ambuild (Relation heapRelation,
 
   <para>
 <programlisting>
+void
+ambuildempty (Relation indexRelation);
+</programlisting>
+   Build an empty index, and write it to the initialization fork (INIT_FORKNUM)
+   of the given relation.  This method is called only for unlogged tables; the
+   empty index written to the initialization fork will be copied over the main
+   relation fork on each server restart.
+  </para>
+
+  <para>
+<programlisting>
 bool
 aminsert (Relation indexRelation,
           Datum *values,
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 8635e80..7b0e14d 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
   { <replaceable class="PARAMETER">column_name</replaceable> <replaceable class="PARAMETER">data_type</replaceable> [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
     | <replaceable>table_constraint</replaceable>
     | LIKE <replaceable>parent_table</replaceable> [ <replaceable>like_option</replaceable> ... ] }
@@ -32,7 +32,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="PARAMETER">tablespace</replaceable> ]
 
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
     OF <replaceable class="PARAMETER">type_name</replaceable> [ (
   { <replaceable class="PARAMETER">column_name</replaceable> WITH OPTIONS [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
     | <replaceable>table_constraint</replaceable> }
@@ -164,6 +164,22 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
    </varlistentry>
 
    <varlistentry>
+    <term><literal>UNLOGGED</></term>
+    <listitem>
+     <para>
+      If specified, the table is created as an unlogged table.  Data written
+      to unlogged tables is not written to the write-ahead log (see <xref
+      linkend="wal">), which makes them considerably faster than ordinary
+      tables.  However, it also means that the data stored in the tables is not
+      copied to standby servers and does not survive if
+      <productname>PostgreSQL</productname> is restarted.  Unlogged tables are
+      automatically truncated on restart.  Any indexes created on an unlogged
+      table are automatically unlogged as well.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>IF NOT EXISTS</></term>
     <listitem>
      <para>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 86da68b..0ea7ec2 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
     [ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -82,6 +82,16 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name
    </varlistentry>
 
    <varlistentry>
+    <term><literal>UNLOGGED</></term>
+    <listitem>
+     <para>
+      If specified, the table is created as an unlogged table.
+      Refer to <xref linkend="sql-createtable"> for details.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable>table_name</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 8681ede..7ec12b0 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -412,6 +412,19 @@ ginbuild(PG_FUNCTION_ARGS)
 }
 
 /*
+ *	ginbuildempty() -- build an empty gin index in the initialization fork
+ */
+Datum
+ginbuildempty(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("unlogged GIN indexes are not supported")));
+
+	PG_RETURN_VOID();
+}
+
+/*
  * Inserts value during normal insertion
  */
 static uint32
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index a7dc2a5..fdfb5d4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -210,6 +210,19 @@ gistbuildCallback(Relation index,
 }
 
 /*
+ *	gistbuildempty() -- build an empty gist index in the initialization fork
+ */
+Datum
+gistbuildempty(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("unlogged GIST indexes are not supported")));
+
+	PG_RETURN_VOID();
+}
+
+/*
  *	gistinsert -- wrapper for GiST tuple insertion.
  *
  *	  This is the public interface routine for tuple insertion in GiSTs.
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index bb46446..cbe8682 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -114,6 +114,19 @@ hashbuild(PG_FUNCTION_ARGS)
 }
 
 /*
+ *	hashbuildempty() -- build an empty hash index in the initialization fork
+ */
+Datum
+hashbuildempty(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("unlogged hash indexes are not supported")));
+
+	PG_RETURN_VOID();
+}
+
+/*
  * Per-tuple callback from IndexBuildHeapScan
  */
 static void
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 46aeb9e..6ccc16d 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -29,6 +29,7 @@
 #include "storage/indexfsm.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
+#include "storage/smgr.h"
 #include "utils/memutils.h"
 
 
@@ -205,6 +206,36 @@ btbuildCallback(Relation index,
 }
 
 /*
+ *	btbuildempty() -- build an empty btree index in the initialization fork
+ */
+Datum
+btbuildempty(PG_FUNCTION_ARGS)
+{
+	Relation	index = (Relation) PG_GETARG_POINTER(0);
+	Page		metapage;
+
+	/* Construct metapage. */
+	metapage = (Page) palloc(BLCKSZ);
+	_bt_initmetapage(metapage, P_NONE, 0);
+
+	/* Write the page.  If archiving/streaming, XLOG it. */
+	smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
+			  (char *) metapage, true);
+	if (XLogIsNeeded())
+		log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+					BTREE_METAPAGE, metapage);
+
+	/*
+	 * An immediate sync is require even if we xlog'd the page, because the
+	 * write did not go through shared_buffers and therefore a concurrent
+	 * checkpoint may have move the redo pointer past our xlog record.
+	 */
+	smgrimmedsync(index->rd_smgr, INIT_FORKNUM);
+
+	PG_RETURN_VOID();
+}
+
+/*
  *	btinsert() -- insert an index tuple into a btree.
  *
  *		Descend the tree recursively, find the appropriate location for our
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 70f4cc5..9a7b45f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -49,6 +49,7 @@
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/procarray.h"
+#include "storage/reinit.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
@@ -5996,6 +5997,16 @@ StartupXLOG(void)
 		InRecovery = true;
 	}
 
+	/*
+	 * Blow away any leftover data in unlogged relations.  This should be
+	 * done BEFORE starting up Hot Standby, so that read-only backends don't
+	 * see residual data from a previous startup.  If redo isn't required or
+	 * Hot Standby isn't enabled, we could do both the
+	 * UNLOGGED_RELATION_CLEANUP and UNLOGGED_RELATION_INIT phases in once
+	 * pass later on ... but for now, we don't bother to detect that case.
+	 */
+	ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
 	/* REDO */
 	if (InRecovery)
 	{
@@ -6524,6 +6535,13 @@ StartupXLOG(void)
 	PreallocXlogFiles(EndOfLog);
 
 	/*
+	 * Reset initial contents of unlogged relations.  This has to be done
+	 * AFTER recovery is complete so that any unlogged relations created
+	 * during recovery also get picked up.
+	 */
+	ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+	/*
 	 * Okay, we're officially UP.
 	 */
 	InRecovery = false;
@@ -7024,6 +7042,14 @@ ShutdownXLOG(int code, Datum arg)
 	ShutdownSUBTRANS();
 	ShutdownMultiXact();
 
+	/*
+	 * Remove any unlogged relation contents.  This will happen anyway at
+	 * the next startup; the point of doing it here is to avoid consuming
+	 * a potentially large amount of disk space while we're shut down, for
+	 * data that will be discarded anyway.
+	 */
+	ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
 	ereport(LOG,
 			(errmsg("database system is shut down")));
 }
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 88b5c2a..fc5a8fc 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -55,7 +55,8 @@
 const char *forkNames[] = {
 	"main",						/* MAIN_FORKNUM */
 	"fsm",						/* FSM_FORKNUM */
-	"vm"						/* VISIBILITYMAP_FORKNUM */
+	"vm",						/* VISIBILITYMAP_FORKNUM */
+	"init"						/* INIT_FORKNUM */
 };
 
 /*
@@ -82,14 +83,14 @@ forkname_to_number(char *forkName)
  * 		We use this to figure out whether a filename could be a relation
  * 		fork (as opposed to an oddly named stray file that somehow ended
  * 		up in the database directory).  If the passed string begins with
- * 		a fork name (other than the main fork name), we return its length.
- * 		If not, we return 0.
+ * 		a fork name (other than the main fork name), we return its length,
+ *	    and set *fork (if not NULL) to the fork number.  If not, we return 0.
  *
  * Note that the present coding assumes that there are no fork names which
  * are prefixes of other fork names.
  */
 int
-forkname_chars(const char *str)
+forkname_chars(const char *str, ForkNumber *fork)
 {
 	ForkNumber	forkNum;
 
@@ -97,7 +98,11 @@ forkname_chars(const char *str)
 	{
 		int len = strlen(forkNames[forkNum]);
 		if (strncmp(forkNames[forkNum], str, len) == 0)
+		{
+			if (fork)
+				*fork = forkNum;
 			return len;
+		}
 	}
 	return 0;
 }
@@ -537,6 +542,7 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
 		case RELPERSISTENCE_TEMP:
 			backend = MyBackendId;
 			break;
+		case RELPERSISTENCE_UNLOGGED:
 		case RELPERSISTENCE_PERMANENT:
 			backend = InvalidBackendId;
 			break;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cda9000..cd287b1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -317,8 +317,8 @@ heap_create(const char *relname,
 	/*
 	 * Have the storage manager create the relation's disk file, if needed.
 	 *
-	 * We only create the main fork here, other forks will be created on
-	 * demand.
+	 * We only create the main fork here, other forks will be created as
+	 * needed.
 	 */
 	if (create_storage)
 	{
@@ -1207,6 +1207,41 @@ heap_create_with_catalog(const char *relname,
 		register_on_commit_action(relid, oncommit);
 
 	/*
+	 * If this is an unlogged relation, it needs an init fork so that it
+	 * can be correctly reinitialized on restart.
+	 */
+	if (relpersistence == RELPERSISTENCE_UNLOGGED)
+	{
+		Page		dummypage;
+
+		Assert(relkind == RELKIND_RELATION || relkind == RELKIND_TOASTVALUE);
+
+		/*
+		 * Technically, we just write an empty file here, but then there's
+		 * nothing to XLOG.  We could introduce a dedicated XLOG record to
+		 * create an empty relation fork, but it's easier to just
+		 * XLOG a blank page, which (during redo) will create the fork
+		 * automatically.
+		 */
+		dummypage = (Page) palloc0(BLCKSZ);
+
+		/* Create form, write page.  If archiving/streaming, XLOG it. */
+		smgrcreate(new_rel_desc->rd_smgr, INIT_FORKNUM, false);
+		smgrwrite(new_rel_desc->rd_smgr, INIT_FORKNUM, 0,
+				  (char *) dummypage, true);
+		if (XLogIsNeeded())
+			log_newpage(&new_rel_desc->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+						0, dummypage);
+
+		/*
+		 * An immediate sync is require even if we xlog'd the page, because the
+		 * write did not go through shared_buffers and therefore a concurrent
+		 * checkpoint may have move the redo pointer past our xlog record.
+		 */
+		smgrimmedsync(new_rel_desc->rd_smgr, INIT_FORKNUM);
+	}
+
+	/*
 	 * ok, the relation has been cataloged, so close our relations and return
 	 * the OID of the newly created relation.
 	 */
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8fbe8eb..22f0959 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -967,6 +967,17 @@ index_create(Oid heapRelationId,
 	}
 
 	/*
+	 * If this is an unlogged index, we need to write out an init fork for it.
+	 */
+	if (relpersistence == RELPERSISTENCE_UNLOGGED)
+	{
+		RegProcedure	ambuildempty = indexRelation->rd_am->ambuildempty;
+		RelationOpenSmgr(indexRelation);
+		smgrcreate(indexRelation->rd_smgr, INIT_FORKNUM, false);
+		OidFunctionCall1(ambuildempty, PointerGetDatum(indexRelation));
+	}
+
+	/*
 	 * Close the heap and index; but we keep the locks that we acquired above
 	 * until end of transaction.
 	 */
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 671aaff..34ec77d 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -111,6 +111,10 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
 			backend = MyBackendId;
 			needs_wal = false;
 			break;
+		case RELPERSISTENCE_UNLOGGED:
+			backend = InvalidBackendId;
+			needs_wal = false;
+			break;
 		case RELPERSISTENCE_PERMANENT:
 			backend = InvalidBackendId;
 			needs_wal = true;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 06707da..790c585 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -536,8 +536,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
 	TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
 	TRUNCATE TRUSTED TYPE_P
 
-	UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNTIL
-	UPDATE USER USING
+	UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
+	UNTIL UPDATE USER USING
 
 	VACUUM VALID VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
 	VERBOSE VERSION_P VIEW VOLATILE
@@ -2353,6 +2353,7 @@ OptTemp:	TEMPORARY					{ $$ = RELPERSISTENCE_TEMP; }
 			| LOCAL TEMP				{ $$ = RELPERSISTENCE_TEMP; }
 			| GLOBAL TEMPORARY			{ $$ = RELPERSISTENCE_TEMP; }
 			| GLOBAL TEMP				{ $$ = RELPERSISTENCE_TEMP; }
+			| UNLOGGED					{ $$ = RELPERSISTENCE_UNLOGGED; }
 			| /*EMPTY*/					{ $$ = RELPERSISTENCE_PERMANENT; }
 		;
 
@@ -7839,6 +7840,11 @@ OptTempTableName:
 					$$ = $4;
 					$$->relpersistence = RELPERSISTENCE_TEMP;
 				}
+			| UNLOGGED opt_table qualified_name
+				{
+					$$ = $3;
+					$$->relpersistence = RELPERSISTENCE_UNLOGGED;
+				}
 			| TABLE qualified_name
 				{
 					$$ = $2;
@@ -11305,6 +11311,7 @@ unreserved_keyword:
 			| UNENCRYPTED
 			| UNKNOWN
 			| UNLISTEN
+			| UNLOGGED
 			| UNTIL
 			| UPDATE
 			| VACUUM
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 3b93aa1..d2198f2 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/storage/file
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = fd.o buffile.o copydir.o
+OBJS = fd.o buffile.o copydir.o reinit.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 4a10563..5af64d7 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -38,7 +38,6 @@
 #endif
 
 
-static void copy_file(char *fromfile, char *tofile);
 static void fsync_fname(char *fname, bool isdir);
 
 
@@ -142,7 +141,7 @@ copydir(char *fromdir, char *todir, bool recurse)
 /*
  * copy one file
  */
-static void
+void
 copy_file(char *fromfile, char *tofile)
 {
 	char	   *buffer;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index fd5ec78..b218f70 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2054,7 +2054,7 @@ looks_like_temp_rel_name(const char *name)
 	/* We might have _forkname or .segment or both. */
 	if (name[pos] == '_')
 	{
-		int		forkchar = forkname_chars(&name[pos+1]);
+		int		forkchar = forkname_chars(&name[pos+1], NULL);
 		if (forkchar <= 0)
 			return false;
 		pos += forkchar + 1;
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
new file mode 100644
index 0000000..b75178b
--- /dev/null
+++ b/src/backend/storage/file/reinit.c
@@ -0,0 +1,396 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.c
+ *	  Reinitialization of unlogged relations
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/file/reinit.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/catalog.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "storage/reinit.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+
+static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
+									 int op);
+static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
+								   int op);
+static bool parse_filename_for_nontemp_relation(const char *name,
+									int *oidchars, ForkNumber *fork);
+
+typedef struct {
+	char oid[OIDCHARS+1];
+} unlogged_relation_entry;
+
+/*
+ * Reset unlogged relations from before the last restart.
+ *
+ * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any
+ * relation with an "init" fork, except for the "init" fork itself.
+ *
+ * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main
+ * fork.
+ */
+void
+ResetUnloggedRelations(int op)
+{
+	char		temp_path[MAXPGPATH];
+	DIR		   *spc_dir;
+	struct dirent *spc_de;
+	MemoryContext tmpctx, oldctx;
+
+	/* Log it. */
+	ereport(DEBUG1,
+			(errmsg("resetting unlogged relations: cleanup %d init %d",
+			 (op & UNLOGGED_RELATION_CLEANUP) != 0,
+			 (op & UNLOGGED_RELATION_INIT) != 0)));
+
+	/*
+	 * Just to be sure we don't leak any memory, let's create a temporary
+	 * memory context for this operation.
+	 */
+	tmpctx = AllocSetContextCreate(CurrentMemoryContext,
+								   "ResetUnloggedRelations",
+								   ALLOCSET_DEFAULT_MINSIZE,
+								   ALLOCSET_DEFAULT_INITSIZE,
+								   ALLOCSET_DEFAULT_MAXSIZE);
+	oldctx = MemoryContextSwitchTo(tmpctx);
+
+	/*
+	 * First process unlogged files in pg_default ($PGDATA/base)
+	 */
+	ResetUnloggedRelationsInTablespaceDir("base", op);
+
+	/*
+	 * Cycle through directories for all non-default tablespaces.
+	 */
+	spc_dir = AllocateDir("pg_tblspc");
+
+	while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL)
+	{
+		if (strcmp(spc_de->d_name, ".") == 0 ||
+			strcmp(spc_de->d_name, "..") == 0)
+			continue;
+
+		snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
+			spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
+		ResetUnloggedRelationsInTablespaceDir(temp_path, op);
+	}
+
+	FreeDir(spc_dir);
+
+	/*
+	 * Restore memory context.
+	 */
+	MemoryContextSwitchTo(oldctx);
+	MemoryContextDelete(tmpctx);
+}
+
+/* Process one tablespace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
+{
+	DIR		   *ts_dir;
+	struct dirent *de;
+	char		dbspace_path[MAXPGPATH];
+
+	ts_dir = AllocateDir(tsdirname);
+	if (ts_dir == NULL)
+	{
+		/* anything except ENOENT is fishy */
+		if (errno != ENOENT)
+			elog(LOG,
+				 "could not open tablespace directory \"%s\": %m",
+				 tsdirname);
+		return;
+	}
+
+	while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
+	{
+		int		i = 0;
+
+		/*
+		 * We're only interested in the per-database directories, which have
+		 * numeric names.  Note that this code will also (properly) ignore "."
+		 * and "..".
+		 */
+		while (isdigit((unsigned char) de->d_name[i]))
+			++i;
+		if (de->d_name[i] != '\0' || i == 0)
+			continue;
+
+		snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
+				 tsdirname, de->d_name);
+		ResetUnloggedRelationsInDbspaceDir(dbspace_path, op);
+	}
+
+	FreeDir(ts_dir);
+}
+
+/* Process one per-dbspace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
+{
+	DIR		   *dbspace_dir;
+	struct dirent *de;
+	char		rm_path[MAXPGPATH];
+
+	/* Caller must specify at least one operation. */
+	Assert((op & (UNLOGGED_RELATION_CLEANUP | UNLOGGED_RELATION_INIT)) != 0);
+
+	/*
+	 * Cleanup is a two-pass operation.  First, we go through and identify all
+	 * the files with init forks.  Then, we go through again and nuke
+	 * everything with the same OID except the init fork.
+	 */
+	if ((op & UNLOGGED_RELATION_CLEANUP) != 0)
+	{
+		HTAB	   *hash = NULL;
+		HASHCTL		ctl;
+
+		/* Open the directory. */
+		dbspace_dir = AllocateDir(dbspacedirname);
+		if (dbspace_dir == NULL)
+		{
+			elog(LOG,
+				 "could not open dbspace directory \"%s\": %m",
+				 dbspacedirname);
+			return;
+		}
+
+		/*
+		 * It's possible that someone could create a ton of unlogged relations
+		 * in the same database & tablespace, so we'd better use a hash table
+		 * rather than an array or linked list to keep track of which files
+		 * need to be reset.  Otherwise, this cleanup operation would be
+		 * O(n^2).
+		 */
+		ctl.keysize = sizeof(unlogged_relation_entry);
+		ctl.entrysize = sizeof(unlogged_relation_entry);
+		hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
+
+		/* Scan the directory. */
+		while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+		{
+			ForkNumber forkNum;
+			int		oidchars;
+			unlogged_relation_entry ent;
+
+			/* Skip anything that doesn't look like a relation data file. */
+			if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+													 &forkNum))
+				continue;
+
+			/* Also skip it unless this is the init fork. */
+			if (forkNum != INIT_FORKNUM)
+				continue;
+
+			/*
+			 * Put the OID portion of the name into the hash table, if it isn't
+			 * already.
+			 */
+			memset(ent.oid, 0, sizeof(ent.oid));
+			memcpy(ent.oid, de->d_name, oidchars);
+			hash_search(hash, &ent, HASH_ENTER, NULL);
+		}
+
+		/* Done with the first pass. */
+		FreeDir(dbspace_dir);
+
+		/*
+		 * If we didn't find any init forks, there's no point in continuing;
+		 * we can bail out now.
+		 */
+		if (hash_get_num_entries(hash) == 0)
+		{
+			hash_destroy(hash);
+			return;
+		}
+
+		/*
+		 * Now, make a second pass and remove anything that matches. First,
+		 * reopen the directory.
+		 */
+		dbspace_dir = AllocateDir(dbspacedirname);
+		if (dbspace_dir == NULL)
+		{
+			elog(LOG,
+				 "could not open dbspace directory \"%s\": %m",
+				 dbspacedirname);
+			hash_destroy(hash);
+			return;
+		}
+
+		/* Scan the directory. */
+		while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+		{
+			ForkNumber forkNum;
+			int		oidchars;
+			bool	found;
+			unlogged_relation_entry ent;
+
+			/* Skip anything that doesn't look like a relation data file. */
+			if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+													 &forkNum))
+				continue;
+
+			/* We never remove the init fork. */
+			if (forkNum == INIT_FORKNUM)
+				continue;
+
+			/*
+			 * See whether the OID portion of the name shows up in the hash
+			 * table.
+			 */
+			memset(ent.oid, 0, sizeof(ent.oid));
+			memcpy(ent.oid, de->d_name, oidchars);
+			hash_search(hash, &ent, HASH_FIND, &found);
+
+			/* If so, nuke it! */
+			if (found)
+			{
+				snprintf(rm_path, sizeof(rm_path), "%s/%s",
+					dbspacedirname, de->d_name);
+				/*
+				 * It's tempting to actually throw an error here, but since
+				 * this code gets run during database startup, that could
+				 * result in the database failing to start.  (XXX Should we do
+				 * it anyway?)
+				 */
+				if (unlink(rm_path))
+					elog(LOG, "could not unlink file \"%s\": %m", rm_path);
+				else
+					elog(DEBUG2, "unlinked file \"%s\"", rm_path);
+			}
+		}
+
+		/* Cleanup is complete. */
+		FreeDir(dbspace_dir);
+		hash_destroy(hash);
+	}
+
+	/*
+	 * Initialization happens after cleanup is complete: we copy each init
+	 * fork file to the corresponding main fork file.  Note that if we are
+	 * asked to do both cleanup and init, we may never get here: if the cleanup
+	 * code determines that there are no init forks in this dbspace, it will
+	 * return before we get to this point.
+	 */
+	if ((op & UNLOGGED_RELATION_INIT) != 0)
+	{
+		/* Open the directory. */
+		dbspace_dir = AllocateDir(dbspacedirname);
+		if (dbspace_dir == NULL)
+		{
+			/* we just saw this directory, so it really ought to be there */
+			elog(LOG,
+				 "could not open dbspace directory \"%s\": %m",
+				 dbspacedirname);
+			return;
+		}
+
+		/* Scan the directory. */
+		while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+		{
+			ForkNumber forkNum;
+			int		oidchars;
+			char	oidbuf[OIDCHARS+1];
+			char	srcpath[MAXPGPATH];
+			char	dstpath[MAXPGPATH];
+
+			/* Skip anything that doesn't look like a relation data file. */
+			if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+													 &forkNum))
+				continue;
+
+			/* Also skip it unless this is the init fork. */
+			if (forkNum != INIT_FORKNUM)
+				continue;
+
+			/* Construct source pathname. */
+			snprintf(srcpath, sizeof(srcpath), "%s/%s",
+					 dbspacedirname, de->d_name);
+
+			/* Construct destination pathname. */
+			memcpy(oidbuf, de->d_name, oidchars);
+			oidbuf[oidchars] = '\0';
+			snprintf(dstpath, sizeof(dstpath), "%s/%s%s",
+					 dbspacedirname, oidbuf, de->d_name + oidchars + 1 +
+					 strlen(forkNames[INIT_FORKNUM]));
+
+			/* OK, we're ready to perform the actual copy. */
+			elog(DEBUG2, "copying %s to %s", srcpath, dstpath);
+			copy_file(srcpath, dstpath);
+		}
+
+		/* Done with the first pass. */
+		FreeDir(dbspace_dir);
+	}
+}
+
+/*
+ * Basic parsing of putative relation filenames.
+ *
+ * This funtion returns true if the file appears to be in the correct format
+ * for a non-temporary relation and false otherwise.
+ *
+ * NB: If this function returns true, the caller is entitled to assume that
+ * *oidchars has been set to the a value no more than OIDCHARS, and thus
+ * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
+ * portion of the filename.  This is critical to protect against a possible
+ * buffer overrun.
+ */
+static bool
+parse_filename_for_nontemp_relation(const char *name, int *oidchars,
+									ForkNumber *fork)
+{
+	int			pos;
+
+	/* Look for a non-empty string of digits (that isn't too long). */
+	for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
+		;
+	if (pos == 0 || pos > OIDCHARS)
+		return false;
+	*oidchars = pos;
+
+	/* Check for a fork name. */
+	if (name[pos] != '_')
+		*fork = MAIN_FORKNUM;
+	else
+	{
+		int		forkchar;
+
+		forkchar = forkname_chars(&name[pos+1], fork);
+		if (forkchar <= 0)
+			return false;
+		pos += forkchar + 1;
+	}
+
+	/* Check for a segment number. */
+	if (name[pos] == '.')
+	{
+		int		segchar;
+		for (segchar = 1; isdigit((unsigned char) name[pos+segchar]); ++segchar)
+			;
+		if (segchar <= 1)
+			return false;
+		pos += segchar;
+	}
+
+	/* Now we should be at the end. */
+	if (name[pos] != '\0')
+		return false;
+	return true;
+}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e352cda..f33c29e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -615,6 +615,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 	/* Determine owning backend. */
 	switch (relform->relpersistence)
 	{
+		case RELPERSISTENCE_UNLOGGED:
 		case RELPERSISTENCE_PERMANENT:
 			backend = InvalidBackendId;
 			break;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 12b0f07..f3ebdde 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -858,6 +858,7 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
 	switch (relation->rd_rel->relpersistence)
 	{
+		case RELPERSISTENCE_UNLOGGED:
 		case RELPERSISTENCE_PERMANENT:
 			relation->rd_backend = InvalidBackendId;
 			break;
@@ -2564,6 +2565,7 @@ RelationBuildLocalRelation(const char *relname,
 	rel->rd_rel->relpersistence = relpersistence;
 	switch (relpersistence)
 	{
+		case RELPERSISTENCE_UNLOGGED:
 		case RELPERSISTENCE_PERMANENT:
 			rel->rd_backend = InvalidBackendId;
 			break;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 55ea684..30ca0b2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3447,6 +3447,7 @@ getTables(int *numTables)
 	int			i_relhasrules;
 	int			i_relhasoids;
 	int			i_relfrozenxid;
+	int			i_relpersistence;
 	int			i_owning_tab;
 	int			i_owning_col;
 	int			i_reltablespace;
@@ -3477,7 +3478,40 @@ getTables(int *numTables)
 	 * we cannot correctly identify inherited columns, owned sequences, etc.
 	 */
 
-	if (g_fout->remoteVersion >= 90000)
+	if (g_fout->remoteVersion >= 90100)
+	{
+		/*
+		 * Left join to pick up dependency info linking sequences to their
+		 * owning column, if any (note this dependency is AUTO as of 8.2)
+		 */
+		appendPQExpBuffer(query,
+						  "SELECT c.tableoid, c.oid, c.relname, "
+						  "c.relacl, c.relkind, c.relnamespace, "
+						  "(%s c.relowner) AS rolname, "
+						  "c.relchecks, c.relhastriggers, "
+						  "c.relhasindex, c.relhasrules, c.relhasoids, "
+						  "c.relfrozenxid, c.relpersistence, "
+						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
+						  "d.refobjid AS owning_tab, "
+						  "d.refobjsubid AS owning_col, "
+						  "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
+						"array_to_string(c.reloptions, ', ') AS reloptions, "
+						  "array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ', ') AS toast_reloptions "
+						  "FROM pg_class c "
+						  "LEFT JOIN pg_depend d ON "
+						  "(c.relkind = '%c' AND "
+						  "d.classid = c.tableoid AND d.objid = c.oid AND "
+						  "d.objsubid = 0 AND "
+						  "d.refclassid = c.tableoid AND d.deptype = 'a') "
+					   "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+						  "WHERE c.relkind in ('%c', '%c', '%c', '%c') "
+						  "ORDER BY c.oid",
+						  username_subquery,
+						  RELKIND_SEQUENCE,
+						  RELKIND_RELATION, RELKIND_SEQUENCE,
+						  RELKIND_VIEW, RELKIND_COMPOSITE_TYPE);
+	}
+	else if (g_fout->remoteVersion >= 90000)
 	{
 		/*
 		 * Left join to pick up dependency info linking sequences to their
@@ -3489,7 +3523,7 @@ getTables(int *numTables)
 						  "(%s c.relowner) AS rolname, "
 						  "c.relchecks, c.relhastriggers, "
 						  "c.relhasindex, c.relhasrules, c.relhasoids, "
-						  "c.relfrozenxid, "
+						  "c.relfrozenxid, 'p' AS relpersistence, "
 						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -3522,7 +3556,7 @@ getTables(int *numTables)
 						  "(%s c.relowner) AS rolname, "
 						  "c.relchecks, c.relhastriggers, "
 						  "c.relhasindex, c.relhasrules, c.relhasoids, "
-						  "c.relfrozenxid, "
+						  "c.relfrozenxid, 'p' AS relpersistence, "
 						  "NULL AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -3555,7 +3589,7 @@ getTables(int *numTables)
 						  "(%s relowner) AS rolname, "
 						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
 						  "relhasindex, relhasrules, relhasoids, "
-						  "relfrozenxid, "
+						  "relfrozenxid, 'p' AS relpersistence, "
 						  "NULL AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -3587,7 +3621,7 @@ getTables(int *numTables)
 						  "(%s relowner) AS rolname, "
 						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
 						  "relhasindex, relhasrules, relhasoids, "
-						  "0 AS relfrozenxid, "
+						  "0 AS relfrozenxid, 'p' AS relpersistence, "
 						  "NULL AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -3619,7 +3653,7 @@ getTables(int *numTables)
 						  "(%s relowner) AS rolname, "
 						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
 						  "relhasindex, relhasrules, relhasoids, "
-						  "0 AS relfrozenxid, "
+						  "0 AS relfrozenxid, 'p' AS relpersistence, "
 						  "NULL AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -3647,7 +3681,7 @@ getTables(int *numTables)
 						  "(%s relowner) AS rolname, "
 						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
 						  "relhasindex, relhasrules, relhasoids, "
-						  "0 AS relfrozenxid, "
+						  "0 AS relfrozenxid, 'p' AS relpersistence, "
 						  "NULL AS reloftype, "
 						  "NULL::oid AS owning_tab, "
 						  "NULL::int4 AS owning_col, "
@@ -3670,7 +3704,7 @@ getTables(int *numTables)
 						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
 						  "relhasindex, relhasrules, "
 						  "'t'::bool AS relhasoids, "
-						  "0 AS relfrozenxid, "
+						  "0 AS relfrozenxid, 'p' AS relpersistence, "
 						  "NULL AS reloftype, "
 						  "NULL::oid AS owning_tab, "
 						  "NULL::int4 AS owning_col, "
@@ -3703,7 +3737,7 @@ getTables(int *numTables)
 						  "relchecks, (reltriggers <> 0) AS relhastriggers, "
 						  "relhasindex, relhasrules, "
 						  "'t'::bool AS relhasoids, "
-						  "0 as relfrozenxid, "
+						  "0 as relfrozenxid, 'p' AS relpersistence, "
 						  "NULL AS reloftype, "
 						  "NULL::oid AS owning_tab, "
 						  "NULL::int4 AS owning_col, "
@@ -3749,6 +3783,7 @@ getTables(int *numTables)
 	i_relhasrules = PQfnumber(res, "relhasrules");
 	i_relhasoids = PQfnumber(res, "relhasoids");
 	i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+	i_relpersistence = PQfnumber(res, "relpersistence");
 	i_owning_tab = PQfnumber(res, "owning_tab");
 	i_owning_col = PQfnumber(res, "owning_col");
 	i_reltablespace = PQfnumber(res, "reltablespace");
@@ -3783,6 +3818,7 @@ getTables(int *numTables)
 		tblinfo[i].rolname = strdup(PQgetvalue(res, i, i_rolname));
 		tblinfo[i].relacl = strdup(PQgetvalue(res, i, i_relacl));
 		tblinfo[i].relkind = *(PQgetvalue(res, i, i_relkind));
+		tblinfo[i].relpersistence = *(PQgetvalue(res, i, i_relpersistence));
 		tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
 		tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
 		tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
@@ -10968,8 +11004,12 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 		if (binary_upgrade)
 			binary_upgrade_set_relfilenodes(q, tbinfo->dobj.catId.oid, false);
 
-		appendPQExpBuffer(q, "CREATE TABLE %s",
-						  fmtId(tbinfo->dobj.name));
+		if (tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED)
+			appendPQExpBuffer(q, "CREATE UNLOGGED TABLE %s",
+							  fmtId(tbinfo->dobj.name));
+		else
+			appendPQExpBuffer(q, "CREATE TABLE %s",
+							  fmtId(tbinfo->dobj.name));
 		if (tbinfo->reloftype)
 			appendPQExpBuffer(q, " OF %s", tbinfo->reloftype);
 		actual_atts = 0;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 7885535..4313fd8 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -220,6 +220,7 @@ typedef struct _tableInfo
 	char	   *rolname;		/* name of owner, or empty string */
 	char	   *relacl;
 	char		relkind;
+	char		relpersistence;	/* relation persistence */
 	char	   *reltablespace;	/* relation tablespace */
 	char	   *reloptions;		/* options specified by WITH (...) */
 	char	   *toast_reloptions;		/* ditto, for the TOAST table */
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c4370a1..207d028 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1118,6 +1118,7 @@ describeOneTableDetails(const char *schemaname,
 		Oid			tablespace;
 		char	   *reloptions;
 		char	   *reloftype;
+		char		relpersistence;
 	}			tableinfo;
 	bool		show_modifiers = false;
 	bool		retval;
@@ -1138,6 +1139,23 @@ describeOneTableDetails(const char *schemaname,
 			  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
 						  "c.relhastriggers, c.relhasoids, "
 						  "%s, c.reltablespace, "
+						  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
+						  "c.relpersistence\n"
+						  "FROM pg_catalog.pg_class c\n "
+		   "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+						  "WHERE c.oid = '%s'\n",
+						  (verbose ?
+						   "pg_catalog.array_to_string(c.reloptions || "
+						   "array(select 'toast.' || x from pg_catalog.unnest(tc.reloptions) x), ', ')\n"
+						   : "''"),
+						  oid);
+	}
+	else if (pset.sversion >= 90000)
+	{
+		printfPQExpBuffer(&buf,
+			  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
+						  "c.relhastriggers, c.relhasoids, "
+						  "%s, c.reltablespace, "
 						  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END\n"
 						  "FROM pg_catalog.pg_class c\n "
 		   "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
@@ -1218,6 +1236,8 @@ describeOneTableDetails(const char *schemaname,
 		atooid(PQgetvalue(res, 0, 7)) : 0;
 	tableinfo.reloftype = (pset.sversion >= 90000 && strcmp(PQgetvalue(res, 0, 8), "") != 0) ?
 		strdup(PQgetvalue(res, 0, 8)) : 0;
+	tableinfo.relpersistence = (pset.sversion >= 90100 && strcmp(PQgetvalue(res, 0, 9), "") != 0) ?
+		PQgetvalue(res, 0, 9)[0] : 0;
 	PQclear(res);
 	res = NULL;
 
@@ -1269,8 +1289,12 @@ describeOneTableDetails(const char *schemaname,
 	switch (tableinfo.relkind)
 	{
 		case 'r':
-			printfPQExpBuffer(&title, _("Table \"%s.%s\""),
-							  schemaname, relationname);
+			if (tableinfo.relpersistence == 'u')
+				printfPQExpBuffer(&title, _("Unlogged Table \"%s.%s\""),
+								  schemaname, relationname);
+			else
+				printfPQExpBuffer(&title, _("Table \"%s.%s\""),
+								  schemaname, relationname);
 			break;
 		case 'v':
 			printfPQExpBuffer(&title, _("View \"%s.%s\""),
@@ -1281,8 +1305,12 @@ describeOneTableDetails(const char *schemaname,
 							  schemaname, relationname);
 			break;
 		case 'i':
-			printfPQExpBuffer(&title, _("Index \"%s.%s\""),
-							  schemaname, relationname);
+			if (tableinfo.relpersistence == 'u')
+				printfPQExpBuffer(&title, _("Unlogged Index \"%s.%s\""),
+								  schemaname, relationname);
+			else
+				printfPQExpBuffer(&title, _("Index \"%s.%s\""),
+								  schemaname, relationname);
 			break;
 		case 's':
 			/* not used as of 8.2, but keep it for backwards compatibility */
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
index e2d7b45..b1eef92 100644
--- a/src/include/access/gin.h
+++ b/src/include/access/gin.h
@@ -389,6 +389,7 @@ extern void ginUpdateStats(Relation index, const GinStatsData *stats);
 
 /* gininsert.c */
 extern Datum ginbuild(PG_FUNCTION_ARGS);
+extern Datum ginbuildempty(PG_FUNCTION_ARGS);
 extern Datum gininsert(PG_FUNCTION_ARGS);
 extern void ginEntryInsert(Relation index, GinState *ginstate,
 			   OffsetNumber attnum, Datum value,
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 34cc5d5..1853696 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -235,6 +235,7 @@ typedef struct
 
 /* gist.c */
 extern Datum gistbuild(PG_FUNCTION_ARGS);
+extern Datum gistbuildempty(PG_FUNCTION_ARGS);
 extern Datum gistinsert(PG_FUNCTION_ARGS);
 extern MemoryContext createTempGistContext(void);
 extern void initGISTstate(GISTSTATE *giststate, Relation index);
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index d5899f4..52d1c93 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -242,6 +242,7 @@ typedef HashMetaPageData *HashMetaPage;
 /* public routines */
 
 extern Datum hashbuild(PG_FUNCTION_ARGS);
+extern Datum hashbuildempty(PG_FUNCTION_ARGS);
 extern Datum hashinsert(PG_FUNCTION_ARGS);
 extern Datum hashbeginscan(PG_FUNCTION_ARGS);
 extern Datum hashgettuple(PG_FUNCTION_ARGS);
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 3bbc4d1..283612e 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -555,6 +555,7 @@ typedef BTScanOpaqueData *BTScanOpaque;
  * prototypes for functions in nbtree.c (external entry points for btree)
  */
 extern Datum btbuild(PG_FUNCTION_ARGS);
+extern Datum btbuildempty(PG_FUNCTION_ARGS);
 extern Datum btinsert(PG_FUNCTION_ARGS);
 extern Datum btbeginscan(PG_FUNCTION_ARGS);
 extern Datum btgettuple(PG_FUNCTION_ARGS);
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 56dcdd5..40cb9ff 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -25,7 +25,7 @@
 
 extern const char *forkNames[];
 extern ForkNumber forkname_to_number(char *forkName);
-extern int forkname_chars(const char *str);
+extern int forkname_chars(const char *str, ForkNumber *);
 
 extern char *relpathbackend(RelFileNode rnode, BackendId backend,
 			  ForkNumber forknum);
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index c9b8e2d..e4d2c39 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -59,6 +59,7 @@ CATALOG(pg_am,2601)
 	regproc		ammarkpos;		/* "mark current scan position" function */
 	regproc		amrestrpos;		/* "restore marked scan position" function */
 	regproc		ambuild;		/* "build new index" function */
+	regproc		ambuildempty;	/* "build empty index" function */
 	regproc		ambulkdelete;	/* bulk-delete function */
 	regproc		amvacuumcleanup;	/* post-VACUUM cleanup function */
 	regproc		amcostestimate; /* estimate cost of an indexscan */
@@ -76,7 +77,7 @@ typedef FormData_pg_am *Form_pg_am;
  *		compiler constants for pg_am
  * ----------------
  */
-#define Natts_pg_am						26
+#define Natts_pg_am						27
 #define Anum_pg_am_amname				1
 #define Anum_pg_am_amstrategies			2
 #define Anum_pg_am_amsupport			3
@@ -99,26 +100,27 @@ typedef FormData_pg_am *Form_pg_am;
 #define Anum_pg_am_ammarkpos			20
 #define Anum_pg_am_amrestrpos			21
 #define Anum_pg_am_ambuild				22
-#define Anum_pg_am_ambulkdelete			23
-#define Anum_pg_am_amvacuumcleanup		24
-#define Anum_pg_am_amcostestimate		25
-#define Anum_pg_am_amoptions			26
+#define Anum_pg_am_ambuildempty			23
+#define Anum_pg_am_ambulkdelete			24
+#define Anum_pg_am_amvacuumcleanup		25
+#define Anum_pg_am_amcostestimate		26
+#define Anum_pg_am_amoptions			27
 
 /* ----------------
  *		initial contents of pg_am
  * ----------------
  */
 
-DATA(insert OID = 403 (  btree	5 1 t t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions ));
+DATA(insert OID = 403 (  btree	5 1 t t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
 DESCR("b-tree index access method");
 #define BTREE_AM_OID 403
-DATA(insert OID = 405 (  hash	1 1 f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
+DATA(insert OID = 405 (  hash	1 1 f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
 DESCR("hash index access method");
 #define HASH_AM_OID 405
-DATA(insert OID = 783 (  gist	0 7 f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
+DATA(insert OID = 783 (  gist	0 7 f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
 DESCR("GiST index access method");
 #define GIST_AM_OID 783
-DATA(insert OID = 2742 (  gin	0 5 f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
+DATA(insert OID = 2742 (  gin	0 5 f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
 DESCR("GIN index access method");
 #define GIN_AM_OID 2742
 
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 1edbfe3..39f9743 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -150,6 +150,7 @@ DESCR("");
 #define		  RELKIND_COMPOSITE_TYPE  'c'		/* composite type */
 
 #define		  RELPERSISTENCE_PERMANENT	'p'
+#define		  RELPERSISTENCE_UNLOGGED	'u'
 #define		  RELPERSISTENCE_TEMP		't'
 
 #endif   /* PG_CLASS_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8e5f502..e41d0b7 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -689,6 +689,8 @@ DATA(insert OID = 337 (  btrestrpos		   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 227
 DESCR("btree(internal)");
 DATA(insert OID = 338 (  btbuild		   PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ btbuild _null_ _null_ _null_ ));
 DESCR("btree(internal)");
+DATA(insert OID = 328 (  btbuildempty	   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ btbuildempty _null_ _null_ _null_ ));
+DESCR("btree(internal)");
 DATA(insert OID = 332 (  btbulkdelete	   PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ btbulkdelete _null_ _null_ _null_ ));
 DESCR("btree(internal)");
 DATA(insert OID = 972 (  btvacuumcleanup   PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ btvacuumcleanup _null_ _null_ _null_ ));
@@ -808,6 +810,8 @@ DATA(insert OID = 447 (  hashrestrpos	   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
 DESCR("hash(internal)");
 DATA(insert OID = 448 (  hashbuild		   PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ hashbuild _null_ _null_ _null_ ));
 DESCR("hash(internal)");
+DATA(insert OID = 327 (  hashbuildempty	   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ hashbuildempty _null_ _null_ _null_ ));
+DESCR("hash(internal)");
 DATA(insert OID = 442 (  hashbulkdelete    PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ hashbulkdelete _null_ _null_ _null_ ));
 DESCR("hash(internal)");
 DATA(insert OID = 425 (  hashvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ hashvacuumcleanup _null_ _null_ _null_ ));
@@ -1104,6 +1108,8 @@ DATA(insert OID = 781 (  gistrestrpos	   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
 DESCR("gist(internal)");
 DATA(insert OID = 782 (  gistbuild		   PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ gistbuild _null_ _null_ _null_ ));
 DESCR("gist(internal)");
+DATA(insert OID = 326 (  gistbuildempty	   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ gistbuildempty _null_ _null_ _null_ ));
+DESCR("gist(internal)");
 DATA(insert OID = 776 (  gistbulkdelete    PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ gistbulkdelete _null_ _null_ _null_ ));
 DESCR("gist(internal)");
 DATA(insert OID = 2561 (  gistvacuumcleanup   PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ gistvacuumcleanup _null_ _null_ _null_ ));
@@ -4339,6 +4345,8 @@ DATA(insert OID = 2737 (  ginrestrpos	   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
 DESCR("gin(internal)");
 DATA(insert OID = 2738 (  ginbuild		   PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ ginbuild _null_ _null_ _null_ ));
 DESCR("gin(internal)");
+DATA(insert OID = 325 (  ginbuildempty	   PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ ginbuildempty _null_ _null_ _null_ ));
+DESCR("gin(internal)");
 DATA(insert OID = 2739 (  ginbulkdelete    PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ ginbulkdelete _null_ _null_ _null_ ));
 DESCR("gin(internal)");
 DATA(insert OID = 2740 (  ginvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ ginvacuumcleanup _null_ _null_ _null_ ));
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 2c44cf7..3b038a0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -388,6 +388,7 @@ PG_KEYWORD("union", UNION, RESERVED_KEYWORD)
 PG_KEYWORD("unique", UNIQUE, RESERVED_KEYWORD)
 PG_KEYWORD("unknown", UNKNOWN, UNRESERVED_KEYWORD)
 PG_KEYWORD("unlisten", UNLISTEN, UNRESERVED_KEYWORD)
+PG_KEYWORD("unlogged", UNLOGGED, UNRESERVED_KEYWORD)
 PG_KEYWORD("until", UNTIL, UNRESERVED_KEYWORD)
 PG_KEYWORD("update", UPDATE, UNRESERVED_KEYWORD)
 PG_KEYWORD("user", USER, RESERVED_KEYWORD)
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 62d15cc..ebf6855 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -203,7 +203,7 @@
  * Enable debugging print statements for WAL-related operations; see
  * also the wal_debug GUC var.
  */
-/* #define WAL_DEBUG */
+#define WAL_DEBUG
 
 /*
  * Enable tracing of resource consumption during sort operations;
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index b24a98c..7c57724 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -14,5 +14,6 @@
 #define COPYDIR_H
 
 extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void copy_file(char *fromfile, char *tofile);
 
 #endif   /* COPYDIR_H */
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
new file mode 100644
index 0000000..9999dff
--- /dev/null
+++ b/src/include/storage/reinit.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.h
+ *	  Reinitialization of unlogged relations
+ *
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fd.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REINIT_H
+#define REINIT_H
+
+extern void ResetUnloggedRelations(int op);
+
+#define UNLOGGED_RELATION_CLEANUP		0x0001
+#define UNLOGGED_RELATION_INIT			0x0002
+
+#endif   /* REINIT_H */
diff --git a/src/include/storage/relfilenode.h b/src/include/storage/relfilenode.h
index 24a72e6..f71b233 100644
--- a/src/include/storage/relfilenode.h
+++ b/src/include/storage/relfilenode.h
@@ -27,7 +27,8 @@ typedef enum ForkNumber
 	InvalidForkNumber = -1,
 	MAIN_FORKNUM = 0,
 	FSM_FORKNUM,
-	VISIBILITYMAP_FORKNUM
+	VISIBILITYMAP_FORKNUM,
+	INIT_FORKNUM	
 
 	/*
 	 * NOTE: if you add a new fork, change MAX_FORKNUM below and update the
@@ -35,7 +36,7 @@ typedef enum ForkNumber
 	 */
 } ForkNumber;
 
-#define MAX_FORKNUM		VISIBILITYMAP_FORKNUM
+#define MAX_FORKNUM		INIT_FORKNUM
 
 /*
  * RelFileNode must provide all that we need to know to physically access
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 8474d8f..d952d6b 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -114,6 +114,7 @@ typedef struct RelationAmInfo
 	FmgrInfo	ammarkpos;
 	FmgrInfo	amrestrpos;
 	FmgrInfo	ambuild;
+	FmgrInfo	ambuildempty;
 	FmgrInfo	ambulkdelete;
 	FmgrInfo	amvacuumcleanup;
 	FmgrInfo	amcostestimate;
relpersistence-v2.patchapplication/octet-stream; name=relpersistence-v2.patchDownload
commit edb91e48410f6a70ce294a5fff4a4033576d8aee
Author: Robert Haas <rhaas@postgresql.org>
Date:   Mon Aug 16 21:02:11 2010 -0400

    Generalize concept of temporary relations to "relation persistence".
    
    This commit replaces pg_class.relistemp with pg_class.relpersistence;
    and also modifies the RangeVar node type to carry relpersistence rather
    than istemp.  It also removes removes rd_istemp from RelationData and
    instead performs the correct computation based on relpersistence.
    
    For clarity, we add three new macros: RelationNeedsWAL(),
    RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
    can clarify the purpose of each check that previous depended on
    rd_istemp.

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 070cd92..9d857a0 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -304,7 +304,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 
 			MarkBufferDirty(stack->buffer);
 
-			if (!btree->index->rd_istemp)
+			if (RelationNeedsWAL(btree->index))
 			{
 				XLogRecPtr	recptr;
 
@@ -373,7 +373,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 				MarkBufferDirty(lbuffer);
 				MarkBufferDirty(stack->buffer);
 
-				if (!btree->index->rd_istemp)
+				if (RelationNeedsWAL(btree->index))
 				{
 					XLogRecPtr	recptr;
 
@@ -422,7 +422,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
 				MarkBufferDirty(rbuffer);
 				MarkBufferDirty(stack->buffer);
 
-				if (!btree->index->rd_istemp)
+				if (RelationNeedsWAL(btree->index))
 				{
 					XLogRecPtr	recptr;
 
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 525f79c..74339c9 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -103,7 +103,7 @@ writeListPage(Relation index, Buffer buffer,
 
 	MarkBufferDirty(buffer);
 
-	if (!index->rd_istemp)
+	if (RelationNeedsWAL(index))
 	{
 		XLogRecData rdata[2];
 		ginxlogInsertListPage data;
@@ -384,7 +384,7 @@ ginHeapTupleFastInsert(Relation index, GinState *ginstate,
 	 */
 	MarkBufferDirty(metabuffer);
 
-	if (!index->rd_istemp)
+	if (RelationNeedsWAL(index))
 	{
 		XLogRecPtr	recptr;
 
@@ -564,7 +564,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
 			MarkBufferDirty(buffers[i]);
 		}
 
-		if (!index->rd_istemp)
+		if (RelationNeedsWAL(index))
 		{
 			XLogRecPtr	recptr;
 
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fa70e4f..8681ede 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -55,7 +55,7 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
 
 	MarkBufferDirty(buffer);
 
-	if (!index->rd_istemp)
+	if (RelationNeedsWAL(index))
 	{
 		XLogRecPtr	recptr;
 		XLogRecData rdata[2];
@@ -325,7 +325,7 @@ ginbuild(PG_FUNCTION_ARGS)
 	GinInitBuffer(RootBuffer, GIN_LEAF);
 	MarkBufferDirty(RootBuffer);
 
-	if (!index->rd_istemp)
+	if (RelationNeedsWAL(index))
 	{
 		XLogRecPtr	recptr;
 		XLogRecData rdata;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 27326ac..5f20ac9 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -410,7 +410,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
 
 	MarkBufferDirty(metabuffer);
 
-	if (!index->rd_istemp)
+	if (RelationNeedsWAL(index))
 	{
 		XLogRecPtr			recptr;
 		ginxlogUpdateMeta	data;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 7dfecff..4b35acb 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -93,7 +93,7 @@ xlogVacuumPage(Relation index, Buffer buffer)
 
 	Assert(GinPageIsLeaf(page));
 
-	if (index->rd_istemp)
+	if (!RelationNeedsWAL(index))
 		return;
 
 	data.node = index->rd_node;
@@ -308,7 +308,7 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
 		MarkBufferDirty(lBuffer);
 	MarkBufferDirty(dBuffer);
 
-	if (!gvs->index->rd_istemp)
+	if (RelationNeedsWAL(gvs->index))
 	{
 		XLogRecPtr	recptr;
 		XLogRecData rdata[4];
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 3054f98..a7dc2a5 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -117,7 +117,7 @@ gistbuild(PG_FUNCTION_ARGS)
 
 	MarkBufferDirty(buffer);
 
-	if (!index->rd_istemp)
+	if (RelationNeedsWAL(index))
 	{
 		XLogRecPtr	recptr;
 		XLogRecData rdata;
@@ -403,7 +403,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
 			dist->page = BufferGetPage(dist->buffer);
 		}
 
-		if (!state->r->rd_istemp)
+		if (RelationNeedsWAL(state->r))
 		{
 			XLogRecPtr	recptr;
 			XLogRecData *rdata;
@@ -467,7 +467,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
 
 		MarkBufferDirty(state->stack->buffer);
 
-		if (!state->r->rd_istemp)
+		if (RelationNeedsWAL(state->r))
 		{
 			OffsetNumber noffs = 0,
 						offs[1];
@@ -552,7 +552,7 @@ gistfindleaf(GISTInsertState *state, GISTSTATE *giststate)
 		opaque = GistPageGetOpaque(state->stack->page);
 
 		state->stack->lsn = PageGetLSN(state->stack->page);
-		Assert(state->r->rd_istemp || !XLogRecPtrIsInvalid(state->stack->lsn));
+		Assert(!RelationNeedsWAL(state->r) || !XLogRecPtrIsInvalid(state->stack->lsn));
 
 		if (state->stack->blkno != GIST_ROOT_BLKNO &&
 			XLByteLT(state->stack->parent->lsn, opaque->nsn))
@@ -913,7 +913,7 @@ gistmakedeal(GISTInsertState *state, GISTSTATE *giststate)
 	}
 
 	/* say to xlog that insert is completed */
-	if (state->needInsertComplete && !state->r->rd_istemp)
+	if (state->needInsertComplete && RelationNeedsWAL(state->r))
 		gistxlogInsertCompletion(state->r->rd_node, &(state->key), 1);
 }
 
@@ -1013,7 +1013,7 @@ gistnewroot(Relation r, Buffer buffer, IndexTuple *itup, int len, ItemPointer ke
 
 	MarkBufferDirty(buffer);
 
-	if (!r->rd_istemp)
+	if (RelationNeedsWAL(r))
 	{
 		XLogRecPtr	recptr;
 		XLogRecData *rdata;
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 0ff5ba8..26bdb20 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -248,7 +248,7 @@ gistbulkdelete(PG_FUNCTION_ARGS)
 					PageIndexTupleDelete(page, todelete[i]);
 				GistMarkTuplesDeleted(page);
 
-				if (!rel->rd_istemp)
+				if (RelationNeedsWAL(rel))
 				{
 					XLogRecData *rdata;
 					XLogRecPtr	recptr;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8b064bc..8f368a2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -124,7 +124,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
 	 *
 	 * During a rescan, don't make a new strategy object if we don't have to.
 	 */
-	if (!scan->rs_rd->rd_istemp &&
+	if (!RelationUsesLocalBuffers(scan->rs_rd) &&
 		scan->rs_nblocks > NBuffers / 4)
 	{
 		allow_strat = scan->rs_allow_strat;
@@ -905,7 +905,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
 		elog(ERROR, "could not open relation with OID %u", relationId);
 
 	/* Make note that we've accessed a temporary relation */
-	if (r->rd_istemp)
+	if (RelationUsesLocalBuffers(r))
 		MyXactAccessedTempRel = true;
 
 	pgstat_initstats(r);
@@ -951,7 +951,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
 		elog(ERROR, "could not open relation with OID %u", relationId);
 
 	/* Make note that we've accessed a temporary relation */
-	if (r->rd_istemp)
+	if (RelationUsesLocalBuffers(r))
 		MyXactAccessedTempRel = true;
 
 	pgstat_initstats(r);
@@ -1917,7 +1917,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!(options & HEAP_INSERT_SKIP_WAL) && !relation->rd_istemp)
+	if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
 	{
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
@@ -2227,7 +2227,7 @@ l1:
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!relation->rd_istemp)
+	if (RelationNeedsWAL(relation))
 	{
 		xl_heap_delete xlrec;
 		XLogRecPtr	recptr;
@@ -2780,7 +2780,7 @@ l2:
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!relation->rd_istemp)
+	if (RelationNeedsWAL(relation))
 	{
 		XLogRecPtr	recptr = log_heap_update(relation, buffer, oldtup.t_self,
 											 newbuf, heaptup,
@@ -3403,7 +3403,7 @@ l3:
 	 * (Also, in a PITR log-shipping or 2PC environment, we have to have XLOG
 	 * entries for everything anyway.)
 	 */
-	if (!relation->rd_istemp)
+	if (RelationNeedsWAL(relation))
 	{
 		xl_heap_lock xlrec;
 		XLogRecPtr	recptr;
@@ -3505,7 +3505,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!relation->rd_istemp)
+	if (RelationNeedsWAL(relation))
 	{
 		xl_heap_inplace xlrec;
 		XLogRecPtr	recptr;
@@ -3852,8 +3852,8 @@ log_heap_clean(Relation reln, Buffer buffer,
 	XLogRecPtr	recptr;
 	XLogRecData rdata[4];
 
-	/* Caller should not call me on a temp relation */
-	Assert(!reln->rd_istemp);
+	/* Caller should not call me on a non-WAL-logged relation */
+	Assert(RelationNeedsWAL(reln));
 
 	xlrec.node = reln->rd_node;
 	xlrec.block = BufferGetBlockNumber(buffer);
@@ -3935,8 +3935,8 @@ log_heap_freeze(Relation reln, Buffer buffer,
 	XLogRecPtr	recptr;
 	XLogRecData rdata[2];
 
-	/* Caller should not call me on a temp relation */
-	Assert(!reln->rd_istemp);
+	/* Caller should not call me on a non-WAL-logged relation */
+	Assert(RelationNeedsWAL(reln));
 	/* nor when there are no tuples to freeze */
 	Assert(offcnt > 0);
 
@@ -3981,8 +3981,8 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
 	XLogRecData rdata[4];
 	Page		page = BufferGetPage(newbuf);
 
-	/* Caller should not call me on a temp relation */
-	Assert(!reln->rd_istemp);
+	/* Caller should not call me on a non-WAL-logged relation */
+	Assert(RelationNeedsWAL(reln));
 
 	if (HeapTupleIsHeapOnly(newtup))
 		info = XLOG_HEAP_HOT_UPDATE;
@@ -4982,7 +4982,7 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
  *	heap_sync		- sync a heap, for use when no WAL has been written
  *
  * This forces the heap contents (including TOAST heap if any) down to disk.
- * If we skipped using WAL, and it's not a temp relation, we must force the
+ * If we skipped using WAL, and WAL is otherwise needed, we must force the
  * relation down to disk before it's safe to commit the transaction.  This
  * requires writing out any dirty buffers and then doing a forced fsync.
  *
@@ -4995,8 +4995,8 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
 void
 heap_sync(Relation rel)
 {
-	/* temp tables never need fsync */
-	if (rel->rd_istemp)
+	/* non-WAL-logged tables never need fsync */
+	if (!RelationNeedsWAL(rel))
 		return;
 
 	/* main heap */
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b8c4027..40eadb8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 		/*
 		 * Emit a WAL HEAP_CLEAN record showing what we did
 		 */
-		if (!relation->rd_istemp)
+		if (RelationNeedsWAL(relation))
 		{
 			XLogRecPtr	recptr;
 
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 19ca302..eb2dbff 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -277,8 +277,8 @@ end_heap_rewrite(RewriteState state)
 	}
 
 	/*
-	 * If the rel isn't temp, must fsync before commit.  We use heap_sync to
-	 * ensure that the toast table gets fsync'd too.
+	 * If the rel is WAL-logged, must fsync before commit.  We use heap_sync
+	 * to ensure that the toast table gets fsync'd too.
 	 *
 	 * It's obvious that we must do this when not WAL-logging. It's less
 	 * obvious that we have to do it even if we did WAL-log the pages. The
@@ -287,7 +287,7 @@ end_heap_rewrite(RewriteState state)
 	 * occurring during the rewriteheap operation won't have fsync'd data we
 	 * wrote before the checkpoint.
 	 */
-	if (!state->rs_new_rel->rd_istemp)
+	if (RelationNeedsWAL(state->rs_new_rel))
 		heap_sync(state->rs_new_rel);
 
 	/* Deleting the context frees everything */
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index eaad812..ee0f04c 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -766,7 +766,7 @@ _bt_insertonpg(Relation rel,
 		}
 
 		/* XLOG stuff */
-		if (!rel->rd_istemp)
+		if (RelationNeedsWAL(rel))
 		{
 			xl_btree_insert xlrec;
 			BlockNumber xldownlink;
@@ -1165,7 +1165,7 @@ _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,
 	}
 
 	/* XLOG stuff */
-	if (!rel->rd_istemp)
+	if (RelationNeedsWAL(rel))
 	{
 		xl_btree_split xlrec;
 		uint8		xlinfo;
@@ -1914,7 +1914,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
 	MarkBufferDirty(metabuf);
 
 	/* XLOG stuff */
-	if (!rel->rd_istemp)
+	if (RelationNeedsWAL(rel))
 	{
 		xl_btree_newroot xlrec;
 		XLogRecPtr	recptr;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index e0c0f21..2b44780 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -224,7 +224,7 @@ _bt_getroot(Relation rel, int access)
 		MarkBufferDirty(metabuf);
 
 		/* XLOG stuff */
-		if (!rel->rd_istemp)
+		if (RelationNeedsWAL(rel))
 		{
 			xl_btree_newroot xlrec;
 			XLogRecPtr	recptr;
@@ -452,7 +452,7 @@ _bt_checkpage(Relation rel, Buffer buf)
 static void
 _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedXid)
 {
-	if (rel->rd_istemp)
+	if (!RelationNeedsWAL(rel))
 		return;
 
 	/* No ereport(ERROR) until changes are logged */
@@ -751,7 +751,7 @@ _bt_delitems_vacuum(Relation rel, Buffer buf,
 	MarkBufferDirty(buf);
 
 	/* XLOG stuff */
-	if (!rel->rd_istemp)
+	if (RelationNeedsWAL(rel))
 	{
 		XLogRecPtr	recptr;
 		XLogRecData rdata[2];
@@ -829,7 +829,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
 	MarkBufferDirty(buf);
 
 	/* XLOG stuff */
-	if (!rel->rd_istemp)
+	if (RelationNeedsWAL(rel))
 	{
 		XLogRecPtr	recptr;
 		XLogRecData rdata[3];
@@ -1365,7 +1365,7 @@ _bt_pagedel(Relation rel, Buffer buf, BTStack stack)
 		MarkBufferDirty(lbuf);
 
 	/* XLOG stuff */
-	if (!rel->rd_istemp)
+	if (RelationNeedsWAL(rel))
 	{
 		xl_btree_delete_page xlrec;
 		xl_btree_metadata xlmeta;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index a1d3aef..3fb43a2 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -211,9 +211,9 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
 
 	/*
 	 * We need to log index creation in WAL iff WAL archiving/streaming is
-	 * enabled AND it's not a temp index.
+	 * enabled UNLESS the index isn't WAL-logged anyway.
 	 */
-	wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;
+	wstate.btws_use_wal = XLogIsNeeded() && RelationNeedsWAL(wstate.index);
 
 	/* reserve the metapage */
 	wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
@@ -797,9 +797,9 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
 	_bt_uppershutdown(wstate, state);
 
 	/*
-	 * If the index isn't temp, we must fsync it down to disk before it's safe
-	 * to commit the transaction.  (For a temp index we don't care since the
-	 * index will be uninteresting after a crash anyway.)
+	 * If the index is WAL-logged, we must fsync it down to disk before it's
+	 * safe to commit the transaction.  (For a non-WAL-logged index we don't
+	 * care since the index will be uninteresting after a crash anyway.)
 	 *
 	 * It's obvious that we must do this when not WAL-logging the build. It's
 	 * less obvious that we have to do it even if we did WAL-log the index
@@ -811,7 +811,7 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
 	 * fsync those pages here, they might still not be on disk when the crash
 	 * occurs.
 	 */
-	if (!wstate->index->rd_istemp)
+	if (RelationNeedsWAL(wstate->index))
 	{
 		RelationOpenSmgr(wstate->index);
 		smgrimmedsync(wstate->index->rd_smgr, MAIN_FORKNUM);
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index e475403..73ef114 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -219,6 +219,7 @@ Boot_CreateStmt:
 												   $3,
 												   tupdesc,
 												   RELKIND_RELATION,
+												   RELPERSISTENCE_PERMANENT,
 												   shared_relation,
 												   mapped_relation,
 												   true);
@@ -238,6 +239,7 @@ Boot_CreateStmt:
 													  tupdesc,
 													  NIL,
 													  RELKIND_RELATION,
+													  RELPERSISTENCE_PERMANENT,
 													  shared_relation,
 													  mapped_relation,
 													  true,
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6322512..88b5c2a 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -524,12 +524,26 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
  * created by bootstrap have preassigned OIDs, so there's no need.
  */
 Oid
-GetNewRelFileNode(Oid reltablespace, Relation pg_class, BackendId backend)
+GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
 {
 	RelFileNodeBackend rnode;
 	char	   *rpath;
 	int			fd;
 	bool		collides;
+	BackendId	backend;
+
+	switch (relpersistence)
+	{
+		case RELPERSISTENCE_TEMP:
+			backend = MyBackendId;
+			break;
+		case RELPERSISTENCE_PERMANENT:
+			backend = InvalidBackendId;
+			break;
+		default:
+			elog(ERROR, "invalid relpersistence: %c", relpersistence);
+			return InvalidOid;	/* placate compiler */
+	}
 
 	/* This logic should match RelationInitPhysicalAddr */
 	rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index dcc53e1..cda9000 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -237,6 +237,7 @@ heap_create(const char *relname,
 			Oid relid,
 			TupleDesc tupDesc,
 			char relkind,
+			char relpersistence,
 			bool shared_relation,
 			bool mapped_relation,
 			bool allow_system_table_mods)
@@ -310,7 +311,8 @@ heap_create(const char *relname,
 									 relid,
 									 reltablespace,
 									 shared_relation,
-									 mapped_relation);
+									 mapped_relation,
+									 relpersistence);
 
 	/*
 	 * Have the storage manager create the relation's disk file, if needed.
@@ -321,7 +323,7 @@ heap_create(const char *relname,
 	if (create_storage)
 	{
 		RelationOpenSmgr(rel);
-		RelationCreateStorage(rel->rd_node, rel->rd_istemp);
+		RelationCreateStorage(rel->rd_node, relpersistence);
 	}
 
 	return rel;
@@ -692,7 +694,7 @@ InsertPgClassTuple(Relation pg_class_desc,
 	values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
 	values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
 	values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
-	values[Anum_pg_class_relistemp - 1] = BoolGetDatum(rd_rel->relistemp);
+	values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
 	values[Anum_pg_class_relkind - 1] = CharGetDatum(rd_rel->relkind);
 	values[Anum_pg_class_relnatts - 1] = Int16GetDatum(rd_rel->relnatts);
 	values[Anum_pg_class_relchecks - 1] = Int16GetDatum(rd_rel->relchecks);
@@ -897,6 +899,7 @@ heap_create_with_catalog(const char *relname,
 						 TupleDesc tupdesc,
 						 List *cooked_constraints,
 						 char relkind,
+						 char relpersistence,
 						 bool shared_relation,
 						 bool mapped_relation,
 						 bool oidislocal,
@@ -996,8 +999,7 @@ heap_create_with_catalog(const char *relname,
 		}
 		else
 			relid = GetNewRelFileNode(reltablespace, pg_class_desc,
-									  isTempOrToastNamespace(relnamespace) ?
-										  MyBackendId : InvalidBackendId);
+									  relpersistence);
 	}
 
 	/*
@@ -1035,6 +1037,7 @@ heap_create_with_catalog(const char *relname,
 							   relid,
 							   tupdesc,
 							   relkind,
+							   relpersistence,
 							   shared_relation,
 							   mapped_relation,
 							   allow_system_table_mods);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b437c99..8fbe8eb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -545,6 +545,7 @@ index_create(Oid heapRelationId,
 	bool		is_exclusion;
 	Oid			namespaceId;
 	int			i;
+	char		relpersistence;
 
 	is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
 
@@ -561,11 +562,13 @@ index_create(Oid heapRelationId,
 	/*
 	 * The index will be in the same namespace as its parent table, and is
 	 * shared across databases if and only if the parent is.  Likewise, it
-	 * will use the relfilenode map if and only if the parent does.
+	 * will use the relfilenode map if and only if the parent does; and it
+	 * inherits the parent's relpersistence.
 	 */
 	namespaceId = RelationGetNamespace(heapRelation);
 	shared_relation = heapRelation->rd_rel->relisshared;
 	mapped_relation = RelationIsMapped(heapRelation);
+	relpersistence = heapRelation->rd_rel->relpersistence;
 
 	/*
 	 * check parameters
@@ -646,9 +649,7 @@ index_create(Oid heapRelationId,
 		else
 		{
 			indexRelationId =
-				GetNewRelFileNode(tableSpaceId, pg_class,
-								  heapRelation->rd_istemp ?
-									MyBackendId : InvalidBackendId);
+				GetNewRelFileNode(tableSpaceId, pg_class, relpersistence);
 		}
 	}
 
@@ -663,6 +664,7 @@ index_create(Oid heapRelationId,
 								indexRelationId,
 								indexTupDesc,
 								RELKIND_INDEX,
+								relpersistence,
 								shared_relation,
 								mapped_relation,
 								allow_system_table_mods);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 3727146..aa37097 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -235,14 +235,14 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
 	}
 
 	/*
-	 * If istemp is set, this is a reference to a temp relation.  The parser
-	 * never generates such a RangeVar in simple DML, but it can happen in
-	 * contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)".  Such a
-	 * command will generate an added CREATE INDEX operation, which must be
+	 * Some non-default relpersistence value may have been specified.  The
+	 * parser never generates such a RangeVar in simple DML, but it can happen
+	 * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)".  Such
+	 * a command will generate an added CREATE INDEX operation, which must be
 	 * careful to find the temp table, even when pg_temp is not first in the
 	 * search path.
 	 */
-	if (relation->istemp)
+	if (relation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		if (relation->schemaname)
 			ereport(ERROR,
@@ -308,7 +308,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 							newRelation->relname)));
 	}
 
-	if (newRelation->istemp)
+	if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		/* TEMP tables are created in our backend-local temp namespace */
 		if (newRelation->schemaname)
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 0ce2051..671aaff 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -95,19 +95,35 @@ typedef struct xl_smgr_truncate
  * transaction aborts later on, the storage will be destroyed.
  */
 void
-RelationCreateStorage(RelFileNode rnode, bool istemp)
+RelationCreateStorage(RelFileNode rnode, char relpersistence)
 {
 	PendingRelDelete *pending;
 	XLogRecPtr	lsn;
 	XLogRecData rdata;
 	xl_smgr_create xlrec;
 	SMgrRelation srel;
-	BackendId	backend = istemp ? MyBackendId : InvalidBackendId;
+	BackendId	backend;
+	bool		needs_wal;
+
+	switch (relpersistence)
+	{
+		case RELPERSISTENCE_TEMP:
+			backend = MyBackendId;
+			needs_wal = false;
+			break;
+		case RELPERSISTENCE_PERMANENT:
+			backend = InvalidBackendId;
+			needs_wal = true;
+			break;
+		default:
+			elog(ERROR, "invalid relpersistence: %c", relpersistence);
+			return;			/* placate compiler */
+	}
 
 	srel = smgropen(rnode, backend);
 	smgrcreate(srel, MAIN_FORKNUM, false);
 
-	if (!istemp)
+	if (needs_wal)
 	{
 		/*
 		 * Make an XLOG entry reporting the file creation.
@@ -253,7 +269,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	 * failure to truncate, that might spell trouble at WAL replay, into a
 	 * certain PANIC.
 	 */
-	if (!rel->rd_istemp)
+	if (RelationNeedsWAL(rel))
 	{
 		/*
 		 * Make an XLOG entry reporting the file truncation.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7bf64e2..d1f6c9f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -195,7 +195,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
 	 * Toast tables for regular relations go in pg_toast; those for temp
 	 * relations go into the per-backend temp-toast-table namespace.
 	 */
-	if (rel->rd_backend == MyBackendId)
+	if (RelationUsesTempNamespace(rel))
 		namespaceid = GetTempToastNamespace();
 	else
 		namespaceid = PG_TOAST_NAMESPACE;
@@ -216,6 +216,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
 										   tupdesc,
 										   NIL,
 										   RELKIND_TOASTVALUE,
+										   rel->rd_rel->relpersistence,
 										   shared_relation,
 										   mapped_relation,
 										   true,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index bb7cd74..9fdc471 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -675,6 +675,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace)
 										  tupdesc,
 										  NIL,
 										  OldHeap->rd_rel->relkind,
+										  OldHeap->rd_rel->relpersistence,
 										  false,
 										  RelationIsMapped(OldHeap),
 										  true,
@@ -789,9 +790,9 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 
 	/*
 	 * We need to log the copied data in WAL iff WAL archiving/streaming is
-	 * enabled AND it's not a temp rel.
+	 * enabled AND it's not a WAL-logged rel.
 	 */
-	use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;
+	use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
 
 	/* use_wal off requires smgr_targblock be initially invalid */
 	Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9407d0f..0940893 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -222,7 +222,7 @@ DefineIndex(RangeVar *heapRelation,
 	}
 	else
 	{
-		tablespaceId = GetDefaultTablespace(rel->rd_istemp);
+		tablespaceId = GetDefaultTablespace(rel->rd_rel->relpersistence);
 		/* note InvalidOid is OK in this case */
 	}
 
@@ -1706,7 +1706,7 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
 			continue;
 
 		/* Skip temp tables of other backends; we can't reindex them at all */
-		if (classtuple->relistemp &&
+		if (classtuple->relpersistence == RELPERSISTENCE_TEMP &&
 			!isTempNamespace(classtuple->relnamespace))
 			continue;
 
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 62d1fbf..ef52a35 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -274,7 +274,7 @@ DefineSequence(CreateSeqStmt *seq)
 	MarkBufferDirty(buf);
 
 	/* XLOG stuff */
-	if (!rel->rd_istemp)
+	if (RelationNeedsWAL(rel))
 	{
 		xl_seq_rec	xlrec;
 		XLogRecPtr	recptr;
@@ -379,7 +379,7 @@ AlterSequenceInternal(Oid relid, List *options)
 	MarkBufferDirty(buf);
 
 	/* XLOG stuff */
-	if (!seqrel->rd_istemp)
+	if (RelationNeedsWAL(seqrel))
 	{
 		xl_seq_rec	xlrec;
 		XLogRecPtr	recptr;
@@ -609,7 +609,7 @@ nextval_internal(Oid relid)
 	MarkBufferDirty(buf);
 
 	/* XLOG stuff */
-	if (logit && !seqrel->rd_istemp)
+	if (logit && RelationNeedsWAL(seqrel))
 	{
 		xl_seq_rec	xlrec;
 		XLogRecPtr	recptr;
@@ -786,7 +786,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
 	MarkBufferDirty(buf);
 
 	/* XLOG stuff */
-	if (!seqrel->rd_istemp)
+	if (RelationNeedsWAL(seqrel))
 	{
 		xl_seq_rec	xlrec;
 		XLogRecPtr	recptr;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6ec8a85..6252622 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -223,7 +223,7 @@ static const struct dropmsgstrings dropmsgstringarray[] = {
 
 
 static void truncate_check_rel(Relation rel);
-static List *MergeAttributes(List *schema, List *supers, bool istemp,
+static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 				List **supOids, List **supconstr, int *supOidCount);
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static bool change_varattnos_walker(Node *node, const AttrNumber *newattno);
@@ -334,7 +334,7 @@ static void ATPrepAddInherit(Relation child_rel);
 static void ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode);
 static void ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode);
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
-				   ForkNumber forkNum, bool istemp);
+				   ForkNumber forkNum, char relpersistence);
 static const char *storage_name(char c);
 
 
@@ -386,7 +386,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 	/*
 	 * Check consistency of arguments
 	 */
-	if (stmt->oncommit != ONCOMMIT_NOOP && !stmt->relation->istemp)
+	if (stmt->oncommit != ONCOMMIT_NOOP 
+		&& stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 				 errmsg("ON COMMIT can only be used on temporary tables")));
@@ -396,7 +397,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 	 * code.  This is needed because calling code might not expect untrusted
 	 * tables to appear in pg_temp at the front of its search path.
 	 */
-	if (stmt->relation->istemp && InSecurityRestrictedOperation())
+	if (stmt->relation->relpersistence == RELPERSISTENCE_TEMP
+		&& InSecurityRestrictedOperation())
 		ereport(ERROR,
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("cannot create temporary table within security-restricted operation")));
@@ -429,7 +431,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 	}
 	else
 	{
-		tablespaceId = GetDefaultTablespace(stmt->relation->istemp);
+		tablespaceId = GetDefaultTablespace(stmt->relation->relpersistence);
 		/* note InvalidOid is OK in this case */
 	}
 
@@ -473,7 +475,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 	 * inherited attributes.
 	 */
 	schema = MergeAttributes(schema, stmt->inhRelations,
-							 stmt->relation->istemp,
+							 stmt->relation->relpersistence,
 							 &inheritOids, &old_constraints, &parentOidCount);
 
 	/*
@@ -552,6 +554,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
 										  list_concat(cookedDefaults,
 													  old_constraints),
 										  relkind,
+										  stmt->relation->relpersistence,
 										  false,
 										  false,
 										  localHasOids,
@@ -1213,7 +1216,7 @@ storage_name(char c)
  *----------
  */
 static List *
-MergeAttributes(List *schema, List *supers, bool istemp,
+MergeAttributes(List *schema, List *supers, char relpersistence,
 				List **supOids, List **supconstr, int *supOidCount)
 {
 	ListCell   *entry;
@@ -1321,7 +1324,8 @@ MergeAttributes(List *schema, List *supers, bool istemp,
 					 errmsg("inherited relation \"%s\" is not a table",
 							parent->relname)));
 		/* Permanent rels cannot inherit from temporary ones */
-		if (!istemp && relation->rd_istemp)
+		if (relpersistence != RELPERSISTENCE_TEMP
+			&& RelationUsesTempNamespace(relation))
 			ereport(ERROR,
 					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 					 errmsg("cannot inherit from temporary relation \"%s\"",
@@ -5062,23 +5066,23 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
 						RelationGetRelationName(pkrel))));
 
 	/*
-	 * Disallow reference from permanent table to temp table or vice versa.
-	 * (The ban on perm->temp is for fairly obvious reasons.  The ban on
-	 * temp->perm is because other backends might need to run the RI triggers
-	 * on the perm table, but they can't reliably see tuples the owning
-	 * backend has created in the temp table, because non-shared buffers are
-	 * used for temp tables.)
+	 * References from permanent tables to temp tables are disallowed because
+	 * the contents of the temp table disappear at the end of each session.
+	 * References from temp tables to permanent tables are also disallowed,
+	 * because other backends might need to run the RI triggers on the perm
+	 * table, but they can't reliably see tuples in the local buffers of other
+	 * backends.
 	 */
-	if (pkrel->rd_istemp)
+	if (RelationUsesLocalBuffers(pkrel))
 	{
-		if (!rel->rd_istemp)
+		if (!RelationUsesLocalBuffers(rel))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 					 errmsg("cannot reference temporary table from permanent table constraint")));
 	}
 	else
 	{
-		if (rel->rd_istemp)
+		if (RelationUsesLocalBuffers(rel))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 					 errmsg("cannot reference permanent table from temporary table constraint")));
@@ -7285,7 +7289,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	 * Relfilenodes are not unique across tablespaces, so we need to allocate
 	 * a new one in the new tablespace.
 	 */
-	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL, rel->rd_backend);
+	newrelfilenode = GetNewRelFileNode(newTableSpace, NULL,
+									   rel->rd_rel->relpersistence);
 
 	/* Open old and new relation */
 	newrnode = rel->rd_node;
@@ -7302,10 +7307,11 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	 * NOTE: any conflict in relfilenode value will be caught in
 	 * RelationCreateStorage().
 	 */
-	RelationCreateStorage(newrnode, rel->rd_istemp);
+	RelationCreateStorage(newrnode, rel->rd_rel->relpersistence);
 
 	/* copy main fork */
-	copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM, rel->rd_istemp);
+	copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM,
+					   rel->rd_rel->relpersistence);
 
 	/* copy those extra forks that exist */
 	for (forkNum = MAIN_FORKNUM + 1; forkNum <= MAX_FORKNUM; forkNum++)
@@ -7313,7 +7319,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 		if (smgrexists(rel->rd_smgr, forkNum))
 		{
 			smgrcreate(dstrel, forkNum, false);
-			copy_relation_data(rel->rd_smgr, dstrel, forkNum, rel->rd_istemp);
+			copy_relation_data(rel->rd_smgr, dstrel, forkNum,
+							   rel->rd_rel->relpersistence);
 		}
 	}
 
@@ -7348,7 +7355,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
  */
 static void
 copy_relation_data(SMgrRelation src, SMgrRelation dst,
-				   ForkNumber forkNum, bool istemp)
+				   ForkNumber forkNum, char relpersistence)
 {
 	char	   *buf;
 	Page		page;
@@ -7367,9 +7374,9 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
 
 	/*
 	 * We need to log the copied data in WAL iff WAL archiving/streaming is
-	 * enabled AND it's not a temp rel.
+	 * enabled AND it's a permanent relation.
 	 */
-	use_wal = XLogIsNeeded() && !istemp;
+	use_wal = XLogIsNeeded() && relpersistence == RELPERSISTENCE_PERMANENT;
 
 	nblocks = smgrnblocks(src, forkNum);
 
@@ -7408,7 +7415,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
 	 * wouldn't replay our earlier WAL entries. If we do not fsync those pages
 	 * here, they might still not be on disk when the crash occurs.
 	 */
-	if (!istemp)
+	if (relpersistence == RELPERSISTENCE_PERMANENT)
 		smgrimmedsync(dst, forkNum);
 }
 
@@ -7476,7 +7483,8 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
 	ATSimplePermissions(parent_rel, false, false);
 
 	/* Permanent rels cannot inherit from temporary ones */
-	if (parent_rel->rd_istemp && !child_rel->rd_istemp)
+	if (RelationUsesTempNamespace(parent_rel)
+		&& !RelationUsesTempNamespace(child_rel))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 				 errmsg("cannot inherit from temporary relation \"%s\"",
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 590eee5..c8192a3 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1045,8 +1045,8 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
 /*
  * GetDefaultTablespace -- get the OID of the current default tablespace
  *
- * Regular objects and temporary objects have different default tablespaces,
- * hence the forTemp parameter must be specified.
+ * Temporary objects have different default tablespaces, hence the
+ * relpersistence parameter must be specified.
  *
  * May return InvalidOid to indicate "use the database's default tablespace".
  *
@@ -1057,12 +1057,12 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
  * default_tablespace GUC variable.
  */
 Oid
-GetDefaultTablespace(bool forTemp)
+GetDefaultTablespace(char relpersistence)
 {
 	Oid			result;
 
 	/* The temp-table case is handled elsewhere */
-	if (forTemp)
+	if (relpersistence == RELPERSISTENCE_TEMP)
 	{
 		PrepareTempTablespaces();
 		return GetNextTempTableSpace();
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 0ac993f..cbdf97d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -268,10 +268,10 @@ static void
 vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 {
 	/*
-	 * No need to log changes for temp tables, they do not contain data
-	 * visible on the standby server.
+	 * Skip this for relations for which no WAL is to be written, or if we're
+	 * not trying to support archive recovery.
 	 */
-	if (rel->rd_istemp || !XLogIsNeeded())
+	if (!RelationNeedsWAL(rel) || !XLogIsNeeded())
 		return;
 
 	/*
@@ -664,8 +664,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		if (nfrozen > 0)
 		{
 			MarkBufferDirty(buf);
-			/* no XLOG for temp tables, though */
-			if (!onerel->rd_istemp)
+			if (RelationNeedsWAL(onerel))
 			{
 				XLogRecPtr	recptr;
 
@@ -895,7 +894,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!onerel->rd_istemp)
+	if (RelationNeedsWAL(onerel))
 	{
 		XLogRecPtr	recptr;
 
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 09ab24b..2b2b908 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -68,10 +68,10 @@ isViewOnTempTable_walker(Node *node, void *context)
 			if (rte->rtekind == RTE_RELATION)
 			{
 				Relation	rel = heap_open(rte->relid, AccessShareLock);
-				bool		istemp = rel->rd_istemp;
+				char		relpersistence = rel->rd_rel->relpersistence;
 
 				heap_close(rel, AccessShareLock);
-				if (istemp)
+				if (relpersistence == RELPERSISTENCE_TEMP)
 					return true;
 			}
 		}
@@ -173,9 +173,9 @@ DefineVirtualRelation(const RangeVar *relation, List *tlist, bool replace)
 		/*
 		 * Due to the namespace visibility rules for temporary objects, we
 		 * should only end up replacing a temporary view with another
-		 * temporary view, and vice versa.
+		 * temporary view, and similarly for permanent views.
 		 */
-		Assert(relation->istemp == rel->rd_istemp);
+		Assert(relation->relpersistence == rel->rd_rel->relpersistence);
 
 		/*
 		 * Create a tuple descriptor to compare against the existing view, and
@@ -454,10 +454,11 @@ DefineView(ViewStmt *stmt, const char *queryString)
 	 * schema name.
 	 */
 	view = stmt->view;
-	if (!view->istemp && isViewOnTempTable(viewParse))
+	if (view->relpersistence == RELPERSISTENCE_PERMANENT
+		&& isViewOnTempTable(viewParse))
 	{
 		view = copyObject(view);	/* don't corrupt original command */
-		view->istemp = true;
+		view->relpersistence = RELPERSISTENCE_TEMP;
 		ereport(NOTICE,
 				(errmsg("view \"%s\" will be a temporary view",
 						view->relname)));
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 69f3a28..c4719f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2131,7 +2131,8 @@ OpenIntoRel(QueryDesc *queryDesc)
 	/*
 	 * Check consistency of arguments
 	 */
-	if (into->onCommit != ONCOMMIT_NOOP && !into->rel->istemp)
+	if (into->onCommit != ONCOMMIT_NOOP
+		&& into->rel->relpersistence != RELPERSISTENCE_TEMP)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 				 errmsg("ON COMMIT can only be used on temporary tables")));
@@ -2141,7 +2142,8 @@ OpenIntoRel(QueryDesc *queryDesc)
 	 * code.  This is needed because calling code might not expect untrusted
 	 * tables to appear in pg_temp at the front of its search path.
 	 */
-	if (into->rel->istemp && InSecurityRestrictedOperation())
+	if (into->rel->relpersistence == RELPERSISTENCE_TEMP
+		&& InSecurityRestrictedOperation())
 		ereport(ERROR,
 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 				 errmsg("cannot create temporary table within security-restricted operation")));
@@ -2168,7 +2170,7 @@ OpenIntoRel(QueryDesc *queryDesc)
 	}
 	else
 	{
-		tablespaceId = GetDefaultTablespace(into->rel->istemp);
+		tablespaceId = GetDefaultTablespace(into->rel->relpersistence);
 		/* note InvalidOid is OK in this case */
 	}
 
@@ -2208,6 +2210,7 @@ OpenIntoRel(QueryDesc *queryDesc)
 											  tupdesc,
 											  NIL,
 											  RELKIND_RELATION,
+											  into->rel->relpersistence,
 											  false,
 											  false,
 											  true,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e91044b..32aafc8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -955,7 +955,7 @@ _copyRangeVar(RangeVar *from)
 	COPY_STRING_FIELD(schemaname);
 	COPY_STRING_FIELD(relname);
 	COPY_SCALAR_FIELD(inhOpt);
-	COPY_SCALAR_FIELD(istemp);
+	COPY_SCALAR_FIELD(relpersistence);
 	COPY_NODE_FIELD(alias);
 	COPY_LOCATION_FIELD(location);
 
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 73b28f9..1f7b5f3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -104,7 +104,7 @@ _equalRangeVar(RangeVar *a, RangeVar *b)
 	COMPARE_STRING_FIELD(schemaname);
 	COMPARE_STRING_FIELD(relname);
 	COMPARE_SCALAR_FIELD(inhOpt);
-	COMPARE_SCALAR_FIELD(istemp);
+	COMPARE_SCALAR_FIELD(relpersistence);
 	COMPARE_NODE_FIELD(alias);
 	COMPARE_LOCATION_FIELD(location);
 
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 4b268f3..f06f73b 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -15,6 +15,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -378,7 +379,7 @@ makeRangeVar(char *schemaname, char *relname, int location)
 	r->schemaname = schemaname;
 	r->relname = relname;
 	r->inhOpt = INH_DEFAULT;
-	r->istemp = false;
+	r->relpersistence = RELPERSISTENCE_PERMANENT;
 	r->alias = NULL;
 	r->location = location;
 
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 61aea61..66a5f33 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -839,7 +839,7 @@ _outRangeVar(StringInfo str, RangeVar *node)
 	WRITE_STRING_FIELD(schemaname);
 	WRITE_STRING_FIELD(relname);
 	WRITE_ENUM_FIELD(inhOpt, InhOption);
-	WRITE_BOOL_FIELD(istemp);
+	WRITE_CHAR_FIELD(relpersistence);
 	WRITE_NODE_FIELD(alias);
 	WRITE_LOCATION_FIELD(location);
 }
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2166a5d..933d58a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -373,7 +373,7 @@ _readRangeVar(void)
 	READ_STRING_FIELD(schemaname);
 	READ_STRING_FIELD(relname);
 	READ_ENUM_FIELD(inhOpt, InhOption);
-	READ_BOOL_FIELD(istemp);
+	READ_CHAR_FIELD(relpersistence);
 	READ_NODE_FIELD(alias);
 	READ_LOCATION_FIELD(location);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 1394b21..06707da 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -311,7 +311,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
 %type <fun_param_mode> arg_class
 %type <typnam>	func_return func_type
 
-%type <boolean>  OptTemp opt_trusted opt_restart_seqs
+%type <boolean>  opt_trusted opt_restart_seqs
+%type <ival>	 OptTemp
 %type <oncommit> OnCommitOption
 
 %type <node>	for_locking_item
@@ -2278,7 +2279,7 @@ CreateStmt:	CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
 			OptInherit OptWith OnCommitOption OptTableSpace
 				{
 					CreateStmt *n = makeNode(CreateStmt);
-					$4->istemp = $2;
+					$4->relpersistence = $2;
 					n->relation = $4;
 					n->tableElts = $6;
 					n->inhRelations = $8;
@@ -2294,7 +2295,7 @@ CreateStmt:	CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
 			OptTableSpace
 				{
 					CreateStmt *n = makeNode(CreateStmt);
-					$7->istemp = $2;
+					$7->relpersistence = $2;
 					n->relation = $7;
 					n->tableElts = $9;
 					n->inhRelations = $11;
@@ -2309,7 +2310,7 @@ CreateStmt:	CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
 			OptTypedTableElementList OptWith OnCommitOption OptTableSpace
 				{
 					CreateStmt *n = makeNode(CreateStmt);
-					$4->istemp = $2;
+					$4->relpersistence = $2;
 					n->relation = $4;
 					n->tableElts = $7;
 					n->ofTypename = makeTypeNameFromNameList($6);
@@ -2325,7 +2326,7 @@ CreateStmt:	CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
 			OptTypedTableElementList OptWith OnCommitOption OptTableSpace
 				{
 					CreateStmt *n = makeNode(CreateStmt);
-					$7->istemp = $2;
+					$7->relpersistence = $2;
 					n->relation = $7;
 					n->tableElts = $10;
 					n->ofTypename = makeTypeNameFromNameList($9);
@@ -2346,13 +2347,13 @@ CreateStmt:	CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
  * NOTE: we accept both GLOBAL and LOCAL options; since we have no modules
  * the LOCAL keyword is really meaningless.
  */
-OptTemp:	TEMPORARY						{ $$ = TRUE; }
-			| TEMP							{ $$ = TRUE; }
-			| LOCAL TEMPORARY				{ $$ = TRUE; }
-			| LOCAL TEMP					{ $$ = TRUE; }
-			| GLOBAL TEMPORARY				{ $$ = TRUE; }
-			| GLOBAL TEMP					{ $$ = TRUE; }
-			| /*EMPTY*/						{ $$ = FALSE; }
+OptTemp:	TEMPORARY					{ $$ = RELPERSISTENCE_TEMP; }
+			| TEMP						{ $$ = RELPERSISTENCE_TEMP; }
+			| LOCAL TEMPORARY			{ $$ = RELPERSISTENCE_TEMP; }
+			| LOCAL TEMP				{ $$ = RELPERSISTENCE_TEMP; }
+			| GLOBAL TEMPORARY			{ $$ = RELPERSISTENCE_TEMP; }
+			| GLOBAL TEMP				{ $$ = RELPERSISTENCE_TEMP; }
+			| /*EMPTY*/					{ $$ = RELPERSISTENCE_PERMANENT; }
 		;
 
 OptTableElementList:
@@ -2832,7 +2833,7 @@ CreateAsStmt:
 								(errcode(ERRCODE_SYNTAX_ERROR),
 								 errmsg("CREATE TABLE AS cannot specify INTO"),
 								 parser_errposition(exprLocation((Node *) n->intoClause))));
-					$4->rel->istemp = $2;
+					$4->rel->relpersistence = $2;
 					n->intoClause = $4;
 					/* Implement WITH NO DATA by forcing top-level LIMIT 0 */
 					if (!$7)
@@ -2898,7 +2899,7 @@ CreateSeqStmt:
 			CREATE OptTemp SEQUENCE qualified_name OptSeqOptList
 				{
 					CreateSeqStmt *n = makeNode(CreateSeqStmt);
-					$4->istemp = $2;
+					$4->relpersistence = $2;
 					n->sequence = $4;
 					n->options = $5;
 					n->ownerId = InvalidOid;
@@ -6543,7 +6544,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
 				{
 					ViewStmt *n = makeNode(ViewStmt);
 					n->view = $4;
-					n->view->istemp = $2;
+					n->view->relpersistence = $2;
 					n->aliases = $5;
 					n->query = $7;
 					n->replace = false;
@@ -6554,7 +6555,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
 				{
 					ViewStmt *n = makeNode(ViewStmt);
 					n->view = $6;
-					n->view->istemp = $4;
+					n->view->relpersistence = $4;
 					n->aliases = $7;
 					n->query = $9;
 					n->replace = true;
@@ -7250,7 +7251,7 @@ ExecuteStmt: EXECUTE name execute_param_clause
 					ExecuteStmt *n = makeNode(ExecuteStmt);
 					n->name = $7;
 					n->params = $8;
-					$4->rel->istemp = $2;
+					$4->rel->relpersistence = $2;
 					n->into = $4;
 					if ($4->colNames)
 						ereport(ERROR,
@@ -7811,42 +7812,42 @@ OptTempTableName:
 			TEMPORARY opt_table qualified_name
 				{
 					$$ = $3;
-					$$->istemp = true;
+					$$->relpersistence = RELPERSISTENCE_TEMP;
 				}
 			| TEMP opt_table qualified_name
 				{
 					$$ = $3;
-					$$->istemp = true;
+					$$->relpersistence = RELPERSISTENCE_TEMP;
 				}
 			| LOCAL TEMPORARY opt_table qualified_name
 				{
 					$$ = $4;
-					$$->istemp = true;
+					$$->relpersistence = RELPERSISTENCE_TEMP;
 				}
 			| LOCAL TEMP opt_table qualified_name
 				{
 					$$ = $4;
-					$$->istemp = true;
+					$$->relpersistence = RELPERSISTENCE_TEMP;
 				}
 			| GLOBAL TEMPORARY opt_table qualified_name
 				{
 					$$ = $4;
-					$$->istemp = true;
+					$$->relpersistence = RELPERSISTENCE_TEMP;
 				}
 			| GLOBAL TEMP opt_table qualified_name
 				{
 					$$ = $4;
-					$$->istemp = true;
+					$$->relpersistence = RELPERSISTENCE_TEMP;
 				}
 			| TABLE qualified_name
 				{
 					$$ = $2;
-					$$->istemp = false;
+					$$->relpersistence = RELPERSISTENCE_PERMANENT;
 				}
 			| qualified_name
 				{
 					$$ = $1;
-					$$->istemp = false;
+					$$->relpersistence = RELPERSISTENCE_PERMANENT;
 				}
 		;
 
@@ -10838,16 +10839,12 @@ qualified_name_list:
 qualified_name:
 			ColId
 				{
-					$$ = makeNode(RangeVar);
-					$$->catalogname = NULL;
-					$$->schemaname = NULL;
-					$$->relname = $1;
-					$$->location = @1;
+					$$ = makeRangeVar(NULL, $1, @1);
 				}
 			| ColId indirection
 				{
 					check_qualified_name($2, yyscanner);
-					$$ = makeNode(RangeVar);
+					$$ = makeRangeVar(NULL, NULL, @1);
 					switch (list_length($2))
 					{
 						case 1:
@@ -10868,7 +10865,6 @@ qualified_name:
 									 parser_errposition(@1)));
 							break;
 					}
-					$$->location = @1;
 				}
 		;
 
@@ -12085,6 +12081,7 @@ makeRangeVarFromAnyName(List *names, int position, core_yyscan_t yyscanner)
 			break;
 	}
 
+	r->relpersistence = RELPERSISTENCE_PERMANENT;
 	r->location = position;
 
 	return r;
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index a8aee20..aa7c144 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -158,10 +158,11 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	 * If the target relation name isn't schema-qualified, make it so.  This
 	 * prevents some corner cases in which added-on rewritten commands might
 	 * think they should apply to other relations that have the same name and
-	 * are earlier in the search path.	"istemp" is equivalent to a
-	 * specification of pg_temp, so no need for anything extra in that case.
+	 * are earlier in the search path.	But a local temp table is effectively
+	 * specified to be in pg_temp, so no need for anything extra in that case.
 	 */
-	if (stmt->relation->schemaname == NULL && !stmt->relation->istemp)
+	if (stmt->relation->schemaname == NULL
+		&& stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
 	{
 		Oid			namespaceid = RangeVarGetCreationNamespace(stmt->relation);
 
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index a617b88..be7a69a 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1967,7 +1967,7 @@ do_autovacuum(void)
 		 * Check if it is a temp table (presumably, of some other backend's).
 		 * We cannot safely process other backends' temp tables.
 		 */
-		if (classForm->relistemp)
+		if (classForm->relpersistence == RELPERSISTENCE_TEMP)
 		{
 			int			backendID;
 
@@ -2064,7 +2064,7 @@ do_autovacuum(void)
 		/*
 		 * We cannot safely process other backends' temp tables, so skip 'em.
 		 */
-		if (classForm->relistemp)
+		if (classForm->relpersistence == RELPERSISTENCE_TEMP)
 			continue;
 
 		relid = HeapTupleGetOid(tuple);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 54c7109..51d5ec1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -123,7 +123,7 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(reln);
 
-	if (reln->rd_istemp)
+	if (RelationUsesLocalBuffers(reln))
 	{
 		/* see comments in ReadBufferExtended */
 		if (RELATION_IS_OTHER_TEMP(reln))
@@ -2076,7 +2076,7 @@ FlushRelationBuffers(Relation rel)
 	/* Open rel at the smgr level if not already done */
 	RelationOpenSmgr(rel);
 
-	if (rel->rd_istemp)
+	if (RelationUsesLocalBuffers(rel))
 	{
 		for (i = 0; i < NLocBuffer; i++)
 		{
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index f5250a2..e352cda 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -612,16 +612,26 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 	}
 
-	/* If temporary, determine owning backend. */
-	if (!relform->relistemp)
-		backend = InvalidBackendId;
-	else if (isTempOrToastNamespace(relform->relnamespace))
-		backend = MyBackendId;
-	else
+	/* Determine owning backend. */
+	switch (relform->relpersistence)
 	{
-		/* Do it the hard way. */
-		backend = GetTempNamespaceBackendId(relform->relnamespace);
-		Assert(backend != InvalidBackendId);
+		case RELPERSISTENCE_PERMANENT:
+			backend = InvalidBackendId;
+			break;
+		case RELPERSISTENCE_TEMP:
+			if (isTempOrToastNamespace(relform->relnamespace))
+				backend = MyBackendId;
+			else
+			{
+				/* Do it the hard way. */
+				backend = GetTempNamespaceBackendId(relform->relnamespace);
+				Assert(backend != InvalidBackendId);
+			}
+			break;
+		default:
+			elog(ERROR, "invalid relpersistence: %c", relform->relpersistence);
+			backend = InvalidBackendId; 	/* placate compiler */
+			break;
 	}
 
 	ReleaseSysCache(tuple);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 62b745b..12b0f07 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -856,20 +856,30 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	relation->rd_isnailed = false;
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
-	relation->rd_istemp = relation->rd_rel->relistemp;
-	if (!relation->rd_istemp)
-		relation->rd_backend = InvalidBackendId;
-	else if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
-		relation->rd_backend = MyBackendId;
-	else
+	switch (relation->rd_rel->relpersistence)
 	{
-		/*
-		 * If it's a temporary table, but not one of ours, we have to use
-		 * the slow, grotty method to figure out the owning backend.
-		 */
-		relation->rd_backend =
-			GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
-		Assert(relation->rd_backend != InvalidBackendId);
+		case RELPERSISTENCE_PERMANENT:
+			relation->rd_backend = InvalidBackendId;
+			break;
+		case RELPERSISTENCE_TEMP:
+			if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
+				relation->rd_backend = MyBackendId;
+			else
+			{
+				/*
+				 * If it's a local temp table, but not one of ours, we have to
+				 * use the slow, grotty method to figure out the owning
+				 * backend.
+				 */
+				relation->rd_backend =
+					GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
+				Assert(relation->rd_backend != InvalidBackendId);
+			}
+			break;
+		default:
+			elog(ERROR, "invalid relpersistence: %c",
+				 relation->rd_rel->relpersistence);
+			break;
 	}
 
 	/*
@@ -1432,7 +1442,6 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_isnailed = true;
 	relation->rd_createSubid = InvalidSubTransactionId;
 	relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
-	relation->rd_istemp = false;
 	relation->rd_backend = InvalidBackendId;
 
 	/*
@@ -1458,11 +1467,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	if (isshared)
 		relation->rd_rel->reltablespace = GLOBALTABLESPACE_OID;
 
-	/*
-	 * Likewise, we must know if a relation is temp ... but formrdesc is not
-	 * used for any temp relations.
-	 */
-	relation->rd_rel->relistemp = false;
+	/* formrdesc is used only for permanent relations */
+	relation->rd_rel->relpersistence = RELPERSISTENCE_PERMANENT;
 
 	relation->rd_rel->relpages = 1;
 	relation->rd_rel->reltuples = 1;
@@ -2440,7 +2446,8 @@ RelationBuildLocalRelation(const char *relname,
 						   Oid relid,
 						   Oid reltablespace,
 						   bool shared_relation,
-						   bool mapped_relation)
+						   bool mapped_relation,
+						   char relpersistence)
 {
 	Relation	rel;
 	MemoryContext oldcxt;
@@ -2514,10 +2521,6 @@ RelationBuildLocalRelation(const char *relname,
 	/* must flag that we have rels created in this transaction */
 	need_eoxact_work = true;
 
-	/* it is temporary if and only if it is in my temp-table namespace */
-	rel->rd_istemp = isTempOrToastNamespace(relnamespace);
-	rel->rd_backend = rel->rd_istemp ? MyBackendId : InvalidBackendId;
-
 	/*
 	 * create a new tuple descriptor from the one passed in.  We do this
 	 * partly to copy it into the cache context, and partly because the new
@@ -2557,6 +2560,21 @@ RelationBuildLocalRelation(const char *relname,
 	/* needed when bootstrapping: */
 	rel->rd_rel->relowner = BOOTSTRAP_SUPERUSERID;
 
+	/* set up persistence; rd_backend is a function of persistence type */
+	rel->rd_rel->relpersistence = relpersistence;
+	switch (relpersistence)
+	{
+		case RELPERSISTENCE_PERMANENT:
+			rel->rd_backend = InvalidBackendId;
+			break;
+		case RELPERSISTENCE_TEMP:
+			rel->rd_backend = MyBackendId;
+			break;
+		default:
+			elog(ERROR, "invalid relpersistence: %c", relpersistence);
+			break;
+	}
+
 	/*
 	 * Insert relation physical and logical identifiers (OIDs) into the right
 	 * places.	Note that the physical ID (relfilenode) is initially the same
@@ -2565,7 +2583,6 @@ RelationBuildLocalRelation(const char *relname,
 	 * map.
 	 */
 	rel->rd_rel->relisshared = shared_relation;
-	rel->rd_rel->relistemp = rel->rd_istemp;
 
 	RelationGetRelid(rel) = relid;
 
@@ -2642,7 +2659,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 
 	/* Allocate a new relfilenode */
 	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
-									   relation->rd_backend);
+									   relation->rd_rel->relpersistence);
 
 	/*
 	 * Get a writable copy of the pg_class tuple for the given relation.
@@ -2665,7 +2682,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
 	newrnode.node = relation->rd_node;
 	newrnode.node.relNode = newrelfilenode;
 	newrnode.backend = relation->rd_backend;
-	RelationCreateStorage(newrnode.node, relation->rd_istemp);
+	RelationCreateStorage(newrnode.node, relation->rd_rel->relpersistence);
 	smgrclosenode(newrnode);
 
 	/*
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 97c808b..56dcdd5 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -56,6 +56,6 @@ extern Oid	GetNewOid(Relation relation);
 extern Oid GetNewOidWithIndex(Relation relation, Oid indexId,
 				   AttrNumber oidcolumn);
 extern Oid	GetNewRelFileNode(Oid reltablespace, Relation pg_class,
-				  BackendId backend);
+				  char relpersistence);
 
 #endif   /* CATALOG_H */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 7795bda..646ab9c 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -40,6 +40,7 @@ extern Relation heap_create(const char *relname,
 			Oid relid,
 			TupleDesc tupDesc,
 			char relkind,
+			char relpersistence,
 			bool shared_relation,
 			bool mapped_relation,
 			bool allow_system_table_mods);
@@ -54,6 +55,7 @@ extern Oid heap_create_with_catalog(const char *relname,
 						 TupleDesc tupdesc,
 						 List *cooked_constraints,
 						 char relkind,
+						 char relpersistence,
 						 bool shared_relation,
 						 bool mapped_relation,
 						 bool oidislocal,
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index f50cf9d..1edbfe3 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -49,7 +49,7 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
 	Oid			reltoastidxid;	/* if toast table, OID of chunk_id index */
 	bool		relhasindex;	/* T if has (or has had) any indexes */
 	bool		relisshared;	/* T if shared across databases */
-	bool		relistemp;		/* T if temporary relation */
+	char		relpersistence;	/* see RELPERSISTENCE_xxx constants */
 	char		relkind;		/* see RELKIND_xxx constants below */
 	int2		relnatts;		/* number of user attributes */
 
@@ -108,7 +108,7 @@ typedef FormData_pg_class *Form_pg_class;
 #define Anum_pg_class_reltoastidxid		12
 #define Anum_pg_class_relhasindex		13
 #define Anum_pg_class_relisshared		14
-#define Anum_pg_class_relistemp			15
+#define Anum_pg_class_relpersistence	15
 #define Anum_pg_class_relkind			16
 #define Anum_pg_class_relnatts			17
 #define Anum_pg_class_relchecks			18
@@ -132,13 +132,13 @@ typedef FormData_pg_class *Form_pg_class;
  */
 
 /* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId */
-DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f f r 28 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f f 3 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f f r 19 0 f f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 19 0 f f f f f f 3 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f f r 25 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 25 0 t f f f f f 3 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f f r 27 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f 3 _null_ _null_ ));
 DESCR("");
 
 #define		  RELKIND_INDEX			  'i'		/* secondary index */
@@ -149,4 +149,7 @@ DESCR("");
 #define		  RELKIND_VIEW			  'v'		/* view */
 #define		  RELKIND_COMPOSITE_TYPE  'c'		/* composite type */
 
+#define		  RELPERSISTENCE_PERMANENT	'p'
+#define		  RELPERSISTENCE_TEMP		't'
+
 #endif   /* PG_CLASS_H */
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index d7b8731..f086b1c 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -20,7 +20,7 @@
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
 
-extern void RelationCreateStorage(RelFileNode rnode, bool istemp);
+extern void RelationCreateStorage(RelFileNode rnode, char relpersistence);
 extern void RelationDropStorage(Relation rel);
 extern void RelationPreserveStorage(RelFileNode rnode);
 extern void RelationTruncate(Relation rel, BlockNumber nblocks);
diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h
index 327fbc6..1e3f6ca 100644
--- a/src/include/commands/tablespace.h
+++ b/src/include/commands/tablespace.h
@@ -47,7 +47,7 @@ extern void AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt);
 
 extern void TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo);
 
-extern Oid	GetDefaultTablespace(bool forTemp);
+extern Oid	GetDefaultTablespace(char relpersistence);
 
 extern void PrepareTempTablespaces(void);
 
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b17adf2..ba5ae37 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -74,7 +74,7 @@ typedef struct RangeVar
 	char	   *relname;		/* the relation/sequence name */
 	InhOption	inhOpt;			/* expand rel by inheritance? recursively act
 								 * on children? */
-	bool		istemp;			/* is this a temp relation/sequence? */
+	char		relpersistence;	/* see RELPERSISTENCE_* in pg_class.h */
 	Alias	   *alias;			/* table alias & optional column aliases */
 	int			location;		/* token location, or -1 if unknown */
 } RangeVar;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 9ad92c2..8474d8f 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -132,7 +132,6 @@ typedef struct RelationData
 	struct SMgrRelationData *rd_smgr;	/* cached file handle, or NULL */
 	int			rd_refcnt;		/* reference count */
 	BackendId	rd_backend;		/* owning backend id, if temporary relation */
-	bool		rd_istemp;		/* rel is a temporary relation */
 	bool		rd_isnailed;	/* rel is nailed in cache */
 	bool		rd_isvalid;		/* relcache entry is valid */
 	char		rd_indexvalid;	/* state of rd_indexlist: 0 = not valid, 1 =
@@ -391,6 +390,27 @@ typedef struct StdRdOptions
 	} while (0)
 
 /*
+ * RelationNeedsWAL
+ *		True if relation needs WAL.
+ */
+#define RelationNeedsWAL(relation) \
+	((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
+
+/*
+ * RelationUsesLocalBuffers
+ *		True if relation's pages are stored in local buffers.
+ */
+#define RelationUsesLocalBuffers(relation) \
+	((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
+ * RelationUsesTempNamespace
+ *		True if relation's catalog entries live in a private namespace.
+ */
+#define RelationUsesTempNamespace(relation) \
+	((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
  * RELATION_IS_LOCAL
  *		If a rel is either temp or newly created in the current transaction,
  *		it can be assumed to be visible only to the current backend.
@@ -408,7 +428,8 @@ typedef struct StdRdOptions
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_OTHER_TEMP(relation) \
-	((relation)->rd_istemp && (relation)->rd_backend != MyBackendId)
+	((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP \
+	&& (relation)->rd_backend != MyBackendId)
 
 /* routines in utils/cache/relcache.c */
 extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 10d82d4..3500050 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,8 @@ extern Relation RelationBuildLocalRelation(const char *relname,
 						   Oid relid,
 						   Oid reltablespace,
 						   bool shared_relation,
-						   bool mapped_relation);
+						   bool mapped_relation,
+						   char relpersistence);
 
 /*
  * Routine to manage assignment of new relfilenode to a relation
relax-sync-commit-v1.patchapplication/octet-stream; name=relax-sync-commit-v1.patchDownload
commit bdd697e5f0a16db2a672e5e14d11744958364101
Author: Robert Haas <rhaas@postgresql.org>
Date:   Sat Nov 13 09:52:11 2010 -0500

    Assume synchronous_commit=off for transactions that don't write WAL.
    
    This is advantageous for transactions that write only to temporary or
    unlogged tables, where loss of the transaction commit record is not
    critical.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d2e2e11..088daa0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -907,6 +907,7 @@ RecordTransactionCommit(void)
 	int			nmsgs = 0;
 	SharedInvalidationMessage *invalMessages = NULL;
 	bool		RelcacheInitFileInval = false;
+	bool		wrote_xlog;
 
 	/* Get data needed for commit record */
 	nrels = smgrGetPendingDeletes(true, &rels);
@@ -914,6 +915,7 @@ RecordTransactionCommit(void)
 	if (XLogStandbyInfoActive())
 		nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
 													 &RelcacheInitFileInval);
+	wrote_xlog = (XactLastRecEnd.xrecoff != 0);
 
 	/*
 	 * If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -940,7 +942,7 @@ RecordTransactionCommit(void)
 		 * assigned is a sequence advance record due to nextval() --- we want
 		 * to flush that to disk before reporting commit.)
 		 */
-		if (XactLastRecEnd.xrecoff == 0)
+		if (!wrote_xlog)
 			goto cleanup;
 	}
 	else
@@ -1028,16 +1030,21 @@ RecordTransactionCommit(void)
 	}
 
 	/*
-	 * Check if we want to commit asynchronously.  If the user has set
-	 * synchronous_commit = off, and we're not doing cleanup of any non-temp
-	 * rels nor committing any command that wanted to force sync commit, then
-	 * we can defer flushing XLOG.	(We must not allow asynchronous commit if
-	 * there are any non-temp tables to be deleted, because we might delete
-	 * the files before the COMMIT record is flushed to disk.  We do allow
-	 * asynchronous commit if all to-be-deleted tables are temporary though,
-	 * since they are lost anyway if we crash.)
+	 * Check if we want to commit asynchronously.  If we're doing cleanup of
+	 * any non-temp rels or committing any command that wanted to force sync
+	 * commit, then we must flush XLOG immediately.  (We must not allow
+	 * asynchronous commit if there are any non-temp tables to be deleted,
+	 * because we might delete the files before the COMMIT record is flushed to
+	 * disk.  We do allow asynchronous commit if all to-be-deleted tables are
+	 * temporary though, since they are lost anyway if we crash.) Otherwise,
+	 * we can defer the flush if either (1) the user has set synchronous_commit
+	 * = off, or (2) the current transaction has not performed any WAL-logged
+	 * operation.  This latter case can arise if the only writes performed by
+	 * the current transaction target temporary or unlogged relations.  Loss
+	 * of such a transaction won't matter anyway, because temp tables will be
+	 * lost after a crash anyway, and unlogged ones will be truncated.
 	 */
-	if (XactSyncCommit || forceSyncCommit || nrels > 0)
+	if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0)
 	{
 		/*
 		 * Synchronous commit case:
#3Andy Colson
andy@squeakycode.net
In reply to: Robert Haas (#2)
Re: unlogged tables

I was able to apply and compile and run ok, creating unlogged tables
seems to work as well.

I patched up pgbench to optionally create unlogged tables, and ran it
both ways. I get ~80tps normally, and ~1,500tps with unlogged. (Thats
from memory, was playing with it last night at home)

I also have a "real world" test I can try (import apache logs and run a
few stats).

What other things would be good to test:
indexes?
analyze/stats/plans?
dump/restore?

Is "create temp unlogged table stuff(...)" an option?

-Andy

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Colson (#3)
Re: unlogged tables

Andy Colson <andy@squeakycode.net> writes:

Is "create temp unlogged table stuff(...)" an option?

temp tables are unlogged already.

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Andy Colson (#3)
Re: unlogged tables

On Tue, Nov 16, 2010 at 1:09 PM, Andy Colson <andy@squeakycode.net> wrote:

I was able to apply and compile and run ok, creating unlogged tables seems
to work as well.

I patched up pgbench to optionally create unlogged tables, and ran it both
ways.  I get ~80tps normally, and ~1,500tps with unlogged.  (Thats from
memory, was playing with it last night at home)

What do you get with normal tables but with fsync, full_page_writes,
and synchronous_commits turned off?

What do you get with normal tables but with sychronous_commit (only) off?

Can you detect any performance regression on normal tables with the
patch vs. without the patch?

I also have a "real world" test I can try (import apache logs and run a few
stats).

That would be great.

What other things would be good to test:
indexes?
analyze/stats/plans?
dump/restore?

All of those. I guess there's a question of what pg_dump should emit
for an unlogged table. Clearly, we need to dump a CREATE UNLOGGED
TABLE statement (which we do), and right now we also dump the table
contents - which seems reasonable, but arguably someone could say that
we ought not to dump the contents of anything less than a
full-fledged, permanent table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#5)
Re: unlogged tables

Excerpts from Robert Haas's message of mar nov 16 15:34:55 -0300 2010:

On Tue, Nov 16, 2010 at 1:09 PM, Andy Colson <andy@squeakycode.net> wrote:

dump/restore?

All of those. I guess there's a question of what pg_dump should emit
for an unlogged table. Clearly, we need to dump a CREATE UNLOGGED
TABLE statement (which we do), and right now we also dump the table
contents - which seems reasonable, but arguably someone could say that
we ought not to dump the contents of anything less than a
full-fledged, permanent table.

I think if you do a regular backup of the complete database, unlogged
tables should come out empty, but if you specifically request a dump of
it, it shouldn't.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#6)
Re: unlogged tables

On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of mar nov 16 15:34:55 -0300 2010:

On Tue, Nov 16, 2010 at 1:09 PM, Andy Colson <andy@squeakycode.net> wrote:

dump/restore?

All of those.  I guess there's a question of what pg_dump should emit
for an unlogged table.  Clearly, we need to dump a CREATE UNLOGGED
TABLE statement (which we do), and right now we also dump the table
contents - which seems reasonable, but arguably someone could say that
we ought not to dump the contents of anything less than a
full-fledged, permanent table.

I think if you do a regular backup of the complete database, unlogged
tables should come out empty, but if you specifically request a dump of
it, it shouldn't.

Oh, wow. That seems confusing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: unlogged tables

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera

I think if you do a regular backup of the complete database, unlogged
tables should come out empty, but if you specifically request a dump of
it, it shouldn't.

Oh, wow. That seems confusing.

I don't like it either.

I think allowing pg_dump to dump the data in an unlogged table is not
only reasonable, but essential. Imagine that someone determines that
his reliability needs will be adequately served by unlogged tables plus
hourly backups. Now you're going to tell him that that doesn't work
because pg_dump arbitrarily excludes the data in unlogged tables?

regards, tom lane

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#7)
Re: unlogged tables

On 11/16/2010 02:06 PM, Robert Haas wrote:

On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

I think if you do a regular backup of the complete database, unlogged
tables should come out empty, but if you specifically request a dump of
it, it shouldn't.

Oh, wow. That seems confusing.

Yeah. And unnecessary. If you want it excluded we already have a switch
for that.

cheers

andrew

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: unlogged tables

On Tue, Nov 16, 2010 at 3:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera

I think if you do a regular backup of the complete database, unlogged
tables should come out empty, but if you specifically request a dump of
it, it shouldn't.

Oh, wow.  That seems confusing.

I don't like it either.

I think allowing pg_dump to dump the data in an unlogged table is not
only reasonable, but essential.  Imagine that someone determines that
his reliability needs will be adequately served by unlogged tables plus
hourly backups.  Now you're going to tell him that that doesn't work
because pg_dump arbitrarily excludes the data in unlogged tables?

Yeah, you'd have to allow a flag to control the behavior. And in that
case I'd rather the flag have a single default rather than different
defaults depending on whether or not individual tables were selected.
Something like --omit-unlogged-data.

Incidentally, unlogged tables plus hourly backups is not dissimilar to
what some NoSQL products are offering for reliability. Except with
PG, you can (or soon will be able to, hopefully) selectively apply
that lowered degree of reliability to a subset of your data for which
you determine it's appropriate, while maintaining full reliability
guarantees for other data. I am not aware of any other product which
offers that level of fine-grained control over durability.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#10)
Re: unlogged tables

Yeah, you'd have to allow a flag to control the behavior. And in that
case I'd rather the flag have a single default rather than different
defaults depending on whether or not individual tables were selected.
Something like --omit-unlogged-data.

Are you sure we don't want to default the other way? It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#12David Fetter
david@fetter.org
In reply to: Josh Berkus (#11)
Re: unlogged tables

On Tue, Nov 16, 2010 at 02:00:33PM -0800, Josh Berkus wrote:

Yeah, you'd have to allow a flag to control the behavior. And in
that case I'd rather the flag have a single default rather than
different defaults depending on whether or not individual tables
were selected. Something like --omit-unlogged-data.

Are you sure we don't want to default the other way?

+1 for defaulting the other way.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Josh Berkus (#11)
Re: unlogged tables

On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Or perhaps most people will want them backed up, because them being
unlogged the backup is the only way to get them back in case of a crash?

#14Josh Berkus
josh@agliodbs.com
In reply to: Peter Eisentraut (#13)
Re: unlogged tables

On 11/16/10 2:08 PM, Peter Eisentraut wrote:

On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Or perhaps most people will want them backed up, because them being
unlogged the backup is the only way to get them back in case of a crash?

Yeah, hard to tell, really. Which default is less likely to become a
foot-gun?

Maybe it's time for a survey on -general.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#15Joshua D. Drake
jd@commandprompt.com
In reply to: Josh Berkus (#11)
Re: unlogged tables

On Tue, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

Yeah, you'd have to allow a flag to control the behavior. And in that
case I'd rather the flag have a single default rather than different
defaults depending on whether or not individual tables were selected.
Something like --omit-unlogged-data.

Are you sure we don't want to default the other way? It seems to me
that most people using unlogged tables won't want to back them up ...

+1

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#14)
Re: unlogged tables

On 11/16/2010 05:12 PM, Josh Berkus wrote:

On 11/16/10 2:08 PM, Peter Eisentraut wrote:

On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Or perhaps most people will want them backed up, because them being
unlogged the backup is the only way to get them back in case of a crash?

Yeah, hard to tell, really. Which default is less likely to become a
foot-gun?

Maybe it's time for a survey on -general.

I would argue pretty strongly that backing something up is much less
likely to be a foot-gun than not backing it up, and treating unlogged
tables the same as logged tables for this purpose is also much less
likely to be a foot-gun. As I pointed out upthread, we already have a
mechanism for not backing up selected objects. I'd much rather have a
rule that says "everything gets backed up by default" than one that says
"everything gets backed up by default except unlogged tables".

cheers

andrew

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: unlogged tables

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Nov 16, 2010 at 3:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think allowing pg_dump to dump the data in an unlogged table is not
only reasonable, but essential.

Yeah, you'd have to allow a flag to control the behavior. And in that
case I'd rather the flag have a single default rather than different
defaults depending on whether or not individual tables were selected.
Something like --omit-unlogged-data.

As long as the default is to include the data, I wouldn't object to
having such a flag. A default that drops data seems way too
foot-gun-like.

regards, tom lane

#18Joshua D. Drake
jd@commandprompt.com
In reply to: Peter Eisentraut (#13)
Re: unlogged tables

On Wed, 2010-11-17 at 00:08 +0200, Peter Eisentraut wrote:

On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Or perhaps most people will want them backed up, because them being
unlogged the backup is the only way to get them back in case of a crash?

To me, the use of unlogged tables is going to be for dynamic, volatile
data that can be rebuilt from an integrity set on a crash. Session
tables, metadata tables, dynamic updates that are batched to logged
tables every 10 minutes, that type of thing.

I think Berkus has a good idea on asking general.

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#19Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#14)
Re: unlogged tables

On Tuesday 16 November 2010 23:12:10 Josh Berkus wrote:

On 11/16/10 2:08 PM, Peter Eisentraut wrote:

On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Or perhaps most people will want them backed up, because them being
unlogged the backup is the only way to get them back in case of a crash?

Yeah, hard to tell, really. Which default is less likely to become a
foot-gun?

Well. Maybe both possibilities are just propable(which I think is unlikely),
but the different impact is pretty clear.

One way your backup runs too long and too much data changes, the other way
round you loose the data which you assumed safely backuped.

Isn't that a *really* easy decision?

Andres

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Andres Freund (#19)
Re: unlogged tables

Andres Freund <andres@anarazel.de> wrote:

One way your backup runs too long and too much data changes, the
other way round you loose the data which you assumed safely
backuped.

Isn't that a *really* easy decision?

Yeah. Count me in the camp which wants the default behavior to be
that pg_dump backs up all permanent tables, even those which aren't
WAL-logged (and therefore aren't kept up in PITR backups, hot/warm
standbys, or streaming replication).

-Kevin

#21Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#19)
Re: unlogged tables

On Tuesday 16 November 2010 23:30:29 Andres Freund wrote:

On Tuesday 16 November 2010 23:12:10 Josh Berkus wrote:

On 11/16/10 2:08 PM, Peter Eisentraut wrote:

On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Or perhaps most people will want them backed up, because them being
unlogged the backup is the only way to get them back in case of a
crash?

Yeah, hard to tell, really. Which default is less likely to become a
foot-gun?

Well. Maybe both possibilities are just propable(which I think is
unlikely), but the different impact is pretty clear.

One way your backup runs too long and too much data changes, the other way
round you loose the data which you assumed safely backuped.

Isn't that a *really* easy decision?

Oh, and another argument:
Which are you more likely to discover: a backup that runs consistenly running
for a short time or a backup thats getting slower and larger...

Andres

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#11)
Re: unlogged tables

Josh Berkus <josh@agliodbs.com> writes:

Yeah, you'd have to allow a flag to control the behavior. And in that
case I'd rather the flag have a single default rather than different
defaults depending on whether or not individual tables were selected.
Something like --omit-unlogged-data.

Are you sure we don't want to default the other way? It seems to me
that most people using unlogged tables won't want to back them up ...

That's a very debatable assumption. You got any evidence for it?
Personally, I don't think pg_dump should ever default to omitting
data.

especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Say what? pg_dump just takes AccessShareLock. That doesn't add any
overhead.

regards, tom lane

#23Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#22)
Re: unlogged tables

That's a very debatable assumption. You got any evidence for it?
Personally, I don't think pg_dump should ever default to omitting
data.

Survey launched, although it may become a moot point, given how this
discussion is going.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#24Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#19)
Re: unlogged tables

On Tue, Nov 16, 2010 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote:

On Tuesday 16 November 2010 23:12:10 Josh Berkus wrote:

On 11/16/10 2:08 PM, Peter Eisentraut wrote:

On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote:

It seems to me
that most people using unlogged tables won't want to back them up ...
especially since the share lock for pgdump will add overhead for the
kinds of high-volume updates people want to do with unlogged tables.

Or perhaps most people will want them backed up, because them being
unlogged the backup is the only way to get them back in case of a crash?

Yeah, hard to tell, really.   Which default is less likely to become a
foot-gun?

Well. Maybe both possibilities are just propable(which I think is unlikely),
but the different impact is pretty clear.

One way your backup runs too long and too much data changes, the other way
round you loose the data which you assumed safely backuped.

Isn't that a *really* easy decision?

Yeah, it seems pretty clear to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25marcin mank
marcin.mank@gmail.com
In reply to: Robert Haas (#2)
Re: unlogged tables

Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown?

Greetings
Marcin Mańk

#26Robert Haas
robertmhaas@gmail.com
In reply to: marcin mank (#25)
Re: unlogged tables

On Tue, Nov 16, 2010 at 5:57 PM, marcin mank <marcin.mank@gmail.com> wrote:

Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown?

I don't think so. To make that work, you'd need to keep track of
every backing file that might contain pages not fsync()'d to disk, and
at shutdown time you'd need to fsync() them all before shutting down.
Doing that would require an awful lot of bookkeeping for a pretty
marginal gain. Maybe it would be useful to have:

ALTER TABLE .. READ [ONLY|WRITE];

...and preserve unlogged tables that are also read-only. Or perhaps
something specific to unlogged tables:

ALTER TABLE .. QUIESCE;

...which would take an AccessExclusiveLock, make the table read-only,
fsync() it, and tag it for restart-survival.

But I'm happy to leave all of this until we gain some field experience
with this feature, and have a better idea what features people would
most like to see.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#26)
Re: unlogged tables

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Nov 16, 2010 at 5:57 PM, marcin mank <marcin.mank@gmail.com> wrote:

Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown?

I don't think so. To make that work, you'd need to keep track of
every backing file that might contain pages not fsync()'d to disk, and
at shutdown time you'd need to fsync() them all before shutting down.

This is presuming that we want to guarantee the same level of safety for
unlogged tables as for regular. Which, it seems to me, is exactly what
people *aren't* asking for. Why not just write the data and shut down?
If you're unlucky enough to have a system crash immediately after that,
well, you might have corrupt data in the unlogged tables ... but that
doesn't seem real probable.

regards, tom lane

#28Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#26)
Re: unlogged tables

On 11/16/10 4:40 PM, Robert Haas wrote:

But I'm happy to leave all of this until we gain some field experience
with this feature, and have a better idea what features people would
most like to see.

+1. Let's not complicate this.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#29Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#27)
Re: unlogged tables

On Tue, Nov 16, 2010 at 7:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Nov 16, 2010 at 5:57 PM, marcin mank <marcin.mank@gmail.com> wrote:

Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown?

I don't think so.  To make that work, you'd need to keep track of
every backing file that might contain pages not fsync()'d to disk, and
at shutdown time you'd need to fsync() them all before shutting down.

This is presuming that we want to guarantee the same level of safety for
unlogged tables as for regular.  Which, it seems to me, is exactly what
people *aren't* asking for.  Why not just write the data and shut down?
If you're unlucky enough to have a system crash immediately after that,
well, you might have corrupt data in the unlogged tables ... but that
doesn't seem real probable.

I have a hard time getting excited about a system that is designed to
ensure that we probably don't have data corruption. The whole point
of this feature is to relax the usual data integrity guarantees in a
controlled way. A small but uncertain risk of corruption is not an
improvement over a simple, predictable behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#30David Fetter
david@fetter.org
In reply to: David Fetter (#12)
Re: unlogged tables

On Tue, Nov 16, 2010 at 02:07:35PM -0800, David Fetter wrote:

On Tue, Nov 16, 2010 at 02:00:33PM -0800, Josh Berkus wrote:

Yeah, you'd have to allow a flag to control the behavior. And in
that case I'd rather the flag have a single default rather than
different defaults depending on whether or not individual tables
were selected. Something like --omit-unlogged-data.

Are you sure we don't want to default the other way?

+1 for defaulting the other way.

Upon further reflection, I'm switching to the "default to backing up
unlogged tables" side.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#29)
Re: unlogged tables

On 17.11.2010 03:56, Robert Haas wrote:

On Tue, Nov 16, 2010 at 7:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas<robertmhaas@gmail.com> writes:

On Tue, Nov 16, 2010 at 5:57 PM, marcin mank<marcin.mank@gmail.com> wrote:

Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown?

I don't think so. To make that work, you'd need to keep track of
every backing file that might contain pages not fsync()'d to disk, and
at shutdown time you'd need to fsync() them all before shutting down.

This is presuming that we want to guarantee the same level of safety for
unlogged tables as for regular. Which, it seems to me, is exactly what
people *aren't* asking for. Why not just write the data and shut down?
If you're unlucky enough to have a system crash immediately after that,
well, you might have corrupt data in the unlogged tables ... but that
doesn't seem real probable.

I have a hard time getting excited about a system that is designed to
ensure that we probably don't have data corruption. The whole point
of this feature is to relax the usual data integrity guarantees in a
controlled way. A small but uncertain risk of corruption is not an
improvement over a simple, predictable behavior.

I agree with Robert, the point of unlogged tables is that the system
knows to zap them away if there's any risk of having corruption in them.
A corrupt page can lead to all kinds of errors. We try to handle
corruption gracefully, but I wouldn't be surprised if you managed to
even get a segfault caused by a torn page if you're unlucky.

fsync()ing the file at shutdown doesn't seem too bad to me from
performance point of view, we tolerate that for all other tables. And
you can always truncate the table yourself before shutdown.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#31)
Re: unlogged tables

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

fsync()ing the file at shutdown doesn't seem too bad to me from
performance point of view, we tolerate that for all other tables. And
you can always truncate the table yourself before shutdown.

The objection to that was not about performance. It was about how
to find out what needs to be fsync'd.

regards, tom lane

#33Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#32)
Re: unlogged tables

On Wed, Nov 17, 2010 at 3:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

fsync()ing the file at shutdown doesn't seem too bad to me from
performance point of view, we tolerate that for all other tables. And
you can always truncate the table yourself before shutdown.

The objection to that was not about performance.  It was about how
to find out what needs to be fsync'd.

Just a crazy brainstorming thought, but....

If this is a clean shutdown then all the non-unlogged tables have been
checkpointed so they should have no dirty pages in them anyways. So we
could just fsync everything.

--
greg

#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Greg Stark (#33)
Re: unlogged tables

Greg Stark <gsstark@mit.edu> wrote:

If this is a clean shutdown then all the non-unlogged tables have
been checkpointed so they should have no dirty pages in them
anyways. So we could just fsync everything.

Or just all the unlogged tables.

-Kevin

#35Robert Haas
robertmhaas@gmail.com
In reply to: Greg Stark (#33)
Re: unlogged tables

On Wed, Nov 17, 2010 at 11:00 AM, Greg Stark <gsstark@mit.edu> wrote:

On Wed, Nov 17, 2010 at 3:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

fsync()ing the file at shutdown doesn't seem too bad to me from
performance point of view, we tolerate that for all other tables. And
you can always truncate the table yourself before shutdown.

The objection to that was not about performance.  It was about how
to find out what needs to be fsync'd.

Just a crazy brainstorming thought, but....

If this is a clean shutdown then all the non-unlogged tables have been
checkpointed so they should have no dirty pages in them anyways. So we
could just fsync everything.

Hmm, that reminds me: checkpoints should really skip writing buffers
belonging to unlogged relations altogether; and any fsync against an
unlogged relation should be skipped. I need to go take a look at
what's required to make that happen, either as part of this patch or
as a follow-on commit.

It might be interesting to have a kind of semi-unlogged table where we
write a special xlog record for the first access after each checkpoint
but otherwise don't xlog. On redo, we truncate the tables mentioned,
but not any others, since they're presumably OK. But that's not what
I'm trying to design here. I'm trying optimize it for the case where
you DON'T care about durability and you just want it to be as fast as
possible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#36Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#35)
Re: unlogged tables

Robert, All:

I hope you're following the thread on -general about this feature.
We're getting a lot of feedback.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#37Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#32)
Re: unlogged tables

On 17.11.2010 17:11, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

fsync()ing the file at shutdown doesn't seem too bad to me from
performance point of view, we tolerate that for all other tables. And
you can always truncate the table yourself before shutdown.

The objection to that was not about performance. It was about how
to find out what needs to be fsync'd.

I must be missing something: we handle that just fine with normal
tables, why is it a problem for unlogged tables?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#37)
Re: unlogged tables

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 17.11.2010 17:11, Tom Lane wrote:

The objection to that was not about performance. It was about how
to find out what needs to be fsync'd.

I must be missing something: we handle that just fine with normal
tables, why is it a problem for unlogged tables?

Hmm ... that's a good point. If we simply treat unlogged tables the
same as regular for checkpointing purposes, don't we end up having
flushed them all correctly during a shutdown checkpoint? I was thinking
that WAL-logging had some influence on that logic, but it doesn't.

Robert is probably going to object that he wanted to prevent any
fsyncing for unlogged tables, but the discussion over in pgsql-general
is crystal clear that people do NOT want to lose unlogged data over
a clean shutdown and restart. If all it takes to do that is to refrain
from lobotomizing the checkpoint logic for unlogged tables, I say we
should refrain.

regards, tom lane

#39Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#36)
Re: unlogged tables

On Wed, Nov 17, 2010 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:

Robert, All:

I hope you're following the thread on -general about this feature.
We're getting a lot of feedback.

I haven't been; I'm not subscribed to general; it'd be useful to CC me
next time.

Reading through the thread in the archives, it seems like people are
mostly confused. Some are confused about the current behavior of the
patch (no, it really does always truncate your tables, I swear);
others are confused about how WAL logging works (of course a backend
crash doesn't truncate an ordinary table - that's because it's WAL
LOGGED); and still others are maybe not exactly confused but hoping
that unlogged table = MyISAM (try not to corrupt your data, but don't
get too bent out of shape about the possibility that it may get
corrupted anyway).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#40Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#38)
Re: unlogged tables

On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 17.11.2010 17:11, Tom Lane wrote:

The objection to that was not about performance.  It was about how
to find out what needs to be fsync'd.

I must be missing something: we handle that just fine with normal
tables, why is it a problem for unlogged tables?

Hmm ... that's a good point.  If we simply treat unlogged tables the
same as regular for checkpointing purposes, don't we end up having
flushed them all correctly during a shutdown checkpoint?  I was thinking
that WAL-logging had some influence on that logic, but it doesn't.

Robert is probably going to object that he wanted to prevent any
fsyncing for unlogged tables, but the discussion over in pgsql-general
is crystal clear that people do NOT want to lose unlogged data over
a clean shutdown and restart.  If all it takes to do that is to refrain
from lobotomizing the checkpoint logic for unlogged tables, I say we
should refrain.

I think that's absolutely a bad idea. I seriously do not want to have
a conversation with someone about why their unlogged tables are
exacerbating their checkpoint I/O spikes. I'd be happy to have two
modes, though. We should probably revisit the syntax, though. One,
it seems that CREATE UNLOGGED TABLE is not as clear as I thought it
was. Two, when (not if) we add more durability levels, we don't want
to create keywords for all of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#40)
Re: unlogged tables

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert is probably going to object that he wanted to prevent any
fsyncing for unlogged tables, but the discussion over in pgsql-general
is crystal clear that people do NOT want to lose unlogged data over
a clean shutdown and restart. �If all it takes to do that is to refrain
from lobotomizing the checkpoint logic for unlogged tables, I say we
should refrain.

I think that's absolutely a bad idea.

The customer is always right, and I think we are hearing loud and clear
what the customers want. Please let's not go out of our way to create
a feature that isn't what they want.

regards, tom lane

In reply to: Tom Lane (#41)
Re: unlogged tables

On Wed, Nov 17, 2010 at 02:16:06PM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert is probably going to object that he wanted to prevent any
fsyncing for unlogged tables, but the discussion over in pgsql-general
is crystal clear that people do NOT want to lose unlogged data over
a clean shutdown and restart. ���If all it takes to do that is to refrain
from lobotomizing the checkpoint logic for unlogged tables, I say we
should refrain.

I think that's absolutely a bad idea.

The customer is always right, and I think we are hearing loud and clear
what the customers want. Please let's not go out of our way to create
a feature that isn't what they want.

regards, tom lane

I would be fine with only having a safe shutdown with unlogged tables
and skip the checkpoint I/O all other times.

Cheers,
Ken

#43Andrew Dunstan
andrew@dunslane.net
In reply to: Kenneth Marshall (#42)
Re: unlogged tables

On 11/17/2010 02:22 PM, Kenneth Marshall wrote:

On Wed, Nov 17, 2010 at 02:16:06PM -0500, Tom Lane wrote:

Robert Haas<robertmhaas@gmail.com> writes:

On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert is probably going to object that he wanted to prevent any
fsyncing for unlogged tables, but the discussion over in pgsql-general
is crystal clear that people do NOT want to lose unlogged data over
a clean shutdown and restart. �If all it takes to do that is to refrain
from lobotomizing the checkpoint logic for unlogged tables, I say we
should refrain.

I think that's absolutely a bad idea.

The customer is always right, and I think we are hearing loud and clear
what the customers want. Please let's not go out of our way to create
a feature that isn't what they want.

I would be fine with only having a safe shutdown with unlogged tables
and skip the checkpoint I/O all other times.

Yeah, I was just thinking something like that would be good, and should
overcome Robert's objection to the whole idea.

I also agree with Tom's sentiment above.

To answer another point I see Tom made on the -general list: while
individual backends may crash from time to time, crashes of the whole
Postgres server are very rare in my experience in production
environments. It's really pretty robust, unless you're doing crazy
stuff. So that makes it all the more important that we can restart a
server cleanly (say, to change a config setting) without losing the
unlogged tables. If we don't allow that we'll make a laughing stock of
ourselves. Honestly.

cheers

andrew

#44Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#39)
Re: unlogged tables

Excerpts from Robert Haas's message of mié nov 17 15:48:56 -0300 2010:

On Wed, Nov 17, 2010 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:

Robert, All:

I hope you're following the thread on -general about this feature.
We're getting a lot of feedback.

I haven't been; I'm not subscribed to general; it'd be useful to CC me
next time.

FWIW I've figured that being subscribed to the lists is good even if I
have my mail client configured to hide these emails by default. It's a
lot easier for searching stuff that someone else references.

(I made the mistake of having it hide all pg-general email even though I
was CC'ed, though, which is the trivial way to implement this. I don't
recommend repeating this mistake.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#43)
Re: unlogged tables

Andrew Dunstan <andrew@dunslane.net> writes:

On 11/17/2010 02:22 PM, Kenneth Marshall wrote:

I would be fine with only having a safe shutdown with unlogged tables
and skip the checkpoint I/O all other times.

Yeah, I was just thinking something like that would be good, and should
overcome Robert's objection to the whole idea.

I don't think you can fsync only in the shutdown checkpoint and assume
your data is safe, if you didn't fsync a write a few moments earlier.

Now, a few minutes ago Robert was muttering about supporting more than
one kind of degraded-reliability table. I could see inventing
"unlogged" tables, which means exactly that (no xlog support, but we
still checkpoint/fsync as usual), and "unsynced" tables which
also/instead suppress fsync activity. The former type could be assumed
to survive a clean shutdown/restart, while the latter wouldn't. This
would let people pick their poison.

regards, tom lane

#46Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#45)
Re: unlogged tables

Now, a few minutes ago Robert was muttering about supporting more than
one kind of degraded-reliability table. I could see inventing
"unlogged" tables, which means exactly that (no xlog support, but we
still checkpoint/fsync as usual), and "unsynced" tables which
also/instead suppress fsync activity. The former type could be assumed
to survive a clean shutdown/restart, while the latter wouldn't. This
would let people pick their poison.

We're assuming here that the checkpoint activity for the unlogged table
causes significant load on a production system. Maybe we should do some
testing before we try to make this overly complex? I wouldn't be
surprised to find that on most filesystems the extra checkpointing of
the unlogged tables adds only small minority overhead.

Shouldn't be hard to build out pgbench into something which will test
this ... if only I had a suitable test machine available.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#43)
Re: unlogged tables

[ forgot to comment on this part ]

Andrew Dunstan <andrew@dunslane.net> writes:

To answer another point I see Tom made on the -general list: while
individual backends may crash from time to time, crashes of the whole
Postgres server are very rare in my experience in production
environments.

Well, if you mean the postmaster darn near never goes down, that's true,
because we go out of our way to ensure it does as little as possible.
But that has got zip to do with this discussion, because a backend crash
has to be assumed to have corrupted unlogged tables. There are some
folk over in -general who are wishfully thinking that only a postmaster
crash would lose their unlogged data, but that's simply wrong. Backend
crashes *will* truncate those tables; there is no way around that. The
comment I made was that my experience as to how often backends crash
might not square with production experience --- but you do have to draw
the distinction between a backend crash and a postmaster crash.

regards, tom lane

#48Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#47)
Re: unlogged tables

On 11/17/2010 02:44 PM, Tom Lane wrote:

[ forgot to comment on this part ]

Andrew Dunstan<andrew@dunslane.net> writes:

To answer another point I see Tom made on the -general list: while
individual backends may crash from time to time, crashes of the whole
Postgres server are very rare in my experience in production
environments.

Well, if you mean the postmaster darn near never goes down, that's true,
because we go out of our way to ensure it does as little as possible.
But that has got zip to do with this discussion, because a backend crash
has to be assumed to have corrupted unlogged tables. There are some
folk over in -general who are wishfully thinking that only a postmaster
crash would lose their unlogged data, but that's simply wrong. Backend
crashes *will* truncate those tables; there is no way around that. The
comment I made was that my experience as to how often backends crash
might not square with production experience --- but you do have to draw
the distinction between a backend crash and a postmaster crash.

OK. I'd missed that. Thanks for clarifying.

cheers

andrew

#49Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#43)
Re: unlogged tables

On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

The customer is always right, and I think we are hearing loud and clear
what the customers want.  Please let's not go out of our way to create
a feature that isn't what they want.

I would be fine with only having a safe shutdown with unlogged tables
and skip the checkpoint I/O all other times.

Yeah, I was just thinking something like that would be good, and should
overcome Robert's objection to the whole idea.

Could we slow down here a bit and talk through the ideas here in a
logical fashion?

The customer is always right, but the informed customer makes better
decisions than the uninformed customer. This idea, as proposed, does
not work. If you only include dirty buffers at the final checkpoint
before shutting down, you have no guarantee that any buffers that you
either didn't write or didn't fsync previously are actually on disk.
Therefore, you have no guarantee that the table data is not corrupted.
So you really have to decide between including the unlogged-table
buffers in EVERY checkpoint and not ever including them at all. Which
one is right depends on your use case.

For example, consider the poster who said that, when this feature is
available, they plan to try ripping out their memcached instance and
replacing it with PostgreSQL running unlogged tables. Suppose this
poster (or someone else in a similar situation) has a 64 GB and is
currently running a 60 GB memcached instance on it, which is not an
unrealistic scenario for memcached. Suppose further that he dirties
25% of that data each hour. memcached is currently doing no writes to
disk. When he switches to PostgreSQL and sets checkpoints_segments to
a gazillion and checkpoint_timeout to the maximum, he's going to start
writing 15 GB of data to disk every hour - data which he clearly
doesn't care about losing, or preserving across restarts, because he's
currently storing it in memcached. In fact, with memcached, he'll not
only lose data at shutdown - he'll lose data on a regular basis when
everything is running normally. We can try to convince ourselves that
someone in this situation will not care about needing to get 15GB of
disposable data per hour from memory to disk in order to have a
feature that he doesn't need, but I think it's going to be pretty hard
to make that credible.

Now, second use case. Consider someone who is currently running
PostgreSQL in a non-durable configuration, with fsync=off,
full_page_writes=off, and synchronous_commit=off. This person - who
is based on someone I spoke with at PG West - is doing a large amount
of data processing using PostGIS. Their typical workflow is to load a
bunch of data, run a simulation, and then throw away the entire
database. They don't want to pay the cost of durability because if
they crash in mid-simulation they will simply rerun it. Being fast is
more important. Whether or not this person will be happy with the
proposed behavior is a bit harder to say. If it kills performance,
they will definitely hate it. But if the performance penalty is only
modest, they may enjoy the convenience of being able to shut down the
database and start it up again later without losing data.

Third use case. Someone on pgsql-general mentioned that they want to
write logs to PG, and can abide losing them if a crash happens, but
not on a clean shutdown and restart. This person clearly shuts down
their production database a lot more often than I do, but that is OK.
By explicit stipulation, they want the survive-a-clean-shutdown
behavior. I have no problem supporting that use case, providing they
are willing to take the associated performance penalty at checkpoint
time, which we don't know because we haven't asked, but I'm fine with
assuming it's useful even though I probably wouldn't use it much
myself.

I also agree with Tom's sentiment above.

To answer another point I see Tom made on the -general list: while
individual backends may crash from time to time, crashes of the whole
Postgres server are very rare in my experience in production environments.
It's really pretty robust, unless you're doing crazy stuff. So that makes it
all the more important that we can restart a server cleanly (say, to change
a config setting) without losing the unlogged tables. If we don't allow that
we'll make a laughing stock of ourselves. Honestly.

Let's please not assume that there is only one reasonable option here,
or that I have not thought about some of these issues.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#50Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#46)
Re: unlogged tables

On Wed, Nov 17, 2010 at 2:42 PM, Josh Berkus <josh@agliodbs.com> wrote:

Now, a few minutes ago Robert was muttering about supporting more than
one kind of degraded-reliability table.  I could see inventing
"unlogged" tables, which means exactly that (no xlog support, but we
still checkpoint/fsync as usual), and "unsynced" tables which
also/instead suppress fsync activity.  The former type could be assumed
to survive a clean shutdown/restart, while the latter wouldn't.  This
would let people pick their poison.

We're assuming here that the checkpoint activity for the unlogged table
causes significant load on a production system.  Maybe we should do some
testing before we try to make this overly complex?  I wouldn't be
surprised to find that on most filesystems the extra checkpointing of
the unlogged tables adds only small minority overhead.

Shouldn't be hard to build out pgbench into something which will test
this ... if only I had a suitable test machine available.

I guess the point I'd make here is that checkpoint I/O will be a
problem for unlogged tables in exactly the same situations in which it
is a problem for regular tables. There is some amount of I/O that
your system can handle before the additional I/O caused by checkpoints
starts to become a problem. If unlogged tables (or one particular
variant of unlogged tables) don't need to participate in checkpoints,
then you will be able to use unlogged tables, in situations where they
are appropriate to the workload, to control your I/O load and
hopefully keep it below the level where it causes a problem. Of
course, there will also be workloads where your system has plenty of
spare capacity (in which case it won't matter) or where your system is
going to be overwhelmed anyway (in which case it doesn't really matter
either). But if you are somewhere between those two extremes, this
has to matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#51Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#49)
Re: unlogged tables

On Wednesday 17 November 2010 20:54:14 Robert Haas wrote:

On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

The customer is always right, and I think we are hearing loud and clear
what the customers want. Please let's not go out of our way to create
a feature that isn't what they want.

I would be fine with only having a safe shutdown with unlogged tables
and skip the checkpoint I/O all other times.

Yeah, I was just thinking something like that would be good, and should
overcome Robert's objection to the whole idea.

Could we slow down here a bit and talk through the ideas here in a
logical fashion?

The customer is always right, but the informed customer makes better
decisions than the uninformed customer. This idea, as proposed, does
not work. If you only include dirty buffers at the final checkpoint
before shutting down, you have no guarantee that any buffers that you
either didn't write or didn't fsync previously are actually on disk.
Therefore, you have no guarantee that the table data is not corrupted.
So you really have to decide between including the unlogged-table
buffers in EVERY checkpoint and not ever including them at all. Which
one is right depends on your use case.

How can you get a buffer which was no written out *at all*? Do you want to
force all such pages to stay in shared_buffers? That sounds quite a bit more
complicated than what you proposed...

For example, consider the poster who said that, when this feature is
available, they plan to try ripping out their memcached instance and
replacing it with PostgreSQL running unlogged tables. Suppose this
poster (or someone else in a similar situation) has a 64 GB and is
currently running a 60 GB memcached instance on it, which is not an
unrealistic scenario for memcached. Suppose further that he dirties
25% of that data each hour. memcached is currently doing no writes to
disk. When he switches to PostgreSQL and sets checkpoints_segments to
a gazillion and checkpoint_timeout to the maximum, he's going to start
writing 15 GB of data to disk every hour - data which he clearly
doesn't care about losing, or preserving across restarts, because he's
currently storing it in memcached. In fact, with memcached, he'll not
only lose data at shutdown - he'll lose data on a regular basis when
everything is running normally. We can try to convince ourselves that
someone in this situation will not care about needing to get 15GB of
disposable data per hour from memory to disk in order to have a
feature that he doesn't need, but I think it's going to be pretty hard
to make that credible.

To really support that use case we would first need to make shared_buffers
properly scale to 64GB - which unfortunatley, in my experience, is not yet the
case.
Also, see the issues in the former paragraph - I have severe doubts you can
support such a memcached scenario by pg. Either you spill to disk if your
buffers overflow (fine with me) or you need to throw away data memcached alike. I
doubt there is a sensible implementation in pg for the latter.

So you will have to write to disk at some point...

Third use case. Someone on pgsql-general mentioned that they want to
write logs to PG, and can abide losing them if a crash happens, but
not on a clean shutdown and restart. This person clearly shuts down
their production database a lot more often than I do, but that is OK.
By explicit stipulation, they want the survive-a-clean-shutdown
behavior. I have no problem supporting that use case, providing they
are willing to take the associated performance penalty at checkpoint
time, which we don't know because we haven't asked, but I'm fine with
assuming it's useful even though I probably wouldn't use it much
myself.

Maybe I am missing something - but why does this imply we have to write data
at checkpoints?
Just fsyncing every file belonging to an persistently-unlogged (or whatever
sensible name anyone can come up) table is not prohibively expensive - in fact
doing that on a local $PGDATA with approx 300GB and loads of tables doing so
takes less than 15s on a system with hot inode/dentry cache and no dirty files.
(just `find $PGDATA -print0|xargs -0 fsync_many_files` with fsync_many_files
beeing a tiny c program doing posix_fadvise(POSIX_FADV_DONTNEED) on all files
and then fsyncs every one).
The assumption of a hot inode cache is realistic I think.

Andres

#52Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#45)
Re: unlogged tables

On Wed, Nov 17, 2010 at 2:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 11/17/2010 02:22 PM, Kenneth Marshall wrote:

I would be fine with only having a safe shutdown with unlogged tables
and skip the checkpoint I/O all other times.

Yeah, I was just thinking something like that would be good, and should
overcome Robert's objection to the whole idea.

I don't think you can fsync only in the shutdown checkpoint and assume
your data is safe, if you didn't fsync a write a few moments earlier.

Now, a few minutes ago Robert was muttering about supporting more than
one kind of degraded-reliability table.  I could see inventing
"unlogged" tables, which means exactly that (no xlog support, but we
still checkpoint/fsync as usual), and "unsynced" tables which
also/instead suppress fsync activity.  The former type could be assumed
to survive a clean shutdown/restart, while the latter wouldn't.  This
would let people pick their poison.

OK, so we're proposing a hierarchy like this.

1. PERMANENT (already exists). Permanent tables are WAL-logged,
participate in checkpoints, and are fsync'd. They survive crashes and
clean restarts, and are replicated.

2. UNLOGGED (what this patch currently implements). Unlogged tables
are not WAL-logged, but they do participate in checkpoints and they
are fsync'd on request. They survive clean restarts, but on a crash
they are truncated. They are not replicated.

3. UNSYNCED (future work). Unsynced tables are not WAL-logged, do not
participate in checkpoints, and are never fsync'd. After any sort of
crash or shutdown, clean or otherwise, they are truncated. They are
not replicated.

4. GLOBAL TEMPORARY (future work). Global temporary tables are not
WAL-logged, do not participate in checkpoints, and are never fsync'd.
The contents of each global temporary table are private to that
session, so that they can use the local buffer manager rather than
shared buffers. Multiple sessions can use a global temporary table at
the same time, and each sees separate contents. At session exit, any
contents inserted by the owning backend are lost; since all sessions
exit on crash or shutdown, all contents are also lost at that time.

5. LOCAL TEMPORARY (our current temp tables). Local temporary tables
are not WAL-logged, do not participate in checkpoints, and are never
fsync'd. The table definition and all of its contents are private to
the session, so that they are dropped at session exit (or at
transaction end if ON COMMIT DROP is used). Since all sessions exit
on crash or shutdown, all table definitions and all table contents are
lost at that time.

It's possible to imagine a few more stops on this hierarchy. For
example, you could have an ASYNCHRONOUS table between (1) and (2) that
always acts as if synchronous_commit=off, but is otherwise replicated
and durable over crashes; or a MINIMALLY LOGGED table that is XLOG'd
as if wal_level=minimal even when the actual value of wal_level is
otherwise, and is therefore crash-safe but not replication-safe; or a
level that is similar to unlogged but we XLOG the first event that
dirties a page after each checkpoint, and therefore even on a crash we
need only remove the tables for which such an XLOG record has been
written. All of those are a bit speculative perhaps but we could jam
them in there if there's demand, I suppose.

I don't particularly care for the name UNSYNCED, and I'm starting not
to like UNLOGGED much either, although at least that one is an actual
word. PERMANENT and the flavors of TEMPORARY are a reasonably
comprehensible as a description of user-visible behavior, but UNLOGGED
and UNSYNCED sounds a lot like they're discussing internal details
that the user might not actually understand or care about. I don't
have a better idea right off the top of my head, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#53Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#52)
Re: unlogged tables

On 11/17/2010 03:37 PM, Robert Haas wrote:

I don't particularly care for the name UNSYNCED, and I'm starting not
to like UNLOGGED much either, although at least that one is an actual
word. PERMANENT and the flavors of TEMPORARY are a reasonably
comprehensible as a description of user-visible behavior, but UNLOGGED
and UNSYNCED sounds a lot like they're discussing internal details
that the user might not actually understand or care about. I don't
have a better idea right off the top of my head, though.

Maybe VOLATILE for UNSYNCED? Not sure about UNLOGGED.

cheers

andrew

#54Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#51)
Re: unlogged tables

On Wed, Nov 17, 2010 at 3:35 PM, Andres Freund <andres@anarazel.de> wrote:

The customer is always right, but the informed customer makes better
decisions than the uninformed customer.  This idea, as proposed, does
not work.  If you only include dirty buffers at the final checkpoint
before shutting down, you have no guarantee that any buffers that you
either didn't write or didn't fsync previously are actually on disk.
Therefore, you have no guarantee that the table data is not corrupted.
 So you really have to decide between including the unlogged-table
buffers in EVERY checkpoint and not ever including them at all.  Which
one is right depends on your use case.

How can you get a buffer which was no written out *at all*? Do you want to
force all such pages to stay in shared_buffers? That sounds quite a bit more
complicated than what you proposed...

Oh, you're right. We always have to write buffers before kicking them
out of shared_buffers, but if we don't fsync them we have no guarantee
they're actually on disk.

For example, consider the poster who said that, when this feature is
available, they plan to try ripping out their memcached instance and
replacing it with PostgreSQL running unlogged tables.  Suppose this
poster (or someone else in a similar situation) has a 64 GB and is
currently running a 60 GB memcached instance on it, which is not an
unrealistic scenario for memcached.  Suppose further that he dirties
25% of that data each hour.  memcached is currently doing no writes to
disk.  When he switches to PostgreSQL and sets checkpoints_segments to
a gazillion and checkpoint_timeout to the maximum, he's going to start
writing 15 GB of data to disk every hour - data which he clearly
doesn't care about losing, or preserving across restarts, because he's
currently storing it in memcached.  In fact, with memcached, he'll not
only lose data at shutdown - he'll lose data on a regular basis when
everything is running normally.  We can try to convince ourselves that
someone in this situation will not care about needing to get 15GB of
disposable data per hour from memory to disk in order to have a
feature that he doesn't need, but I think it's going to be pretty hard
to make that credible.

To really support that use case we would first need to make shared_buffers
properly scale to 64GB - which unfortunatley, in my experience, is not yet the
case.

Well, that's something to aspire to. :-)

Also, see the issues in the former paragraph - I have severe doubts you can
support such a memcached scenario by pg. Either you spill to disk if your
buffers overflow (fine with me) or you need to throw away data memcached alike. I
doubt there is a sensible implementation in pg for the latter.

So you will have to write to disk at some point...

I agree that there are difficulties, but again, doing checkpoint I/O
for data that the user was willing to throw away is going in the wrong
direction.

Third use case.  Someone on pgsql-general mentioned that they want to
write logs to PG, and can abide losing them if a crash happens, but
not on a clean shutdown and restart.  This person clearly shuts down
their production database a lot more often than I do, but that is OK.
By explicit stipulation, they want the survive-a-clean-shutdown
behavior.  I have no problem supporting that use case, providing they
are willing to take the associated performance penalty at checkpoint
time, which we don't know because we haven't asked, but I'm fine with
assuming it's useful even though I probably wouldn't use it much
myself.

Maybe I am missing something - but why does this imply we have to write data
at checkpoints?
Just fsyncing every file belonging to an persistently-unlogged (or whatever
sensible name anyone can come up) table is not prohibively expensive - in fact
doing that on a local $PGDATA with approx 300GB and loads of tables doing so
takes less than 15s on a system with hot inode/dentry cache and no dirty files.
(just `find $PGDATA -print0|xargs -0 fsync_many_files` with fsync_many_files
beeing a tiny c program doing posix_fadvise(POSIX_FADV_DONTNEED) on all files
and then fsyncs every one).
The assumption of a hot inode cache is realistic I think.

Hmm. I don't really want to try to do it in this patch because it's
complicated enough already, but if people don't mind the shutdown
sequence potentially being slowed down a bit, that might allow us to
have the best of both worlds without needing to invent multiple
durability levels. I was sort of assuming that people wouldn't want
to slow down the shutdown sequence to avoid losing data they've
already declared isn't that valuable, but evidently I underestimated
the demand for kinda-durable tables. If the overhead of doing this
isn't too severe, it might be the way to go.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#55David Fetter
david@fetter.org
In reply to: Andrew Dunstan (#53)
Re: unlogged tables

On Wed, Nov 17, 2010 at 03:48:52PM -0500, Andrew Dunstan wrote:

On 11/17/2010 03:37 PM, Robert Haas wrote:

I don't particularly care for the name UNSYNCED, and I'm starting
not to like UNLOGGED much either, although at least that one is an
actual word. PERMANENT and the flavors of TEMPORARY are a
reasonably comprehensible as a description of user-visible
behavior, but UNLOGGED and UNSYNCED sounds a lot like they're
discussing internal details that the user might not actually
understand or care about. I don't have a better idea right off the
top of my head, though.

Maybe VOLATILE for UNSYNCED? Not sure about UNLOGGED.

+1 for describing the end-user-visible behavior.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#56Steve Crawford
scrawford@pinpointresearch.com
In reply to: Andrew Dunstan (#53)
Re: unlogged tables

On 11/17/2010 12:48 PM, Andrew Dunstan wrote:

Maybe VOLATILE for UNSYNCED? Not sure about UNLOGGED.

UNSAFE and EXTREMELY_UNSAFE?? :)

Cheers,
Steve

#57Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#52)
Re: unlogged tables

Robert Haas <robertmhaas@gmail.com> wrote:

OK, so we're proposing a hierarchy like this.

1. PERMANENT (already exists).

2. UNLOGGED (what this patch currently implements).

3. UNSYNCED (future work).

4. GLOBAL TEMPORARY (future work).

5. LOCAL TEMPORARY (our current temp tables).

All of the above would have real uses in our shop.

It's possible to imagine a few more stops on this hierarchy.

Some of these might be slightly preferred over the above in certain
circumstances, but that's getting down to fine tuning. I think the
five listed above are more important than the "speculative ones
mentioned.

I don't particularly care for the name UNSYNCED

EVANESCENT?

I'm starting not to like UNLOGGED much either

EPHEMERAL?

Actually, the UNSYNCED and UNLOGGED seem fairly clear....

-Kevin

#58A.M.
agentm@themactionfaction.com
In reply to: Kevin Grittner (#57)
Re: unlogged tables

On Nov 17, 2010, at 4:00 PM, Kevin Grittner wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

OK, so we're proposing a hierarchy like this.

1. PERMANENT (already exists).

2. UNLOGGED (what this patch currently implements).

3. UNSYNCED (future work).

4. GLOBAL TEMPORARY (future work).

5. LOCAL TEMPORARY (our current temp tables).

All of the above would have real uses in our shop.

It's possible to imagine a few more stops on this hierarchy.

Some of these might be slightly preferred over the above in certain
circumstances, but that's getting down to fine tuning. I think the
five listed above are more important than the "speculative ones
mentioned.

I don't particularly care for the name UNSYNCED

EVANESCENT?

I'm starting not to like UNLOGGED much either

EPHEMERAL?

Actually, the UNSYNCED and UNLOGGED seem fairly clear....

Unless one thinks that the types could be combined- perhaps a table declaration could use both UNLOGGED and UNSYNCED?

Cheers,
M

#59Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#57)
Re: unlogged tables

On Wed, Nov 17, 2010 at 4:00 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

OK, so we're proposing a hierarchy like this.

1. PERMANENT (already exists).

2. UNLOGGED (what this patch currently implements).

3. UNSYNCED (future work).

4. GLOBAL TEMPORARY (future work).

5. LOCAL TEMPORARY (our current temp tables).

All of the above would have real uses in our shop.

It's possible to imagine a few more stops on this hierarchy.

Some of these might be slightly preferred over the above in certain
circumstances, but that's getting down to fine tuning.  I think the
five listed above are more important than the "speculative ones
mentioned.

I don't particularly care for the name UNSYNCED

EVANESCENT?

I'm starting not to like UNLOGGED much either

EPHEMERAL?

Actually, the UNSYNCED and UNLOGGED seem fairly clear....

I think Andrew's suggestion of VOLATILE is pretty good. It's hard to
come up with multiple words that express gradations of "we might
decide to chuck your data if things go South", though. Then again if
we go with Andres's suggestion maybe we can get by with one level.

Or if we still end up with multiple levels, maybe it's best to use
VOLATILE for everything >1 and <4, and then have a subordinate clause
to specify gradations.

CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS
THAT EXPLAIN THE DETAILS GO HERE;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#60Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#54)
Re: unlogged tables

Excerpts from Robert Haas's message of mié nov 17 17:51:37 -0300 2010:

On Wed, Nov 17, 2010 at 3:35 PM, Andres Freund <andres@anarazel.de> wrote:

How can you get a buffer which was no written out *at all*? Do you want to
force all such pages to stay in shared_buffers? That sounds quite a bit more
complicated than what you proposed...

Oh, you're right. We always have to write buffers before kicking them
out of shared_buffers, but if we don't fsync them we have no guarantee
they're actually on disk.

You could just open all the segments and fsync them.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#61Andrew Dunstan
andrew@dunslane.net
In reply to: Kevin Grittner (#57)
Re: unlogged tables

On 11/17/2010 04:00 PM, Kevin Grittner wrote:

Actually, the UNSYNCED and UNLOGGED seem fairly clear....

I think Robert's right. These names won't convey much to someone not
steeped in our technology.

cheers

andrew

#62David Fetter
david@fetter.org
In reply to: Robert Haas (#59)
Re: unlogged tables

On Wed, Nov 17, 2010 at 04:05:56PM -0500, Robert Haas wrote:

On Wed, Nov 17, 2010 at 4:00 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

OK, so we're proposing a hierarchy like this.

1. PERMANENT (already exists).

2. UNLOGGED (what this patch currently implements).

3. UNSYNCED (future work).

4. GLOBAL TEMPORARY (future work).

5. LOCAL TEMPORARY (our current temp tables).

All of the above would have real uses in our shop.

It's possible to imagine a few more stops on this hierarchy.

Some of these might be slightly preferred over the above in certain
circumstances, but that's getting down to fine tuning. �I think the
five listed above are more important than the "speculative ones
mentioned.

I don't particularly care for the name UNSYNCED

EVANESCENT?

I'm starting not to like UNLOGGED much either

EPHEMERAL?

Actually, the UNSYNCED and UNLOGGED seem fairly clear....

I think Andrew's suggestion of VOLATILE is pretty good. It's hard to
come up with multiple words that express gradations of "we might
decide to chuck your data if things go South", though. Then again if
we go with Andres's suggestion maybe we can get by with one level.

Or if we still end up with multiple levels, maybe it's best to use
VOLATILE for everything >1 and <4, and then have a subordinate clause
to specify gradations.

CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS
THAT EXPLAIN THE DETAILS GO HERE;

How about something like:

OPTIONS (SYNC=no, LOG=no, ... )

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#63Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#59)
Re: unlogged tables

Excerpts from Robert Haas's message of mié nov 17 18:05:56 -0300 2010:

CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS
THAT EXPLAIN THE DETAILS GO HERE;

What about some reloptions?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#64Steve Crawford
scrawford@pinpointresearch.com
In reply to: Tom Lane (#47)
Re: unlogged tables

On 11/17/2010 11:44 AM, Tom Lane wrote:

...because a backend crash has to be assumed to have corrupted
unlogged tables...

So in a typical use-case, say storing session data on a web-site, one
crashed backend could wreck sessions for some or all of the site? Is
there a mechanism in the proposal that would allow a client to determine
the state of a table (good, truncated, wrecked, etc.)?

Cheers,
Steve

#65Joshua D. Drake
jd@commandprompt.com
In reply to: David Fetter (#62)
Re: unlogged tables

I don't particularly care for the name UNSYNCED

EVANESCENT?

I'm starting not to like UNLOGGED much either

EPHEMERAL?

Actually, the UNSYNCED and UNLOGGED seem fairly clear....

Uhhh yeah. Let's not break out the thesaurus for this.

JD
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#66Dimitri Fontaine
dfontaine@hi-media.com
In reply to: Robert Haas (#59)
Re: unlogged tables

CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS
THAT EXPLAIN THE DETAILS GO HERE;

[ TRUNCATE ON RESTART ]

Your patch implements this option, right?

Regards,

--
dim

#67Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#66)
Re: unlogged tables

On Thu, Nov 18, 2010 at 3:07 AM, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:

CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS
THAT EXPLAIN THE DETAILS GO HERE;

[ TRUNCATE ON RESTART ]

Your patch implements this option, right?

Yeah.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#68Andy Colson
andy@squeakycode.net
In reply to: Robert Haas (#5)
Re: unlogged tables

I have done a bunch of benchmarking. It was not easy to find consistent numbers, so I picked a job and ran the same thing over and over.

I'm running Slackware 13.1 on a desktop computer.

Linux storm 2.6.35.7-smp #1 SMP Sun Oct 10 21:43:07 CDT 2010 i686 AMD Athlon(tm) 7850 Dual-Core Processor AuthenticAMD GNU/Linux

Database on:
/dev/sda2 on /pub type ext4 (rw,noatime)

I started with stock, unpatched, pg 9.1, and ran pg_bench. I used several scale's and always set the # connections at half the scale. (so scale 20 used 10 connections). I ran all tests for 180 seconds. autovacuum was always off, and I ran "vacuum -z" between each pg_bench.

each block of numbers has these columns: scale, test 1, test 2, test 3, avg
So the first line below: 6, 96, 105, 102, 101
means:
pg_becnh -i -s 6
pg_bench -c 3 -T 180
vacuum -z
pg_bench -c 3 -T 180
vacuum -z
pg_bench -c 3 -T 180

result times for the three runs 96, 105 and 102 seconds, with average 101 seconds.

The LOGS test is importing 61+ million rows of apache logs. Its a perl script, uses COPY over many many files. Each file is commit separate.

checkpoint_segments = 7
shared_buffers = 512MB
effective_cache_size = 1024MB
autovacuum off

fsync on
synchronous_commit on
full_page_writes on
bgwriter_lru_maxpages 100
180 second tests

scale, test 1, test 2, test 3, avg
6, 96, 105, 102, 101
20, 120, 82, 76, 93
40, 73, 42, 43, 53
80, 50, 29, 35, 38

synchronous_commit off
6, 239, 676, 614, 510
20, 78, 47, 56, 60
40, 59, 35, 41, 45
80, 53, 30, 35, 39

LOGS: ~ 3,900 ins/sec (I didnt record this well, its sort of a guess)

synchronous_commit off
full_page_writes off
6, 1273, 1344, 1287, 1301
20, 1323, 1307, 1313, 1314
40, 1051, 872, 702, 875
80, 551, 206, 245, 334

LOGS (got impatient and killed it)
Total rows: 20,719,095
Total Seconds: 5,279.74
Total ins/sec: 3,924.25

fsync off
synchronous_commit off
full_page_writes off
bgwriter_lru_maxpages 0
6, 3622, 2940, 2879, 3147
20, 2860, 2952, 2939, 2917
40, 2204, 2143, 2349, 2232
80, 1394, 1043, 1085, 1174

LOG (this is a full import)
Total rows: 61,467,489
Total Seconds: 1,240.93
Total ins/sec: 49,533.37

------- Apply unlogged patches and recompile, re-initdb ---
I patched pg_bench to run with either normal or unlogged tables

fsync on
synchronous_commit on
full_page_writes on
bgwriter_lru_maxpages 100
180 second tests

normal tables
6, 101, 102, 108, 103
20, 110, 71, 90, 90
40, 83, 45, 49, 59
80, 50, 34, 30, 38

LOGS (partial import)
Total rows: 24,754,871
Total Seconds: 6,058.03
Total ins/sec: 4,086.28

unlogged tables
6, 2966, 3047, 3007, 3006
20, 2767, 2515, 2708, 2663
40, 1933, 1311, 1464, 1569
80, 837, 552, 579, 656

LOGS (full import)
Total rows: 61,467,489
Total Seconds: 1,126.75
Total ins/sec: 54,552.60

After all this... there are too many numbers for me. I have no idea what this means.

-Andy

#69Robert Haas
robertmhaas@gmail.com
In reply to: Andy Colson (#68)
Re: unlogged tables

On Sun, Nov 21, 2010 at 11:07 PM, Andy Colson <andy@squeakycode.net> wrote:

After all this... there are too many numbers for me.  I have no idea what
this means.

I think what it means that is that, for you, unlogged tables were
almost as fast as shutting off all of synchronous_commit,
full_page_writes, and fsync, and further setting
bgwriter_lru_maxpages=0. Now, that seems a little strange, because
you'd think if anything it would be faster. I'm not sure what
accounts for the difference, although I wonder if checkpoints are part
of it. With the current code, which doesn't exclude unlogged table
pages from checkpoints, a checkpoint will still be faster with
fsync=off than with unlogged tables. It seems like we're agreed that
this is a problem to be fixed in phase two, though, either by fsyncing
every unlogged table we can find at shutdown time, or else by
providing two durability options, one that works as the current code
does (but survives clean shutdowns) and another that excludes dirty
pages from checkpoints (and does not survive clean shutdowns).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company