unlogged tables
I have played around a little more, and think I found a problem.
If given enough time, an unlogged table makes it to disk, and a restart wont clear the data. If I insert a bunch of stuff, commit, and quickly restart PG, it table is cleared. If I let it sit for a while, it stays.
Based on that, I have a pgbench_accounts table (unlogged) that after a restart has data in it.
andy=# select aid, bid, abalance from pgbench_accounts where abalance = 3305;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
274130 | 3 | 3305
2169892 | 22 | 3305
705321 | 8 | 3305
4463145 | 45 | 3305
I dropped the index, and added a new one, then restart PG. Now it seems the index is empty/unusable.
andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
-----+-----+----------
(0 rows)
andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
16384
Lets recreate it:
andy=# drop index bob;
DROP INDEX
Time: 13.829 ms
andy=# create index bob on pgbench_accounts(aid, bid);
CREATE INDEX
Time: 17215.859 ms
andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
(1 row)
Time: 0.712 ms
andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
179716096
I also did kill -9 on all the postgres* processes, while they were busy inserting records, to try to corrupt the database. But could not seem to. Setting fsync off also did not give me errors, but I assume because I was using unlogged tables, and they were all getting cleared anyway, I never saw them.
With fsync off and normal tables, I got bad looking things in my logs and vacuum:
LOG: unexpected pageaddr 1/AB1D6000 in log file 1, segment 187, offset 1925120
WARNING: relation "access" page 28184 is uninitialized --- fixing
etc...
AND last, I tried to update my git repo and see if the patches still work. They do not.
There was much discussion on the syntax:
create unlogged table vs create temp xxx table vs something else.
There was much discussion on how persistent the tables should be. And some on backups.
At this point, though, I find myself at an end, not sure what else to do until the dust settles.
Oh, also, I wanted to add:
There is \h help: +1
but I can find no way of determining the "tempness"/"unloggedness" of a table via \d*
The only way I found was to "pg_dump -s"
I will attempt to link this to the website, and mark it as returned to author.
-Andy
On Tue, Nov 30, 2010 at 10:36 PM, Andy Colson <andy@squeakycode.net> wrote:
Based on that, I have a pgbench_accounts table (unlogged) that after a
restart has data in it.andy=# select aid, bid, abalance from pgbench_accounts where abalance =
3305;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
274130 | 3 | 3305
2169892 | 22 | 3305
705321 | 8 | 3305
4463145 | 45 | 3305I dropped the index, and added a new one, then restart PG. Now it seems the
index is empty/unusable.andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
-----+-----+----------
(0 rows)andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
16384Lets recreate it:
andy=# drop index bob;
DROP INDEX
Time: 13.829 ms
andy=# create index bob on pgbench_accounts(aid, bid);
CREATE INDEX
Time: 17215.859 ms
andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
(1 row)Time: 0.712 ms
andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
179716096
This appears as though you've somehow gotten a normal table connected
to an unlogged index. That certainly sounds like a bug, but there's
not enough details here to figure out what series of steps I should
perform to recreate the problem.
AND last, I tried to update my git repo and see if the patches still work.
They do not.
Updated patches attached.
Oh, also, I wanted to add:
There is \h help: +1
but I can find no way of determining the "tempness"/"unloggedness" of a
table via \d*
It's clearly displayed in the \d output.
Unlogged Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
a | integer | not null
Indexes:
"test_pkey" PRIMARY KEY, btree (a)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
unlogged-tables-v3.patchapplication/octet-stream; name=unlogged-tables-v3.patchDownload
commit 6d103445572b6596d39dde48c9d5290245f76e8e
Author: Robert Haas <rhaas@postgresql.org>
Date: Sat Nov 13 08:30:55 2010 -0500
Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
crash-safe and do not appear on standby servers. On restart, they are
truncated.
Currently, only btree indexes are support on unlogged tables.
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 925aac4..c599b95 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -167,6 +167,17 @@ ambuild (Relation heapRelation,
<para>
<programlisting>
+void
+ambuildempty (Relation indexRelation);
+</programlisting>
+ Build an empty index, and write it to the initialization fork (INIT_FORKNUM)
+ of the given relation. This method is called only for unlogged tables; the
+ empty index written to the initialization fork will be copied over the main
+ relation fork on each server restart.
+ </para>
+
+ <para>
+<programlisting>
bool
aminsert (Relation indexRelation,
Datum *values,
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 8635e80..7b0e14d 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
{ <replaceable class="PARAMETER">column_name</replaceable> <replaceable class="PARAMETER">data_type</replaceable> [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
| <replaceable>table_constraint</replaceable>
| LIKE <replaceable>parent_table</replaceable> [ <replaceable>like_option</replaceable> ... ] }
@@ -32,7 +32,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="PARAMETER">tablespace</replaceable> ]
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
OF <replaceable class="PARAMETER">type_name</replaceable> [ (
{ <replaceable class="PARAMETER">column_name</replaceable> WITH OPTIONS [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
| <replaceable>table_constraint</replaceable> }
@@ -164,6 +164,22 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
</varlistentry>
<varlistentry>
+ <term><literal>UNLOGGED</></term>
+ <listitem>
+ <para>
+ If specified, the table is created as an unlogged table. Data written
+ to unlogged tables is not written to the write-ahead log (see <xref
+ linkend="wal">), which makes them considerably faster than ordinary
+ tables. However, it also means that the data stored in the tables is not
+ copied to standby servers and does not survive if
+ <productname>PostgreSQL</productname> is restarted. Unlogged tables are
+ automatically truncated on restart. Any indexes created on an unlogged
+ table are automatically unlogged as well.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>IF NOT EXISTS</></term>
<listitem>
<para>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 3a256d1..ff71078 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE <replaceable>table_name</replaceable>
[ (<replaceable>column_name</replaceable> [, ...] ) ]
[ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -82,6 +82,16 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name
</varlistentry>
<varlistentry>
+ <term><literal>UNLOGGED</></term>
+ <listitem>
+ <para>
+ If specified, the table is created as an unlogged table.
+ Refer to <xref linkend="sql-createtable"> for details.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable>table_name</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 8681ede..7ec12b0 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -412,6 +412,19 @@ ginbuild(PG_FUNCTION_ARGS)
}
/*
+ * ginbuildempty() -- build an empty gin index in the initialization fork
+ */
+Datum
+ginbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged GIN indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* Inserts value during normal insertion
*/
static uint32
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 6693730..b31ec0b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -208,6 +208,19 @@ gistbuildCallback(Relation index,
}
/*
+ * gistbuildempty() -- build an empty gist index in the initialization fork
+ */
+Datum
+gistbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged GIST indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* gistinsert -- wrapper for GiST tuple insertion.
*
* This is the public interface routine for tuple insertion in GiSTs.
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index bb46446..cbe8682 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -114,6 +114,19 @@ hashbuild(PG_FUNCTION_ARGS)
}
/*
+ * hashbuildempty() -- build an empty hash index in the initialization fork
+ */
+Datum
+hashbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged hash indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* Per-tuple callback from IndexBuildHeapScan
*/
static void
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 46aeb9e..6ccc16d 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -29,6 +29,7 @@
#include "storage/indexfsm.h"
#include "storage/ipc.h"
#include "storage/lmgr.h"
+#include "storage/smgr.h"
#include "utils/memutils.h"
@@ -205,6 +206,36 @@ btbuildCallback(Relation index,
}
/*
+ * btbuildempty() -- build an empty btree index in the initialization fork
+ */
+Datum
+btbuildempty(PG_FUNCTION_ARGS)
+{
+ Relation index = (Relation) PG_GETARG_POINTER(0);
+ Page metapage;
+
+ /* Construct metapage. */
+ metapage = (Page) palloc(BLCKSZ);
+ _bt_initmetapage(metapage, P_NONE, 0);
+
+ /* Write the page. If archiving/streaming, XLOG it. */
+ smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
+ (char *) metapage, true);
+ if (XLogIsNeeded())
+ log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ BTREE_METAPAGE, metapage);
+
+ /*
+ * An immediate sync is require even if we xlog'd the page, because the
+ * write did not go through shared_buffers and therefore a concurrent
+ * checkpoint may have move the redo pointer past our xlog record.
+ */
+ smgrimmedsync(index->rd_smgr, INIT_FORKNUM);
+
+ PG_RETURN_VOID();
+}
+
+/*
* btinsert() -- insert an index tuple into a btree.
*
* Descend the tree recursively, find the appropriate location for our
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ede6ceb..0936f92 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -49,6 +49,7 @@
#include "storage/latch.h"
#include "storage/pmsignal.h"
#include "storage/procarray.h"
+#include "storage/reinit.h"
#include "storage/smgr.h"
#include "storage/spin.h"
#include "utils/builtins.h"
@@ -5996,6 +5997,16 @@ StartupXLOG(void)
InRecovery = true;
}
+ /*
+ * Blow away any leftover data in unlogged relations. This should be
+ * done BEFORE starting up Hot Standby, so that read-only backends don't
+ * see residual data from a previous startup. If redo isn't required or
+ * Hot Standby isn't enabled, we could do both the
+ * UNLOGGED_RELATION_CLEANUP and UNLOGGED_RELATION_INIT phases in once
+ * pass later on ... but for now, we don't bother to detect that case.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
/* REDO */
if (InRecovery)
{
@@ -6524,6 +6535,13 @@ StartupXLOG(void)
PreallocXlogFiles(EndOfLog);
/*
+ * Reset initial contents of unlogged relations. This has to be done
+ * AFTER recovery is complete so that any unlogged relations created
+ * during recovery also get picked up.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+ /*
* Okay, we're officially UP.
*/
InRecovery = false;
@@ -7024,6 +7042,14 @@ ShutdownXLOG(int code, Datum arg)
ShutdownSUBTRANS();
ShutdownMultiXact();
+ /*
+ * Remove any unlogged relation contents. This will happen anyway at
+ * the next startup; the point of doing it here is to avoid consuming
+ * a potentially large amount of disk space while we're shut down, for
+ * data that will be discarded anyway.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
ereport(LOG,
(errmsg("database system is shut down")));
}
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 88b5c2a..fc5a8fc 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -55,7 +55,8 @@
const char *forkNames[] = {
"main", /* MAIN_FORKNUM */
"fsm", /* FSM_FORKNUM */
- "vm" /* VISIBILITYMAP_FORKNUM */
+ "vm", /* VISIBILITYMAP_FORKNUM */
+ "init" /* INIT_FORKNUM */
};
/*
@@ -82,14 +83,14 @@ forkname_to_number(char *forkName)
* We use this to figure out whether a filename could be a relation
* fork (as opposed to an oddly named stray file that somehow ended
* up in the database directory). If the passed string begins with
- * a fork name (other than the main fork name), we return its length.
- * If not, we return 0.
+ * a fork name (other than the main fork name), we return its length,
+ * and set *fork (if not NULL) to the fork number. If not, we return 0.
*
* Note that the present coding assumes that there are no fork names which
* are prefixes of other fork names.
*/
int
-forkname_chars(const char *str)
+forkname_chars(const char *str, ForkNumber *fork)
{
ForkNumber forkNum;
@@ -97,7 +98,11 @@ forkname_chars(const char *str)
{
int len = strlen(forkNames[forkNum]);
if (strncmp(forkNames[forkNum], str, len) == 0)
+ {
+ if (fork)
+ *fork = forkNum;
return len;
+ }
}
return 0;
}
@@ -537,6 +542,7 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
case RELPERSISTENCE_TEMP:
backend = MyBackendId;
break;
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
break;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index bcf6caa..65abac5 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -318,8 +318,8 @@ heap_create(const char *relname,
/*
* Have the storage manager create the relation's disk file, if needed.
*
- * We only create the main fork here, other forks will be created on
- * demand.
+ * We only create the main fork here, other forks will be created as
+ * needed.
*/
if (create_storage)
{
@@ -1211,6 +1211,41 @@ heap_create_with_catalog(const char *relname,
register_on_commit_action(relid, oncommit);
/*
+ * If this is an unlogged relation, it needs an init fork so that it
+ * can be correctly reinitialized on restart.
+ */
+ if (relpersistence == RELPERSISTENCE_UNLOGGED)
+ {
+ Page dummypage;
+
+ Assert(relkind == RELKIND_RELATION || relkind == RELKIND_TOASTVALUE);
+
+ /*
+ * Technically, we just write an empty file here, but then there's
+ * nothing to XLOG. We could introduce a dedicated XLOG record to
+ * create an empty relation fork, but it's easier to just
+ * XLOG a blank page, which (during redo) will create the fork
+ * automatically.
+ */
+ dummypage = (Page) palloc0(BLCKSZ);
+
+ /* Create form, write page. If archiving/streaming, XLOG it. */
+ smgrcreate(new_rel_desc->rd_smgr, INIT_FORKNUM, false);
+ smgrwrite(new_rel_desc->rd_smgr, INIT_FORKNUM, 0,
+ (char *) dummypage, true);
+ if (XLogIsNeeded())
+ log_newpage(&new_rel_desc->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ 0, dummypage);
+
+ /*
+ * An immediate sync is require even if we xlog'd the page, because the
+ * write did not go through shared_buffers and therefore a concurrent
+ * checkpoint may have move the redo pointer past our xlog record.
+ */
+ smgrimmedsync(new_rel_desc->rd_smgr, INIT_FORKNUM);
+ }
+
+ /*
* ok, the relation has been cataloged, so close our relations and return
* the OID of the newly created relation.
*/
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8fbe8eb..22f0959 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -967,6 +967,17 @@ index_create(Oid heapRelationId,
}
/*
+ * If this is an unlogged index, we need to write out an init fork for it.
+ */
+ if (relpersistence == RELPERSISTENCE_UNLOGGED)
+ {
+ RegProcedure ambuildempty = indexRelation->rd_am->ambuildempty;
+ RelationOpenSmgr(indexRelation);
+ smgrcreate(indexRelation->rd_smgr, INIT_FORKNUM, false);
+ OidFunctionCall1(ambuildempty, PointerGetDatum(indexRelation));
+ }
+
+ /*
* Close the heap and index; but we keep the locks that we acquired above
* until end of transaction.
*/
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 671aaff..34ec77d 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -111,6 +111,10 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
backend = MyBackendId;
needs_wal = false;
break;
+ case RELPERSISTENCE_UNLOGGED:
+ backend = InvalidBackendId;
+ needs_wal = false;
+ break;
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
needs_wal = true;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8fc79b6..c1dce3c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -536,8 +536,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P
- UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNTIL
- UPDATE USER USING
+ UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
+ UNTIL UPDATE USER USING
VACUUM VALID VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
VERBOSE VERSION_P VIEW VOLATILE
@@ -2355,6 +2355,7 @@ OptTemp: TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
| LOCAL TEMP { $$ = RELPERSISTENCE_TEMP; }
| GLOBAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
| GLOBAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | UNLOGGED { $$ = RELPERSISTENCE_UNLOGGED; }
| /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; }
;
@@ -7917,6 +7918,11 @@ OptTempTableName:
$$ = $4;
$$->relpersistence = RELPERSISTENCE_TEMP;
}
+ | UNLOGGED opt_table qualified_name
+ {
+ $$ = $3;
+ $$->relpersistence = RELPERSISTENCE_UNLOGGED;
+ }
| TABLE qualified_name
{
$$ = $2;
@@ -11383,6 +11389,7 @@ unreserved_keyword:
| UNENCRYPTED
| UNKNOWN
| UNLISTEN
+ | UNLOGGED
| UNTIL
| UPDATE
| VACUUM
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 3b93aa1..d2198f2 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/storage/file
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = fd.o buffile.o copydir.o
+OBJS = fd.o buffile.o copydir.o reinit.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 4a10563..5af64d7 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -38,7 +38,6 @@
#endif
-static void copy_file(char *fromfile, char *tofile);
static void fsync_fname(char *fname, bool isdir);
@@ -142,7 +141,7 @@ copydir(char *fromdir, char *todir, bool recurse)
/*
* copy one file
*/
-static void
+void
copy_file(char *fromfile, char *tofile)
{
char *buffer;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index fd5ec78..b218f70 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2054,7 +2054,7 @@ looks_like_temp_rel_name(const char *name)
/* We might have _forkname or .segment or both. */
if (name[pos] == '_')
{
- int forkchar = forkname_chars(&name[pos+1]);
+ int forkchar = forkname_chars(&name[pos+1], NULL);
if (forkchar <= 0)
return false;
pos += forkchar + 1;
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
new file mode 100644
index 0000000..b75178b
--- /dev/null
+++ b/src/backend/storage/file/reinit.c
@@ -0,0 +1,396 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.c
+ * Reinitialization of unlogged relations
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/reinit.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/catalog.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "storage/reinit.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+
+static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
+ int op);
+static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
+ int op);
+static bool parse_filename_for_nontemp_relation(const char *name,
+ int *oidchars, ForkNumber *fork);
+
+typedef struct {
+ char oid[OIDCHARS+1];
+} unlogged_relation_entry;
+
+/*
+ * Reset unlogged relations from before the last restart.
+ *
+ * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any
+ * relation with an "init" fork, except for the "init" fork itself.
+ *
+ * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main
+ * fork.
+ */
+void
+ResetUnloggedRelations(int op)
+{
+ char temp_path[MAXPGPATH];
+ DIR *spc_dir;
+ struct dirent *spc_de;
+ MemoryContext tmpctx, oldctx;
+
+ /* Log it. */
+ ereport(DEBUG1,
+ (errmsg("resetting unlogged relations: cleanup %d init %d",
+ (op & UNLOGGED_RELATION_CLEANUP) != 0,
+ (op & UNLOGGED_RELATION_INIT) != 0)));
+
+ /*
+ * Just to be sure we don't leak any memory, let's create a temporary
+ * memory context for this operation.
+ */
+ tmpctx = AllocSetContextCreate(CurrentMemoryContext,
+ "ResetUnloggedRelations",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldctx = MemoryContextSwitchTo(tmpctx);
+
+ /*
+ * First process unlogged files in pg_default ($PGDATA/base)
+ */
+ ResetUnloggedRelationsInTablespaceDir("base", op);
+
+ /*
+ * Cycle through directories for all non-default tablespaces.
+ */
+ spc_dir = AllocateDir("pg_tblspc");
+
+ while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL)
+ {
+ if (strcmp(spc_de->d_name, ".") == 0 ||
+ strcmp(spc_de->d_name, "..") == 0)
+ continue;
+
+ snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
+ spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
+ ResetUnloggedRelationsInTablespaceDir(temp_path, op);
+ }
+
+ FreeDir(spc_dir);
+
+ /*
+ * Restore memory context.
+ */
+ MemoryContextSwitchTo(oldctx);
+ MemoryContextDelete(tmpctx);
+}
+
+/* Process one tablespace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
+{
+ DIR *ts_dir;
+ struct dirent *de;
+ char dbspace_path[MAXPGPATH];
+
+ ts_dir = AllocateDir(tsdirname);
+ if (ts_dir == NULL)
+ {
+ /* anything except ENOENT is fishy */
+ if (errno != ENOENT)
+ elog(LOG,
+ "could not open tablespace directory \"%s\": %m",
+ tsdirname);
+ return;
+ }
+
+ while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
+ {
+ int i = 0;
+
+ /*
+ * We're only interested in the per-database directories, which have
+ * numeric names. Note that this code will also (properly) ignore "."
+ * and "..".
+ */
+ while (isdigit((unsigned char) de->d_name[i]))
+ ++i;
+ if (de->d_name[i] != '\0' || i == 0)
+ continue;
+
+ snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
+ tsdirname, de->d_name);
+ ResetUnloggedRelationsInDbspaceDir(dbspace_path, op);
+ }
+
+ FreeDir(ts_dir);
+}
+
+/* Process one per-dbspace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
+{
+ DIR *dbspace_dir;
+ struct dirent *de;
+ char rm_path[MAXPGPATH];
+
+ /* Caller must specify at least one operation. */
+ Assert((op & (UNLOGGED_RELATION_CLEANUP | UNLOGGED_RELATION_INIT)) != 0);
+
+ /*
+ * Cleanup is a two-pass operation. First, we go through and identify all
+ * the files with init forks. Then, we go through again and nuke
+ * everything with the same OID except the init fork.
+ */
+ if ((op & UNLOGGED_RELATION_CLEANUP) != 0)
+ {
+ HTAB *hash = NULL;
+ HASHCTL ctl;
+
+ /* Open the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ return;
+ }
+
+ /*
+ * It's possible that someone could create a ton of unlogged relations
+ * in the same database & tablespace, so we'd better use a hash table
+ * rather than an array or linked list to keep track of which files
+ * need to be reset. Otherwise, this cleanup operation would be
+ * O(n^2).
+ */
+ ctl.keysize = sizeof(unlogged_relation_entry);
+ ctl.entrysize = sizeof(unlogged_relation_entry);
+ hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /*
+ * Put the OID portion of the name into the hash table, if it isn't
+ * already.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_ENTER, NULL);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+
+ /*
+ * If we didn't find any init forks, there's no point in continuing;
+ * we can bail out now.
+ */
+ if (hash_get_num_entries(hash) == 0)
+ {
+ hash_destroy(hash);
+ return;
+ }
+
+ /*
+ * Now, make a second pass and remove anything that matches. First,
+ * reopen the directory.
+ */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ hash_destroy(hash);
+ return;
+ }
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ bool found;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* We never remove the init fork. */
+ if (forkNum == INIT_FORKNUM)
+ continue;
+
+ /*
+ * See whether the OID portion of the name shows up in the hash
+ * table.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_FIND, &found);
+
+ /* If so, nuke it! */
+ if (found)
+ {
+ snprintf(rm_path, sizeof(rm_path), "%s/%s",
+ dbspacedirname, de->d_name);
+ /*
+ * It's tempting to actually throw an error here, but since
+ * this code gets run during database startup, that could
+ * result in the database failing to start. (XXX Should we do
+ * it anyway?)
+ */
+ if (unlink(rm_path))
+ elog(LOG, "could not unlink file \"%s\": %m", rm_path);
+ else
+ elog(DEBUG2, "unlinked file \"%s\"", rm_path);
+ }
+ }
+
+ /* Cleanup is complete. */
+ FreeDir(dbspace_dir);
+ hash_destroy(hash);
+ }
+
+ /*
+ * Initialization happens after cleanup is complete: we copy each init
+ * fork file to the corresponding main fork file. Note that if we are
+ * asked to do both cleanup and init, we may never get here: if the cleanup
+ * code determines that there are no init forks in this dbspace, it will
+ * return before we get to this point.
+ */
+ if ((op & UNLOGGED_RELATION_INIT) != 0)
+ {
+ /* Open the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ /* we just saw this directory, so it really ought to be there */
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ return;
+ }
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ char oidbuf[OIDCHARS+1];
+ char srcpath[MAXPGPATH];
+ char dstpath[MAXPGPATH];
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /* Construct source pathname. */
+ snprintf(srcpath, sizeof(srcpath), "%s/%s",
+ dbspacedirname, de->d_name);
+
+ /* Construct destination pathname. */
+ memcpy(oidbuf, de->d_name, oidchars);
+ oidbuf[oidchars] = '\0';
+ snprintf(dstpath, sizeof(dstpath), "%s/%s%s",
+ dbspacedirname, oidbuf, de->d_name + oidchars + 1 +
+ strlen(forkNames[INIT_FORKNUM]));
+
+ /* OK, we're ready to perform the actual copy. */
+ elog(DEBUG2, "copying %s to %s", srcpath, dstpath);
+ copy_file(srcpath, dstpath);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+ }
+}
+
+/*
+ * Basic parsing of putative relation filenames.
+ *
+ * This funtion returns true if the file appears to be in the correct format
+ * for a non-temporary relation and false otherwise.
+ *
+ * NB: If this function returns true, the caller is entitled to assume that
+ * *oidchars has been set to the a value no more than OIDCHARS, and thus
+ * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
+ * portion of the filename. This is critical to protect against a possible
+ * buffer overrun.
+ */
+static bool
+parse_filename_for_nontemp_relation(const char *name, int *oidchars,
+ ForkNumber *fork)
+{
+ int pos;
+
+ /* Look for a non-empty string of digits (that isn't too long). */
+ for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
+ ;
+ if (pos == 0 || pos > OIDCHARS)
+ return false;
+ *oidchars = pos;
+
+ /* Check for a fork name. */
+ if (name[pos] != '_')
+ *fork = MAIN_FORKNUM;
+ else
+ {
+ int forkchar;
+
+ forkchar = forkname_chars(&name[pos+1], fork);
+ if (forkchar <= 0)
+ return false;
+ pos += forkchar + 1;
+ }
+
+ /* Check for a segment number. */
+ if (name[pos] == '.')
+ {
+ int segchar;
+ for (segchar = 1; isdigit((unsigned char) name[pos+segchar]); ++segchar)
+ ;
+ if (segchar <= 1)
+ return false;
+ pos += segchar;
+ }
+
+ /* Now we should be at the end. */
+ if (name[pos] != '\0')
+ return false;
+ return true;
+}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e352cda..f33c29e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -615,6 +615,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
/* Determine owning backend. */
switch (relform->relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
break;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 1509686..fa9e9ca 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -851,6 +851,7 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
switch (relation->rd_rel->relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
relation->rd_backend = InvalidBackendId;
break;
@@ -2490,6 +2491,7 @@ RelationBuildLocalRelation(const char *relname,
rel->rd_rel->relpersistence = relpersistence;
switch (relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
rel->rd_backend = InvalidBackendId;
break;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 66274b4..065d3a4 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3447,6 +3447,7 @@ getTables(int *numTables)
int i_relhasrules;
int i_relhasoids;
int i_relfrozenxid;
+ int i_relpersistence;
int i_owning_tab;
int i_owning_col;
int i_reltablespace;
@@ -3477,7 +3478,7 @@ getTables(int *numTables)
* we cannot correctly identify inherited columns, owned sequences, etc.
*/
- if (g_fout->remoteVersion >= 90000)
+ if (g_fout->remoteVersion >= 90100)
{
/*
* Left join to pick up dependency info linking sequences to their
@@ -3489,7 +3490,40 @@ getTables(int *numTables)
"(%s c.relowner) AS rolname, "
"c.relchecks, c.relhastriggers, "
"c.relhasindex, c.relhasrules, c.relhasoids, "
- "c.relfrozenxid, "
+ "c.relfrozenxid, c.relpersistence, "
+ "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
+ "d.refobjid AS owning_tab, "
+ "d.refobjsubid AS owning_col, "
+ "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
+ "array_to_string(c.reloptions, ', ') AS reloptions, "
+ "array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ', ') AS toast_reloptions "
+ "FROM pg_class c "
+ "LEFT JOIN pg_depend d ON "
+ "(c.relkind = '%c' AND "
+ "d.classid = c.tableoid AND d.objid = c.oid AND "
+ "d.objsubid = 0 AND "
+ "d.refclassid = c.tableoid AND d.deptype = 'a') "
+ "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "WHERE c.relkind in ('%c', '%c', '%c', '%c') "
+ "ORDER BY c.oid",
+ username_subquery,
+ RELKIND_SEQUENCE,
+ RELKIND_RELATION, RELKIND_SEQUENCE,
+ RELKIND_VIEW, RELKIND_COMPOSITE_TYPE);
+ }
+ else if (g_fout->remoteVersion >= 90000)
+ {
+ /*
+ * Left join to pick up dependency info linking sequences to their
+ * owning column, if any (note this dependency is AUTO as of 8.2)
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.tableoid, c.oid, c.relname, "
+ "c.relacl, c.relkind, c.relnamespace, "
+ "(%s c.relowner) AS rolname, "
+ "c.relchecks, c.relhastriggers, "
+ "c.relhasindex, c.relhasrules, c.relhasoids, "
+ "c.relfrozenxid, 'p' AS relpersistence, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3522,7 +3556,7 @@ getTables(int *numTables)
"(%s c.relowner) AS rolname, "
"c.relchecks, c.relhastriggers, "
"c.relhasindex, c.relhasrules, c.relhasoids, "
- "c.relfrozenxid, "
+ "c.relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3555,7 +3589,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "relfrozenxid, "
+ "relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3587,7 +3621,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3619,7 +3653,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3647,7 +3681,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3670,7 +3704,7 @@ getTables(int *numTables)
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, "
"'t'::bool AS relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3703,7 +3737,7 @@ getTables(int *numTables)
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, "
"'t'::bool AS relhasoids, "
- "0 as relfrozenxid, "
+ "0 as relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3749,6 +3783,7 @@ getTables(int *numTables)
i_relhasrules = PQfnumber(res, "relhasrules");
i_relhasoids = PQfnumber(res, "relhasoids");
i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+ i_relpersistence = PQfnumber(res, "relpersistence");
i_owning_tab = PQfnumber(res, "owning_tab");
i_owning_col = PQfnumber(res, "owning_col");
i_reltablespace = PQfnumber(res, "reltablespace");
@@ -3783,6 +3818,7 @@ getTables(int *numTables)
tblinfo[i].rolname = strdup(PQgetvalue(res, i, i_rolname));
tblinfo[i].relacl = strdup(PQgetvalue(res, i, i_relacl));
tblinfo[i].relkind = *(PQgetvalue(res, i, i_relkind));
+ tblinfo[i].relpersistence = *(PQgetvalue(res, i, i_relpersistence));
tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
@@ -11051,8 +11087,12 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
if (binary_upgrade)
binary_upgrade_set_relfilenodes(q, tbinfo->dobj.catId.oid, false);
- appendPQExpBuffer(q, "CREATE TABLE %s",
- fmtId(tbinfo->dobj.name));
+ if (tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED)
+ appendPQExpBuffer(q, "CREATE UNLOGGED TABLE %s",
+ fmtId(tbinfo->dobj.name));
+ else
+ appendPQExpBuffer(q, "CREATE TABLE %s",
+ fmtId(tbinfo->dobj.name));
if (tbinfo->reloftype)
appendPQExpBuffer(q, " OF %s", tbinfo->reloftype);
actual_atts = 0;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 7885535..4313fd8 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -220,6 +220,7 @@ typedef struct _tableInfo
char *rolname; /* name of owner, or empty string */
char *relacl;
char relkind;
+ char relpersistence; /* relation persistence */
char *reltablespace; /* relation tablespace */
char *reloptions; /* options specified by WITH (...) */
char *toast_reloptions; /* ditto, for the TOAST table */
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c4370a1..207d028 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1118,6 +1118,7 @@ describeOneTableDetails(const char *schemaname,
Oid tablespace;
char *reloptions;
char *reloftype;
+ char relpersistence;
} tableinfo;
bool show_modifiers = false;
bool retval;
@@ -1138,6 +1139,23 @@ describeOneTableDetails(const char *schemaname,
"SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
"c.relhastriggers, c.relhasoids, "
"%s, c.reltablespace, "
+ "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
+ "c.relpersistence\n"
+ "FROM pg_catalog.pg_class c\n "
+ "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+ "WHERE c.oid = '%s'\n",
+ (verbose ?
+ "pg_catalog.array_to_string(c.reloptions || "
+ "array(select 'toast.' || x from pg_catalog.unnest(tc.reloptions) x), ', ')\n"
+ : "''"),
+ oid);
+ }
+ else if (pset.sversion >= 90000)
+ {
+ printfPQExpBuffer(&buf,
+ "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
+ "c.relhastriggers, c.relhasoids, "
+ "%s, c.reltablespace, "
"CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END\n"
"FROM pg_catalog.pg_class c\n "
"LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
@@ -1218,6 +1236,8 @@ describeOneTableDetails(const char *schemaname,
atooid(PQgetvalue(res, 0, 7)) : 0;
tableinfo.reloftype = (pset.sversion >= 90000 && strcmp(PQgetvalue(res, 0, 8), "") != 0) ?
strdup(PQgetvalue(res, 0, 8)) : 0;
+ tableinfo.relpersistence = (pset.sversion >= 90100 && strcmp(PQgetvalue(res, 0, 9), "") != 0) ?
+ PQgetvalue(res, 0, 9)[0] : 0;
PQclear(res);
res = NULL;
@@ -1269,8 +1289,12 @@ describeOneTableDetails(const char *schemaname,
switch (tableinfo.relkind)
{
case 'r':
- printfPQExpBuffer(&title, _("Table \"%s.%s\""),
- schemaname, relationname);
+ if (tableinfo.relpersistence == 'u')
+ printfPQExpBuffer(&title, _("Unlogged Table \"%s.%s\""),
+ schemaname, relationname);
+ else
+ printfPQExpBuffer(&title, _("Table \"%s.%s\""),
+ schemaname, relationname);
break;
case 'v':
printfPQExpBuffer(&title, _("View \"%s.%s\""),
@@ -1281,8 +1305,12 @@ describeOneTableDetails(const char *schemaname,
schemaname, relationname);
break;
case 'i':
- printfPQExpBuffer(&title, _("Index \"%s.%s\""),
- schemaname, relationname);
+ if (tableinfo.relpersistence == 'u')
+ printfPQExpBuffer(&title, _("Unlogged Index \"%s.%s\""),
+ schemaname, relationname);
+ else
+ printfPQExpBuffer(&title, _("Index \"%s.%s\""),
+ schemaname, relationname);
break;
case 's':
/* not used as of 8.2, but keep it for backwards compatibility */
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
index e2d7b45..b1eef92 100644
--- a/src/include/access/gin.h
+++ b/src/include/access/gin.h
@@ -389,6 +389,7 @@ extern void ginUpdateStats(Relation index, const GinStatsData *stats);
/* gininsert.c */
extern Datum ginbuild(PG_FUNCTION_ARGS);
+extern Datum ginbuildempty(PG_FUNCTION_ARGS);
extern Datum gininsert(PG_FUNCTION_ARGS);
extern void ginEntryInsert(Relation index, GinState *ginstate,
OffsetNumber attnum, Datum value,
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index f2dcbfb..742fad6 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -234,6 +234,7 @@ typedef struct
/* gist.c */
extern Datum gistbuild(PG_FUNCTION_ARGS);
+extern Datum gistbuildempty(PG_FUNCTION_ARGS);
extern Datum gistinsert(PG_FUNCTION_ARGS);
extern MemoryContext createTempGistContext(void);
extern void initGISTstate(GISTSTATE *giststate, Relation index);
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index d5899f4..52d1c93 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -242,6 +242,7 @@ typedef HashMetaPageData *HashMetaPage;
/* public routines */
extern Datum hashbuild(PG_FUNCTION_ARGS);
+extern Datum hashbuildempty(PG_FUNCTION_ARGS);
extern Datum hashinsert(PG_FUNCTION_ARGS);
extern Datum hashbeginscan(PG_FUNCTION_ARGS);
extern Datum hashgettuple(PG_FUNCTION_ARGS);
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 3bbc4d1..283612e 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -555,6 +555,7 @@ typedef BTScanOpaqueData *BTScanOpaque;
* prototypes for functions in nbtree.c (external entry points for btree)
*/
extern Datum btbuild(PG_FUNCTION_ARGS);
+extern Datum btbuildempty(PG_FUNCTION_ARGS);
extern Datum btinsert(PG_FUNCTION_ARGS);
extern Datum btbeginscan(PG_FUNCTION_ARGS);
extern Datum btgettuple(PG_FUNCTION_ARGS);
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 56dcdd5..40cb9ff 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -25,7 +25,7 @@
extern const char *forkNames[];
extern ForkNumber forkname_to_number(char *forkName);
-extern int forkname_chars(const char *str);
+extern int forkname_chars(const char *str, ForkNumber *);
extern char *relpathbackend(RelFileNode rnode, BackendId backend,
ForkNumber forknum);
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index 5a18dee..f078fdd 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -60,6 +60,7 @@ CATALOG(pg_am,2601)
regproc ammarkpos; /* "mark current scan position" function */
regproc amrestrpos; /* "restore marked scan position" function */
regproc ambuild; /* "build new index" function */
+ regproc ambuildempty; /* "build empty index" function */
regproc ambulkdelete; /* bulk-delete function */
regproc amvacuumcleanup; /* post-VACUUM cleanup function */
regproc amcostestimate; /* estimate cost of an indexscan */
@@ -101,26 +102,27 @@ typedef FormData_pg_am *Form_pg_am;
#define Anum_pg_am_ammarkpos 21
#define Anum_pg_am_amrestrpos 22
#define Anum_pg_am_ambuild 23
-#define Anum_pg_am_ambulkdelete 24
-#define Anum_pg_am_amvacuumcleanup 25
-#define Anum_pg_am_amcostestimate 26
-#define Anum_pg_am_amoptions 27
+#define Anum_pg_am_ambuildempty 24
+#define Anum_pg_am_ambulkdelete 25
+#define Anum_pg_am_amvacuumcleanup 26
+#define Anum_pg_am_amcostestimate 27
+#define Anum_pg_am_amoptions 28
/* ----------------
* initial contents of pg_am
* ----------------
*/
-DATA(insert OID = 403 ( btree 5 1 t f t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions ));
+DATA(insert OID = 403 ( btree 5 1 t f t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
DESCR("b-tree index access method");
#define BTREE_AM_OID 403
-DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
+DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
DESCR("hash index access method");
#define HASH_AM_OID 405
-DATA(insert OID = 783 ( gist 0 7 f f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
+DATA(insert OID = 783 ( gist 0 7 f f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
DESCR("GiST index access method");
#define GIST_AM_OID 783
-DATA(insert OID = 2742 ( gin 0 5 f f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
+DATA(insert OID = 2742 ( gin 0 5 f f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
DESCR("GIN index access method");
#define GIN_AM_OID 2742
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 1edbfe3..39f9743 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -150,6 +150,7 @@ DESCR("");
#define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
#define RELPERSISTENCE_PERMANENT 'p'
+#define RELPERSISTENCE_UNLOGGED 'u'
#define RELPERSISTENCE_TEMP 't'
#endif /* PG_CLASS_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 25a3912..87ec355 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -689,6 +689,8 @@ DATA(insert OID = 337 ( btrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 227
DESCR("btree(internal)");
DATA(insert OID = 338 ( btbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ btbuild _null_ _null_ _null_ ));
DESCR("btree(internal)");
+DATA(insert OID = 328 ( btbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ btbuildempty _null_ _null_ _null_ ));
+DESCR("btree(internal)");
DATA(insert OID = 332 ( btbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ btbulkdelete _null_ _null_ _null_ ));
DESCR("btree(internal)");
DATA(insert OID = 972 ( btvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ btvacuumcleanup _null_ _null_ _null_ ));
@@ -808,6 +810,8 @@ DATA(insert OID = 447 ( hashrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("hash(internal)");
DATA(insert OID = 448 ( hashbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ hashbuild _null_ _null_ _null_ ));
DESCR("hash(internal)");
+DATA(insert OID = 327 ( hashbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ hashbuildempty _null_ _null_ _null_ ));
+DESCR("hash(internal)");
DATA(insert OID = 442 ( hashbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ hashbulkdelete _null_ _null_ _null_ ));
DESCR("hash(internal)");
DATA(insert OID = 425 ( hashvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ hashvacuumcleanup _null_ _null_ _null_ ));
@@ -1104,6 +1108,8 @@ DATA(insert OID = 781 ( gistrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("gist(internal)");
DATA(insert OID = 782 ( gistbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ gistbuild _null_ _null_ _null_ ));
DESCR("gist(internal)");
+DATA(insert OID = 326 ( gistbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ gistbuildempty _null_ _null_ _null_ ));
+DESCR("gist(internal)");
DATA(insert OID = 776 ( gistbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ gistbulkdelete _null_ _null_ _null_ ));
DESCR("gist(internal)");
DATA(insert OID = 2561 ( gistvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ gistvacuumcleanup _null_ _null_ _null_ ));
@@ -4345,6 +4351,8 @@ DATA(insert OID = 2737 ( ginrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("gin(internal)");
DATA(insert OID = 2738 ( ginbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ ginbuild _null_ _null_ _null_ ));
DESCR("gin(internal)");
+DATA(insert OID = 325 ( ginbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ ginbuildempty _null_ _null_ _null_ ));
+DESCR("gin(internal)");
DATA(insert OID = 2739 ( ginbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ ginbulkdelete _null_ _null_ _null_ ));
DESCR("gin(internal)");
DATA(insert OID = 2740 ( ginvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ ginvacuumcleanup _null_ _null_ _null_ ));
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 2c44cf7..3b038a0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -388,6 +388,7 @@ PG_KEYWORD("union", UNION, RESERVED_KEYWORD)
PG_KEYWORD("unique", UNIQUE, RESERVED_KEYWORD)
PG_KEYWORD("unknown", UNKNOWN, UNRESERVED_KEYWORD)
PG_KEYWORD("unlisten", UNLISTEN, UNRESERVED_KEYWORD)
+PG_KEYWORD("unlogged", UNLOGGED, UNRESERVED_KEYWORD)
PG_KEYWORD("until", UNTIL, UNRESERVED_KEYWORD)
PG_KEYWORD("update", UPDATE, UNRESERVED_KEYWORD)
PG_KEYWORD("user", USER, RESERVED_KEYWORD)
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 62d15cc..ebf6855 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -203,7 +203,7 @@
* Enable debugging print statements for WAL-related operations; see
* also the wal_debug GUC var.
*/
-/* #define WAL_DEBUG */
+#define WAL_DEBUG
/*
* Enable tracing of resource consumption during sort operations;
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index b24a98c..7c57724 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -14,5 +14,6 @@
#define COPYDIR_H
extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void copy_file(char *fromfile, char *tofile);
#endif /* COPYDIR_H */
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
new file mode 100644
index 0000000..9999dff
--- /dev/null
+++ b/src/include/storage/reinit.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.h
+ * Reinitialization of unlogged relations
+ *
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fd.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REINIT_H
+#define REINIT_H
+
+extern void ResetUnloggedRelations(int op);
+
+#define UNLOGGED_RELATION_CLEANUP 0x0001
+#define UNLOGGED_RELATION_INIT 0x0002
+
+#endif /* REINIT_H */
diff --git a/src/include/storage/relfilenode.h b/src/include/storage/relfilenode.h
index 24a72e6..f71b233 100644
--- a/src/include/storage/relfilenode.h
+++ b/src/include/storage/relfilenode.h
@@ -27,7 +27,8 @@ typedef enum ForkNumber
InvalidForkNumber = -1,
MAIN_FORKNUM = 0,
FSM_FORKNUM,
- VISIBILITYMAP_FORKNUM
+ VISIBILITYMAP_FORKNUM,
+ INIT_FORKNUM
/*
* NOTE: if you add a new fork, change MAX_FORKNUM below and update the
@@ -35,7 +36,7 @@ typedef enum ForkNumber
*/
} ForkNumber;
-#define MAX_FORKNUM VISIBILITYMAP_FORKNUM
+#define MAX_FORKNUM INIT_FORKNUM
/*
* RelFileNode must provide all that we need to know to physically access
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 88a3168..d5b5e58 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -114,6 +114,7 @@ typedef struct RelationAmInfo
FmgrInfo ammarkpos;
FmgrInfo amrestrpos;
FmgrInfo ambuild;
+ FmgrInfo ambuildempty;
FmgrInfo ambulkdelete;
FmgrInfo amvacuumcleanup;
FmgrInfo amcostestimate;
relpersistence-v3.patchapplication/octet-stream; name=relpersistence-v3.patchDownload
commit 25ac27bb77803212531e373a0a7125261baa006b
Author: Robert Haas <rhaas@postgresql.org>
Date: Mon Aug 16 21:02:11 2010 -0400
Generalize concept of temporary relations to "relation persistence".
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp. It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.
For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 070cd92..9d857a0 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -304,7 +304,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
@@ -373,7 +373,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(lbuffer);
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
@@ -422,7 +422,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(rbuffer);
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 525f79c..74339c9 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -103,7 +103,7 @@ writeListPage(Relation index, Buffer buffer,
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecData rdata[2];
ginxlogInsertListPage data;
@@ -384,7 +384,7 @@ ginHeapTupleFastInsert(Relation index, GinState *ginstate,
*/
MarkBufferDirty(metabuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
@@ -564,7 +564,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
MarkBufferDirty(buffers[i]);
}
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fa70e4f..8681ede 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -55,7 +55,7 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
@@ -325,7 +325,7 @@ ginbuild(PG_FUNCTION_ARGS)
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 27326ac..5f20ac9 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -410,7 +410,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
MarkBufferDirty(metabuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
ginxlogUpdateMeta data;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 7dfecff..4b35acb 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -93,7 +93,7 @@ xlogVacuumPage(Relation index, Buffer buffer)
Assert(GinPageIsLeaf(page));
- if (index->rd_istemp)
+ if (!RelationNeedsWAL(index))
return;
data.node = index->rd_node;
@@ -308,7 +308,7 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
MarkBufferDirty(lBuffer);
MarkBufferDirty(dBuffer);
- if (!gvs->index->rd_istemp)
+ if (RelationNeedsWAL(gvs->index))
{
XLogRecPtr recptr;
XLogRecData rdata[4];
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 8c2dbc9..6693730 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -115,7 +115,7 @@ gistbuild(PG_FUNCTION_ARGS)
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata;
@@ -401,7 +401,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
dist->page = BufferGetPage(dist->buffer);
}
- if (!state->r->rd_istemp)
+ if (RelationNeedsWAL(state->r))
{
XLogRecPtr recptr;
XLogRecData *rdata;
@@ -465,7 +465,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
MarkBufferDirty(state->stack->buffer);
- if (!state->r->rd_istemp)
+ if (RelationNeedsWAL(state->r))
{
OffsetNumber noffs = 0,
offs[1];
@@ -550,7 +550,7 @@ gistfindleaf(GISTInsertState *state, GISTSTATE *giststate)
opaque = GistPageGetOpaque(state->stack->page);
state->stack->lsn = PageGetLSN(state->stack->page);
- Assert(state->r->rd_istemp || !XLogRecPtrIsInvalid(state->stack->lsn));
+ Assert(!RelationNeedsWAL(state->r) || !XLogRecPtrIsInvalid(state->stack->lsn));
if (state->stack->blkno != GIST_ROOT_BLKNO &&
XLByteLT(state->stack->parent->lsn, opaque->nsn))
@@ -911,7 +911,7 @@ gistmakedeal(GISTInsertState *state, GISTSTATE *giststate)
}
/* say to xlog that insert is completed */
- if (state->needInsertComplete && !state->r->rd_istemp)
+ if (state->needInsertComplete && RelationNeedsWAL(state->r))
gistxlogInsertCompletion(state->r->rd_node, &(state->key), 1);
}
@@ -1011,7 +1011,7 @@ gistnewroot(Relation r, Buffer buffer, IndexTuple *itup, int len, ItemPointer ke
MarkBufferDirty(buffer);
- if (!r->rd_istemp)
+ if (RelationNeedsWAL(r))
{
XLogRecPtr recptr;
XLogRecData *rdata;
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index dbe9406..e02e72d 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -248,7 +248,7 @@ gistbulkdelete(PG_FUNCTION_ARGS)
PageIndexTupleDelete(page, todelete[i]);
GistMarkTuplesDeleted(page);
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecData *rdata;
XLogRecPtr recptr;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8b064bc..8f368a2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -124,7 +124,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
*
* During a rescan, don't make a new strategy object if we don't have to.
*/
- if (!scan->rs_rd->rd_istemp &&
+ if (!RelationUsesLocalBuffers(scan->rs_rd) &&
scan->rs_nblocks > NBuffers / 4)
{
allow_strat = scan->rs_allow_strat;
@@ -905,7 +905,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
elog(ERROR, "could not open relation with OID %u", relationId);
/* Make note that we've accessed a temporary relation */
- if (r->rd_istemp)
+ if (RelationUsesLocalBuffers(r))
MyXactAccessedTempRel = true;
pgstat_initstats(r);
@@ -951,7 +951,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
elog(ERROR, "could not open relation with OID %u", relationId);
/* Make note that we've accessed a temporary relation */
- if (r->rd_istemp)
+ if (RelationUsesLocalBuffers(r))
MyXactAccessedTempRel = true;
pgstat_initstats(r);
@@ -1917,7 +1917,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!(options & HEAP_INSERT_SKIP_WAL) && !relation->rd_istemp)
+ if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
{
xl_heap_insert xlrec;
xl_heap_header xlhdr;
@@ -2227,7 +2227,7 @@ l1:
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_delete xlrec;
XLogRecPtr recptr;
@@ -2780,7 +2780,7 @@ l2:
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
XLogRecPtr recptr = log_heap_update(relation, buffer, oldtup.t_self,
newbuf, heaptup,
@@ -3403,7 +3403,7 @@ l3:
* (Also, in a PITR log-shipping or 2PC environment, we have to have XLOG
* entries for everything anyway.)
*/
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_lock xlrec;
XLogRecPtr recptr;
@@ -3505,7 +3505,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_inplace xlrec;
XLogRecPtr recptr;
@@ -3852,8 +3852,8 @@ log_heap_clean(Relation reln, Buffer buffer,
XLogRecPtr recptr;
XLogRecData rdata[4];
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
xlrec.node = reln->rd_node;
xlrec.block = BufferGetBlockNumber(buffer);
@@ -3935,8 +3935,8 @@ log_heap_freeze(Relation reln, Buffer buffer,
XLogRecPtr recptr;
XLogRecData rdata[2];
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
/* nor when there are no tuples to freeze */
Assert(offcnt > 0);
@@ -3981,8 +3981,8 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
XLogRecData rdata[4];
Page page = BufferGetPage(newbuf);
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
if (HeapTupleIsHeapOnly(newtup))
info = XLOG_HEAP_HOT_UPDATE;
@@ -4982,7 +4982,7 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
* heap_sync - sync a heap, for use when no WAL has been written
*
* This forces the heap contents (including TOAST heap if any) down to disk.
- * If we skipped using WAL, and it's not a temp relation, we must force the
+ * If we skipped using WAL, and WAL is otherwise needed, we must force the
* relation down to disk before it's safe to commit the transaction. This
* requires writing out any dirty buffers and then doing a forced fsync.
*
@@ -4995,8 +4995,8 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
void
heap_sync(Relation rel)
{
- /* temp tables never need fsync */
- if (rel->rd_istemp)
+ /* non-WAL-logged tables never need fsync */
+ if (!RelationNeedsWAL(rel))
return;
/* main heap */
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b8c4027..40eadb8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
/*
* Emit a WAL HEAP_CLEAN record showing what we did
*/
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 19ca302..eb2dbff 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -277,8 +277,8 @@ end_heap_rewrite(RewriteState state)
}
/*
- * If the rel isn't temp, must fsync before commit. We use heap_sync to
- * ensure that the toast table gets fsync'd too.
+ * If the rel is WAL-logged, must fsync before commit. We use heap_sync
+ * to ensure that the toast table gets fsync'd too.
*
* It's obvious that we must do this when not WAL-logging. It's less
* obvious that we have to do it even if we did WAL-log the pages. The
@@ -287,7 +287,7 @@ end_heap_rewrite(RewriteState state)
* occurring during the rewriteheap operation won't have fsync'd data we
* wrote before the checkpoint.
*/
- if (!state->rs_new_rel->rd_istemp)
+ if (RelationNeedsWAL(state->rs_new_rel))
heap_sync(state->rs_new_rel);
/* Deleting the context frees everything */
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index eaad812..ee0f04c 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -766,7 +766,7 @@ _bt_insertonpg(Relation rel,
}
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_insert xlrec;
BlockNumber xldownlink;
@@ -1165,7 +1165,7 @@ _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,
}
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_split xlrec;
uint8 xlinfo;
@@ -1914,7 +1914,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
MarkBufferDirty(metabuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_newroot xlrec;
XLogRecPtr recptr;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index e0c0f21..2b44780 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -224,7 +224,7 @@ _bt_getroot(Relation rel, int access)
MarkBufferDirty(metabuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_newroot xlrec;
XLogRecPtr recptr;
@@ -452,7 +452,7 @@ _bt_checkpage(Relation rel, Buffer buf)
static void
_bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedXid)
{
- if (rel->rd_istemp)
+ if (!RelationNeedsWAL(rel))
return;
/* No ereport(ERROR) until changes are logged */
@@ -751,7 +751,7 @@ _bt_delitems_vacuum(Relation rel, Buffer buf,
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
@@ -829,7 +829,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecPtr recptr;
XLogRecData rdata[3];
@@ -1365,7 +1365,7 @@ _bt_pagedel(Relation rel, Buffer buf, BTStack stack)
MarkBufferDirty(lbuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_delete_page xlrec;
xl_btree_metadata xlmeta;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index a1d3aef..3fb43a2 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -211,9 +211,9 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
/*
* We need to log index creation in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp index.
+ * enabled UNLESS the index isn't WAL-logged anyway.
*/
- wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;
+ wstate.btws_use_wal = XLogIsNeeded() && RelationNeedsWAL(wstate.index);
/* reserve the metapage */
wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
@@ -797,9 +797,9 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
_bt_uppershutdown(wstate, state);
/*
- * If the index isn't temp, we must fsync it down to disk before it's safe
- * to commit the transaction. (For a temp index we don't care since the
- * index will be uninteresting after a crash anyway.)
+ * If the index is WAL-logged, we must fsync it down to disk before it's
+ * safe to commit the transaction. (For a non-WAL-logged index we don't
+ * care since the index will be uninteresting after a crash anyway.)
*
* It's obvious that we must do this when not WAL-logging the build. It's
* less obvious that we have to do it even if we did WAL-log the index
@@ -811,7 +811,7 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
* fsync those pages here, they might still not be on disk when the crash
* occurs.
*/
- if (!wstate->index->rd_istemp)
+ if (RelationNeedsWAL(wstate->index))
{
RelationOpenSmgr(wstate->index);
smgrimmedsync(wstate->index->rd_smgr, MAIN_FORKNUM);
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index e475403..73ef114 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -219,6 +219,7 @@ Boot_CreateStmt:
$3,
tupdesc,
RELKIND_RELATION,
+ RELPERSISTENCE_PERMANENT,
shared_relation,
mapped_relation,
true);
@@ -238,6 +239,7 @@ Boot_CreateStmt:
tupdesc,
NIL,
RELKIND_RELATION,
+ RELPERSISTENCE_PERMANENT,
shared_relation,
mapped_relation,
true,
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6322512..88b5c2a 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -524,12 +524,26 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
* created by bootstrap have preassigned OIDs, so there's no need.
*/
Oid
-GetNewRelFileNode(Oid reltablespace, Relation pg_class, BackendId backend)
+GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
{
RelFileNodeBackend rnode;
char *rpath;
int fd;
bool collides;
+ BackendId backend;
+
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_TEMP:
+ backend = MyBackendId;
+ break;
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ return InvalidOid; /* placate compiler */
+ }
/* This logic should match RelationInitPhysicalAddr */
rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9b7668c..bcf6caa 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -238,6 +238,7 @@ heap_create(const char *relname,
Oid relid,
TupleDesc tupDesc,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool allow_system_table_mods)
@@ -311,7 +312,8 @@ heap_create(const char *relname,
relid,
reltablespace,
shared_relation,
- mapped_relation);
+ mapped_relation,
+ relpersistence);
/*
* Have the storage manager create the relation's disk file, if needed.
@@ -322,7 +324,7 @@ heap_create(const char *relname,
if (create_storage)
{
RelationOpenSmgr(rel);
- RelationCreateStorage(rel->rd_node, rel->rd_istemp);
+ RelationCreateStorage(rel->rd_node, relpersistence);
}
return rel;
@@ -693,7 +695,7 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
- values[Anum_pg_class_relistemp - 1] = BoolGetDatum(rd_rel->relistemp);
+ values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
values[Anum_pg_class_relkind - 1] = CharGetDatum(rd_rel->relkind);
values[Anum_pg_class_relnatts - 1] = Int16GetDatum(rd_rel->relnatts);
values[Anum_pg_class_relchecks - 1] = Int16GetDatum(rd_rel->relchecks);
@@ -898,6 +900,7 @@ heap_create_with_catalog(const char *relname,
TupleDesc tupdesc,
List *cooked_constraints,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool oidislocal,
@@ -997,8 +1000,7 @@ heap_create_with_catalog(const char *relname,
}
else
relid = GetNewRelFileNode(reltablespace, pg_class_desc,
- isTempOrToastNamespace(relnamespace) ?
- MyBackendId : InvalidBackendId);
+ relpersistence);
}
/*
@@ -1036,6 +1038,7 @@ heap_create_with_catalog(const char *relname,
relid,
tupdesc,
relkind,
+ relpersistence,
shared_relation,
mapped_relation,
allow_system_table_mods);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b437c99..8fbe8eb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -545,6 +545,7 @@ index_create(Oid heapRelationId,
bool is_exclusion;
Oid namespaceId;
int i;
+ char relpersistence;
is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -561,11 +562,13 @@ index_create(Oid heapRelationId,
/*
* The index will be in the same namespace as its parent table, and is
* shared across databases if and only if the parent is. Likewise, it
- * will use the relfilenode map if and only if the parent does.
+ * will use the relfilenode map if and only if the parent does; and it
+ * inherits the parent's relpersistence.
*/
namespaceId = RelationGetNamespace(heapRelation);
shared_relation = heapRelation->rd_rel->relisshared;
mapped_relation = RelationIsMapped(heapRelation);
+ relpersistence = heapRelation->rd_rel->relpersistence;
/*
* check parameters
@@ -646,9 +649,7 @@ index_create(Oid heapRelationId,
else
{
indexRelationId =
- GetNewRelFileNode(tableSpaceId, pg_class,
- heapRelation->rd_istemp ?
- MyBackendId : InvalidBackendId);
+ GetNewRelFileNode(tableSpaceId, pg_class, relpersistence);
}
}
@@ -663,6 +664,7 @@ index_create(Oid heapRelationId,
indexRelationId,
indexTupDesc,
RELKIND_INDEX,
+ relpersistence,
shared_relation,
mapped_relation,
allow_system_table_mods);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 653c9ad..84cbfeb 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -235,14 +235,14 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
}
/*
- * If istemp is set, this is a reference to a temp relation. The parser
- * never generates such a RangeVar in simple DML, but it can happen in
- * contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)". Such a
- * command will generate an added CREATE INDEX operation, which must be
+ * Some non-default relpersistence value may have been specified. The
+ * parser never generates such a RangeVar in simple DML, but it can happen
+ * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)". Such
+ * a command will generate an added CREATE INDEX operation, which must be
* careful to find the temp table, even when pg_temp is not first in the
* search path.
*/
- if (relation->istemp)
+ if (relation->relpersistence == RELPERSISTENCE_TEMP)
{
if (relation->schemaname)
ereport(ERROR,
@@ -308,7 +308,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
newRelation->relname)));
}
- if (newRelation->istemp)
+ if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
{
/* TEMP tables are created in our backend-local temp namespace */
if (newRelation->schemaname)
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 0ce2051..671aaff 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -95,19 +95,35 @@ typedef struct xl_smgr_truncate
* transaction aborts later on, the storage will be destroyed.
*/
void
-RelationCreateStorage(RelFileNode rnode, bool istemp)
+RelationCreateStorage(RelFileNode rnode, char relpersistence)
{
PendingRelDelete *pending;
XLogRecPtr lsn;
XLogRecData rdata;
xl_smgr_create xlrec;
SMgrRelation srel;
- BackendId backend = istemp ? MyBackendId : InvalidBackendId;
+ BackendId backend;
+ bool needs_wal;
+
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_TEMP:
+ backend = MyBackendId;
+ needs_wal = false;
+ break;
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ needs_wal = true;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ return; /* placate compiler */
+ }
srel = smgropen(rnode, backend);
smgrcreate(srel, MAIN_FORKNUM, false);
- if (!istemp)
+ if (needs_wal)
{
/*
* Make an XLOG entry reporting the file creation.
@@ -253,7 +269,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
* failure to truncate, that might spell trouble at WAL replay, into a
* certain PANIC.
*/
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
/*
* Make an XLOG entry reporting the file truncation.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7bf64e2..d1f6c9f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -195,7 +195,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
* Toast tables for regular relations go in pg_toast; those for temp
* relations go into the per-backend temp-toast-table namespace.
*/
- if (rel->rd_backend == MyBackendId)
+ if (RelationUsesTempNamespace(rel))
namespaceid = GetTempToastNamespace();
else
namespaceid = PG_TOAST_NAMESPACE;
@@ -216,6 +216,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
tupdesc,
NIL,
RELKIND_TOASTVALUE,
+ rel->rd_rel->relpersistence,
shared_relation,
mapped_relation,
true,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index bb7cd74..9fdc471 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -675,6 +675,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace)
tupdesc,
NIL,
OldHeap->rd_rel->relkind,
+ OldHeap->rd_rel->relpersistence,
false,
RelationIsMapped(OldHeap),
true,
@@ -789,9 +790,9 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp rel.
+ * enabled AND it's not a WAL-logged rel.
*/
- use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;
+ use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
/* use_wal off requires smgr_targblock be initially invalid */
Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9407d0f..0940893 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -222,7 +222,7 @@ DefineIndex(RangeVar *heapRelation,
}
else
{
- tablespaceId = GetDefaultTablespace(rel->rd_istemp);
+ tablespaceId = GetDefaultTablespace(rel->rd_rel->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -1706,7 +1706,7 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
continue;
/* Skip temp tables of other backends; we can't reindex them at all */
- if (classtuple->relistemp &&
+ if (classtuple->relpersistence == RELPERSISTENCE_TEMP &&
!isTempNamespace(classtuple->relnamespace))
continue;
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index bb8ebce..e1df5fb 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -366,7 +366,7 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -448,7 +448,7 @@ AlterSequence(AlterSeqStmt *stmt)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!seqrel->rd_istemp)
+ if (RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -678,7 +678,7 @@ nextval_internal(Oid relid)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (logit && !seqrel->rd_istemp)
+ if (logit && RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -855,7 +855,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!seqrel->rd_istemp)
+ if (RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 937992b..31884c6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -224,7 +224,7 @@ static const struct dropmsgstrings dropmsgstringarray[] = {
static void truncate_check_rel(Relation rel);
-static List *MergeAttributes(List *schema, List *supers, bool istemp,
+static List *MergeAttributes(List *schema, List *supers, char relpersistence,
List **supOids, List **supconstr, int *supOidCount);
static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
static bool change_varattnos_walker(Node *node, const AttrNumber *newattno);
@@ -339,7 +339,7 @@ static void ATPrepAddInherit(Relation child_rel);
static void ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode);
static void ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode);
static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
- ForkNumber forkNum, bool istemp);
+ ForkNumber forkNum, char relpersistence);
static const char *storage_name(char c);
@@ -391,7 +391,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
/*
* Check consistency of arguments
*/
- if (stmt->oncommit != ONCOMMIT_NOOP && !stmt->relation->istemp)
+ if (stmt->oncommit != ONCOMMIT_NOOP
+ && stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("ON COMMIT can only be used on temporary tables")));
@@ -401,7 +402,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
* code. This is needed because calling code might not expect untrusted
* tables to appear in pg_temp at the front of its search path.
*/
- if (stmt->relation->istemp && InSecurityRestrictedOperation())
+ if (stmt->relation->relpersistence == RELPERSISTENCE_TEMP
+ && InSecurityRestrictedOperation())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("cannot create temporary table within security-restricted operation")));
@@ -434,7 +436,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
}
else
{
- tablespaceId = GetDefaultTablespace(stmt->relation->istemp);
+ tablespaceId = GetDefaultTablespace(stmt->relation->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -478,7 +480,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
* inherited attributes.
*/
schema = MergeAttributes(schema, stmt->inhRelations,
- stmt->relation->istemp,
+ stmt->relation->relpersistence,
&inheritOids, &old_constraints, &parentOidCount);
/*
@@ -557,6 +559,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
list_concat(cookedDefaults,
old_constraints),
relkind,
+ stmt->relation->relpersistence,
false,
false,
localHasOids,
@@ -1208,7 +1211,7 @@ storage_name(char c)
*----------
*/
static List *
-MergeAttributes(List *schema, List *supers, bool istemp,
+MergeAttributes(List *schema, List *supers, char relpersistence,
List **supOids, List **supconstr, int *supOidCount)
{
ListCell *entry;
@@ -1316,7 +1319,8 @@ MergeAttributes(List *schema, List *supers, bool istemp,
errmsg("inherited relation \"%s\" is not a table",
parent->relname)));
/* Permanent rels cannot inherit from temporary ones */
- if (!istemp && relation->rd_istemp)
+ if (relpersistence != RELPERSISTENCE_TEMP
+ && RelationUsesTempNamespace(relation))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot inherit from temporary relation \"%s\"",
@@ -5124,23 +5128,23 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
RelationGetRelationName(pkrel))));
/*
- * Disallow reference from permanent table to temp table or vice versa.
- * (The ban on perm->temp is for fairly obvious reasons. The ban on
- * temp->perm is because other backends might need to run the RI triggers
- * on the perm table, but they can't reliably see tuples the owning
- * backend has created in the temp table, because non-shared buffers are
- * used for temp tables.)
+ * References from permanent tables to temp tables are disallowed because
+ * the contents of the temp table disappear at the end of each session.
+ * References from temp tables to permanent tables are also disallowed,
+ * because other backends might need to run the RI triggers on the perm
+ * table, but they can't reliably see tuples in the local buffers of other
+ * backends.
*/
- if (pkrel->rd_istemp)
+ if (RelationUsesLocalBuffers(pkrel))
{
- if (!rel->rd_istemp)
+ if (!RelationUsesLocalBuffers(rel))
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot reference temporary table from permanent table constraint")));
}
else
{
- if (rel->rd_istemp)
+ if (RelationUsesLocalBuffers(rel))
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot reference permanent table from temporary table constraint")));
@@ -7347,7 +7351,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
* Relfilenodes are not unique across tablespaces, so we need to allocate
* a new one in the new tablespace.
*/
- newrelfilenode = GetNewRelFileNode(newTableSpace, NULL, rel->rd_backend);
+ newrelfilenode = GetNewRelFileNode(newTableSpace, NULL,
+ rel->rd_rel->relpersistence);
/* Open old and new relation */
newrnode = rel->rd_node;
@@ -7364,10 +7369,11 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
* NOTE: any conflict in relfilenode value will be caught in
* RelationCreateStorage().
*/
- RelationCreateStorage(newrnode, rel->rd_istemp);
+ RelationCreateStorage(newrnode, rel->rd_rel->relpersistence);
/* copy main fork */
- copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM, rel->rd_istemp);
+ copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM,
+ rel->rd_rel->relpersistence);
/* copy those extra forks that exist */
for (forkNum = MAIN_FORKNUM + 1; forkNum <= MAX_FORKNUM; forkNum++)
@@ -7375,7 +7381,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
if (smgrexists(rel->rd_smgr, forkNum))
{
smgrcreate(dstrel, forkNum, false);
- copy_relation_data(rel->rd_smgr, dstrel, forkNum, rel->rd_istemp);
+ copy_relation_data(rel->rd_smgr, dstrel, forkNum,
+ rel->rd_rel->relpersistence);
}
}
@@ -7410,7 +7417,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
*/
static void
copy_relation_data(SMgrRelation src, SMgrRelation dst,
- ForkNumber forkNum, bool istemp)
+ ForkNumber forkNum, char relpersistence)
{
char *buf;
Page page;
@@ -7429,9 +7436,9 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp rel.
+ * enabled AND it's a permanent relation.
*/
- use_wal = XLogIsNeeded() && !istemp;
+ use_wal = XLogIsNeeded() && relpersistence == RELPERSISTENCE_PERMANENT;
nblocks = smgrnblocks(src, forkNum);
@@ -7470,7 +7477,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
* wouldn't replay our earlier WAL entries. If we do not fsync those pages
* here, they might still not be on disk when the crash occurs.
*/
- if (!istemp)
+ if (relpersistence == RELPERSISTENCE_PERMANENT)
smgrimmedsync(dst, forkNum);
}
@@ -7538,7 +7545,8 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
ATSimplePermissions(parent_rel, false, false);
/* Permanent rels cannot inherit from temporary ones */
- if (parent_rel->rd_istemp && !child_rel->rd_istemp)
+ if (RelationUsesTempNamespace(parent_rel)
+ && !RelationUsesTempNamespace(child_rel))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot inherit from temporary relation \"%s\"",
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 5ba0f1c..227b4b0 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1050,8 +1050,8 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
/*
* GetDefaultTablespace -- get the OID of the current default tablespace
*
- * Regular objects and temporary objects have different default tablespaces,
- * hence the forTemp parameter must be specified.
+ * Temporary objects have different default tablespaces, hence the
+ * relpersistence parameter must be specified.
*
* May return InvalidOid to indicate "use the database's default tablespace".
*
@@ -1062,12 +1062,12 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
* default_tablespace GUC variable.
*/
Oid
-GetDefaultTablespace(bool forTemp)
+GetDefaultTablespace(char relpersistence)
{
Oid result;
/* The temp-table case is handled elsewhere */
- if (forTemp)
+ if (relpersistence == RELPERSISTENCE_TEMP)
{
PrepareTempTablespaces();
return GetNextTempTableSpace();
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 0ac993f..cbdf97d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -268,10 +268,10 @@ static void
vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
{
/*
- * No need to log changes for temp tables, they do not contain data
- * visible on the standby server.
+ * Skip this for relations for which no WAL is to be written, or if we're
+ * not trying to support archive recovery.
*/
- if (rel->rd_istemp || !XLogIsNeeded())
+ if (!RelationNeedsWAL(rel) || !XLogIsNeeded())
return;
/*
@@ -664,8 +664,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
if (nfrozen > 0)
{
MarkBufferDirty(buf);
- /* no XLOG for temp tables, though */
- if (!onerel->rd_istemp)
+ if (RelationNeedsWAL(onerel))
{
XLogRecPtr recptr;
@@ -895,7 +894,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!onerel->rd_istemp)
+ if (RelationNeedsWAL(onerel))
{
XLogRecPtr recptr;
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 09ab24b..2b2b908 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -68,10 +68,10 @@ isViewOnTempTable_walker(Node *node, void *context)
if (rte->rtekind == RTE_RELATION)
{
Relation rel = heap_open(rte->relid, AccessShareLock);
- bool istemp = rel->rd_istemp;
+ char relpersistence = rel->rd_rel->relpersistence;
heap_close(rel, AccessShareLock);
- if (istemp)
+ if (relpersistence == RELPERSISTENCE_TEMP)
return true;
}
}
@@ -173,9 +173,9 @@ DefineVirtualRelation(const RangeVar *relation, List *tlist, bool replace)
/*
* Due to the namespace visibility rules for temporary objects, we
* should only end up replacing a temporary view with another
- * temporary view, and vice versa.
+ * temporary view, and similarly for permanent views.
*/
- Assert(relation->istemp == rel->rd_istemp);
+ Assert(relation->relpersistence == rel->rd_rel->relpersistence);
/*
* Create a tuple descriptor to compare against the existing view, and
@@ -454,10 +454,11 @@ DefineView(ViewStmt *stmt, const char *queryString)
* schema name.
*/
view = stmt->view;
- if (!view->istemp && isViewOnTempTable(viewParse))
+ if (view->relpersistence == RELPERSISTENCE_PERMANENT
+ && isViewOnTempTable(viewParse))
{
view = copyObject(view); /* don't corrupt original command */
- view->istemp = true;
+ view->relpersistence = RELPERSISTENCE_TEMP;
ereport(NOTICE,
(errmsg("view \"%s\" will be a temporary view",
view->relname)));
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 69f3a28..c4719f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2131,7 +2131,8 @@ OpenIntoRel(QueryDesc *queryDesc)
/*
* Check consistency of arguments
*/
- if (into->onCommit != ONCOMMIT_NOOP && !into->rel->istemp)
+ if (into->onCommit != ONCOMMIT_NOOP
+ && into->rel->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("ON COMMIT can only be used on temporary tables")));
@@ -2141,7 +2142,8 @@ OpenIntoRel(QueryDesc *queryDesc)
* code. This is needed because calling code might not expect untrusted
* tables to appear in pg_temp at the front of its search path.
*/
- if (into->rel->istemp && InSecurityRestrictedOperation())
+ if (into->rel->relpersistence == RELPERSISTENCE_TEMP
+ && InSecurityRestrictedOperation())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("cannot create temporary table within security-restricted operation")));
@@ -2168,7 +2170,7 @@ OpenIntoRel(QueryDesc *queryDesc)
}
else
{
- tablespaceId = GetDefaultTablespace(into->rel->istemp);
+ tablespaceId = GetDefaultTablespace(into->rel->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -2208,6 +2210,7 @@ OpenIntoRel(QueryDesc *queryDesc)
tupdesc,
NIL,
RELKIND_RELATION,
+ into->rel->relpersistence,
false,
false,
true,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0e0b4dc..c60821a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -955,7 +955,7 @@ _copyRangeVar(RangeVar *from)
COPY_STRING_FIELD(schemaname);
COPY_STRING_FIELD(relname);
COPY_SCALAR_FIELD(inhOpt);
- COPY_SCALAR_FIELD(istemp);
+ COPY_SCALAR_FIELD(relpersistence);
COPY_NODE_FIELD(alias);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 2d2b8c7..85cded0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -104,7 +104,7 @@ _equalRangeVar(RangeVar *a, RangeVar *b)
COMPARE_STRING_FIELD(schemaname);
COMPARE_STRING_FIELD(relname);
COMPARE_SCALAR_FIELD(inhOpt);
- COMPARE_SCALAR_FIELD(istemp);
+ COMPARE_SCALAR_FIELD(relpersistence);
COMPARE_NODE_FIELD(alias);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 4b268f3..f06f73b 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -15,6 +15,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -378,7 +379,7 @@ makeRangeVar(char *schemaname, char *relname, int location)
r->schemaname = schemaname;
r->relname = relname;
r->inhOpt = INH_DEFAULT;
- r->istemp = false;
+ r->relpersistence = RELPERSISTENCE_PERMANENT;
r->alias = NULL;
r->location = location;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index afbfcca..8d6051a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -839,7 +839,7 @@ _outRangeVar(StringInfo str, RangeVar *node)
WRITE_STRING_FIELD(schemaname);
WRITE_STRING_FIELD(relname);
WRITE_ENUM_FIELD(inhOpt, InhOption);
- WRITE_BOOL_FIELD(istemp);
+ WRITE_CHAR_FIELD(relpersistence);
WRITE_NODE_FIELD(alias);
WRITE_LOCATION_FIELD(location);
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2166a5d..933d58a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -373,7 +373,7 @@ _readRangeVar(void)
READ_STRING_FIELD(schemaname);
READ_STRING_FIELD(relname);
READ_ENUM_FIELD(inhOpt, InhOption);
- READ_BOOL_FIELD(istemp);
+ READ_CHAR_FIELD(relpersistence);
READ_NODE_FIELD(alias);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9ec75f7..8fc79b6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -311,7 +311,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
%type <fun_param_mode> arg_class
%type <typnam> func_return func_type
-%type <boolean> OptTemp opt_trusted opt_restart_seqs
+%type <boolean> opt_trusted opt_restart_seqs
+%type <ival> OptTemp
%type <oncommit> OnCommitOption
%type <node> for_locking_item
@@ -2280,7 +2281,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptInherit OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->relation = $4;
n->tableElts = $6;
n->inhRelations = $8;
@@ -2296,7 +2297,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $7->istemp = $2;
+ $7->relpersistence = $2;
n->relation = $7;
n->tableElts = $9;
n->inhRelations = $11;
@@ -2311,7 +2312,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTypedTableElementList OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->relation = $4;
n->tableElts = $7;
n->ofTypename = makeTypeNameFromNameList($6);
@@ -2327,7 +2328,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTypedTableElementList OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $7->istemp = $2;
+ $7->relpersistence = $2;
n->relation = $7;
n->tableElts = $10;
n->ofTypename = makeTypeNameFromNameList($9);
@@ -2348,13 +2349,13 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
* NOTE: we accept both GLOBAL and LOCAL options; since we have no modules
* the LOCAL keyword is really meaningless.
*/
-OptTemp: TEMPORARY { $$ = TRUE; }
- | TEMP { $$ = TRUE; }
- | LOCAL TEMPORARY { $$ = TRUE; }
- | LOCAL TEMP { $$ = TRUE; }
- | GLOBAL TEMPORARY { $$ = TRUE; }
- | GLOBAL TEMP { $$ = TRUE; }
- | /*EMPTY*/ { $$ = FALSE; }
+OptTemp: TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | LOCAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | LOCAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | GLOBAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | GLOBAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; }
;
OptTableElementList:
@@ -2834,7 +2835,7 @@ CreateAsStmt:
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("CREATE TABLE AS cannot specify INTO"),
parser_errposition(exprLocation((Node *) n->intoClause))));
- $4->rel->istemp = $2;
+ $4->rel->relpersistence = $2;
n->intoClause = $4;
/* Implement WITH NO DATA by forcing top-level LIMIT 0 */
if (!$7)
@@ -2900,7 +2901,7 @@ CreateSeqStmt:
CREATE OptTemp SEQUENCE qualified_name OptSeqOptList
{
CreateSeqStmt *n = makeNode(CreateSeqStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->sequence = $4;
n->options = $5;
n->ownerId = InvalidOid;
@@ -6621,7 +6622,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
{
ViewStmt *n = makeNode(ViewStmt);
n->view = $4;
- n->view->istemp = $2;
+ n->view->relpersistence = $2;
n->aliases = $5;
n->query = $7;
n->replace = false;
@@ -6632,7 +6633,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
{
ViewStmt *n = makeNode(ViewStmt);
n->view = $6;
- n->view->istemp = $4;
+ n->view->relpersistence = $4;
n->aliases = $7;
n->query = $9;
n->replace = true;
@@ -7328,7 +7329,7 @@ ExecuteStmt: EXECUTE name execute_param_clause
ExecuteStmt *n = makeNode(ExecuteStmt);
n->name = $7;
n->params = $8;
- $4->rel->istemp = $2;
+ $4->rel->relpersistence = $2;
n->into = $4;
if ($4->colNames)
ereport(ERROR,
@@ -7889,42 +7890,42 @@ OptTempTableName:
TEMPORARY opt_table qualified_name
{
$$ = $3;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| TEMP opt_table qualified_name
{
$$ = $3;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| LOCAL TEMPORARY opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| LOCAL TEMP opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| GLOBAL TEMPORARY opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| GLOBAL TEMP opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| TABLE qualified_name
{
$$ = $2;
- $$->istemp = false;
+ $$->relpersistence = RELPERSISTENCE_PERMANENT;
}
| qualified_name
{
$$ = $1;
- $$->istemp = false;
+ $$->relpersistence = RELPERSISTENCE_PERMANENT;
}
;
@@ -10916,16 +10917,12 @@ qualified_name_list:
qualified_name:
ColId
{
- $$ = makeNode(RangeVar);
- $$->catalogname = NULL;
- $$->schemaname = NULL;
- $$->relname = $1;
- $$->location = @1;
+ $$ = makeRangeVar(NULL, $1, @1);
}
| ColId indirection
{
check_qualified_name($2, yyscanner);
- $$ = makeNode(RangeVar);
+ $$ = makeRangeVar(NULL, NULL, @1);
switch (list_length($2))
{
case 1:
@@ -10946,7 +10943,6 @@ qualified_name:
parser_errposition(@1)));
break;
}
- $$->location = @1;
}
;
@@ -12163,6 +12159,7 @@ makeRangeVarFromAnyName(List *names, int position, core_yyscan_t yyscanner)
break;
}
+ r->relpersistence = RELPERSISTENCE_PERMANENT;
r->location = position;
return r;
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index a8aee20..aa7c144 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -158,10 +158,11 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
* If the target relation name isn't schema-qualified, make it so. This
* prevents some corner cases in which added-on rewritten commands might
* think they should apply to other relations that have the same name and
- * are earlier in the search path. "istemp" is equivalent to a
- * specification of pg_temp, so no need for anything extra in that case.
+ * are earlier in the search path. But a local temp table is effectively
+ * specified to be in pg_temp, so no need for anything extra in that case.
*/
- if (stmt->relation->schemaname == NULL && !stmt->relation->istemp)
+ if (stmt->relation->schemaname == NULL
+ && stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
{
Oid namespaceid = RangeVarGetCreationNamespace(stmt->relation);
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c7d704d..89b2540 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1975,7 +1975,7 @@ do_autovacuum(void)
* Check if it is a temp table (presumably, of some other backend's).
* We cannot safely process other backends' temp tables.
*/
- if (classForm->relistemp)
+ if (classForm->relpersistence == RELPERSISTENCE_TEMP)
{
int backendID;
@@ -2072,7 +2072,7 @@ do_autovacuum(void)
/*
* We cannot safely process other backends' temp tables, so skip 'em.
*/
- if (classForm->relistemp)
+ if (classForm->relpersistence == RELPERSISTENCE_TEMP)
continue;
relid = HeapTupleGetOid(tuple);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index edc4977..860e736 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -123,7 +123,7 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
/* Open it at the smgr level if not already done */
RelationOpenSmgr(reln);
- if (reln->rd_istemp)
+ if (RelationUsesLocalBuffers(reln))
{
/* see comments in ReadBufferExtended */
if (RELATION_IS_OTHER_TEMP(reln))
@@ -2071,7 +2071,7 @@ FlushRelationBuffers(Relation rel)
/* Open rel at the smgr level if not already done */
RelationOpenSmgr(rel);
- if (rel->rd_istemp)
+ if (RelationUsesLocalBuffers(rel))
{
for (i = 0; i < NLocBuffer; i++)
{
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index f5250a2..e352cda 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -612,16 +612,26 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
- /* If temporary, determine owning backend. */
- if (!relform->relistemp)
- backend = InvalidBackendId;
- else if (isTempOrToastNamespace(relform->relnamespace))
- backend = MyBackendId;
- else
+ /* Determine owning backend. */
+ switch (relform->relpersistence)
{
- /* Do it the hard way. */
- backend = GetTempNamespaceBackendId(relform->relnamespace);
- Assert(backend != InvalidBackendId);
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ if (isTempOrToastNamespace(relform->relnamespace))
+ backend = MyBackendId;
+ else
+ {
+ /* Do it the hard way. */
+ backend = GetTempNamespaceBackendId(relform->relnamespace);
+ Assert(backend != InvalidBackendId);
+ }
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relform->relpersistence);
+ backend = InvalidBackendId; /* placate compiler */
+ break;
}
ReleaseSysCache(tuple);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 8df12a1..1509686 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -849,20 +849,30 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
relation->rd_isnailed = false;
relation->rd_createSubid = InvalidSubTransactionId;
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
- relation->rd_istemp = relation->rd_rel->relistemp;
- if (!relation->rd_istemp)
- relation->rd_backend = InvalidBackendId;
- else if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
- relation->rd_backend = MyBackendId;
- else
+ switch (relation->rd_rel->relpersistence)
{
- /*
- * If it's a temporary table, but not one of ours, we have to use
- * the slow, grotty method to figure out the owning backend.
- */
- relation->rd_backend =
- GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
- Assert(relation->rd_backend != InvalidBackendId);
+ case RELPERSISTENCE_PERMANENT:
+ relation->rd_backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
+ relation->rd_backend = MyBackendId;
+ else
+ {
+ /*
+ * If it's a local temp table, but not one of ours, we have to
+ * use the slow, grotty method to figure out the owning
+ * backend.
+ */
+ relation->rd_backend =
+ GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
+ Assert(relation->rd_backend != InvalidBackendId);
+ }
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c",
+ relation->rd_rel->relpersistence);
+ break;
}
/*
@@ -1358,7 +1368,6 @@ formrdesc(const char *relationName, Oid relationReltype,
relation->rd_isnailed = true;
relation->rd_createSubid = InvalidSubTransactionId;
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
- relation->rd_istemp = false;
relation->rd_backend = InvalidBackendId;
/*
@@ -1384,11 +1393,8 @@ formrdesc(const char *relationName, Oid relationReltype,
if (isshared)
relation->rd_rel->reltablespace = GLOBALTABLESPACE_OID;
- /*
- * Likewise, we must know if a relation is temp ... but formrdesc is not
- * used for any temp relations.
- */
- relation->rd_rel->relistemp = false;
+ /* formrdesc is used only for permanent relations */
+ relation->rd_rel->relpersistence = RELPERSISTENCE_PERMANENT;
relation->rd_rel->relpages = 1;
relation->rd_rel->reltuples = 1;
@@ -2366,7 +2372,8 @@ RelationBuildLocalRelation(const char *relname,
Oid relid,
Oid reltablespace,
bool shared_relation,
- bool mapped_relation)
+ bool mapped_relation,
+ char relpersistence)
{
Relation rel;
MemoryContext oldcxt;
@@ -2440,10 +2447,6 @@ RelationBuildLocalRelation(const char *relname,
/* must flag that we have rels created in this transaction */
need_eoxact_work = true;
- /* it is temporary if and only if it is in my temp-table namespace */
- rel->rd_istemp = isTempOrToastNamespace(relnamespace);
- rel->rd_backend = rel->rd_istemp ? MyBackendId : InvalidBackendId;
-
/*
* create a new tuple descriptor from the one passed in. We do this
* partly to copy it into the cache context, and partly because the new
@@ -2483,6 +2486,21 @@ RelationBuildLocalRelation(const char *relname,
/* needed when bootstrapping: */
rel->rd_rel->relowner = BOOTSTRAP_SUPERUSERID;
+ /* set up persistence; rd_backend is a function of persistence type */
+ rel->rd_rel->relpersistence = relpersistence;
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_PERMANENT:
+ rel->rd_backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ rel->rd_backend = MyBackendId;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ break;
+ }
+
/*
* Insert relation physical and logical identifiers (OIDs) into the right
* places. Note that the physical ID (relfilenode) is initially the same
@@ -2491,7 +2509,6 @@ RelationBuildLocalRelation(const char *relname,
* map.
*/
rel->rd_rel->relisshared = shared_relation;
- rel->rd_rel->relistemp = rel->rd_istemp;
RelationGetRelid(rel) = relid;
@@ -2569,7 +2586,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
/* Allocate a new relfilenode */
newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
- relation->rd_backend);
+ relation->rd_rel->relpersistence);
/*
* Get a writable copy of the pg_class tuple for the given relation.
@@ -2592,7 +2609,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
newrnode.node = relation->rd_node;
newrnode.node.relNode = newrelfilenode;
newrnode.backend = relation->rd_backend;
- RelationCreateStorage(newrnode.node, relation->rd_istemp);
+ RelationCreateStorage(newrnode.node, relation->rd_rel->relpersistence);
smgrclosenode(newrnode);
/*
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 97c808b..56dcdd5 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -56,6 +56,6 @@ extern Oid GetNewOid(Relation relation);
extern Oid GetNewOidWithIndex(Relation relation, Oid indexId,
AttrNumber oidcolumn);
extern Oid GetNewRelFileNode(Oid reltablespace, Relation pg_class,
- BackendId backend);
+ char relpersistence);
#endif /* CATALOG_H */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 7795bda..646ab9c 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -40,6 +40,7 @@ extern Relation heap_create(const char *relname,
Oid relid,
TupleDesc tupDesc,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool allow_system_table_mods);
@@ -54,6 +55,7 @@ extern Oid heap_create_with_catalog(const char *relname,
TupleDesc tupdesc,
List *cooked_constraints,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool oidislocal,
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index f50cf9d..1edbfe3 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -49,7 +49,7 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
- bool relistemp; /* T if temporary relation */
+ char relpersistence; /* see RELPERSISTENCE_xxx constants */
char relkind; /* see RELKIND_xxx constants below */
int2 relnatts; /* number of user attributes */
@@ -108,7 +108,7 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltoastidxid 12
#define Anum_pg_class_relhasindex 13
#define Anum_pg_class_relisshared 14
-#define Anum_pg_class_relistemp 15
+#define Anum_pg_class_relpersistence 15
#define Anum_pg_class_relkind 16
#define Anum_pg_class_relnatts 17
#define Anum_pg_class_relchecks 18
@@ -132,13 +132,13 @@ typedef FormData_pg_class *Form_pg_class;
*/
/* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId */
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f f r 28 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f f r 19 0 f f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 19 0 f f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f f r 25 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 25 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f f r 27 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
#define RELKIND_INDEX 'i' /* secondary index */
@@ -149,4 +149,7 @@ DESCR("");
#define RELKIND_VIEW 'v' /* view */
#define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
+#define RELPERSISTENCE_PERMANENT 'p'
+#define RELPERSISTENCE_TEMP 't'
+
#endif /* PG_CLASS_H */
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index d7b8731..f086b1c 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -20,7 +20,7 @@
#include "storage/relfilenode.h"
#include "utils/relcache.h"
-extern void RelationCreateStorage(RelFileNode rnode, bool istemp);
+extern void RelationCreateStorage(RelFileNode rnode, char relpersistence);
extern void RelationDropStorage(Relation rel);
extern void RelationPreserveStorage(RelFileNode rnode);
extern void RelationTruncate(Relation rel, BlockNumber nblocks);
diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h
index 327fbc6..1e3f6ca 100644
--- a/src/include/commands/tablespace.h
+++ b/src/include/commands/tablespace.h
@@ -47,7 +47,7 @@ extern void AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt);
extern void TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo);
-extern Oid GetDefaultTablespace(bool forTemp);
+extern Oid GetDefaultTablespace(char relpersistence);
extern void PrepareTempTablespaces(void);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b17adf2..ba5ae37 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -74,7 +74,7 @@ typedef struct RangeVar
char *relname; /* the relation/sequence name */
InhOption inhOpt; /* expand rel by inheritance? recursively act
* on children? */
- bool istemp; /* is this a temp relation/sequence? */
+ char relpersistence; /* see RELPERSISTENCE_* in pg_class.h */
Alias *alias; /* table alias & optional column aliases */
int location; /* token location, or -1 if unknown */
} RangeVar;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 39e0365..88a3168 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -132,7 +132,6 @@ typedef struct RelationData
struct SMgrRelationData *rd_smgr; /* cached file handle, or NULL */
int rd_refcnt; /* reference count */
BackendId rd_backend; /* owning backend id, if temporary relation */
- bool rd_istemp; /* rel is a temporary relation */
bool rd_isnailed; /* rel is nailed in cache */
bool rd_isvalid; /* relcache entry is valid */
char rd_indexvalid; /* state of rd_indexlist: 0 = not valid, 1 =
@@ -390,6 +389,27 @@ typedef struct StdRdOptions
} while (0)
/*
+ * RelationNeedsWAL
+ * True if relation needs WAL.
+ */
+#define RelationNeedsWAL(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
+
+/*
+ * RelationUsesLocalBuffers
+ * True if relation's pages are stored in local buffers.
+ */
+#define RelationUsesLocalBuffers(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
+ * RelationUsesTempNamespace
+ * True if relation's catalog entries live in a private namespace.
+ */
+#define RelationUsesTempNamespace(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
* RELATION_IS_LOCAL
* If a rel is either temp or newly created in the current transaction,
* it can be assumed to be visible only to the current backend.
@@ -407,7 +427,8 @@ typedef struct StdRdOptions
* Beware of multiple eval of argument
*/
#define RELATION_IS_OTHER_TEMP(relation) \
- ((relation)->rd_istemp && (relation)->rd_backend != MyBackendId)
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP \
+ && (relation)->rd_backend != MyBackendId)
/* routines in utils/cache/relcache.c */
extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 10d82d4..3500050 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,8 @@ extern Relation RelationBuildLocalRelation(const char *relname,
Oid relid,
Oid reltablespace,
bool shared_relation,
- bool mapped_relation);
+ bool mapped_relation,
+ char relpersistence);
/*
* Routine to manage assignment of new relfilenode to a relation
relax-sync-commit-v1.patchapplication/octet-stream; name=relax-sync-commit-v1.patchDownload
commit bdd697e5f0a16db2a672e5e14d11744958364101
Author: Robert Haas <rhaas@postgresql.org>
Date: Sat Nov 13 09:52:11 2010 -0500
Assume synchronous_commit=off for transactions that don't write WAL.
This is advantageous for transactions that write only to temporary or
unlogged tables, where loss of the transaction commit record is not
critical.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d2e2e11..088daa0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -907,6 +907,7 @@ RecordTransactionCommit(void)
int nmsgs = 0;
SharedInvalidationMessage *invalMessages = NULL;
bool RelcacheInitFileInval = false;
+ bool wrote_xlog;
/* Get data needed for commit record */
nrels = smgrGetPendingDeletes(true, &rels);
@@ -914,6 +915,7 @@ RecordTransactionCommit(void)
if (XLogStandbyInfoActive())
nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
&RelcacheInitFileInval);
+ wrote_xlog = (XactLastRecEnd.xrecoff != 0);
/*
* If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -940,7 +942,7 @@ RecordTransactionCommit(void)
* assigned is a sequence advance record due to nextval() --- we want
* to flush that to disk before reporting commit.)
*/
- if (XactLastRecEnd.xrecoff == 0)
+ if (!wrote_xlog)
goto cleanup;
}
else
@@ -1028,16 +1030,21 @@ RecordTransactionCommit(void)
}
/*
- * Check if we want to commit asynchronously. If the user has set
- * synchronous_commit = off, and we're not doing cleanup of any non-temp
- * rels nor committing any command that wanted to force sync commit, then
- * we can defer flushing XLOG. (We must not allow asynchronous commit if
- * there are any non-temp tables to be deleted, because we might delete
- * the files before the COMMIT record is flushed to disk. We do allow
- * asynchronous commit if all to-be-deleted tables are temporary though,
- * since they are lost anyway if we crash.)
+ * Check if we want to commit asynchronously. If we're doing cleanup of
+ * any non-temp rels or committing any command that wanted to force sync
+ * commit, then we must flush XLOG immediately. (We must not allow
+ * asynchronous commit if there are any non-temp tables to be deleted,
+ * because we might delete the files before the COMMIT record is flushed to
+ * disk. We do allow asynchronous commit if all to-be-deleted tables are
+ * temporary though, since they are lost anyway if we crash.) Otherwise,
+ * we can defer the flush if either (1) the user has set synchronous_commit
+ * = off, or (2) the current transaction has not performed any WAL-logged
+ * operation. This latter case can arise if the only writes performed by
+ * the current transaction target temporary or unlogged relations. Loss
+ * of such a transaction won't matter anyway, because temp tables will be
+ * lost after a crash anyway, and unlogged ones will be truncated.
*/
- if (XactSyncCommit || forceSyncCommit || nrels > 0)
+ if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0)
{
/*
* Synchronous commit case:
On 11/30/2010 10:27 PM, Robert Haas wrote:
This appears as though you've somehow gotten a normal table connected
to an unlogged index. That certainly sounds like a bug, but there's
not enough details here to figure out what series of steps I should
perform to recreate the problem.There is \h help: +1
but I can find no way of determining the "tempness"/"unloggedness" of a
table via \d*It's clearly displayed in the \d output.
Unlogged Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
a | integer | not null
Indexes:
"test_pkey" PRIMARY KEY, btree (a)
Jeez... Were it a snake it'd a bit me!
Ok. I blew away my database and programs, re-gitted, re-patched (they work), re-compiled (ok), and re-ran initdb.
I have these non-standard settings:
shared_buffers = 512MB
work_mem = 5MB
checkpoint_segments = 7
1st) I can recreate some warning messages from vacuum:
WARNING: relation "ulone" page 0 is uninitialized --- fixing
WARNING: relation "pg_toast_16433" page 0 is uninitialized --- fixing
you create an unlogged table, fill it, restart pg (and it clears the table), then fill it again, and vacuum complains. Here is a log:
andy=# drop table ulone;
DROP TABLE
Time: 40.532 ms
andy=# create unlogged table ulone(id serial, a integer, b integer, c text);
NOTICE: CREATE TABLE will create implicit sequence "ulone_id_seq" for serial column "ulone.id"
CREATE TABLE
Time: 151.968 ms
andy=# insert into ulone(a, b, c) select x, 1, 'bbbbbbbbbbb' from generate_series(1, 10000000) x;
INSERT 0 10000000
Time: 80401.505 ms
andy=# \q
$ vacuumdb -az
vacuumdb: vacuuming database "andy"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"
$ sudo /etc/rc.d/postgresql stop
Stopping PostgreSQL: No directory, logging in with HOME=/
$ sudo /etc/rc.d/postgresql start
Starting PostgreSQL:
$ psql
Timing is on.
psql (9.1devel)
Type "help" for help.
andy=# select count(*) from ulone;
count
-------
0
(1 row)
Time: 1.164 ms
andy=# insert into ulone(a, b, c) select x, 1, 'bbbbbbbbbbb' from generate_series(1, 10000000) x;
INSERT 0 10000000
Time: 75312.753 ms
andy=# \q
$ vacuumdb -az
vacuumdb: vacuuming database "andy"
WARNING: relation "ulone" page 0 is uninitialized --- fixing
WARNING: relation "pg_toast_16478" page 0 is uninitialized --- fixing
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"
2nd) I can get the data to stick around after restart. Though not reliably. In general:
create and fill a table, vacuum it (not sure if its important, I do it because thats what I'd done in my pgbench testing where I noticed the data stuck around), wait an hour (I usually left it for 12-24 hours, but recreated it with as little as a half hour), then restart pg. Sometimes the data is there... sometimes not.
I also filled my table with more data than memory would hold so it would spill to disk, again, because it recreates my pgbench setup.
I'm still working on finding the exact steps, but I wanted to get you #1 above.
-Andy
2nd) I can get the data to stick around after restart. Though not reliably. In general:
create and fill a table, vacuum it (not sure if its important, I do it because thats what I'd done in my pgbench testing where I noticed the data stuck around), wait an hour (I usually left it for 12-24 hours, but recreated it with as little as a half hour), then restart pg. Sometimes the data is there... sometimes not.
I also filled my table with more data than memory would hold so it would spill to disk, again, because it recreates my pgbench setup.
I'm still working on finding the exact steps, but I wanted to get you #1 above.
-Andy
Ok, forget the time thing. Has nothing to do with it. (Which everyone already assumed I imagine).
Its truncate.
Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.
-Andy
Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:
Ok, forget the time thing. Has nothing to do with it. (Which everyone already assumed I imagine).
Its truncate.
Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.
Hmm, presumably the table rewrite thing in truncate is not preserving
the unlogged state (perhaps it's the swap-relfilenode business). Does
CLUSTER have a similar effect? What about VACUUM FULL? If so you know
where the bug is.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Dec 2, 2010 at 10:53 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:
Ok, forget the time thing. Has nothing to do with it. (Which everyone already assumed I imagine).
Its truncate.
Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.
Hmm, presumably the table rewrite thing in truncate is not preserving
the unlogged state (perhaps it's the swap-relfilenode business).
Oh ho. Right. Yeah, that case is not handled. Woopsie.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Nov 30, 2010 at 10:36 PM, Andy Colson <andy@squeakycode.net> wrote:
[ review ]
Currently, if you create an unlogged table, restart PG, and vacuum the
table, you'll get this:
rhaas=# vacuum unlogged;
WARNING: relation "unlogged" page 0 is uninitialized --- fixing
VACUUM
The reason this happens is because the init fork of an unlogged heap
consists of a single empty page, rather than a totally empty file. I
needed to WAL-log the creation of the init fork, and there's currently
no way to WAL-log the creation of an empty file other than the main
relation fork. I figured a file with one empty page would be just as
good as a totally empty file, and that way I could piggyback on
XLOG_HEAP_NEWPAGE, which will automatically create the relation fork
if it's not already there. However, as the above warning message
demonstrates, this was a bit too clever.
One possible fix is to change the XLOG_SMGR_CREATE record to carry a
fork number. Does that seem reasonable, or would anyone like to
recommend another approach?
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?
regards, tom lane
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?
So far I've come up with the following possible behaviors we could
theoretically implement:
1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.
The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.
#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. �Is it really a good idea to commandeer
the word "volatile" for this particular one?
So far I've come up with the following possible behaviors we could
theoretically implement:
1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.
The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.
#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.
OK. I agree that #3 adds a lot of complexity for not much of anything.
If you've got data that's static enough that #3 adds a useful amount
of safety, then you might as well be keeping it in a regular table.
I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?
regards, tom lane
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?So far I've come up with the following possible behaviors we could
theoretically implement:1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.OK. I agree that #3 adds a lot of complexity for not much of anything.
If you've got data that's static enough that #3 adds a useful amount
of safety, then you might as well be keeping it in a regular table.I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?
It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?
It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.
I assume you meant "all the other fork files", but OK. Still, couldn't
that be rather expensive in a large DB?
regards, tom lane
2010/12/7 Robert Haas <robertmhaas@gmail.com>:
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?So far I've come up with the following possible behaviors we could
theoretically implement:1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.OK. I agree that #3 adds a lot of complexity for not much of anything.
If you've got data that's static enough that #3 adds a useful amount
of safety, then you might as well be keeping it in a regular table.I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.
It might make sense to document this behavior : a 'simple' restart
might be way longer than before. I would probably issue a sync(1)
before restarting the server in such situation. (if the
unlogged-volatile tables are large)
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Tue, Dec 7, 2010 at 5:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.I assume you meant "all the other fork files", but OK.
Oh, good point.
Still, couldn't
that be rather expensive in a large DB?
Well, that's why I asked whether it would be acceptable to take that
approach. I'm guessing the overhead isn't too horrible. If you
didn't want to take this approach but did want to survive a clean
shutdown, you would need to fsync everything written since the last
checkpoint. The amount of additional stuff that needs to be written
here is just whatever you failed to write out during previous
checkpoints, which is probably not a ton.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
A very useful feature for unlogged tables would be the ability to
switch them back to normal tables -- this way you could do bulk
loading into an unlogged table and then turn it into a regular table
using just fsync(), bypassing all the WAL-logging overhead. It seems
this could even be implemented in pg_restore itself.
Which brings me to:
On Tue, Dec 7, 2010 at 20:44, Robert Haas <robertmhaas@gmail.com> wrote:
2. Any crash truncates the table, but a clean shutdown does not.
Seems that syncing on a clean shutdown could use the same
infrastructure as the above functionality.
Have you thought about switching unlogged tables back to logged? Are
there any significant obstacles?
Regards,
Marti
On Tue, 2010-12-07 at 13:17 -0500, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?
Note that DB2 uses the table modifier VOLATILE to indicate a table that
has a widely fluctuating table size, for example a queue table. It's
used as a declarative optimizer hint. So the term has many possible
meanings.
Prefer UNLOGGED or similar descriptive term.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Dec 8, 2010 at 9:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
Have you thought about switching unlogged tables back to logged? Are
there any significant obstacles?
I think it can be done, and I think it's useful, but I didn't want to
tackle it for version one, because it's not trivial.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Dec 8, 2010 at 10:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Tue, 2010-12-07 at 13:17 -0500, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?Note that DB2 uses the table modifier VOLATILE to indicate a table that
has a widely fluctuating table size, for example a queue table. It's
used as a declarative optimizer hint. So the term has many possible
meanings.Prefer UNLOGGED or similar descriptive term.
Hrm. The previous consensus seemed to be in favor of trying to
describe the behavior (your contents might disappear) rather than the
implementation (we don't WAL-log those contents). However, the fact
that DB2 uses that word to mean something entirely different is
certainly a bit awkward, so maybe we should reconsider. Or maybe not.
I'm not sure. Anyone else want to weigh in here?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote:
Simon Riggs <simon@2ndquadrant.com> wrote:
Note that DB2 uses the table modifier VOLATILE to indicate a
table that has a widely fluctuating table size, for example a
queue table.
the fact that DB2 uses that word to mean something entirely
different is certainly a bit awkward
It would be especially awkward should someone port their DB2
database to PostgreSQL without noticing the semantic difference, and
then find their data missing.
so maybe we should reconsider.
+1 for choosing terminology without known conflicts with other
significant products.
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Robert Haas <robertmhaas@gmail.com> wrote:
Simon Riggs <simon@2ndquadrant.com> wrote:
Note that DB2 uses the table modifier VOLATILE to indicate a
table that has a widely fluctuating table size, for example a
queue table.
the fact that DB2 uses that word to mean something entirely
different is certainly a bit awkward
It would be especially awkward should someone port their DB2
database to PostgreSQL without noticing the semantic difference, and
then find their data missing.
Not to mention that DB2 syntax tends to appear in the standard a few
years later.
so maybe we should reconsider.
+1 for choosing terminology without known conflicts with other
significant products.
Yeah. Given this info I'm strongly inclined to stick with UNLOGGED.
regards, tom lane
tgl@sss.pgh.pa.us (Tom Lane) writes:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Robert Haas <robertmhaas@gmail.com> wrote:
Simon Riggs <simon@2ndquadrant.com> wrote:
Note that DB2 uses the table modifier VOLATILE to indicate a
table that has a widely fluctuating table size, for example a
queue table.the fact that DB2 uses that word to mean something entirely
different is certainly a bit awkwardIt would be especially awkward should someone port their DB2
database to PostgreSQL without noticing the semantic difference, and
then find their data missing.Not to mention that DB2 syntax tends to appear in the standard a few
years later.
And the term "volatile" has well-understood connotations that are
analagous to those in DB2 in the C language and various descendants.
<http://en.wikipedia.org/wiki/Volatile_variable>
I'm not sure "UNLOGGED" is perfect... If "TEMPORARY" weren't already
taken, it would be pretty good.
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
FLASH would be an amusing choice. "PostgreSQL 9.1, now with support for
FLASH!"
--
output = ("cbbrowne" "@" "acm.org")
http://linuxdatabases.info/info/internet.html
I've told you for the fifty-thousandth time, stop exaggerating.
On Wed, Dec 8, 2010 at 1:37 PM, Chris Browne <cbbrowne@acm.org> wrote:
tgl@sss.pgh.pa.us (Tom Lane) writes:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Robert Haas <robertmhaas@gmail.com> wrote:
Simon Riggs <simon@2ndquadrant.com> wrote:
Note that DB2 uses the table modifier VOLATILE to indicate a
table that has a widely fluctuating table size, for example a
queue table.the fact that DB2 uses that word to mean something entirely
different is certainly a bit awkwardIt would be especially awkward should someone port their DB2
database to PostgreSQL without noticing the semantic difference, and
then find their data missing.Not to mention that DB2 syntax tends to appear in the standard a few
years later.And the term "volatile" has well-understood connotations that are
analagous to those in DB2 in the C language and various descendants.
<http://en.wikipedia.org/wiki/Volatile_variable>I'm not sure "UNLOGGED" is perfect... If "TEMPORARY" weren't already
taken, it would be pretty good.Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
FLASH would be an amusing choice. "PostgreSQL 9.1, now with support for
FLASH!"
The value of VOLATILE, I felt, was that it's sort of like a volatile
variable in C: it might suddenly change under you. I think that
TRANSIENT and EPHEMERAL and TENUOUS all imply that the table itself is
either temporary or, in the last case, not very dense, which isn't
really what we want to convey. I did consider EPHEMERAL myself, but
the more I think about it, the more wrong it sounds. Even the table's
contents are not really short-lived - they may easily last for months
or years. You just shouldn't rely on it. I cracked up this morning
imagining calling this CREATE UNRELIABLE TABLE, but I'm starting to
think UNLOGGED is as well as we're going to do.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
EVANESCENT.
David
2010/12/8 Kineticode Billing <david@kineticode.com>:
On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
EVANESCENT.
UNSAFE ?
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Dec 10, 2010, at 4:34 PM, Cédric Villemain wrote:
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
EVANESCENT.
UNSAFE ?
LOLZ.
David
On Wed, Dec 8, 2010 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah. Given this info I'm strongly inclined to stick with UNLOGGED.
OK. Here's an updated patch set with various fixes:
- Per musings from Tom, I've revisited and revised the logic that
cross-checks relpersistence when you try to create foreign keys. The
old logic was buggy and wrong.
- I fixed the bug Andy Colson found, whereby any operation that
rewrote an unlogged table would cause its indexes to lose their _init
forks, leading to bizarre behavior.
- I fixed another infelicity Andy Colson noted, whereby vacuuming an
unlogged heap after restart would warn about a zeroed page, by
extending XLOG_SMGR_CREATE with a fork number.
- I added support for hash indexes on unlogged tables (gin and gist
are still not yet supported).
I think the first patch (relpersistence-v4.patch) is ready to commit,
and the third patch to allow synchronous commits to become
asynchronous when it doesn't matter (relax-sync-commit-v1.patch)
doesn't seem to be changing much either, although I would appreciate
it if someone with more expertise than I have with our write-ahead
logging system would give it a quick once-over.
The main patch (unlogged-tables-v4.patch) needs more thought. Right
now, unlogged buffers are checkpointed, which I want to get rid of.
Andres Freund suggested we could get by with this and still survive a
clean shutdown if we fsync() every unlogged relation in the cluster
before shutting down, but I'm concerned about the case where one of
the fsync() calls fails. That's presumably already a problem with
checkpoints generally, and I haven't traced through the logic to see
exactly what happens, but I guess this would need similar treatment.
In a non-shutdown checkpoint, the checkpoint can just fail. In a
shutdown checkpoint, we presumably can't just refuse to exit, but it
shouldn't look like a clean shutdown...
As I was working on the hash index support, it occurred to me that at
some point in the future, we might want to allow an unlogged index on
a permanent table. With the current patch, an index is unlogged if
and only if the corresponding table is unlogged, and both the table
and the index are reset to empty on restart. But we could have a
slightly different flavor of index that, instead of being reset to
empty, just gets marked invalid, perhaps by truncating the file to
zero-length (and adding some code to treat that as something other
than a hard error). Perhaps you could even arrange for autovacuum to
kick off an automatic rebuild, though that might need to be
configurable since some people might not want an index rebuild kicking
off immediately after a crash/failover.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
relpersistence-v4.patchapplication/octet-stream; name=relpersistence-v4.patchDownload
commit 22837fa1a37fa2de0e8b072ebcfc03cd6477b808
Author: Robert Haas <rhaas@postgresql.org>
Date: Mon Aug 16 21:02:11 2010 -0400
Generalize concept of temporary relations to "relation persistence".
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp. It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.
For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 070cd92..9d857a0 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -304,7 +304,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
@@ -373,7 +373,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(lbuffer);
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
@@ -422,7 +422,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(rbuffer);
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 525f79c..74339c9 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -103,7 +103,7 @@ writeListPage(Relation index, Buffer buffer,
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecData rdata[2];
ginxlogInsertListPage data;
@@ -384,7 +384,7 @@ ginHeapTupleFastInsert(Relation index, GinState *ginstate,
*/
MarkBufferDirty(metabuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
@@ -564,7 +564,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
MarkBufferDirty(buffers[i]);
}
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fa70e4f..8681ede 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -55,7 +55,7 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
@@ -325,7 +325,7 @@ ginbuild(PG_FUNCTION_ARGS)
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 27326ac..5f20ac9 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -410,7 +410,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
MarkBufferDirty(metabuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
ginxlogUpdateMeta data;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 7dfecff..4b35acb 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -93,7 +93,7 @@ xlogVacuumPage(Relation index, Buffer buffer)
Assert(GinPageIsLeaf(page));
- if (index->rd_istemp)
+ if (!RelationNeedsWAL(index))
return;
data.node = index->rd_node;
@@ -308,7 +308,7 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
MarkBufferDirty(lBuffer);
MarkBufferDirty(dBuffer);
- if (!gvs->index->rd_istemp)
+ if (RelationNeedsWAL(gvs->index))
{
XLogRecPtr recptr;
XLogRecData rdata[4];
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index d6aaea2..b34830b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -115,7 +115,7 @@ gistbuild(PG_FUNCTION_ARGS)
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata;
@@ -401,7 +401,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
dist->page = BufferGetPage(dist->buffer);
}
- if (!state->r->rd_istemp)
+ if (RelationNeedsWAL(state->r))
{
XLogRecPtr recptr;
XLogRecData *rdata;
@@ -465,7 +465,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
MarkBufferDirty(state->stack->buffer);
- if (!state->r->rd_istemp)
+ if (RelationNeedsWAL(state->r))
{
OffsetNumber noffs = 0,
offs[1];
@@ -550,7 +550,7 @@ gistfindleaf(GISTInsertState *state, GISTSTATE *giststate)
opaque = GistPageGetOpaque(state->stack->page);
state->stack->lsn = PageGetLSN(state->stack->page);
- Assert(state->r->rd_istemp || !XLogRecPtrIsInvalid(state->stack->lsn));
+ Assert(!RelationNeedsWAL(state->r) || !XLogRecPtrIsInvalid(state->stack->lsn));
if (state->stack->blkno != GIST_ROOT_BLKNO &&
XLByteLT(state->stack->parent->lsn, opaque->nsn))
@@ -911,7 +911,7 @@ gistmakedeal(GISTInsertState *state, GISTSTATE *giststate)
}
/* say to xlog that insert is completed */
- if (state->needInsertComplete && !state->r->rd_istemp)
+ if (state->needInsertComplete && RelationNeedsWAL(state->r))
gistxlogInsertCompletion(state->r->rd_node, &(state->key), 1);
}
@@ -1011,7 +1011,7 @@ gistnewroot(Relation r, Buffer buffer, IndexTuple *itup, int len, ItemPointer ke
MarkBufferDirty(buffer);
- if (!r->rd_istemp)
+ if (RelationNeedsWAL(r))
{
XLogRecPtr recptr;
XLogRecData *rdata;
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index dbe9406..e02e72d 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -248,7 +248,7 @@ gistbulkdelete(PG_FUNCTION_ARGS)
PageIndexTupleDelete(page, todelete[i]);
GistMarkTuplesDeleted(page);
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecData *rdata;
XLogRecPtr recptr;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4fe3a73..4020906 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -124,7 +124,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
*
* During a rescan, don't make a new strategy object if we don't have to.
*/
- if (!scan->rs_rd->rd_istemp &&
+ if (!RelationUsesLocalBuffers(scan->rs_rd) &&
scan->rs_nblocks > NBuffers / 4)
{
allow_strat = scan->rs_allow_strat;
@@ -905,7 +905,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
elog(ERROR, "could not open relation with OID %u", relationId);
/* Make note that we've accessed a temporary relation */
- if (r->rd_istemp)
+ if (RelationUsesLocalBuffers(r))
MyXactAccessedTempRel = true;
pgstat_initstats(r);
@@ -951,7 +951,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
elog(ERROR, "could not open relation with OID %u", relationId);
/* Make note that we've accessed a temporary relation */
- if (r->rd_istemp)
+ if (RelationUsesLocalBuffers(r))
MyXactAccessedTempRel = true;
pgstat_initstats(r);
@@ -1917,7 +1917,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!(options & HEAP_INSERT_SKIP_WAL) && !relation->rd_istemp)
+ if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
{
xl_heap_insert xlrec;
xl_heap_header xlhdr;
@@ -2227,7 +2227,7 @@ l1:
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_delete xlrec;
XLogRecPtr recptr;
@@ -2780,7 +2780,7 @@ l2:
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
XLogRecPtr recptr = log_heap_update(relation, buffer, oldtup.t_self,
newbuf, heaptup,
@@ -3403,7 +3403,7 @@ l3:
* (Also, in a PITR log-shipping or 2PC environment, we have to have XLOG
* entries for everything anyway.)
*/
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_lock xlrec;
XLogRecPtr recptr;
@@ -3505,7 +3505,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_inplace xlrec;
XLogRecPtr recptr;
@@ -3867,8 +3867,8 @@ log_heap_clean(Relation reln, Buffer buffer,
XLogRecPtr recptr;
XLogRecData rdata[4];
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
xlrec.node = reln->rd_node;
xlrec.block = BufferGetBlockNumber(buffer);
@@ -3950,8 +3950,8 @@ log_heap_freeze(Relation reln, Buffer buffer,
XLogRecPtr recptr;
XLogRecData rdata[2];
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
/* nor when there are no tuples to freeze */
Assert(offcnt > 0);
@@ -3996,8 +3996,8 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
XLogRecData rdata[4];
Page page = BufferGetPage(newbuf);
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
if (HeapTupleIsHeapOnly(newtup))
info = XLOG_HEAP_HOT_UPDATE;
@@ -4997,7 +4997,7 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
* heap_sync - sync a heap, for use when no WAL has been written
*
* This forces the heap contents (including TOAST heap if any) down to disk.
- * If we skipped using WAL, and it's not a temp relation, we must force the
+ * If we skipped using WAL, and WAL is otherwise needed, we must force the
* relation down to disk before it's safe to commit the transaction. This
* requires writing out any dirty buffers and then doing a forced fsync.
*
@@ -5010,8 +5010,8 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
void
heap_sync(Relation rel)
{
- /* temp tables never need fsync */
- if (rel->rd_istemp)
+ /* non-WAL-logged tables never need fsync */
+ if (!RelationNeedsWAL(rel))
return;
/* main heap */
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ee5f38f..d1b08b3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
/*
* Emit a WAL HEAP_CLEAN record showing what we did
*/
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 19ca302..eb2dbff 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -277,8 +277,8 @@ end_heap_rewrite(RewriteState state)
}
/*
- * If the rel isn't temp, must fsync before commit. We use heap_sync to
- * ensure that the toast table gets fsync'd too.
+ * If the rel is WAL-logged, must fsync before commit. We use heap_sync
+ * to ensure that the toast table gets fsync'd too.
*
* It's obvious that we must do this when not WAL-logging. It's less
* obvious that we have to do it even if we did WAL-log the pages. The
@@ -287,7 +287,7 @@ end_heap_rewrite(RewriteState state)
* occurring during the rewriteheap operation won't have fsync'd data we
* wrote before the checkpoint.
*/
- if (!state->rs_new_rel->rd_istemp)
+ if (RelationNeedsWAL(state->rs_new_rel))
heap_sync(state->rs_new_rel);
/* Deleting the context frees everything */
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index eaad812..ee0f04c 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -766,7 +766,7 @@ _bt_insertonpg(Relation rel,
}
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_insert xlrec;
BlockNumber xldownlink;
@@ -1165,7 +1165,7 @@ _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,
}
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_split xlrec;
uint8 xlinfo;
@@ -1914,7 +1914,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
MarkBufferDirty(metabuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_newroot xlrec;
XLogRecPtr recptr;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index e0c0f21..2b44780 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -224,7 +224,7 @@ _bt_getroot(Relation rel, int access)
MarkBufferDirty(metabuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_newroot xlrec;
XLogRecPtr recptr;
@@ -452,7 +452,7 @@ _bt_checkpage(Relation rel, Buffer buf)
static void
_bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedXid)
{
- if (rel->rd_istemp)
+ if (!RelationNeedsWAL(rel))
return;
/* No ereport(ERROR) until changes are logged */
@@ -751,7 +751,7 @@ _bt_delitems_vacuum(Relation rel, Buffer buf,
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
@@ -829,7 +829,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecPtr recptr;
XLogRecData rdata[3];
@@ -1365,7 +1365,7 @@ _bt_pagedel(Relation rel, Buffer buf, BTStack stack)
MarkBufferDirty(lbuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_delete_page xlrec;
xl_btree_metadata xlmeta;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index a1d3aef..3fb43a2 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -211,9 +211,9 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
/*
* We need to log index creation in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp index.
+ * enabled UNLESS the index isn't WAL-logged anyway.
*/
- wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;
+ wstate.btws_use_wal = XLogIsNeeded() && RelationNeedsWAL(wstate.index);
/* reserve the metapage */
wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
@@ -797,9 +797,9 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
_bt_uppershutdown(wstate, state);
/*
- * If the index isn't temp, we must fsync it down to disk before it's safe
- * to commit the transaction. (For a temp index we don't care since the
- * index will be uninteresting after a crash anyway.)
+ * If the index is WAL-logged, we must fsync it down to disk before it's
+ * safe to commit the transaction. (For a non-WAL-logged index we don't
+ * care since the index will be uninteresting after a crash anyway.)
*
* It's obvious that we must do this when not WAL-logging the build. It's
* less obvious that we have to do it even if we did WAL-log the index
@@ -811,7 +811,7 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
* fsync those pages here, they might still not be on disk when the crash
* occurs.
*/
- if (!wstate->index->rd_istemp)
+ if (RelationNeedsWAL(wstate->index))
{
RelationOpenSmgr(wstate->index);
smgrimmedsync(wstate->index->rd_smgr, MAIN_FORKNUM);
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index e475403..73ef114 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -219,6 +219,7 @@ Boot_CreateStmt:
$3,
tupdesc,
RELKIND_RELATION,
+ RELPERSISTENCE_PERMANENT,
shared_relation,
mapped_relation,
true);
@@ -238,6 +239,7 @@ Boot_CreateStmt:
tupdesc,
NIL,
RELKIND_RELATION,
+ RELPERSISTENCE_PERMANENT,
shared_relation,
mapped_relation,
true,
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6322512..88b5c2a 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -524,12 +524,26 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
* created by bootstrap have preassigned OIDs, so there's no need.
*/
Oid
-GetNewRelFileNode(Oid reltablespace, Relation pg_class, BackendId backend)
+GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
{
RelFileNodeBackend rnode;
char *rpath;
int fd;
bool collides;
+ BackendId backend;
+
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_TEMP:
+ backend = MyBackendId;
+ break;
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ return InvalidOid; /* placate compiler */
+ }
/* This logic should match RelationInitPhysicalAddr */
rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9b7668c..bcf6caa 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -238,6 +238,7 @@ heap_create(const char *relname,
Oid relid,
TupleDesc tupDesc,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool allow_system_table_mods)
@@ -311,7 +312,8 @@ heap_create(const char *relname,
relid,
reltablespace,
shared_relation,
- mapped_relation);
+ mapped_relation,
+ relpersistence);
/*
* Have the storage manager create the relation's disk file, if needed.
@@ -322,7 +324,7 @@ heap_create(const char *relname,
if (create_storage)
{
RelationOpenSmgr(rel);
- RelationCreateStorage(rel->rd_node, rel->rd_istemp);
+ RelationCreateStorage(rel->rd_node, relpersistence);
}
return rel;
@@ -693,7 +695,7 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
- values[Anum_pg_class_relistemp - 1] = BoolGetDatum(rd_rel->relistemp);
+ values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
values[Anum_pg_class_relkind - 1] = CharGetDatum(rd_rel->relkind);
values[Anum_pg_class_relnatts - 1] = Int16GetDatum(rd_rel->relnatts);
values[Anum_pg_class_relchecks - 1] = Int16GetDatum(rd_rel->relchecks);
@@ -898,6 +900,7 @@ heap_create_with_catalog(const char *relname,
TupleDesc tupdesc,
List *cooked_constraints,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool oidislocal,
@@ -997,8 +1000,7 @@ heap_create_with_catalog(const char *relname,
}
else
relid = GetNewRelFileNode(reltablespace, pg_class_desc,
- isTempOrToastNamespace(relnamespace) ?
- MyBackendId : InvalidBackendId);
+ relpersistence);
}
/*
@@ -1036,6 +1038,7 @@ heap_create_with_catalog(const char *relname,
relid,
tupdesc,
relkind,
+ relpersistence,
shared_relation,
mapped_relation,
allow_system_table_mods);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b437c99..8fbe8eb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -545,6 +545,7 @@ index_create(Oid heapRelationId,
bool is_exclusion;
Oid namespaceId;
int i;
+ char relpersistence;
is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -561,11 +562,13 @@ index_create(Oid heapRelationId,
/*
* The index will be in the same namespace as its parent table, and is
* shared across databases if and only if the parent is. Likewise, it
- * will use the relfilenode map if and only if the parent does.
+ * will use the relfilenode map if and only if the parent does; and it
+ * inherits the parent's relpersistence.
*/
namespaceId = RelationGetNamespace(heapRelation);
shared_relation = heapRelation->rd_rel->relisshared;
mapped_relation = RelationIsMapped(heapRelation);
+ relpersistence = heapRelation->rd_rel->relpersistence;
/*
* check parameters
@@ -646,9 +649,7 @@ index_create(Oid heapRelationId,
else
{
indexRelationId =
- GetNewRelFileNode(tableSpaceId, pg_class,
- heapRelation->rd_istemp ?
- MyBackendId : InvalidBackendId);
+ GetNewRelFileNode(tableSpaceId, pg_class, relpersistence);
}
}
@@ -663,6 +664,7 @@ index_create(Oid heapRelationId,
indexRelationId,
indexTupDesc,
RELKIND_INDEX,
+ relpersistence,
shared_relation,
mapped_relation,
allow_system_table_mods);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 653c9ad..84cbfeb 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -235,14 +235,14 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
}
/*
- * If istemp is set, this is a reference to a temp relation. The parser
- * never generates such a RangeVar in simple DML, but it can happen in
- * contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)". Such a
- * command will generate an added CREATE INDEX operation, which must be
+ * Some non-default relpersistence value may have been specified. The
+ * parser never generates such a RangeVar in simple DML, but it can happen
+ * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)". Such
+ * a command will generate an added CREATE INDEX operation, which must be
* careful to find the temp table, even when pg_temp is not first in the
* search path.
*/
- if (relation->istemp)
+ if (relation->relpersistence == RELPERSISTENCE_TEMP)
{
if (relation->schemaname)
ereport(ERROR,
@@ -308,7 +308,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
newRelation->relname)));
}
- if (newRelation->istemp)
+ if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
{
/* TEMP tables are created in our backend-local temp namespace */
if (newRelation->schemaname)
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 0ce2051..671aaff 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -95,19 +95,35 @@ typedef struct xl_smgr_truncate
* transaction aborts later on, the storage will be destroyed.
*/
void
-RelationCreateStorage(RelFileNode rnode, bool istemp)
+RelationCreateStorage(RelFileNode rnode, char relpersistence)
{
PendingRelDelete *pending;
XLogRecPtr lsn;
XLogRecData rdata;
xl_smgr_create xlrec;
SMgrRelation srel;
- BackendId backend = istemp ? MyBackendId : InvalidBackendId;
+ BackendId backend;
+ bool needs_wal;
+
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_TEMP:
+ backend = MyBackendId;
+ needs_wal = false;
+ break;
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ needs_wal = true;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ return; /* placate compiler */
+ }
srel = smgropen(rnode, backend);
smgrcreate(srel, MAIN_FORKNUM, false);
- if (!istemp)
+ if (needs_wal)
{
/*
* Make an XLOG entry reporting the file creation.
@@ -253,7 +269,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
* failure to truncate, that might spell trouble at WAL replay, into a
* certain PANIC.
*/
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
/*
* Make an XLOG entry reporting the file truncation.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7bf64e2..d1f6c9f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -195,7 +195,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
* Toast tables for regular relations go in pg_toast; those for temp
* relations go into the per-backend temp-toast-table namespace.
*/
- if (rel->rd_backend == MyBackendId)
+ if (RelationUsesTempNamespace(rel))
namespaceid = GetTempToastNamespace();
else
namespaceid = PG_TOAST_NAMESPACE;
@@ -216,6 +216,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
tupdesc,
NIL,
RELKIND_TOASTVALUE,
+ rel->rd_rel->relpersistence,
shared_relation,
mapped_relation,
true,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index e1dbd6d..249067f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -675,6 +675,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace)
tupdesc,
NIL,
OldHeap->rd_rel->relkind,
+ OldHeap->rd_rel->relpersistence,
false,
RelationIsMapped(OldHeap),
true,
@@ -789,9 +790,9 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp rel.
+ * enabled AND it's not a WAL-logged rel.
*/
- use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;
+ use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
/* use_wal off requires smgr_targblock be initially invalid */
Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9407d0f..0940893 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -222,7 +222,7 @@ DefineIndex(RangeVar *heapRelation,
}
else
{
- tablespaceId = GetDefaultTablespace(rel->rd_istemp);
+ tablespaceId = GetDefaultTablespace(rel->rd_rel->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -1706,7 +1706,7 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
continue;
/* Skip temp tables of other backends; we can't reindex them at all */
- if (classtuple->relistemp &&
+ if (classtuple->relpersistence == RELPERSISTENCE_TEMP &&
!isTempNamespace(classtuple->relnamespace))
continue;
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index bb8ebce..e1df5fb 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -366,7 +366,7 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -448,7 +448,7 @@ AlterSequence(AlterSeqStmt *stmt)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!seqrel->rd_istemp)
+ if (RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -678,7 +678,7 @@ nextval_internal(Oid relid)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (logit && !seqrel->rd_istemp)
+ if (logit && RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -855,7 +855,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!seqrel->rd_istemp)
+ if (RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 937992b..6729d83 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -224,7 +224,7 @@ static const struct dropmsgstrings dropmsgstringarray[] = {
static void truncate_check_rel(Relation rel);
-static List *MergeAttributes(List *schema, List *supers, bool istemp,
+static List *MergeAttributes(List *schema, List *supers, char relpersistence,
List **supOids, List **supconstr, int *supOidCount);
static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
static bool change_varattnos_walker(Node *node, const AttrNumber *newattno);
@@ -339,7 +339,7 @@ static void ATPrepAddInherit(Relation child_rel);
static void ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode);
static void ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode);
static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
- ForkNumber forkNum, bool istemp);
+ ForkNumber forkNum, char relpersistence);
static const char *storage_name(char c);
@@ -391,7 +391,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
/*
* Check consistency of arguments
*/
- if (stmt->oncommit != ONCOMMIT_NOOP && !stmt->relation->istemp)
+ if (stmt->oncommit != ONCOMMIT_NOOP
+ && stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("ON COMMIT can only be used on temporary tables")));
@@ -401,7 +402,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
* code. This is needed because calling code might not expect untrusted
* tables to appear in pg_temp at the front of its search path.
*/
- if (stmt->relation->istemp && InSecurityRestrictedOperation())
+ if (stmt->relation->relpersistence == RELPERSISTENCE_TEMP
+ && InSecurityRestrictedOperation())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("cannot create temporary table within security-restricted operation")));
@@ -434,7 +436,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
}
else
{
- tablespaceId = GetDefaultTablespace(stmt->relation->istemp);
+ tablespaceId = GetDefaultTablespace(stmt->relation->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -478,7 +480,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
* inherited attributes.
*/
schema = MergeAttributes(schema, stmt->inhRelations,
- stmt->relation->istemp,
+ stmt->relation->relpersistence,
&inheritOids, &old_constraints, &parentOidCount);
/*
@@ -557,6 +559,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
list_concat(cookedDefaults,
old_constraints),
relkind,
+ stmt->relation->relpersistence,
false,
false,
localHasOids,
@@ -1208,7 +1211,7 @@ storage_name(char c)
*----------
*/
static List *
-MergeAttributes(List *schema, List *supers, bool istemp,
+MergeAttributes(List *schema, List *supers, char relpersistence,
List **supOids, List **supconstr, int *supOidCount)
{
ListCell *entry;
@@ -1316,7 +1319,8 @@ MergeAttributes(List *schema, List *supers, bool istemp,
errmsg("inherited relation \"%s\" is not a table",
parent->relname)));
/* Permanent rels cannot inherit from temporary ones */
- if (!istemp && relation->rd_istemp)
+ if (relpersistence != RELPERSISTENCE_TEMP
+ && RelationUsesTempNamespace(relation))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot inherit from temporary relation \"%s\"",
@@ -5124,26 +5128,27 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
RelationGetRelationName(pkrel))));
/*
- * Disallow reference from permanent table to temp table or vice versa.
- * (The ban on perm->temp is for fairly obvious reasons. The ban on
- * temp->perm is because other backends might need to run the RI triggers
- * on the perm table, but they can't reliably see tuples the owning
- * backend has created in the temp table, because non-shared buffers are
- * used for temp tables.)
+ * References from permanent tables to temp tables are disallowed because
+ * the contents of the temp table disappear at the end of each session.
+ * References from temp tables to permanent tables are also disallowed,
+ * because other backends might need to run the RI triggers on the perm
+ * table, but they can't reliably see tuples in the local buffers of other
+ * backends.
*/
- if (pkrel->rd_istemp)
+ switch (rel->rd_rel->relpersistence)
{
- if (!rel->rd_istemp)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("cannot reference temporary table from permanent table constraint")));
- }
- else
- {
- if (rel->rd_istemp)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("cannot reference permanent table from temporary table constraint")));
+ case RELPERSISTENCE_PERMANENT:
+ if (pkrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("constraints on permanent tables may reference only permanent tables")));
+ break;
+ case RELPERSISTENCE_TEMP:
+ if (pkrel->rd_rel->relpersistence != RELPERSISTENCE_TEMP)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("constraints on temporary tables may reference only temporary tables")));
+ break;
}
/*
@@ -7347,7 +7352,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
* Relfilenodes are not unique across tablespaces, so we need to allocate
* a new one in the new tablespace.
*/
- newrelfilenode = GetNewRelFileNode(newTableSpace, NULL, rel->rd_backend);
+ newrelfilenode = GetNewRelFileNode(newTableSpace, NULL,
+ rel->rd_rel->relpersistence);
/* Open old and new relation */
newrnode = rel->rd_node;
@@ -7364,10 +7370,11 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
* NOTE: any conflict in relfilenode value will be caught in
* RelationCreateStorage().
*/
- RelationCreateStorage(newrnode, rel->rd_istemp);
+ RelationCreateStorage(newrnode, rel->rd_rel->relpersistence);
/* copy main fork */
- copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM, rel->rd_istemp);
+ copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM,
+ rel->rd_rel->relpersistence);
/* copy those extra forks that exist */
for (forkNum = MAIN_FORKNUM + 1; forkNum <= MAX_FORKNUM; forkNum++)
@@ -7375,7 +7382,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
if (smgrexists(rel->rd_smgr, forkNum))
{
smgrcreate(dstrel, forkNum, false);
- copy_relation_data(rel->rd_smgr, dstrel, forkNum, rel->rd_istemp);
+ copy_relation_data(rel->rd_smgr, dstrel, forkNum,
+ rel->rd_rel->relpersistence);
}
}
@@ -7410,7 +7418,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
*/
static void
copy_relation_data(SMgrRelation src, SMgrRelation dst,
- ForkNumber forkNum, bool istemp)
+ ForkNumber forkNum, char relpersistence)
{
char *buf;
Page page;
@@ -7429,9 +7437,9 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp rel.
+ * enabled AND it's a permanent relation.
*/
- use_wal = XLogIsNeeded() && !istemp;
+ use_wal = XLogIsNeeded() && relpersistence == RELPERSISTENCE_PERMANENT;
nblocks = smgrnblocks(src, forkNum);
@@ -7470,7 +7478,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
* wouldn't replay our earlier WAL entries. If we do not fsync those pages
* here, they might still not be on disk when the crash occurs.
*/
- if (!istemp)
+ if (relpersistence == RELPERSISTENCE_PERMANENT)
smgrimmedsync(dst, forkNum);
}
@@ -7538,7 +7546,8 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
ATSimplePermissions(parent_rel, false, false);
/* Permanent rels cannot inherit from temporary ones */
- if (parent_rel->rd_istemp && !child_rel->rd_istemp)
+ if (RelationUsesTempNamespace(parent_rel)
+ && !RelationUsesTempNamespace(child_rel))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot inherit from temporary relation \"%s\"",
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 5ba0f1c..227b4b0 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1050,8 +1050,8 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
/*
* GetDefaultTablespace -- get the OID of the current default tablespace
*
- * Regular objects and temporary objects have different default tablespaces,
- * hence the forTemp parameter must be specified.
+ * Temporary objects have different default tablespaces, hence the
+ * relpersistence parameter must be specified.
*
* May return InvalidOid to indicate "use the database's default tablespace".
*
@@ -1062,12 +1062,12 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
* default_tablespace GUC variable.
*/
Oid
-GetDefaultTablespace(bool forTemp)
+GetDefaultTablespace(char relpersistence)
{
Oid result;
/* The temp-table case is handled elsewhere */
- if (forTemp)
+ if (relpersistence == RELPERSISTENCE_TEMP)
{
PrepareTempTablespaces();
return GetNextTempTableSpace();
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 0ac993f..cbdf97d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -268,10 +268,10 @@ static void
vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
{
/*
- * No need to log changes for temp tables, they do not contain data
- * visible on the standby server.
+ * Skip this for relations for which no WAL is to be written, or if we're
+ * not trying to support archive recovery.
*/
- if (rel->rd_istemp || !XLogIsNeeded())
+ if (!RelationNeedsWAL(rel) || !XLogIsNeeded())
return;
/*
@@ -664,8 +664,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
if (nfrozen > 0)
{
MarkBufferDirty(buf);
- /* no XLOG for temp tables, though */
- if (!onerel->rd_istemp)
+ if (RelationNeedsWAL(onerel))
{
XLogRecPtr recptr;
@@ -895,7 +894,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!onerel->rd_istemp)
+ if (RelationNeedsWAL(onerel))
{
XLogRecPtr recptr;
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 09ab24b..2b2b908 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -68,10 +68,10 @@ isViewOnTempTable_walker(Node *node, void *context)
if (rte->rtekind == RTE_RELATION)
{
Relation rel = heap_open(rte->relid, AccessShareLock);
- bool istemp = rel->rd_istemp;
+ char relpersistence = rel->rd_rel->relpersistence;
heap_close(rel, AccessShareLock);
- if (istemp)
+ if (relpersistence == RELPERSISTENCE_TEMP)
return true;
}
}
@@ -173,9 +173,9 @@ DefineVirtualRelation(const RangeVar *relation, List *tlist, bool replace)
/*
* Due to the namespace visibility rules for temporary objects, we
* should only end up replacing a temporary view with another
- * temporary view, and vice versa.
+ * temporary view, and similarly for permanent views.
*/
- Assert(relation->istemp == rel->rd_istemp);
+ Assert(relation->relpersistence == rel->rd_rel->relpersistence);
/*
* Create a tuple descriptor to compare against the existing view, and
@@ -454,10 +454,11 @@ DefineView(ViewStmt *stmt, const char *queryString)
* schema name.
*/
view = stmt->view;
- if (!view->istemp && isViewOnTempTable(viewParse))
+ if (view->relpersistence == RELPERSISTENCE_PERMANENT
+ && isViewOnTempTable(viewParse))
{
view = copyObject(view); /* don't corrupt original command */
- view->istemp = true;
+ view->relpersistence = RELPERSISTENCE_TEMP;
ereport(NOTICE,
(errmsg("view \"%s\" will be a temporary view",
view->relname)));
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 69f3a28..c4719f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2131,7 +2131,8 @@ OpenIntoRel(QueryDesc *queryDesc)
/*
* Check consistency of arguments
*/
- if (into->onCommit != ONCOMMIT_NOOP && !into->rel->istemp)
+ if (into->onCommit != ONCOMMIT_NOOP
+ && into->rel->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("ON COMMIT can only be used on temporary tables")));
@@ -2141,7 +2142,8 @@ OpenIntoRel(QueryDesc *queryDesc)
* code. This is needed because calling code might not expect untrusted
* tables to appear in pg_temp at the front of its search path.
*/
- if (into->rel->istemp && InSecurityRestrictedOperation())
+ if (into->rel->relpersistence == RELPERSISTENCE_TEMP
+ && InSecurityRestrictedOperation())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("cannot create temporary table within security-restricted operation")));
@@ -2168,7 +2170,7 @@ OpenIntoRel(QueryDesc *queryDesc)
}
else
{
- tablespaceId = GetDefaultTablespace(into->rel->istemp);
+ tablespaceId = GetDefaultTablespace(into->rel->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -2208,6 +2210,7 @@ OpenIntoRel(QueryDesc *queryDesc)
tupdesc,
NIL,
RELKIND_RELATION,
+ into->rel->relpersistence,
false,
false,
true,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 735322e..4e1f221 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -958,7 +958,7 @@ _copyRangeVar(RangeVar *from)
COPY_STRING_FIELD(schemaname);
COPY_STRING_FIELD(relname);
COPY_SCALAR_FIELD(inhOpt);
- COPY_SCALAR_FIELD(istemp);
+ COPY_SCALAR_FIELD(relpersistence);
COPY_NODE_FIELD(alias);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 2d2b8c7..85cded0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -104,7 +104,7 @@ _equalRangeVar(RangeVar *a, RangeVar *b)
COMPARE_STRING_FIELD(schemaname);
COMPARE_STRING_FIELD(relname);
COMPARE_SCALAR_FIELD(inhOpt);
- COMPARE_SCALAR_FIELD(istemp);
+ COMPARE_SCALAR_FIELD(relpersistence);
COMPARE_NODE_FIELD(alias);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 4b268f3..f06f73b 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -15,6 +15,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -378,7 +379,7 @@ makeRangeVar(char *schemaname, char *relname, int location)
r->schemaname = schemaname;
r->relname = relname;
r->inhOpt = INH_DEFAULT;
- r->istemp = false;
+ r->relpersistence = RELPERSISTENCE_PERMANENT;
r->alias = NULL;
r->location = location;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 5d09e16..7d77d84 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -841,7 +841,7 @@ _outRangeVar(StringInfo str, RangeVar *node)
WRITE_STRING_FIELD(schemaname);
WRITE_STRING_FIELD(relname);
WRITE_ENUM_FIELD(inhOpt, InhOption);
- WRITE_BOOL_FIELD(istemp);
+ WRITE_CHAR_FIELD(relpersistence);
WRITE_NODE_FIELD(alias);
WRITE_LOCATION_FIELD(location);
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2166a5d..933d58a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -373,7 +373,7 @@ _readRangeVar(void)
READ_STRING_FIELD(schemaname);
READ_STRING_FIELD(relname);
READ_ENUM_FIELD(inhOpt, InhOption);
- READ_BOOL_FIELD(istemp);
+ READ_CHAR_FIELD(relpersistence);
READ_NODE_FIELD(alias);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9ec75f7..8fc79b6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -311,7 +311,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
%type <fun_param_mode> arg_class
%type <typnam> func_return func_type
-%type <boolean> OptTemp opt_trusted opt_restart_seqs
+%type <boolean> opt_trusted opt_restart_seqs
+%type <ival> OptTemp
%type <oncommit> OnCommitOption
%type <node> for_locking_item
@@ -2280,7 +2281,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptInherit OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->relation = $4;
n->tableElts = $6;
n->inhRelations = $8;
@@ -2296,7 +2297,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $7->istemp = $2;
+ $7->relpersistence = $2;
n->relation = $7;
n->tableElts = $9;
n->inhRelations = $11;
@@ -2311,7 +2312,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTypedTableElementList OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->relation = $4;
n->tableElts = $7;
n->ofTypename = makeTypeNameFromNameList($6);
@@ -2327,7 +2328,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTypedTableElementList OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $7->istemp = $2;
+ $7->relpersistence = $2;
n->relation = $7;
n->tableElts = $10;
n->ofTypename = makeTypeNameFromNameList($9);
@@ -2348,13 +2349,13 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
* NOTE: we accept both GLOBAL and LOCAL options; since we have no modules
* the LOCAL keyword is really meaningless.
*/
-OptTemp: TEMPORARY { $$ = TRUE; }
- | TEMP { $$ = TRUE; }
- | LOCAL TEMPORARY { $$ = TRUE; }
- | LOCAL TEMP { $$ = TRUE; }
- | GLOBAL TEMPORARY { $$ = TRUE; }
- | GLOBAL TEMP { $$ = TRUE; }
- | /*EMPTY*/ { $$ = FALSE; }
+OptTemp: TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | LOCAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | LOCAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | GLOBAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | GLOBAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; }
;
OptTableElementList:
@@ -2834,7 +2835,7 @@ CreateAsStmt:
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("CREATE TABLE AS cannot specify INTO"),
parser_errposition(exprLocation((Node *) n->intoClause))));
- $4->rel->istemp = $2;
+ $4->rel->relpersistence = $2;
n->intoClause = $4;
/* Implement WITH NO DATA by forcing top-level LIMIT 0 */
if (!$7)
@@ -2900,7 +2901,7 @@ CreateSeqStmt:
CREATE OptTemp SEQUENCE qualified_name OptSeqOptList
{
CreateSeqStmt *n = makeNode(CreateSeqStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->sequence = $4;
n->options = $5;
n->ownerId = InvalidOid;
@@ -6621,7 +6622,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
{
ViewStmt *n = makeNode(ViewStmt);
n->view = $4;
- n->view->istemp = $2;
+ n->view->relpersistence = $2;
n->aliases = $5;
n->query = $7;
n->replace = false;
@@ -6632,7 +6633,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
{
ViewStmt *n = makeNode(ViewStmt);
n->view = $6;
- n->view->istemp = $4;
+ n->view->relpersistence = $4;
n->aliases = $7;
n->query = $9;
n->replace = true;
@@ -7328,7 +7329,7 @@ ExecuteStmt: EXECUTE name execute_param_clause
ExecuteStmt *n = makeNode(ExecuteStmt);
n->name = $7;
n->params = $8;
- $4->rel->istemp = $2;
+ $4->rel->relpersistence = $2;
n->into = $4;
if ($4->colNames)
ereport(ERROR,
@@ -7889,42 +7890,42 @@ OptTempTableName:
TEMPORARY opt_table qualified_name
{
$$ = $3;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| TEMP opt_table qualified_name
{
$$ = $3;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| LOCAL TEMPORARY opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| LOCAL TEMP opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| GLOBAL TEMPORARY opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| GLOBAL TEMP opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| TABLE qualified_name
{
$$ = $2;
- $$->istemp = false;
+ $$->relpersistence = RELPERSISTENCE_PERMANENT;
}
| qualified_name
{
$$ = $1;
- $$->istemp = false;
+ $$->relpersistence = RELPERSISTENCE_PERMANENT;
}
;
@@ -10916,16 +10917,12 @@ qualified_name_list:
qualified_name:
ColId
{
- $$ = makeNode(RangeVar);
- $$->catalogname = NULL;
- $$->schemaname = NULL;
- $$->relname = $1;
- $$->location = @1;
+ $$ = makeRangeVar(NULL, $1, @1);
}
| ColId indirection
{
check_qualified_name($2, yyscanner);
- $$ = makeNode(RangeVar);
+ $$ = makeRangeVar(NULL, NULL, @1);
switch (list_length($2))
{
case 1:
@@ -10946,7 +10943,6 @@ qualified_name:
parser_errposition(@1)));
break;
}
- $$->location = @1;
}
;
@@ -12163,6 +12159,7 @@ makeRangeVarFromAnyName(List *names, int position, core_yyscan_t yyscanner)
break;
}
+ r->relpersistence = RELPERSISTENCE_PERMANENT;
r->location = position;
return r;
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index a8aee20..aa7c144 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -158,10 +158,11 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
* If the target relation name isn't schema-qualified, make it so. This
* prevents some corner cases in which added-on rewritten commands might
* think they should apply to other relations that have the same name and
- * are earlier in the search path. "istemp" is equivalent to a
- * specification of pg_temp, so no need for anything extra in that case.
+ * are earlier in the search path. But a local temp table is effectively
+ * specified to be in pg_temp, so no need for anything extra in that case.
*/
- if (stmt->relation->schemaname == NULL && !stmt->relation->istemp)
+ if (stmt->relation->schemaname == NULL
+ && stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
{
Oid namespaceid = RangeVarGetCreationNamespace(stmt->relation);
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c7d704d..89b2540 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1975,7 +1975,7 @@ do_autovacuum(void)
* Check if it is a temp table (presumably, of some other backend's).
* We cannot safely process other backends' temp tables.
*/
- if (classForm->relistemp)
+ if (classForm->relpersistence == RELPERSISTENCE_TEMP)
{
int backendID;
@@ -2072,7 +2072,7 @@ do_autovacuum(void)
/*
* We cannot safely process other backends' temp tables, so skip 'em.
*/
- if (classForm->relistemp)
+ if (classForm->relpersistence == RELPERSISTENCE_TEMP)
continue;
relid = HeapTupleGetOid(tuple);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index edc4977..860e736 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -123,7 +123,7 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
/* Open it at the smgr level if not already done */
RelationOpenSmgr(reln);
- if (reln->rd_istemp)
+ if (RelationUsesLocalBuffers(reln))
{
/* see comments in ReadBufferExtended */
if (RELATION_IS_OTHER_TEMP(reln))
@@ -2071,7 +2071,7 @@ FlushRelationBuffers(Relation rel)
/* Open rel at the smgr level if not already done */
RelationOpenSmgr(rel);
- if (rel->rd_istemp)
+ if (RelationUsesLocalBuffers(rel))
{
for (i = 0; i < NLocBuffer; i++)
{
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index f5250a2..e352cda 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -612,16 +612,26 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
- /* If temporary, determine owning backend. */
- if (!relform->relistemp)
- backend = InvalidBackendId;
- else if (isTempOrToastNamespace(relform->relnamespace))
- backend = MyBackendId;
- else
+ /* Determine owning backend. */
+ switch (relform->relpersistence)
{
- /* Do it the hard way. */
- backend = GetTempNamespaceBackendId(relform->relnamespace);
- Assert(backend != InvalidBackendId);
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ if (isTempOrToastNamespace(relform->relnamespace))
+ backend = MyBackendId;
+ else
+ {
+ /* Do it the hard way. */
+ backend = GetTempNamespaceBackendId(relform->relnamespace);
+ Assert(backend != InvalidBackendId);
+ }
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relform->relpersistence);
+ backend = InvalidBackendId; /* placate compiler */
+ break;
}
ReleaseSysCache(tuple);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 8df12a1..1509686 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -849,20 +849,30 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
relation->rd_isnailed = false;
relation->rd_createSubid = InvalidSubTransactionId;
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
- relation->rd_istemp = relation->rd_rel->relistemp;
- if (!relation->rd_istemp)
- relation->rd_backend = InvalidBackendId;
- else if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
- relation->rd_backend = MyBackendId;
- else
+ switch (relation->rd_rel->relpersistence)
{
- /*
- * If it's a temporary table, but not one of ours, we have to use
- * the slow, grotty method to figure out the owning backend.
- */
- relation->rd_backend =
- GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
- Assert(relation->rd_backend != InvalidBackendId);
+ case RELPERSISTENCE_PERMANENT:
+ relation->rd_backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
+ relation->rd_backend = MyBackendId;
+ else
+ {
+ /*
+ * If it's a local temp table, but not one of ours, we have to
+ * use the slow, grotty method to figure out the owning
+ * backend.
+ */
+ relation->rd_backend =
+ GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
+ Assert(relation->rd_backend != InvalidBackendId);
+ }
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c",
+ relation->rd_rel->relpersistence);
+ break;
}
/*
@@ -1358,7 +1368,6 @@ formrdesc(const char *relationName, Oid relationReltype,
relation->rd_isnailed = true;
relation->rd_createSubid = InvalidSubTransactionId;
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
- relation->rd_istemp = false;
relation->rd_backend = InvalidBackendId;
/*
@@ -1384,11 +1393,8 @@ formrdesc(const char *relationName, Oid relationReltype,
if (isshared)
relation->rd_rel->reltablespace = GLOBALTABLESPACE_OID;
- /*
- * Likewise, we must know if a relation is temp ... but formrdesc is not
- * used for any temp relations.
- */
- relation->rd_rel->relistemp = false;
+ /* formrdesc is used only for permanent relations */
+ relation->rd_rel->relpersistence = RELPERSISTENCE_PERMANENT;
relation->rd_rel->relpages = 1;
relation->rd_rel->reltuples = 1;
@@ -2366,7 +2372,8 @@ RelationBuildLocalRelation(const char *relname,
Oid relid,
Oid reltablespace,
bool shared_relation,
- bool mapped_relation)
+ bool mapped_relation,
+ char relpersistence)
{
Relation rel;
MemoryContext oldcxt;
@@ -2440,10 +2447,6 @@ RelationBuildLocalRelation(const char *relname,
/* must flag that we have rels created in this transaction */
need_eoxact_work = true;
- /* it is temporary if and only if it is in my temp-table namespace */
- rel->rd_istemp = isTempOrToastNamespace(relnamespace);
- rel->rd_backend = rel->rd_istemp ? MyBackendId : InvalidBackendId;
-
/*
* create a new tuple descriptor from the one passed in. We do this
* partly to copy it into the cache context, and partly because the new
@@ -2483,6 +2486,21 @@ RelationBuildLocalRelation(const char *relname,
/* needed when bootstrapping: */
rel->rd_rel->relowner = BOOTSTRAP_SUPERUSERID;
+ /* set up persistence; rd_backend is a function of persistence type */
+ rel->rd_rel->relpersistence = relpersistence;
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_PERMANENT:
+ rel->rd_backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ rel->rd_backend = MyBackendId;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ break;
+ }
+
/*
* Insert relation physical and logical identifiers (OIDs) into the right
* places. Note that the physical ID (relfilenode) is initially the same
@@ -2491,7 +2509,6 @@ RelationBuildLocalRelation(const char *relname,
* map.
*/
rel->rd_rel->relisshared = shared_relation;
- rel->rd_rel->relistemp = rel->rd_istemp;
RelationGetRelid(rel) = relid;
@@ -2569,7 +2586,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
/* Allocate a new relfilenode */
newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
- relation->rd_backend);
+ relation->rd_rel->relpersistence);
/*
* Get a writable copy of the pg_class tuple for the given relation.
@@ -2592,7 +2609,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
newrnode.node = relation->rd_node;
newrnode.node.relNode = newrelfilenode;
newrnode.backend = relation->rd_backend;
- RelationCreateStorage(newrnode.node, relation->rd_istemp);
+ RelationCreateStorage(newrnode.node, relation->rd_rel->relpersistence);
smgrclosenode(newrnode);
/*
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 97c808b..56dcdd5 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -56,6 +56,6 @@ extern Oid GetNewOid(Relation relation);
extern Oid GetNewOidWithIndex(Relation relation, Oid indexId,
AttrNumber oidcolumn);
extern Oid GetNewRelFileNode(Oid reltablespace, Relation pg_class,
- BackendId backend);
+ char relpersistence);
#endif /* CATALOG_H */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 7795bda..646ab9c 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -40,6 +40,7 @@ extern Relation heap_create(const char *relname,
Oid relid,
TupleDesc tupDesc,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool allow_system_table_mods);
@@ -54,6 +55,7 @@ extern Oid heap_create_with_catalog(const char *relname,
TupleDesc tupdesc,
List *cooked_constraints,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool oidislocal,
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index f50cf9d..1edbfe3 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -49,7 +49,7 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
- bool relistemp; /* T if temporary relation */
+ char relpersistence; /* see RELPERSISTENCE_xxx constants */
char relkind; /* see RELKIND_xxx constants below */
int2 relnatts; /* number of user attributes */
@@ -108,7 +108,7 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltoastidxid 12
#define Anum_pg_class_relhasindex 13
#define Anum_pg_class_relisshared 14
-#define Anum_pg_class_relistemp 15
+#define Anum_pg_class_relpersistence 15
#define Anum_pg_class_relkind 16
#define Anum_pg_class_relnatts 17
#define Anum_pg_class_relchecks 18
@@ -132,13 +132,13 @@ typedef FormData_pg_class *Form_pg_class;
*/
/* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId */
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f f r 28 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f f r 19 0 f f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 19 0 f f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f f r 25 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 25 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f f r 27 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
#define RELKIND_INDEX 'i' /* secondary index */
@@ -149,4 +149,7 @@ DESCR("");
#define RELKIND_VIEW 'v' /* view */
#define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
+#define RELPERSISTENCE_PERMANENT 'p'
+#define RELPERSISTENCE_TEMP 't'
+
#endif /* PG_CLASS_H */
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index d7b8731..f086b1c 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -20,7 +20,7 @@
#include "storage/relfilenode.h"
#include "utils/relcache.h"
-extern void RelationCreateStorage(RelFileNode rnode, bool istemp);
+extern void RelationCreateStorage(RelFileNode rnode, char relpersistence);
extern void RelationDropStorage(Relation rel);
extern void RelationPreserveStorage(RelFileNode rnode);
extern void RelationTruncate(Relation rel, BlockNumber nblocks);
diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h
index 327fbc6..1e3f6ca 100644
--- a/src/include/commands/tablespace.h
+++ b/src/include/commands/tablespace.h
@@ -47,7 +47,7 @@ extern void AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt);
extern void TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo);
-extern Oid GetDefaultTablespace(bool forTemp);
+extern Oid GetDefaultTablespace(char relpersistence);
extern void PrepareTempTablespaces(void);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b17adf2..ba5ae37 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -74,7 +74,7 @@ typedef struct RangeVar
char *relname; /* the relation/sequence name */
InhOption inhOpt; /* expand rel by inheritance? recursively act
* on children? */
- bool istemp; /* is this a temp relation/sequence? */
+ char relpersistence; /* see RELPERSISTENCE_* in pg_class.h */
Alias *alias; /* table alias & optional column aliases */
int location; /* token location, or -1 if unknown */
} RangeVar;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 39e0365..88a3168 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -132,7 +132,6 @@ typedef struct RelationData
struct SMgrRelationData *rd_smgr; /* cached file handle, or NULL */
int rd_refcnt; /* reference count */
BackendId rd_backend; /* owning backend id, if temporary relation */
- bool rd_istemp; /* rel is a temporary relation */
bool rd_isnailed; /* rel is nailed in cache */
bool rd_isvalid; /* relcache entry is valid */
char rd_indexvalid; /* state of rd_indexlist: 0 = not valid, 1 =
@@ -390,6 +389,27 @@ typedef struct StdRdOptions
} while (0)
/*
+ * RelationNeedsWAL
+ * True if relation needs WAL.
+ */
+#define RelationNeedsWAL(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
+
+/*
+ * RelationUsesLocalBuffers
+ * True if relation's pages are stored in local buffers.
+ */
+#define RelationUsesLocalBuffers(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
+ * RelationUsesTempNamespace
+ * True if relation's catalog entries live in a private namespace.
+ */
+#define RelationUsesTempNamespace(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
* RELATION_IS_LOCAL
* If a rel is either temp or newly created in the current transaction,
* it can be assumed to be visible only to the current backend.
@@ -407,7 +427,8 @@ typedef struct StdRdOptions
* Beware of multiple eval of argument
*/
#define RELATION_IS_OTHER_TEMP(relation) \
- ((relation)->rd_istemp && (relation)->rd_backend != MyBackendId)
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP \
+ && (relation)->rd_backend != MyBackendId)
/* routines in utils/cache/relcache.c */
extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 10d82d4..3500050 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,8 @@ extern Relation RelationBuildLocalRelation(const char *relname,
Oid relid,
Oid reltablespace,
bool shared_relation,
- bool mapped_relation);
+ bool mapped_relation,
+ char relpersistence);
/*
* Routine to manage assignment of new relfilenode to a relation
unlogged-tables-v4.patchapplication/octet-stream; name=unlogged-tables-v4.patchDownload
commit c245d0fb75964f0be1774e957f8880ed6fe58cbd
Author: Robert Haas <rhaas@postgresql.org>
Date: Fri Dec 10 22:36:26 2010 -0500
Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
crash-safe and do not appear on standby servers. On restart, they are
truncated.
Currently, only btree and hash indexes are supported on unlogged tables.
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index c4eb59f..51e70e9 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -167,6 +167,17 @@ ambuild (Relation heapRelation,
<para>
<programlisting>
+void
+ambuildempty (Relation indexRelation);
+</programlisting>
+ Build an empty index, and write it to the initialization fork (INIT_FORKNUM)
+ of the given relation. This method is called only for unlogged tables; the
+ empty index written to the initialization fork will be copied over the main
+ relation fork on each server restart.
+ </para>
+
+ <para>
+<programlisting>
bool
aminsert (Relation indexRelation,
Datum *values,
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 8635e80..7b0e14d 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
{ <replaceable class="PARAMETER">column_name</replaceable> <replaceable class="PARAMETER">data_type</replaceable> [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
| <replaceable>table_constraint</replaceable>
| LIKE <replaceable>parent_table</replaceable> [ <replaceable>like_option</replaceable> ... ] }
@@ -32,7 +32,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="PARAMETER">tablespace</replaceable> ]
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
OF <replaceable class="PARAMETER">type_name</replaceable> [ (
{ <replaceable class="PARAMETER">column_name</replaceable> WITH OPTIONS [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
| <replaceable>table_constraint</replaceable> }
@@ -164,6 +164,22 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
</varlistentry>
<varlistentry>
+ <term><literal>UNLOGGED</></term>
+ <listitem>
+ <para>
+ If specified, the table is created as an unlogged table. Data written
+ to unlogged tables is not written to the write-ahead log (see <xref
+ linkend="wal">), which makes them considerably faster than ordinary
+ tables. However, it also means that the data stored in the tables is not
+ copied to standby servers and does not survive if
+ <productname>PostgreSQL</productname> is restarted. Unlogged tables are
+ automatically truncated on restart. Any indexes created on an unlogged
+ table are automatically unlogged as well.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>IF NOT EXISTS</></term>
<listitem>
<para>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 3a256d1..ff71078 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE <replaceable>table_name</replaceable>
[ (<replaceable>column_name</replaceable> [, ...] ) ]
[ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -82,6 +82,16 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name
</varlistentry>
<varlistentry>
+ <term><literal>UNLOGGED</></term>
+ <listitem>
+ <para>
+ If specified, the table is created as an unlogged table.
+ Refer to <xref linkend="sql-createtable"> for details.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable>table_name</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 8681ede..7ec12b0 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -412,6 +412,19 @@ ginbuild(PG_FUNCTION_ARGS)
}
/*
+ * ginbuildempty() -- build an empty gin index in the initialization fork
+ */
+Datum
+ginbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged GIN indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* Inserts value during normal insertion
*/
static uint32
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index b34830b..fedfe8b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -208,6 +208,19 @@ gistbuildCallback(Relation index,
}
/*
+ * gistbuildempty() -- build an empty gist index in the initialization fork
+ */
+Datum
+gistbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged GIST indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* gistinsert -- wrapper for GiST tuple insertion.
*
* This is the public interface routine for tuple insertion in GiSTs.
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index e53ec3d..4df92d4 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -69,7 +69,7 @@ hashbuild(PG_FUNCTION_ARGS)
estimate_rel_size(heap, NULL, &relpages, &reltuples);
/* Initialize the hash index metadata page and initial buckets */
- num_buckets = _hash_metapinit(index, reltuples);
+ num_buckets = _hash_metapinit(index, reltuples, MAIN_FORKNUM);
/*
* If we just insert the tuples into the index in scan order, then
@@ -114,6 +114,19 @@ hashbuild(PG_FUNCTION_ARGS)
}
/*
+ * hashbuildempty() -- build an empty hash index in the initialization fork
+ */
+Datum
+hashbuildempty(PG_FUNCTION_ARGS)
+{
+ Relation index = (Relation) PG_GETARG_POINTER(0);
+
+ _hash_metapinit(index, 0, INIT_FORKNUM);
+
+ PG_RETURN_VOID();
+}
+
+/*
* Per-tuple callback from IndexBuildHeapScan
*/
static void
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index 7c6e902..454ad6c 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -259,7 +259,7 @@ _hash_getovflpage(Relation rel, Buffer metabuf)
* convenient to pre-mark them as "in use" too.
*/
bit = metap->hashm_spares[splitnum];
- _hash_initbitmap(rel, metap, bitno_to_blkno(metap, bit));
+ _hash_initbitmap(rel, metap, bitno_to_blkno(metap, bit), MAIN_FORKNUM);
metap->hashm_spares[splitnum]++;
}
else
@@ -280,7 +280,7 @@ _hash_getovflpage(Relation rel, Buffer metabuf)
* with metapage write lock held; would be better to use a lock that
* doesn't block incoming searches.
*/
- newbuf = _hash_getnewbuf(rel, blkno);
+ newbuf = _hash_getnewbuf(rel, blkno, MAIN_FORKNUM);
metap->hashm_spares[splitnum]++;
@@ -503,7 +503,8 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
* All bits in the new bitmap page are set to "1", indicating "in use".
*/
void
-_hash_initbitmap(Relation rel, HashMetaPage metap, BlockNumber blkno)
+_hash_initbitmap(Relation rel, HashMetaPage metap, BlockNumber blkno,
+ ForkNumber forkNum)
{
Buffer buf;
Page pg;
@@ -520,7 +521,7 @@ _hash_initbitmap(Relation rel, HashMetaPage metap, BlockNumber blkno)
* page while holding the metapage lock, but this path is taken so seldom
* that it's not worth worrying about.
*/
- buf = _hash_getnewbuf(rel, blkno);
+ buf = _hash_getnewbuf(rel, blkno, forkNum);
pg = BufferGetPage(buf);
/* initialize the page's special space */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 2ebeda9..29f7b25 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -183,9 +183,9 @@ _hash_getinitbuf(Relation rel, BlockNumber blkno)
* extend the index at a time.
*/
Buffer
-_hash_getnewbuf(Relation rel, BlockNumber blkno)
+_hash_getnewbuf(Relation rel, BlockNumber blkno, ForkNumber forkNum)
{
- BlockNumber nblocks = RelationGetNumberOfBlocks(rel);
+ BlockNumber nblocks = RelationGetNumberOfBlocksInFork(rel, forkNum);
Buffer buf;
if (blkno == P_NEW)
@@ -197,13 +197,13 @@ _hash_getnewbuf(Relation rel, BlockNumber blkno)
/* smgr insists we use P_NEW to extend the relation */
if (blkno == nblocks)
{
- buf = ReadBuffer(rel, P_NEW);
+ buf = ReadBufferExtended(rel, forkNum, P_NEW, RBM_NORMAL, NULL);
if (BufferGetBlockNumber(buf) != blkno)
elog(ERROR, "unexpected hash relation size: %u, should be %u",
BufferGetBlockNumber(buf), blkno);
}
else
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_ZERO, NULL);
+ buf = ReadBufferExtended(rel, forkNum, blkno, RBM_ZERO, NULL);
LockBuffer(buf, HASH_WRITE);
@@ -324,7 +324,7 @@ _hash_chgbufaccess(Relation rel,
* multiple buffer locks is ignored.
*/
uint32
-_hash_metapinit(Relation rel, double num_tuples)
+_hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
{
HashMetaPage metap;
HashPageOpaque pageopaque;
@@ -340,7 +340,7 @@ _hash_metapinit(Relation rel, double num_tuples)
uint32 i;
/* safety check */
- if (RelationGetNumberOfBlocks(rel) != 0)
+ if (RelationGetNumberOfBlocksInFork(rel, forkNum) != 0)
elog(ERROR, "cannot initialize non-empty hash index \"%s\"",
RelationGetRelationName(rel));
@@ -383,7 +383,7 @@ _hash_metapinit(Relation rel, double num_tuples)
* calls to occur. This ensures that the smgr level has the right idea of
* the physical index length.
*/
- metabuf = _hash_getnewbuf(rel, HASH_METAPAGE);
+ metabuf = _hash_getnewbuf(rel, HASH_METAPAGE, forkNum);
pg = BufferGetPage(metabuf);
pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
@@ -451,7 +451,7 @@ _hash_metapinit(Relation rel, double num_tuples)
/* Allow interrupts, in case N is huge */
CHECK_FOR_INTERRUPTS();
- buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i));
+ buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
pg = BufferGetPage(buf);
pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
pageopaque->hasho_prevblkno = InvalidBlockNumber;
@@ -468,7 +468,7 @@ _hash_metapinit(Relation rel, double num_tuples)
/*
* Initialize first bitmap page
*/
- _hash_initbitmap(rel, metap, num_buckets + 1);
+ _hash_initbitmap(rel, metap, num_buckets + 1, forkNum);
/* all done */
_hash_wrtbuf(rel, metabuf);
@@ -785,7 +785,7 @@ _hash_splitbucket(Relation rel,
oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
nblkno = start_nblkno;
- nbuf = _hash_getnewbuf(rel, nblkno);
+ nbuf = _hash_getnewbuf(rel, nblkno, MAIN_FORKNUM);
npage = BufferGetPage(nbuf);
/* initialize the new bucket's primary page */
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 655a400..a13d629 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -29,6 +29,7 @@
#include "storage/indexfsm.h"
#include "storage/ipc.h"
#include "storage/lmgr.h"
+#include "storage/smgr.h"
#include "utils/memutils.h"
@@ -205,6 +206,36 @@ btbuildCallback(Relation index,
}
/*
+ * btbuildempty() -- build an empty btree index in the initialization fork
+ */
+Datum
+btbuildempty(PG_FUNCTION_ARGS)
+{
+ Relation index = (Relation) PG_GETARG_POINTER(0);
+ Page metapage;
+
+ /* Construct metapage. */
+ metapage = (Page) palloc(BLCKSZ);
+ _bt_initmetapage(metapage, P_NONE, 0);
+
+ /* Write the page. If archiving/streaming, XLOG it. */
+ smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
+ (char *) metapage, true);
+ if (XLogIsNeeded())
+ log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ BTREE_METAPAGE, metapage);
+
+ /*
+ * An immediate sync is require even if we xlog'd the page, because the
+ * write did not go through shared_buffers and therefore a concurrent
+ * checkpoint may have move the redo pointer past our xlog record.
+ */
+ smgrimmedsync(index->rd_smgr, INIT_FORKNUM);
+
+ PG_RETURN_VOID();
+}
+
+/*
* btinsert() -- insert an index tuple into a btree.
*
* Descend the tree recursively, find the appropriate location for our
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5288b7f..d4b8a65 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -49,6 +49,7 @@
#include "storage/latch.h"
#include "storage/pmsignal.h"
#include "storage/procarray.h"
+#include "storage/reinit.h"
#include "storage/smgr.h"
#include "storage/spin.h"
#include "utils/builtins.h"
@@ -5888,6 +5889,16 @@ StartupXLOG(void)
InRecovery = true;
}
+ /*
+ * Blow away any leftover data in unlogged relations. This should be
+ * done BEFORE starting up Hot Standby, so that read-only backends don't
+ * see residual data from a previous startup. If redo isn't required or
+ * Hot Standby isn't enabled, we could do both the
+ * UNLOGGED_RELATION_CLEANUP and UNLOGGED_RELATION_INIT phases in once
+ * pass later on ... but for now, we don't bother to detect that case.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
/* REDO */
if (InRecovery)
{
@@ -6414,6 +6425,13 @@ StartupXLOG(void)
PreallocXlogFiles(EndOfLog);
/*
+ * Reset initial contents of unlogged relations. This has to be done
+ * AFTER recovery is complete so that any unlogged relations created
+ * during recovery also get picked up.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+ /*
* Okay, we're officially UP.
*/
InRecovery = false;
@@ -6914,6 +6932,14 @@ ShutdownXLOG(int code, Datum arg)
ShutdownSUBTRANS();
ShutdownMultiXact();
+ /*
+ * Remove any unlogged relation contents. This will happen anyway at
+ * the next startup; the point of doing it here is to avoid consuming
+ * a potentially large amount of disk space while we're shut down, for
+ * data that will be discarded anyway.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
ereport(LOG,
(errmsg("database system is shut down")));
}
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 88b5c2a..fc5a8fc 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -55,7 +55,8 @@
const char *forkNames[] = {
"main", /* MAIN_FORKNUM */
"fsm", /* FSM_FORKNUM */
- "vm" /* VISIBILITYMAP_FORKNUM */
+ "vm", /* VISIBILITYMAP_FORKNUM */
+ "init" /* INIT_FORKNUM */
};
/*
@@ -82,14 +83,14 @@ forkname_to_number(char *forkName)
* We use this to figure out whether a filename could be a relation
* fork (as opposed to an oddly named stray file that somehow ended
* up in the database directory). If the passed string begins with
- * a fork name (other than the main fork name), we return its length.
- * If not, we return 0.
+ * a fork name (other than the main fork name), we return its length,
+ * and set *fork (if not NULL) to the fork number. If not, we return 0.
*
* Note that the present coding assumes that there are no fork names which
* are prefixes of other fork names.
*/
int
-forkname_chars(const char *str)
+forkname_chars(const char *str, ForkNumber *fork)
{
ForkNumber forkNum;
@@ -97,7 +98,11 @@ forkname_chars(const char *str)
{
int len = strlen(forkNames[forkNum]);
if (strncmp(forkNames[forkNum], str, len) == 0)
+ {
+ if (fork)
+ *fork = forkNum;
return len;
+ }
}
return 0;
}
@@ -537,6 +542,7 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
case RELPERSISTENCE_TEMP:
backend = MyBackendId;
break;
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
break;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index bcf6caa..8027d74 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1211,6 +1211,25 @@ heap_create_with_catalog(const char *relname,
register_on_commit_action(relid, oncommit);
/*
+ * If this is an unlogged relation, it needs an init fork so that it
+ * can be correctly reinitialized on restart. Since we're going to
+ * do an immediate sync, we ony need to xlog this if archiving or
+ * streaming is enabled. And the immediate sync is required, because
+ * otherwise there's no guarantee that this will hit the disk before
+ * the next checkpoint moves the redo pointer.
+ */
+ if (relpersistence == RELPERSISTENCE_UNLOGGED)
+ {
+ Assert(relkind == RELKIND_RELATION || relkind == RELKIND_TOASTVALUE);
+
+ smgrcreate(new_rel_desc->rd_smgr, INIT_FORKNUM, false);
+ if (XLogIsNeeded())
+ log_smgrcreate(&new_rel_desc->rd_smgr->smgr_rnode.node,
+ INIT_FORKNUM);
+ smgrimmedsync(new_rel_desc->rd_smgr, INIT_FORKNUM);
+ }
+
+ /*
* ok, the relation has been cataloged, so close our relations and return
* the OID of the newly created relation.
*/
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8fbe8eb..e50a084 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1438,6 +1438,17 @@ index_build(Relation heapRelation,
Assert(PointerIsValid(stats));
/*
+ * If this is an unlogged index, we need to write out an init fork for it.
+ */
+ if (heapRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
+ {
+ RegProcedure ambuildempty = indexRelation->rd_am->ambuildempty;
+ RelationOpenSmgr(indexRelation);
+ smgrcreate(indexRelation->rd_smgr, INIT_FORKNUM, false);
+ OidFunctionCall1(ambuildempty, PointerGetDatum(indexRelation));
+ }
+
+ /*
* If it's for an exclusion constraint, make a second pass over the heap
* to verify that the constraint is satisfied.
*/
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 671aaff..0bd0451 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -74,6 +74,7 @@ static PendingRelDelete *pendingDeletes = NULL; /* head of linked list */
typedef struct xl_smgr_create
{
RelFileNode rnode;
+ ForkNumber forkNum;
} xl_smgr_create;
typedef struct xl_smgr_truncate
@@ -98,9 +99,6 @@ void
RelationCreateStorage(RelFileNode rnode, char relpersistence)
{
PendingRelDelete *pending;
- XLogRecPtr lsn;
- XLogRecData rdata;
- xl_smgr_create xlrec;
SMgrRelation srel;
BackendId backend;
bool needs_wal;
@@ -111,6 +109,10 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
backend = MyBackendId;
needs_wal = false;
break;
+ case RELPERSISTENCE_UNLOGGED:
+ backend = InvalidBackendId;
+ needs_wal = false;
+ break;
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
needs_wal = true;
@@ -124,19 +126,7 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
smgrcreate(srel, MAIN_FORKNUM, false);
if (needs_wal)
- {
- /*
- * Make an XLOG entry reporting the file creation.
- */
- xlrec.rnode = rnode;
-
- rdata.data = (char *) &xlrec;
- rdata.len = sizeof(xlrec);
- rdata.buffer = InvalidBuffer;
- rdata.next = NULL;
-
- lsn = XLogInsert(RM_SMGR_ID, XLOG_SMGR_CREATE, &rdata);
- }
+ log_smgrcreate(&srel->smgr_rnode.node, MAIN_FORKNUM);
/* Add the relation to the list of stuff to delete at abort */
pending = (PendingRelDelete *)
@@ -150,6 +140,29 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
}
/*
+ * Perform XLogInsert of a XLOG_SMGR_CREATE record to WAL.
+ */
+void
+log_smgrcreate(RelFileNode *rnode, ForkNumber forkNum)
+{
+ xl_smgr_create xlrec;
+ XLogRecData rdata;
+
+ /*
+ * Make an XLOG entry reporting the file creation.
+ */
+ xlrec.rnode = *rnode;
+ xlrec.forkNum = forkNum;
+
+ rdata.data = (char *) &xlrec;
+ rdata.len = sizeof(xlrec);
+ rdata.buffer = InvalidBuffer;
+ rdata.next = NULL;
+
+ XLogInsert(RM_SMGR_ID, XLOG_SMGR_CREATE, &rdata);
+}
+
+/*
* RelationDropStorage
* Schedule unlinking of physical storage at transaction commit.
*/
@@ -478,7 +491,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
SMgrRelation reln;
reln = smgropen(xlrec->rnode, InvalidBackendId);
- smgrcreate(reln, MAIN_FORKNUM, true);
+ smgrcreate(reln, xlrec->forkNum, true);
}
else if (info == XLOG_SMGR_TRUNCATE)
{
@@ -523,7 +536,7 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)
if (info == XLOG_SMGR_CREATE)
{
xl_smgr_create *xlrec = (xl_smgr_create *) rec;
- char *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+ char *path = relpathperm(xlrec->rnode, xlrec->forkNum);
appendStringInfo(buf, "file create: %s", path);
pfree(path);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6729d83..3f6b814 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5128,12 +5128,12 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
RelationGetRelationName(pkrel))));
/*
- * References from permanent tables to temp tables are disallowed because
- * the contents of the temp table disappear at the end of each session.
- * References from temp tables to permanent tables are also disallowed,
- * because other backends might need to run the RI triggers on the perm
- * table, but they can't reliably see tuples in the local buffers of other
- * backends.
+ * References from permanent or unlogged tables to temp tables, and from
+ * permanent tables to unlogged tables, are disallowed because the
+ * referenced data can vanish out from under us. References from temp
+ * tables to any other table type are also disallowed, because other
+ * backends might need to run the RI triggers on the perm table, but they
+ * can't reliably see tuples in the local buffers of other backends.
*/
switch (rel->rd_rel->relpersistence)
{
@@ -5143,6 +5143,13 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("constraints on permanent tables may reference only permanent tables")));
break;
+ case RELPERSISTENCE_UNLOGGED:
+ if (pkrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT
+ && pkrel->rd_rel->relpersistence != RELPERSISTENCE_UNLOGGED)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("constraints on unlogged tables may reference only permanent or unlogged tables")));
+ break;
case RELPERSISTENCE_TEMP:
if (pkrel->rd_rel->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8fc79b6..c1dce3c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -536,8 +536,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P
- UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNTIL
- UPDATE USER USING
+ UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
+ UNTIL UPDATE USER USING
VACUUM VALID VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
VERBOSE VERSION_P VIEW VOLATILE
@@ -2355,6 +2355,7 @@ OptTemp: TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
| LOCAL TEMP { $$ = RELPERSISTENCE_TEMP; }
| GLOBAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
| GLOBAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | UNLOGGED { $$ = RELPERSISTENCE_UNLOGGED; }
| /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; }
;
@@ -7917,6 +7918,11 @@ OptTempTableName:
$$ = $4;
$$->relpersistence = RELPERSISTENCE_TEMP;
}
+ | UNLOGGED opt_table qualified_name
+ {
+ $$ = $3;
+ $$->relpersistence = RELPERSISTENCE_UNLOGGED;
+ }
| TABLE qualified_name
{
$$ = $2;
@@ -11383,6 +11389,7 @@ unreserved_keyword:
| UNENCRYPTED
| UNKNOWN
| UNLISTEN
+ | UNLOGGED
| UNTIL
| UPDATE
| VACUUM
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 860e736..56569cc 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1897,12 +1897,12 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
* Determines the current number of pages in the relation.
*/
BlockNumber
-RelationGetNumberOfBlocks(Relation relation)
+RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
{
/* Open it at the smgr level if not already done */
RelationOpenSmgr(relation);
- return smgrnblocks(relation->rd_smgr, MAIN_FORKNUM);
+ return smgrnblocks(relation->rd_smgr, forkNum);
}
/* ---------------------------------------------------------------------
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 3b93aa1..d2198f2 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/storage/file
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = fd.o buffile.o copydir.o
+OBJS = fd.o buffile.o copydir.o reinit.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 4a10563..5af64d7 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -38,7 +38,6 @@
#endif
-static void copy_file(char *fromfile, char *tofile);
static void fsync_fname(char *fname, bool isdir);
@@ -142,7 +141,7 @@ copydir(char *fromdir, char *todir, bool recurse)
/*
* copy one file
*/
-static void
+void
copy_file(char *fromfile, char *tofile)
{
char *buffer;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 4f7dc39..a1dc18b 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2055,7 +2055,7 @@ looks_like_temp_rel_name(const char *name)
/* We might have _forkname or .segment or both. */
if (name[pos] == '_')
{
- int forkchar = forkname_chars(&name[pos+1]);
+ int forkchar = forkname_chars(&name[pos+1], NULL);
if (forkchar <= 0)
return false;
pos += forkchar + 1;
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
new file mode 100644
index 0000000..b75178b
--- /dev/null
+++ b/src/backend/storage/file/reinit.c
@@ -0,0 +1,396 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.c
+ * Reinitialization of unlogged relations
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/reinit.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/catalog.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "storage/reinit.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+
+static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
+ int op);
+static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
+ int op);
+static bool parse_filename_for_nontemp_relation(const char *name,
+ int *oidchars, ForkNumber *fork);
+
+typedef struct {
+ char oid[OIDCHARS+1];
+} unlogged_relation_entry;
+
+/*
+ * Reset unlogged relations from before the last restart.
+ *
+ * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any
+ * relation with an "init" fork, except for the "init" fork itself.
+ *
+ * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main
+ * fork.
+ */
+void
+ResetUnloggedRelations(int op)
+{
+ char temp_path[MAXPGPATH];
+ DIR *spc_dir;
+ struct dirent *spc_de;
+ MemoryContext tmpctx, oldctx;
+
+ /* Log it. */
+ ereport(DEBUG1,
+ (errmsg("resetting unlogged relations: cleanup %d init %d",
+ (op & UNLOGGED_RELATION_CLEANUP) != 0,
+ (op & UNLOGGED_RELATION_INIT) != 0)));
+
+ /*
+ * Just to be sure we don't leak any memory, let's create a temporary
+ * memory context for this operation.
+ */
+ tmpctx = AllocSetContextCreate(CurrentMemoryContext,
+ "ResetUnloggedRelations",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldctx = MemoryContextSwitchTo(tmpctx);
+
+ /*
+ * First process unlogged files in pg_default ($PGDATA/base)
+ */
+ ResetUnloggedRelationsInTablespaceDir("base", op);
+
+ /*
+ * Cycle through directories for all non-default tablespaces.
+ */
+ spc_dir = AllocateDir("pg_tblspc");
+
+ while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL)
+ {
+ if (strcmp(spc_de->d_name, ".") == 0 ||
+ strcmp(spc_de->d_name, "..") == 0)
+ continue;
+
+ snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
+ spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
+ ResetUnloggedRelationsInTablespaceDir(temp_path, op);
+ }
+
+ FreeDir(spc_dir);
+
+ /*
+ * Restore memory context.
+ */
+ MemoryContextSwitchTo(oldctx);
+ MemoryContextDelete(tmpctx);
+}
+
+/* Process one tablespace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
+{
+ DIR *ts_dir;
+ struct dirent *de;
+ char dbspace_path[MAXPGPATH];
+
+ ts_dir = AllocateDir(tsdirname);
+ if (ts_dir == NULL)
+ {
+ /* anything except ENOENT is fishy */
+ if (errno != ENOENT)
+ elog(LOG,
+ "could not open tablespace directory \"%s\": %m",
+ tsdirname);
+ return;
+ }
+
+ while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
+ {
+ int i = 0;
+
+ /*
+ * We're only interested in the per-database directories, which have
+ * numeric names. Note that this code will also (properly) ignore "."
+ * and "..".
+ */
+ while (isdigit((unsigned char) de->d_name[i]))
+ ++i;
+ if (de->d_name[i] != '\0' || i == 0)
+ continue;
+
+ snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
+ tsdirname, de->d_name);
+ ResetUnloggedRelationsInDbspaceDir(dbspace_path, op);
+ }
+
+ FreeDir(ts_dir);
+}
+
+/* Process one per-dbspace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
+{
+ DIR *dbspace_dir;
+ struct dirent *de;
+ char rm_path[MAXPGPATH];
+
+ /* Caller must specify at least one operation. */
+ Assert((op & (UNLOGGED_RELATION_CLEANUP | UNLOGGED_RELATION_INIT)) != 0);
+
+ /*
+ * Cleanup is a two-pass operation. First, we go through and identify all
+ * the files with init forks. Then, we go through again and nuke
+ * everything with the same OID except the init fork.
+ */
+ if ((op & UNLOGGED_RELATION_CLEANUP) != 0)
+ {
+ HTAB *hash = NULL;
+ HASHCTL ctl;
+
+ /* Open the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ return;
+ }
+
+ /*
+ * It's possible that someone could create a ton of unlogged relations
+ * in the same database & tablespace, so we'd better use a hash table
+ * rather than an array or linked list to keep track of which files
+ * need to be reset. Otherwise, this cleanup operation would be
+ * O(n^2).
+ */
+ ctl.keysize = sizeof(unlogged_relation_entry);
+ ctl.entrysize = sizeof(unlogged_relation_entry);
+ hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /*
+ * Put the OID portion of the name into the hash table, if it isn't
+ * already.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_ENTER, NULL);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+
+ /*
+ * If we didn't find any init forks, there's no point in continuing;
+ * we can bail out now.
+ */
+ if (hash_get_num_entries(hash) == 0)
+ {
+ hash_destroy(hash);
+ return;
+ }
+
+ /*
+ * Now, make a second pass and remove anything that matches. First,
+ * reopen the directory.
+ */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ hash_destroy(hash);
+ return;
+ }
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ bool found;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* We never remove the init fork. */
+ if (forkNum == INIT_FORKNUM)
+ continue;
+
+ /*
+ * See whether the OID portion of the name shows up in the hash
+ * table.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_FIND, &found);
+
+ /* If so, nuke it! */
+ if (found)
+ {
+ snprintf(rm_path, sizeof(rm_path), "%s/%s",
+ dbspacedirname, de->d_name);
+ /*
+ * It's tempting to actually throw an error here, but since
+ * this code gets run during database startup, that could
+ * result in the database failing to start. (XXX Should we do
+ * it anyway?)
+ */
+ if (unlink(rm_path))
+ elog(LOG, "could not unlink file \"%s\": %m", rm_path);
+ else
+ elog(DEBUG2, "unlinked file \"%s\"", rm_path);
+ }
+ }
+
+ /* Cleanup is complete. */
+ FreeDir(dbspace_dir);
+ hash_destroy(hash);
+ }
+
+ /*
+ * Initialization happens after cleanup is complete: we copy each init
+ * fork file to the corresponding main fork file. Note that if we are
+ * asked to do both cleanup and init, we may never get here: if the cleanup
+ * code determines that there are no init forks in this dbspace, it will
+ * return before we get to this point.
+ */
+ if ((op & UNLOGGED_RELATION_INIT) != 0)
+ {
+ /* Open the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ /* we just saw this directory, so it really ought to be there */
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ return;
+ }
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ char oidbuf[OIDCHARS+1];
+ char srcpath[MAXPGPATH];
+ char dstpath[MAXPGPATH];
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /* Construct source pathname. */
+ snprintf(srcpath, sizeof(srcpath), "%s/%s",
+ dbspacedirname, de->d_name);
+
+ /* Construct destination pathname. */
+ memcpy(oidbuf, de->d_name, oidchars);
+ oidbuf[oidchars] = '\0';
+ snprintf(dstpath, sizeof(dstpath), "%s/%s%s",
+ dbspacedirname, oidbuf, de->d_name + oidchars + 1 +
+ strlen(forkNames[INIT_FORKNUM]));
+
+ /* OK, we're ready to perform the actual copy. */
+ elog(DEBUG2, "copying %s to %s", srcpath, dstpath);
+ copy_file(srcpath, dstpath);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+ }
+}
+
+/*
+ * Basic parsing of putative relation filenames.
+ *
+ * This funtion returns true if the file appears to be in the correct format
+ * for a non-temporary relation and false otherwise.
+ *
+ * NB: If this function returns true, the caller is entitled to assume that
+ * *oidchars has been set to the a value no more than OIDCHARS, and thus
+ * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
+ * portion of the filename. This is critical to protect against a possible
+ * buffer overrun.
+ */
+static bool
+parse_filename_for_nontemp_relation(const char *name, int *oidchars,
+ ForkNumber *fork)
+{
+ int pos;
+
+ /* Look for a non-empty string of digits (that isn't too long). */
+ for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
+ ;
+ if (pos == 0 || pos > OIDCHARS)
+ return false;
+ *oidchars = pos;
+
+ /* Check for a fork name. */
+ if (name[pos] != '_')
+ *fork = MAIN_FORKNUM;
+ else
+ {
+ int forkchar;
+
+ forkchar = forkname_chars(&name[pos+1], fork);
+ if (forkchar <= 0)
+ return false;
+ pos += forkchar + 1;
+ }
+
+ /* Check for a segment number. */
+ if (name[pos] == '.')
+ {
+ int segchar;
+ for (segchar = 1; isdigit((unsigned char) name[pos+segchar]); ++segchar)
+ ;
+ if (segchar <= 1)
+ return false;
+ pos += segchar;
+ }
+
+ /* Now we should be at the end. */
+ if (name[pos] != '\0')
+ return false;
+ return true;
+}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e352cda..f33c29e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -615,6 +615,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
/* Determine owning backend. */
switch (relform->relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
break;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 1509686..fa9e9ca 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -851,6 +851,7 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
switch (relation->rd_rel->relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
relation->rd_backend = InvalidBackendId;
break;
@@ -2490,6 +2491,7 @@ RelationBuildLocalRelation(const char *relname,
rel->rd_rel->relpersistence = relpersistence;
switch (relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
rel->rd_backend = InvalidBackendId;
break;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 66274b4..065d3a4 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3447,6 +3447,7 @@ getTables(int *numTables)
int i_relhasrules;
int i_relhasoids;
int i_relfrozenxid;
+ int i_relpersistence;
int i_owning_tab;
int i_owning_col;
int i_reltablespace;
@@ -3477,7 +3478,7 @@ getTables(int *numTables)
* we cannot correctly identify inherited columns, owned sequences, etc.
*/
- if (g_fout->remoteVersion >= 90000)
+ if (g_fout->remoteVersion >= 90100)
{
/*
* Left join to pick up dependency info linking sequences to their
@@ -3489,7 +3490,40 @@ getTables(int *numTables)
"(%s c.relowner) AS rolname, "
"c.relchecks, c.relhastriggers, "
"c.relhasindex, c.relhasrules, c.relhasoids, "
- "c.relfrozenxid, "
+ "c.relfrozenxid, c.relpersistence, "
+ "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
+ "d.refobjid AS owning_tab, "
+ "d.refobjsubid AS owning_col, "
+ "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
+ "array_to_string(c.reloptions, ', ') AS reloptions, "
+ "array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ', ') AS toast_reloptions "
+ "FROM pg_class c "
+ "LEFT JOIN pg_depend d ON "
+ "(c.relkind = '%c' AND "
+ "d.classid = c.tableoid AND d.objid = c.oid AND "
+ "d.objsubid = 0 AND "
+ "d.refclassid = c.tableoid AND d.deptype = 'a') "
+ "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "WHERE c.relkind in ('%c', '%c', '%c', '%c') "
+ "ORDER BY c.oid",
+ username_subquery,
+ RELKIND_SEQUENCE,
+ RELKIND_RELATION, RELKIND_SEQUENCE,
+ RELKIND_VIEW, RELKIND_COMPOSITE_TYPE);
+ }
+ else if (g_fout->remoteVersion >= 90000)
+ {
+ /*
+ * Left join to pick up dependency info linking sequences to their
+ * owning column, if any (note this dependency is AUTO as of 8.2)
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.tableoid, c.oid, c.relname, "
+ "c.relacl, c.relkind, c.relnamespace, "
+ "(%s c.relowner) AS rolname, "
+ "c.relchecks, c.relhastriggers, "
+ "c.relhasindex, c.relhasrules, c.relhasoids, "
+ "c.relfrozenxid, 'p' AS relpersistence, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3522,7 +3556,7 @@ getTables(int *numTables)
"(%s c.relowner) AS rolname, "
"c.relchecks, c.relhastriggers, "
"c.relhasindex, c.relhasrules, c.relhasoids, "
- "c.relfrozenxid, "
+ "c.relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3555,7 +3589,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "relfrozenxid, "
+ "relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3587,7 +3621,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3619,7 +3653,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3647,7 +3681,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3670,7 +3704,7 @@ getTables(int *numTables)
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, "
"'t'::bool AS relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3703,7 +3737,7 @@ getTables(int *numTables)
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, "
"'t'::bool AS relhasoids, "
- "0 as relfrozenxid, "
+ "0 as relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3749,6 +3783,7 @@ getTables(int *numTables)
i_relhasrules = PQfnumber(res, "relhasrules");
i_relhasoids = PQfnumber(res, "relhasoids");
i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+ i_relpersistence = PQfnumber(res, "relpersistence");
i_owning_tab = PQfnumber(res, "owning_tab");
i_owning_col = PQfnumber(res, "owning_col");
i_reltablespace = PQfnumber(res, "reltablespace");
@@ -3783,6 +3818,7 @@ getTables(int *numTables)
tblinfo[i].rolname = strdup(PQgetvalue(res, i, i_rolname));
tblinfo[i].relacl = strdup(PQgetvalue(res, i, i_relacl));
tblinfo[i].relkind = *(PQgetvalue(res, i, i_relkind));
+ tblinfo[i].relpersistence = *(PQgetvalue(res, i, i_relpersistence));
tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
@@ -11051,8 +11087,12 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
if (binary_upgrade)
binary_upgrade_set_relfilenodes(q, tbinfo->dobj.catId.oid, false);
- appendPQExpBuffer(q, "CREATE TABLE %s",
- fmtId(tbinfo->dobj.name));
+ if (tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED)
+ appendPQExpBuffer(q, "CREATE UNLOGGED TABLE %s",
+ fmtId(tbinfo->dobj.name));
+ else
+ appendPQExpBuffer(q, "CREATE TABLE %s",
+ fmtId(tbinfo->dobj.name));
if (tbinfo->reloftype)
appendPQExpBuffer(q, " OF %s", tbinfo->reloftype);
actual_atts = 0;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 7885535..4313fd8 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -220,6 +220,7 @@ typedef struct _tableInfo
char *rolname; /* name of owner, or empty string */
char *relacl;
char relkind;
+ char relpersistence; /* relation persistence */
char *reltablespace; /* relation tablespace */
char *reloptions; /* options specified by WITH (...) */
char *toast_reloptions; /* ditto, for the TOAST table */
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c4370a1..207d028 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1118,6 +1118,7 @@ describeOneTableDetails(const char *schemaname,
Oid tablespace;
char *reloptions;
char *reloftype;
+ char relpersistence;
} tableinfo;
bool show_modifiers = false;
bool retval;
@@ -1138,6 +1139,23 @@ describeOneTableDetails(const char *schemaname,
"SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
"c.relhastriggers, c.relhasoids, "
"%s, c.reltablespace, "
+ "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
+ "c.relpersistence\n"
+ "FROM pg_catalog.pg_class c\n "
+ "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+ "WHERE c.oid = '%s'\n",
+ (verbose ?
+ "pg_catalog.array_to_string(c.reloptions || "
+ "array(select 'toast.' || x from pg_catalog.unnest(tc.reloptions) x), ', ')\n"
+ : "''"),
+ oid);
+ }
+ else if (pset.sversion >= 90000)
+ {
+ printfPQExpBuffer(&buf,
+ "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
+ "c.relhastriggers, c.relhasoids, "
+ "%s, c.reltablespace, "
"CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END\n"
"FROM pg_catalog.pg_class c\n "
"LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
@@ -1218,6 +1236,8 @@ describeOneTableDetails(const char *schemaname,
atooid(PQgetvalue(res, 0, 7)) : 0;
tableinfo.reloftype = (pset.sversion >= 90000 && strcmp(PQgetvalue(res, 0, 8), "") != 0) ?
strdup(PQgetvalue(res, 0, 8)) : 0;
+ tableinfo.relpersistence = (pset.sversion >= 90100 && strcmp(PQgetvalue(res, 0, 9), "") != 0) ?
+ PQgetvalue(res, 0, 9)[0] : 0;
PQclear(res);
res = NULL;
@@ -1269,8 +1289,12 @@ describeOneTableDetails(const char *schemaname,
switch (tableinfo.relkind)
{
case 'r':
- printfPQExpBuffer(&title, _("Table \"%s.%s\""),
- schemaname, relationname);
+ if (tableinfo.relpersistence == 'u')
+ printfPQExpBuffer(&title, _("Unlogged Table \"%s.%s\""),
+ schemaname, relationname);
+ else
+ printfPQExpBuffer(&title, _("Table \"%s.%s\""),
+ schemaname, relationname);
break;
case 'v':
printfPQExpBuffer(&title, _("View \"%s.%s\""),
@@ -1281,8 +1305,12 @@ describeOneTableDetails(const char *schemaname,
schemaname, relationname);
break;
case 'i':
- printfPQExpBuffer(&title, _("Index \"%s.%s\""),
- schemaname, relationname);
+ if (tableinfo.relpersistence == 'u')
+ printfPQExpBuffer(&title, _("Unlogged Index \"%s.%s\""),
+ schemaname, relationname);
+ else
+ printfPQExpBuffer(&title, _("Index \"%s.%s\""),
+ schemaname, relationname);
break;
case 's':
/* not used as of 8.2, but keep it for backwards compatibility */
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
index e2d7b45..b1eef92 100644
--- a/src/include/access/gin.h
+++ b/src/include/access/gin.h
@@ -389,6 +389,7 @@ extern void ginUpdateStats(Relation index, const GinStatsData *stats);
/* gininsert.c */
extern Datum ginbuild(PG_FUNCTION_ARGS);
+extern Datum ginbuildempty(PG_FUNCTION_ARGS);
extern Datum gininsert(PG_FUNCTION_ARGS);
extern void ginEntryInsert(Relation index, GinState *ginstate,
OffsetNumber attnum, Datum value,
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 058435c..0501a76 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -277,6 +277,7 @@ typedef struct
/* gist.c */
extern Datum gistbuild(PG_FUNCTION_ARGS);
+extern Datum gistbuildempty(PG_FUNCTION_ARGS);
extern Datum gistinsert(PG_FUNCTION_ARGS);
extern MemoryContext createTempGistContext(void);
extern void initGISTstate(GISTSTATE *giststate, Relation index);
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index d5899f4..a48320b 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -242,6 +242,7 @@ typedef HashMetaPageData *HashMetaPage;
/* public routines */
extern Datum hashbuild(PG_FUNCTION_ARGS);
+extern Datum hashbuildempty(PG_FUNCTION_ARGS);
extern Datum hashinsert(PG_FUNCTION_ARGS);
extern Datum hashbeginscan(PG_FUNCTION_ARGS);
extern Datum hashgettuple(PG_FUNCTION_ARGS);
@@ -291,7 +292,7 @@ extern Buffer _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf);
extern BlockNumber _hash_freeovflpage(Relation rel, Buffer ovflbuf,
BufferAccessStrategy bstrategy);
extern void _hash_initbitmap(Relation rel, HashMetaPage metap,
- BlockNumber blkno);
+ BlockNumber blkno, ForkNumber forkNum);
extern void _hash_squeezebucket(Relation rel,
Bucket bucket, BlockNumber bucket_blkno,
BufferAccessStrategy bstrategy);
@@ -303,7 +304,8 @@ extern void _hash_droplock(Relation rel, BlockNumber whichlock, int access);
extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
int access, int flags);
extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
-extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno);
+extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
+ ForkNumber forkNum);
extern Buffer _hash_getbuf_with_strategy(Relation rel, BlockNumber blkno,
int access, int flags,
BufferAccessStrategy bstrategy);
@@ -312,7 +314,8 @@ extern void _hash_dropbuf(Relation rel, Buffer buf);
extern void _hash_wrtbuf(Relation rel, Buffer buf);
extern void _hash_chgbufaccess(Relation rel, Buffer buf, int from_access,
int to_access);
-extern uint32 _hash_metapinit(Relation rel, double num_tuples);
+extern uint32 _hash_metapinit(Relation rel, double num_tuples,
+ ForkNumber forkNum);
extern void _hash_pageinit(Page page, Size size);
extern void _hash_expandtable(Relation rel, Buffer metabuf);
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 3bbc4d1..283612e 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -555,6 +555,7 @@ typedef BTScanOpaqueData *BTScanOpaque;
* prototypes for functions in nbtree.c (external entry points for btree)
*/
extern Datum btbuild(PG_FUNCTION_ARGS);
+extern Datum btbuildempty(PG_FUNCTION_ARGS);
extern Datum btinsert(PG_FUNCTION_ARGS);
extern Datum btbeginscan(PG_FUNCTION_ARGS);
extern Datum btgettuple(PG_FUNCTION_ARGS);
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 56dcdd5..40cb9ff 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -25,7 +25,7 @@
extern const char *forkNames[];
extern ForkNumber forkname_to_number(char *forkName);
-extern int forkname_chars(const char *str);
+extern int forkname_chars(const char *str, ForkNumber *);
extern char *relpathbackend(RelFileNode rnode, BackendId backend,
ForkNumber forknum);
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index 9425329..1aa43a9 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -60,6 +60,7 @@ CATALOG(pg_am,2601)
regproc ammarkpos; /* "mark current scan position" function */
regproc amrestrpos; /* "restore marked scan position" function */
regproc ambuild; /* "build new index" function */
+ regproc ambuildempty; /* "build empty index" function */
regproc ambulkdelete; /* bulk-delete function */
regproc amvacuumcleanup; /* post-VACUUM cleanup function */
regproc amcostestimate; /* estimate cost of an indexscan */
@@ -101,26 +102,27 @@ typedef FormData_pg_am *Form_pg_am;
#define Anum_pg_am_ammarkpos 21
#define Anum_pg_am_amrestrpos 22
#define Anum_pg_am_ambuild 23
-#define Anum_pg_am_ambulkdelete 24
-#define Anum_pg_am_amvacuumcleanup 25
-#define Anum_pg_am_amcostestimate 26
-#define Anum_pg_am_amoptions 27
+#define Anum_pg_am_ambuildempty 24
+#define Anum_pg_am_ambulkdelete 25
+#define Anum_pg_am_amvacuumcleanup 26
+#define Anum_pg_am_amcostestimate 27
+#define Anum_pg_am_amoptions 28
/* ----------------
* initial contents of pg_am
* ----------------
*/
-DATA(insert OID = 403 ( btree 5 1 t f t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions ));
+DATA(insert OID = 403 ( btree 5 1 t f t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
DESCR("b-tree index access method");
#define BTREE_AM_OID 403
-DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
+DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
DESCR("hash index access method");
#define HASH_AM_OID 405
-DATA(insert OID = 783 ( gist 0 8 f t f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
+DATA(insert OID = 783 ( gist 0 8 f t f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
DESCR("GiST index access method");
#define GIST_AM_OID 783
-DATA(insert OID = 2742 ( gin 0 5 f f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
+DATA(insert OID = 2742 ( gin 0 5 f f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
DESCR("GIN index access method");
#define GIN_AM_OID 2742
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 1edbfe3..39f9743 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -150,6 +150,7 @@ DESCR("");
#define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
#define RELPERSISTENCE_PERMANENT 'p'
+#define RELPERSISTENCE_UNLOGGED 'u'
#define RELPERSISTENCE_TEMP 't'
#endif /* PG_CLASS_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index feae22e..0cf6d92 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -689,6 +689,8 @@ DATA(insert OID = 337 ( btrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 227
DESCR("btree(internal)");
DATA(insert OID = 338 ( btbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ btbuild _null_ _null_ _null_ ));
DESCR("btree(internal)");
+DATA(insert OID = 328 ( btbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ btbuildempty _null_ _null_ _null_ ));
+DESCR("btree(internal)");
DATA(insert OID = 332 ( btbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ btbulkdelete _null_ _null_ _null_ ));
DESCR("btree(internal)");
DATA(insert OID = 972 ( btvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ btvacuumcleanup _null_ _null_ _null_ ));
@@ -808,6 +810,8 @@ DATA(insert OID = 447 ( hashrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("hash(internal)");
DATA(insert OID = 448 ( hashbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ hashbuild _null_ _null_ _null_ ));
DESCR("hash(internal)");
+DATA(insert OID = 327 ( hashbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ hashbuildempty _null_ _null_ _null_ ));
+DESCR("hash(internal)");
DATA(insert OID = 442 ( hashbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ hashbulkdelete _null_ _null_ _null_ ));
DESCR("hash(internal)");
DATA(insert OID = 425 ( hashvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ hashvacuumcleanup _null_ _null_ _null_ ));
@@ -1104,6 +1108,8 @@ DATA(insert OID = 781 ( gistrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("gist(internal)");
DATA(insert OID = 782 ( gistbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ gistbuild _null_ _null_ _null_ ));
DESCR("gist(internal)");
+DATA(insert OID = 326 ( gistbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ gistbuildempty _null_ _null_ _null_ ));
+DESCR("gist(internal)");
DATA(insert OID = 776 ( gistbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ gistbulkdelete _null_ _null_ _null_ ));
DESCR("gist(internal)");
DATA(insert OID = 2561 ( gistvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ gistvacuumcleanup _null_ _null_ _null_ ));
@@ -4347,6 +4353,8 @@ DATA(insert OID = 2737 ( ginrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("gin(internal)");
DATA(insert OID = 2738 ( ginbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ ginbuild _null_ _null_ _null_ ));
DESCR("gin(internal)");
+DATA(insert OID = 325 ( ginbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ ginbuildempty _null_ _null_ _null_ ));
+DESCR("gin(internal)");
DATA(insert OID = 2739 ( ginbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ ginbulkdelete _null_ _null_ _null_ ));
DESCR("gin(internal)");
DATA(insert OID = 2740 ( ginvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ ginvacuumcleanup _null_ _null_ _null_ ));
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index f086b1c..e2a1fec 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -35,6 +35,8 @@ extern void AtSubCommit_smgr(void);
extern void AtSubAbort_smgr(void);
extern void PostPrepare_smgr(void);
+extern void log_smgrcreate(RelFileNode *rnode, ForkNumber forkNum);
+
extern void smgr_redo(XLogRecPtr lsn, XLogRecord *record);
extern void smgr_desc(StringInfo buf, uint8 xl_info, char *rec);
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 2c44cf7..3b038a0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -388,6 +388,7 @@ PG_KEYWORD("union", UNION, RESERVED_KEYWORD)
PG_KEYWORD("unique", UNIQUE, RESERVED_KEYWORD)
PG_KEYWORD("unknown", UNKNOWN, UNRESERVED_KEYWORD)
PG_KEYWORD("unlisten", UNLISTEN, UNRESERVED_KEYWORD)
+PG_KEYWORD("unlogged", UNLOGGED, UNRESERVED_KEYWORD)
PG_KEYWORD("until", UNTIL, UNRESERVED_KEYWORD)
PG_KEYWORD("update", UPDATE, UNRESERVED_KEYWORD)
PG_KEYWORD("user", USER, RESERVED_KEYWORD)
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 5f41adf..7cf1a64 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -203,7 +203,7 @@
* Enable debugging print statements for WAL-related operations; see
* also the wal_debug GUC var.
*/
-/* #define WAL_DEBUG */
+#define WAL_DEBUG
/*
* Enable tracing of resource consumption during sort operations;
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 8c15521..58808f0 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -177,13 +177,17 @@ extern void AtEOXact_Buffers(bool isCommit);
extern void PrintBufferLeakWarning(Buffer buffer);
extern void CheckPointBuffers(int flags);
extern BlockNumber BufferGetBlockNumber(Buffer buffer);
-extern BlockNumber RelationGetNumberOfBlocks(Relation relation);
+extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
+ ForkNumber forkNum);
extern void FlushRelationBuffers(Relation rel);
extern void FlushDatabaseBuffers(Oid dbid);
extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
ForkNumber forkNum, BlockNumber firstDelBlock);
extern void DropDatabaseBuffers(Oid dbid);
+#define RelationGetNumberOfBlocks(reln) \
+ RelationGetNumberOfBlocksInFork(reln, MAIN_FORKNUM)
+
#ifdef NOT_USED
extern void PrintPinnedBufs(void);
#endif
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index b24a98c..7c57724 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -14,5 +14,6 @@
#define COPYDIR_H
extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void copy_file(char *fromfile, char *tofile);
#endif /* COPYDIR_H */
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
new file mode 100644
index 0000000..9999dff
--- /dev/null
+++ b/src/include/storage/reinit.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.h
+ * Reinitialization of unlogged relations
+ *
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fd.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REINIT_H
+#define REINIT_H
+
+extern void ResetUnloggedRelations(int op);
+
+#define UNLOGGED_RELATION_CLEANUP 0x0001
+#define UNLOGGED_RELATION_INIT 0x0002
+
+#endif /* REINIT_H */
diff --git a/src/include/storage/relfilenode.h b/src/include/storage/relfilenode.h
index 24a72e6..f71b233 100644
--- a/src/include/storage/relfilenode.h
+++ b/src/include/storage/relfilenode.h
@@ -27,7 +27,8 @@ typedef enum ForkNumber
InvalidForkNumber = -1,
MAIN_FORKNUM = 0,
FSM_FORKNUM,
- VISIBILITYMAP_FORKNUM
+ VISIBILITYMAP_FORKNUM,
+ INIT_FORKNUM
/*
* NOTE: if you add a new fork, change MAX_FORKNUM below and update the
@@ -35,7 +36,7 @@ typedef enum ForkNumber
*/
} ForkNumber;
-#define MAX_FORKNUM VISIBILITYMAP_FORKNUM
+#define MAX_FORKNUM INIT_FORKNUM
/*
* RelFileNode must provide all that we need to know to physically access
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 88a3168..d5b5e58 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -114,6 +114,7 @@ typedef struct RelationAmInfo
FmgrInfo ammarkpos;
FmgrInfo amrestrpos;
FmgrInfo ambuild;
+ FmgrInfo ambuildempty;
FmgrInfo ambulkdelete;
FmgrInfo amvacuumcleanup;
FmgrInfo amcostestimate;
relax-sync-commit-v1.patchapplication/octet-stream; name=relax-sync-commit-v1.patchDownload
commit bdd697e5f0a16db2a672e5e14d11744958364101
Author: Robert Haas <rhaas@postgresql.org>
Date: Sat Nov 13 09:52:11 2010 -0500
Assume synchronous_commit=off for transactions that don't write WAL.
This is advantageous for transactions that write only to temporary or
unlogged tables, where loss of the transaction commit record is not
critical.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d2e2e11..088daa0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -907,6 +907,7 @@ RecordTransactionCommit(void)
int nmsgs = 0;
SharedInvalidationMessage *invalMessages = NULL;
bool RelcacheInitFileInval = false;
+ bool wrote_xlog;
/* Get data needed for commit record */
nrels = smgrGetPendingDeletes(true, &rels);
@@ -914,6 +915,7 @@ RecordTransactionCommit(void)
if (XLogStandbyInfoActive())
nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
&RelcacheInitFileInval);
+ wrote_xlog = (XactLastRecEnd.xrecoff != 0);
/*
* If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -940,7 +942,7 @@ RecordTransactionCommit(void)
* assigned is a sequence advance record due to nextval() --- we want
* to flush that to disk before reporting commit.)
*/
- if (XactLastRecEnd.xrecoff == 0)
+ if (!wrote_xlog)
goto cleanup;
}
else
@@ -1028,16 +1030,21 @@ RecordTransactionCommit(void)
}
/*
- * Check if we want to commit asynchronously. If the user has set
- * synchronous_commit = off, and we're not doing cleanup of any non-temp
- * rels nor committing any command that wanted to force sync commit, then
- * we can defer flushing XLOG. (We must not allow asynchronous commit if
- * there are any non-temp tables to be deleted, because we might delete
- * the files before the COMMIT record is flushed to disk. We do allow
- * asynchronous commit if all to-be-deleted tables are temporary though,
- * since they are lost anyway if we crash.)
+ * Check if we want to commit asynchronously. If we're doing cleanup of
+ * any non-temp rels or committing any command that wanted to force sync
+ * commit, then we must flush XLOG immediately. (We must not allow
+ * asynchronous commit if there are any non-temp tables to be deleted,
+ * because we might delete the files before the COMMIT record is flushed to
+ * disk. We do allow asynchronous commit if all to-be-deleted tables are
+ * temporary though, since they are lost anyway if we crash.) Otherwise,
+ * we can defer the flush if either (1) the user has set synchronous_commit
+ * = off, or (2) the current transaction has not performed any WAL-logged
+ * operation. This latter case can arise if the only writes performed by
+ * the current transaction target temporary or unlogged relations. Loss
+ * of such a transaction won't matter anyway, because temp tables will be
+ * lost after a crash anyway, and unlogged ones will be truncated.
*/
- if (XactSyncCommit || forceSyncCommit || nrels > 0)
+ if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0)
{
/*
* Synchronous commit case:
On Wed, Dec 8, 2010 at 6:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
A very useful feature for unlogged tables would be the ability to
switch them back to normal tables -- this way you could do bulk
loading into an unlogged table and then turn it into a regular table
using just fsync(), bypassing all the WAL-logging overhead.
If archive_mode is off, then you can often find a way to bypass
WAL-logging during bulk loading anyway.
If archive_mode is on, then I don't see how this can work without
massive changes.
One possibility would be to create a mechanism to inject entire large
files into the archive log stream. (Such a facility might be useful
for other purposes too). So the transaction that changes the mode
from unlogged to logged would have to take an exclusive lock on the
unlogged table and make sure shared buffers for it are written out,
then it would just copy the backing file(s) for that table into the
archive steam with a special header that tells the recovery process
"Set these aside, I'll explain later". Once that is done, it would
just have to ensure the WAL segment it is currently on will come after
the injected files in the archive stream, and write a WAL record
explaining where those bulk files it sent early are supposed to go.
I don't know, sound like a lot of work and lot of pitfalls.
Cheers,
Jeff
On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
As I was working on the hash index support, it occurred to me that at
some point in the future, we might want to allow an unlogged index on
a permanent table.
That is the feature I would be most excited about.
With the current patch, an index is unlogged if
and only if the corresponding table is unlogged, and both the table
and the index are reset to empty on restart. But we could have a
slightly different flavor of index that, instead of being reset to
empty, just gets marked invalid, perhaps by truncating the file to
zero-length (and adding some code to treat that as something other
than a hard error). Perhaps you could even arrange for autovacuum to
kick off an automatic rebuild,
Or just have rebuilding the index as part of crash recovery. I
wouldn't use the feature anyway on indexes that would take more than a
few seconds to rebuild, And wouldn't want to advertise the database as
being available when it is essentially crippled from missing indexes.
I'd rather bite the bullet up front.
I would think of it is as declaring that, instead of making the index
recoverable via WAL logging and replay, instead make it recoverable by
rebuilding. So in that way it is quit unlike unlogged tables, in that
we are not risking any data, just giving the database a hint about
what the most expeditious way to maintain the index might be. Well,
more of an order than a hint, I guess.
Cheers,
Jeff
On Sat, Dec 11, 2010 at 2:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Dec 8, 2010 at 6:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
A very useful feature for unlogged tables would be the ability to
switch them back to normal tables -- this way you could do bulk
loading into an unlogged table and then turn it into a regular table
using just fsync(), bypassing all the WAL-logging overhead.If archive_mode is off, then you can often find a way to bypass
WAL-logging during bulk loading anyway.If archive_mode is on, then I don't see how this can work without
massive changes.
Well, you'd need to work your way through the heap and all of its
indices and XLOG every page. And you've got to do that in a way
that's transaction-safe, and I don't have a design in mind for that
off the top of my head. But I think "massive changes" is probably an
overstatement. We can already handle ALTER TABLE operations that
involve a full relation rewrite, and that already does the
full-table-XLOG thing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Dec 11, 2010 at 3:18 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
As I was working on the hash index support, it occurred to me that at
some point in the future, we might want to allow an unlogged index on
a permanent table.That is the feature I would be most excited about.
With the current patch, an index is unlogged if
and only if the corresponding table is unlogged, and both the table
and the index are reset to empty on restart. But we could have a
slightly different flavor of index that, instead of being reset to
empty, just gets marked invalid, perhaps by truncating the file to
zero-length (and adding some code to treat that as something other
than a hard error). Perhaps you could even arrange for autovacuum to
kick off an automatic rebuild,Or just have rebuilding the index as part of crash recovery. I
wouldn't use the feature anyway on indexes that would take more than a
few seconds to rebuild, And wouldn't want to advertise the database as
being available when it is essentially crippled from missing indexes.
I'd rather bite the bullet up front.
I don't think you can rebuild the indexes during crash recovery; I
believe you need to be bound to the database that contains the index,
and, as we've been over before, binding to a database is irrevocable,
so the startup process can't bind to the database, rebuild the index,
and then unbind. It would need to signal the postmaster to fire up
other backends to do this work, and at that point I think you may as
well piggyback on autovacuum rather than designing a similar mechanism
from scratch.
Also, while YOU might use such a feature only for indexes that can be
rebuilt in a few seconds, I strongly suspect that other people might
use it in other ways. In particular, it seems that it would be
possibly sensible to use a feature like this for an index that's only
used for reporting queries. If the database crashes, we'll still have
our primary key so we can continue operating, but we'll need to
reindex before running the nightly reports.
I would think of it is as declaring that, instead of making the index
recoverable via WAL logging and replay, instead make it recoverable by
rebuilding. So in that way it is quit unlike unlogged tables, in that
we are not risking any data, just giving the database a hint about
what the most expeditious way to maintain the index might be. Well,
more of an order than a hint, I guess.
I think it's six of one, half a dozen of the other. An index by its
nature only contains data that is duplicated in a table, so by
definition loss of an index isn't risking any data.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think the first patch (relpersistence-v4.patch) is ready to commit,
and the third patch to allow synchronous commits to become
asynchronous when it doesn't matter (relax-sync-commit-v1.patch)
doesn't seem to be changing much either, although I would appreciate
it if someone with more expertise than I have with our write-ahead
logging system would give it a quick once-over.
I don't understand what the point of the relax-sync-commit patch is.
If XactLastRecEnd.xrecoff == 0, then calling
XLogFlush(XactLastRecEnd) is pretty much a null operation anyway
because it will short-circuit at the early statement:
if (XLByteLE(record, LogwrtResult.Flush)) return
Or at least it had better return at that point, or we might have a
serious problem. If XactLastRecEnd.xrecoff == 0 then the only way to
keep going is if XactLastRecEnd.xlogid is ahead of
LogwrtResult.Flush.xlogid.
I guess that could happen legitimately if the logs have recently
rolled over the 4GB boundary, and XactLastRecEnd is aware of this
while LogwrtResult is not yet aware of it. I don't know if that is a
possible state of affairs. If it is, then the result would be that on
very rare occasion your patch removes a spurious, but not harmful
other than performance, fsync.
If somehow XactLastRecEnd gets a falsely advanced value of xlogid,
then calling XLogFlush with it would cause a PANIC "xlog write request
%X/%X is past end of log %X/%X". So unless people have been seeing
this, that must not be able to happen. And looking at the only places
XactLastRecEnd.xlogid get set, I don't see how it could happen.
So maybe in your patch:
if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0)
should be
if (wrote_xlog && (XactSyncCommit || forceSyncCommit || nrels > 0) )
It seems like on general principles we should not be passing to
XLogFlush a structure which is by definition invalid.
But even if XLogFlush is going to return immediately, that doesn't
negate the harm caused by commit_delay doing its thing needlessly.
Perhaps that was the original motivation for your patch.
Cheers,
Jeff
On Sat, Dec 11, 2010 at 9:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Dec 10, 2010 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think the first patch (relpersistence-v4.patch) is ready to commit,
and the third patch to allow synchronous commits to become
asynchronous when it doesn't matter (relax-sync-commit-v1.patch)
doesn't seem to be changing much either, although I would appreciate
it if someone with more expertise than I have with our write-ahead
logging system would give it a quick once-over.I don't understand what the point of the relax-sync-commit patch is.
Suppose we begin a transaction, write a bunch of data to a temporary
table, and commit. Suppose further that synchronous_commit = off. At
transaction commit time, we haven't written any XLOG records yet, but
we do have an XID assigned because of the writes to the temporary
tables. So we'll issue a commit record. Without this patch, since
synchronous_commit = off, we'll force that commit record to disk
before acknowledging the commit to the user. However, that's not
really necessary because if we crash after acknowledging the commit to
the user, the temporary tables will disappear anyway, and our XID
doesn't exist on disk any place else - thus, whether the commit makes
it to disk before the crash or not will be immaterial on restart.
If you have a bunch of transactions that make write to temporary (or
unlogged) tables but not to any permanent tables, this makes it muuuch
faster.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Cédric Villemain <cedric.villemain.debian@gmail.com> writes:
2010/12/8 Kineticode Billing <david@kineticode.com>:
On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
I kind of like TRANSIENT, but that's only because it's a property I've
been working with in some other systems
http://www.erlang.org/doc/design_principles/sup_princ.html
Restart = permanent | transient | temporary
Restart defines when a terminated child process should be restarted.
A permanent child process is always restarted.
A temporary child process is never restarted.
A transient child process is restarted only if it terminates abnormally, i.e. with another exit reason than normal.
EVANESCENT.
UNSAFE ?
What about NOT PERSISTENT ? Then we would have two flavours of them,
that's NOT PERSISTENT ON RESTART TRUNCATE or ON RESTART FLUSH, I guess?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
2010/12/8 Kineticode Billing <david@kineticode.com>:
On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
EVANESCENT.
UNSAFE ?
<troll>
MyISAM
</troll>
--
Rob Wultsch
wultsch@gmail.com
On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch <wultsch@gmail.com> wrote:
On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:2010/12/8 Kineticode Billing <david@kineticode.com>:
On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
EVANESCENT.
UNSAFE ?
<troll>
MyISAM
</troll>
Heh. But that would be corrupt-on-crash, not truncate-on-crash, no?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Dec 12, 2010 at 7:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch <wultsch@gmail.com> wrote:
On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:2010/12/8 Kineticode Billing <david@kineticode.com>:
On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:
Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.
EVANESCENT.
UNSAFE ?
<troll>
MyISAM
</troll>Heh. But that would be corrupt-on-crash, not truncate-on-crash, no?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
<troll>
Yep. Truncate-on-shutdown MySQL options are the MEMORY and PBXT (using
the memory resident option).
</troll>
I like TRANSIENT but wonder if MEMORY might be more easily understood by users.
--
Rob Wultsch
wultsch@gmail.com
On Fri, Dec 10, 2010 at 11:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think the first patch (relpersistence-v4.patch) is ready to commit,
So I've now committed it.
and the third patch to allow synchronous commits to become
asynchronous when it doesn't matter (relax-sync-commit-v1.patch)
Jeff Janes reviewed this, which was good, but he missed a key bit on
which I've now set him straight. So an updated review of this would
be much appreciated.
doesn't seem to be changing much either, although I would appreciate
it if someone with more expertise than I have with our write-ahead
logging system would give it a quick once-over.The main patch (unlogged-tables-v4.patch) needs more thought. Right
now, unlogged buffers are checkpointed, which I want to get rid of.
Andres Freund suggested we could get by with this and still survive a
clean shutdown if we fsync() every unlogged relation in the cluster
before shutting down, but I'm concerned about the case where one of
the fsync() calls fails. That's presumably already a problem with
checkpoints generally, and I haven't traced through the logic to see
exactly what happens, but I guess this would need similar treatment.
In a non-shutdown checkpoint, the checkpoint can just fail. In a
shutdown checkpoint, we presumably can't just refuse to exit, but it
shouldn't look like a clean shutdown...
Any input on this point?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Here's an attempt to summarize the remaining issues with this patch
that I know about. I may have forgotten something, so please mention
it if you notice something missing.
1. pg_dump needs an option to control whether unlogged tables are
dumped. --no-unlogged-tables seems like the obvious choice, assuming
we want the default to be to dump them, which seems like the safest
option.
2. storage.sgml likely needs to be updated. We have a section on the
free space map and one on the visibility map, so I suppose the logical
thing to do is add a similar section on the initialization fork.
3. It's unnecessary to include unlogged relation buffers in
non-shutdown checkpoints. I've recently realized that this is true
independently of whether or not we want unlogged tables to survive a
clean shutdown. Whether or not we can survive a clean shutdown is a
function of whether we register dirty segments when buffers are
written, which is independent of whether we choose to write such
buffers as part of a checkpoint. And indeed, unless we're about to
shut down, there's no reason to do so, because the whole point of
checkpointing is to advance the redo pointer, and that's irrelevant
for unlogged tables.
4. It's arguably unnecessary to register dirty segments for unlogged
relations. Given #3, this now seems a little less important. If the
unlogged relation is hot and fits in shared_buffers, then omitting it
from the checkpoint process means we'll never write out those dirty
buffers, so the fact that they'd cause fsyncs if we did write them
doesn't matter. However, it's still not totally irrelevant, because a
relation that fits in the OS buffer cache but not in shared buffers
will probably generate fsyncs at every checkpoint. (And on the third
hand, the OS may decide to write the dirty data anyway, especially if
it's a largish percentage of RAM.) There are a couple of possible
ways of dealing with this:
4A. The solution Andres proposed - Iterate through all unlogged
relations at shutdown time and fsyncing them all. Possibly
complicated to handle fsync failures.
4B. Another idea I just thought of - register dirty segments as
normal, but teach the background writer to accumulate them in a
separate queue that is only flushed at shutdown, or when it reaches
some maximum size, rather than at every checkpoint.
4C. Decree that this is an area for future enhancement and forget
about it for now. I am leaning toward this option.
5. Make it work with GIST indexes. Per discussion on the other
thread, the current proposal seems to be: (a) add a BM_FLUSH_XLOG bit;
when clear, don't flush XLOG; this then allows pages to have fake
LSNs; (b) add an XLogRecPtr structure in shared memory, protected by a
spinlock; (c) use the structure described in (b) to generate fake LSNs
every time an operation is performed on an unlogged GIST index. I am
not clear on how we make this work across shutdowns - it seems you'd
need to save this structure somewhere during a clean shutdown (where?)
and restore it on startup, unless we go back to truncating even on a
clean shutdown.
6. Make it work with GIN indexes. I haven't looked at what's involved here yet.
Advice, comments, feedback appreciated... I'd like to put this one to bed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas wrote:
If there's any third-party code out there that is checking
rd_istemp, it likely also needs to be revised to check whether
WAL-logging is needed, not whether the relation is temp. The way
I've coded it, such code will fail to compile, and can be very
easily fixed by substituting a call to RelationNeedsWAL() or
RelationUsesLocalBuffers() or RelationUsesTempNamespace(),
depending on which property the caller actually cares about.
Hmm... This broke the SSI patch, which was using rd_istemp to omit
conflict checking where it was set to true. The property I care
about is whether tuples in one backend can be read by an transaction
in a different backend, which I assumed would not be true for
temporary tables. Which of the above would be appropriate for that
use?
-Kevin
Import Notes
Resolved by subject fallback
On Sat, Dec 18, 2010 at 12:27 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Robert Haas wrote:
If there's any third-party code out there that is checking
rd_istemp, it likely also needs to be revised to check whether
WAL-logging is needed, not whether the relation is temp. The way
I've coded it, such code will fail to compile, and can be very
easily fixed by substituting a call to RelationNeedsWAL() or
RelationUsesLocalBuffers() or RelationUsesTempNamespace(),
depending on which property the caller actually cares about.Hmm... This broke the SSI patch, which was using rd_istemp to omit
conflict checking where it was set to true. The property I care
about is whether tuples in one backend can be read by an transaction
in a different backend, which I assumed would not be true for
temporary tables. Which of the above would be appropriate for that
use?
RelationUsesLocalBuffers().
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of sáb dic 18 02:21:41 -0300 2010:
Here's an attempt to summarize the remaining issues with this patch
that I know about. I may have forgotten something, so please mention
it if you notice something missing.1. pg_dump needs an option to control whether unlogged tables are
dumped. --no-unlogged-tables seems like the obvious choice, assuming
we want the default to be to dump them, which seems like the safest
option.
If there are valid use cases for some unlogged tables being dumped and
some others not, would it make sense to be able to specify a pattern of
tables to be dumped or skipped?
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Dec 20, 2010 at 9:05 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Robert Haas's message of sáb dic 18 02:21:41 -0300 2010:
Here's an attempt to summarize the remaining issues with this patch
that I know about. I may have forgotten something, so please mention
it if you notice something missing.1. pg_dump needs an option to control whether unlogged tables are
dumped. --no-unlogged-tables seems like the obvious choice, assuming
we want the default to be to dump them, which seems like the safest
option.If there are valid use cases for some unlogged tables being dumped and
some others not, would it make sense to be able to specify a pattern of
tables to be dumped or skipped?
Well, if you want to dump a subset of the tables in your database, you
can already do that. I think that adding a pattern to
--no-unlogged-tables (or whatever we end up calling it) would be an
unnecessary frammish. There's no particular reason to think that
unlogged tables are going to be so widely used or that concerns about
which ones are going to be so widespread that we should do something
here when we don't even have much simpler things like --function,
which IMHO would extremely useful.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Alvaro Herrera <alvherre@commandprompt.com> writes:
Excerpts from Robert Haas's message of sáb dic 18 02:21:41 -0300 2010:
1. pg_dump needs an option to control whether unlogged tables are
dumped. --no-unlogged-tables seems like the obvious choice, assuming
we want the default to be to dump them, which seems like the safest
option.
If there are valid use cases for some unlogged tables being dumped and
some others not, would it make sense to be able to specify a pattern of
tables to be dumped or skipped?
Presumably you could still do that with the regular --tables name
pattern switch. I don't see a reason for unlogged tables to respond to
a different name pattern.
regards, tom lane