unlogged tables
Here is a series of three patches related to unlogged tables.
1. The first one (relpersistence-v1) is a mostly mechanical patch that
replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
(a character), so that we can support more than two values. BE SURE
YOU INITDB, since the old catalog format will not work with this patch
applied.
2. The second one (unlogged-tables-v1) adds support for unlogged
tables by adding a new supported value for relpersistence. I made this
work by having backend that creates an unlogged relation write out an
"init" fork for that relation. The main fork is nuked and replaced by
the contents of the init fork during startup. But I haven't made this
code work yet for index types other than btree, so attempting to
define a non-btree index on an unlogged relation will currently result
in an error. I don't think that's probably too hard to fix, but I
haven't done it yet.
3. The third patch (relax-sync-commit-v1) allows asynchronous commit
even when synchronous_commit=on if the transaction has not written
WAL. Of course, a read-only transaction won't even have an XID and
therefore won't need a commit record, so what this is really doing is
allowing transactions that have written only to temp - or unlogged -
tables to commit asynchronously. This should be OK, because if the
system crashes before the commit record hits the disk, we haven't
really lost anything we wouldn't lose anyway: the temp tables will
disappear on restart, and the unlogged ones will be truncated. This
path actually could be applied independently of the first two, if I
adjusted the comments a bit.
Review and testing would be appreciated.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
relpersistence-v1.patchapplication/octet-stream; name=relpersistence-v1.patchDownload
commit da292b4f7395aa6bf10f76796999d0809febe206
Author: Robert Haas <rhaas@postgresql.org>
Date: Mon Aug 16 21:02:11 2010 -0400
Generalize concept of temporary relations to "relation persistence".
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp. It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.
For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 070cd92..9d857a0 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -304,7 +304,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
@@ -373,7 +373,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(lbuffer);
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
@@ -422,7 +422,7 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
MarkBufferDirty(rbuffer);
MarkBufferDirty(stack->buffer);
- if (!btree->index->rd_istemp)
+ if (RelationNeedsWAL(btree->index))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 525f79c..74339c9 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -103,7 +103,7 @@ writeListPage(Relation index, Buffer buffer,
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecData rdata[2];
ginxlogInsertListPage data;
@@ -384,7 +384,7 @@ ginHeapTupleFastInsert(Relation index, GinState *ginstate,
*/
MarkBufferDirty(metabuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
@@ -564,7 +564,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
MarkBufferDirty(buffers[i]);
}
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fa70e4f..8681ede 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -55,7 +55,7 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems)
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
@@ -325,7 +325,7 @@ ginbuild(PG_FUNCTION_ARGS)
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 27326ac..5f20ac9 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -410,7 +410,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
MarkBufferDirty(metabuffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
ginxlogUpdateMeta data;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 7dfecff..4b35acb 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -93,7 +93,7 @@ xlogVacuumPage(Relation index, Buffer buffer)
Assert(GinPageIsLeaf(page));
- if (index->rd_istemp)
+ if (!RelationNeedsWAL(index))
return;
data.node = index->rd_node;
@@ -308,7 +308,7 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
MarkBufferDirty(lBuffer);
MarkBufferDirty(dBuffer);
- if (!gvs->index->rd_istemp)
+ if (RelationNeedsWAL(gvs->index))
{
XLogRecPtr recptr;
XLogRecData rdata[4];
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 3054f98..a7dc2a5 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -117,7 +117,7 @@ gistbuild(PG_FUNCTION_ARGS)
MarkBufferDirty(buffer);
- if (!index->rd_istemp)
+ if (RelationNeedsWAL(index))
{
XLogRecPtr recptr;
XLogRecData rdata;
@@ -403,7 +403,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
dist->page = BufferGetPage(dist->buffer);
}
- if (!state->r->rd_istemp)
+ if (RelationNeedsWAL(state->r))
{
XLogRecPtr recptr;
XLogRecData *rdata;
@@ -467,7 +467,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
MarkBufferDirty(state->stack->buffer);
- if (!state->r->rd_istemp)
+ if (RelationNeedsWAL(state->r))
{
OffsetNumber noffs = 0,
offs[1];
@@ -552,7 +552,7 @@ gistfindleaf(GISTInsertState *state, GISTSTATE *giststate)
opaque = GistPageGetOpaque(state->stack->page);
state->stack->lsn = PageGetLSN(state->stack->page);
- Assert(state->r->rd_istemp || !XLogRecPtrIsInvalid(state->stack->lsn));
+ Assert(!RelationNeedsWAL(state->r) || !XLogRecPtrIsInvalid(state->stack->lsn));
if (state->stack->blkno != GIST_ROOT_BLKNO &&
XLByteLT(state->stack->parent->lsn, opaque->nsn))
@@ -913,7 +913,7 @@ gistmakedeal(GISTInsertState *state, GISTSTATE *giststate)
}
/* say to xlog that insert is completed */
- if (state->needInsertComplete && !state->r->rd_istemp)
+ if (state->needInsertComplete && RelationNeedsWAL(state->r))
gistxlogInsertCompletion(state->r->rd_node, &(state->key), 1);
}
@@ -1013,7 +1013,7 @@ gistnewroot(Relation r, Buffer buffer, IndexTuple *itup, int len, ItemPointer ke
MarkBufferDirty(buffer);
- if (!r->rd_istemp)
+ if (RelationNeedsWAL(r))
{
XLogRecPtr recptr;
XLogRecData *rdata;
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 0ff5ba8..26bdb20 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -248,7 +248,7 @@ gistbulkdelete(PG_FUNCTION_ARGS)
PageIndexTupleDelete(page, todelete[i]);
GistMarkTuplesDeleted(page);
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecData *rdata;
XLogRecPtr recptr;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8b064bc..8f368a2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -124,7 +124,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
*
* During a rescan, don't make a new strategy object if we don't have to.
*/
- if (!scan->rs_rd->rd_istemp &&
+ if (!RelationUsesLocalBuffers(scan->rs_rd) &&
scan->rs_nblocks > NBuffers / 4)
{
allow_strat = scan->rs_allow_strat;
@@ -905,7 +905,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
elog(ERROR, "could not open relation with OID %u", relationId);
/* Make note that we've accessed a temporary relation */
- if (r->rd_istemp)
+ if (RelationUsesLocalBuffers(r))
MyXactAccessedTempRel = true;
pgstat_initstats(r);
@@ -951,7 +951,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
elog(ERROR, "could not open relation with OID %u", relationId);
/* Make note that we've accessed a temporary relation */
- if (r->rd_istemp)
+ if (RelationUsesLocalBuffers(r))
MyXactAccessedTempRel = true;
pgstat_initstats(r);
@@ -1917,7 +1917,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!(options & HEAP_INSERT_SKIP_WAL) && !relation->rd_istemp)
+ if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
{
xl_heap_insert xlrec;
xl_heap_header xlhdr;
@@ -2227,7 +2227,7 @@ l1:
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_delete xlrec;
XLogRecPtr recptr;
@@ -2780,7 +2780,7 @@ l2:
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
XLogRecPtr recptr = log_heap_update(relation, buffer, oldtup.t_self,
newbuf, heaptup,
@@ -3403,7 +3403,7 @@ l3:
* (Also, in a PITR log-shipping or 2PC environment, we have to have XLOG
* entries for everything anyway.)
*/
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_lock xlrec;
XLogRecPtr recptr;
@@ -3505,7 +3505,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
xl_heap_inplace xlrec;
XLogRecPtr recptr;
@@ -3852,8 +3852,8 @@ log_heap_clean(Relation reln, Buffer buffer,
XLogRecPtr recptr;
XLogRecData rdata[4];
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
xlrec.node = reln->rd_node;
xlrec.block = BufferGetBlockNumber(buffer);
@@ -3935,8 +3935,8 @@ log_heap_freeze(Relation reln, Buffer buffer,
XLogRecPtr recptr;
XLogRecData rdata[2];
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
/* nor when there are no tuples to freeze */
Assert(offcnt > 0);
@@ -3981,8 +3981,8 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
XLogRecData rdata[4];
Page page = BufferGetPage(newbuf);
- /* Caller should not call me on a temp relation */
- Assert(!reln->rd_istemp);
+ /* Caller should not call me on a non-WAL-logged relation */
+ Assert(RelationNeedsWAL(reln));
if (HeapTupleIsHeapOnly(newtup))
info = XLOG_HEAP_HOT_UPDATE;
@@ -4982,7 +4982,7 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
* heap_sync - sync a heap, for use when no WAL has been written
*
* This forces the heap contents (including TOAST heap if any) down to disk.
- * If we skipped using WAL, and it's not a temp relation, we must force the
+ * If we skipped using WAL, and WAL is otherwise needed, we must force the
* relation down to disk before it's safe to commit the transaction. This
* requires writing out any dirty buffers and then doing a forced fsync.
*
@@ -4995,8 +4995,8 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec)
void
heap_sync(Relation rel)
{
- /* temp tables never need fsync */
- if (rel->rd_istemp)
+ /* non-WAL-logged tables never need fsync */
+ if (!RelationNeedsWAL(rel))
return;
/* main heap */
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b8c4027..40eadb8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -233,7 +233,7 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
/*
* Emit a WAL HEAP_CLEAN record showing what we did
*/
- if (!relation->rd_istemp)
+ if (RelationNeedsWAL(relation))
{
XLogRecPtr recptr;
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 19ca302..eb2dbff 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -277,8 +277,8 @@ end_heap_rewrite(RewriteState state)
}
/*
- * If the rel isn't temp, must fsync before commit. We use heap_sync to
- * ensure that the toast table gets fsync'd too.
+ * If the rel is WAL-logged, must fsync before commit. We use heap_sync
+ * to ensure that the toast table gets fsync'd too.
*
* It's obvious that we must do this when not WAL-logging. It's less
* obvious that we have to do it even if we did WAL-log the pages. The
@@ -287,7 +287,7 @@ end_heap_rewrite(RewriteState state)
* occurring during the rewriteheap operation won't have fsync'd data we
* wrote before the checkpoint.
*/
- if (!state->rs_new_rel->rd_istemp)
+ if (RelationNeedsWAL(state->rs_new_rel))
heap_sync(state->rs_new_rel);
/* Deleting the context frees everything */
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index eaad812..ee0f04c 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -766,7 +766,7 @@ _bt_insertonpg(Relation rel,
}
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_insert xlrec;
BlockNumber xldownlink;
@@ -1165,7 +1165,7 @@ _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,
}
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_split xlrec;
uint8 xlinfo;
@@ -1914,7 +1914,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
MarkBufferDirty(metabuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_newroot xlrec;
XLogRecPtr recptr;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index e0c0f21..2b44780 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -224,7 +224,7 @@ _bt_getroot(Relation rel, int access)
MarkBufferDirty(metabuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_newroot xlrec;
XLogRecPtr recptr;
@@ -452,7 +452,7 @@ _bt_checkpage(Relation rel, Buffer buf)
static void
_bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedXid)
{
- if (rel->rd_istemp)
+ if (!RelationNeedsWAL(rel))
return;
/* No ereport(ERROR) until changes are logged */
@@ -751,7 +751,7 @@ _bt_delitems_vacuum(Relation rel, Buffer buf,
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecPtr recptr;
XLogRecData rdata[2];
@@ -829,7 +829,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
XLogRecPtr recptr;
XLogRecData rdata[3];
@@ -1365,7 +1365,7 @@ _bt_pagedel(Relation rel, Buffer buf, BTStack stack)
MarkBufferDirty(lbuf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_btree_delete_page xlrec;
xl_btree_metadata xlmeta;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index a1d3aef..3fb43a2 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -211,9 +211,9 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
/*
* We need to log index creation in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp index.
+ * enabled UNLESS the index isn't WAL-logged anyway.
*/
- wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;
+ wstate.btws_use_wal = XLogIsNeeded() && RelationNeedsWAL(wstate.index);
/* reserve the metapage */
wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
@@ -797,9 +797,9 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
_bt_uppershutdown(wstate, state);
/*
- * If the index isn't temp, we must fsync it down to disk before it's safe
- * to commit the transaction. (For a temp index we don't care since the
- * index will be uninteresting after a crash anyway.)
+ * If the index is WAL-logged, we must fsync it down to disk before it's
+ * safe to commit the transaction. (For a non-WAL-logged index we don't
+ * care since the index will be uninteresting after a crash anyway.)
*
* It's obvious that we must do this when not WAL-logging the build. It's
* less obvious that we have to do it even if we did WAL-log the index
@@ -811,7 +811,7 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2)
* fsync those pages here, they might still not be on disk when the crash
* occurs.
*/
- if (!wstate->index->rd_istemp)
+ if (RelationNeedsWAL(wstate->index))
{
RelationOpenSmgr(wstate->index);
smgrimmedsync(wstate->index->rd_smgr, MAIN_FORKNUM);
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index e475403..73ef114 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -219,6 +219,7 @@ Boot_CreateStmt:
$3,
tupdesc,
RELKIND_RELATION,
+ RELPERSISTENCE_PERMANENT,
shared_relation,
mapped_relation,
true);
@@ -238,6 +239,7 @@ Boot_CreateStmt:
tupdesc,
NIL,
RELKIND_RELATION,
+ RELPERSISTENCE_PERMANENT,
shared_relation,
mapped_relation,
true,
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6322512..88b5c2a 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -524,12 +524,26 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
* created by bootstrap have preassigned OIDs, so there's no need.
*/
Oid
-GetNewRelFileNode(Oid reltablespace, Relation pg_class, BackendId backend)
+GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
{
RelFileNodeBackend rnode;
char *rpath;
int fd;
bool collides;
+ BackendId backend;
+
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_TEMP:
+ backend = MyBackendId;
+ break;
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ return InvalidOid; /* placate compiler */
+ }
/* This logic should match RelationInitPhysicalAddr */
rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index dcc53e1..cda9000 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -237,6 +237,7 @@ heap_create(const char *relname,
Oid relid,
TupleDesc tupDesc,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool allow_system_table_mods)
@@ -310,7 +311,8 @@ heap_create(const char *relname,
relid,
reltablespace,
shared_relation,
- mapped_relation);
+ mapped_relation,
+ relpersistence);
/*
* Have the storage manager create the relation's disk file, if needed.
@@ -321,7 +323,7 @@ heap_create(const char *relname,
if (create_storage)
{
RelationOpenSmgr(rel);
- RelationCreateStorage(rel->rd_node, rel->rd_istemp);
+ RelationCreateStorage(rel->rd_node, relpersistence);
}
return rel;
@@ -692,7 +694,7 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
- values[Anum_pg_class_relistemp - 1] = BoolGetDatum(rd_rel->relistemp);
+ values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
values[Anum_pg_class_relkind - 1] = CharGetDatum(rd_rel->relkind);
values[Anum_pg_class_relnatts - 1] = Int16GetDatum(rd_rel->relnatts);
values[Anum_pg_class_relchecks - 1] = Int16GetDatum(rd_rel->relchecks);
@@ -897,6 +899,7 @@ heap_create_with_catalog(const char *relname,
TupleDesc tupdesc,
List *cooked_constraints,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool oidislocal,
@@ -996,8 +999,7 @@ heap_create_with_catalog(const char *relname,
}
else
relid = GetNewRelFileNode(reltablespace, pg_class_desc,
- isTempOrToastNamespace(relnamespace) ?
- MyBackendId : InvalidBackendId);
+ relpersistence);
}
/*
@@ -1035,6 +1037,7 @@ heap_create_with_catalog(const char *relname,
relid,
tupdesc,
relkind,
+ relpersistence,
shared_relation,
mapped_relation,
allow_system_table_mods);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b437c99..8fbe8eb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -545,6 +545,7 @@ index_create(Oid heapRelationId,
bool is_exclusion;
Oid namespaceId;
int i;
+ char relpersistence;
is_exclusion = (indexInfo->ii_ExclusionOps != NULL);
@@ -561,11 +562,13 @@ index_create(Oid heapRelationId,
/*
* The index will be in the same namespace as its parent table, and is
* shared across databases if and only if the parent is. Likewise, it
- * will use the relfilenode map if and only if the parent does.
+ * will use the relfilenode map if and only if the parent does; and it
+ * inherits the parent's relpersistence.
*/
namespaceId = RelationGetNamespace(heapRelation);
shared_relation = heapRelation->rd_rel->relisshared;
mapped_relation = RelationIsMapped(heapRelation);
+ relpersistence = heapRelation->rd_rel->relpersistence;
/*
* check parameters
@@ -646,9 +649,7 @@ index_create(Oid heapRelationId,
else
{
indexRelationId =
- GetNewRelFileNode(tableSpaceId, pg_class,
- heapRelation->rd_istemp ?
- MyBackendId : InvalidBackendId);
+ GetNewRelFileNode(tableSpaceId, pg_class, relpersistence);
}
}
@@ -663,6 +664,7 @@ index_create(Oid heapRelationId,
indexRelationId,
indexTupDesc,
RELKIND_INDEX,
+ relpersistence,
shared_relation,
mapped_relation,
allow_system_table_mods);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 3727146..aa37097 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -235,14 +235,14 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
}
/*
- * If istemp is set, this is a reference to a temp relation. The parser
- * never generates such a RangeVar in simple DML, but it can happen in
- * contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)". Such a
- * command will generate an added CREATE INDEX operation, which must be
+ * Some non-default relpersistence value may have been specified. The
+ * parser never generates such a RangeVar in simple DML, but it can happen
+ * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)". Such
+ * a command will generate an added CREATE INDEX operation, which must be
* careful to find the temp table, even when pg_temp is not first in the
* search path.
*/
- if (relation->istemp)
+ if (relation->relpersistence == RELPERSISTENCE_TEMP)
{
if (relation->schemaname)
ereport(ERROR,
@@ -308,7 +308,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
newRelation->relname)));
}
- if (newRelation->istemp)
+ if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
{
/* TEMP tables are created in our backend-local temp namespace */
if (newRelation->schemaname)
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 0ce2051..671aaff 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -95,19 +95,35 @@ typedef struct xl_smgr_truncate
* transaction aborts later on, the storage will be destroyed.
*/
void
-RelationCreateStorage(RelFileNode rnode, bool istemp)
+RelationCreateStorage(RelFileNode rnode, char relpersistence)
{
PendingRelDelete *pending;
XLogRecPtr lsn;
XLogRecData rdata;
xl_smgr_create xlrec;
SMgrRelation srel;
- BackendId backend = istemp ? MyBackendId : InvalidBackendId;
+ BackendId backend;
+ bool needs_wal;
+
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_TEMP:
+ backend = MyBackendId;
+ needs_wal = false;
+ break;
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ needs_wal = true;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ return; /* placate compiler */
+ }
srel = smgropen(rnode, backend);
smgrcreate(srel, MAIN_FORKNUM, false);
- if (!istemp)
+ if (needs_wal)
{
/*
* Make an XLOG entry reporting the file creation.
@@ -253,7 +269,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
* failure to truncate, that might spell trouble at WAL replay, into a
* certain PANIC.
*/
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
/*
* Make an XLOG entry reporting the file truncation.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7bf64e2..d1f6c9f 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -195,7 +195,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
* Toast tables for regular relations go in pg_toast; those for temp
* relations go into the per-backend temp-toast-table namespace.
*/
- if (rel->rd_backend == MyBackendId)
+ if (RelationUsesTempNamespace(rel))
namespaceid = GetTempToastNamespace();
else
namespaceid = PG_TOAST_NAMESPACE;
@@ -216,6 +216,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
tupdesc,
NIL,
RELKIND_TOASTVALUE,
+ rel->rd_rel->relpersistence,
shared_relation,
mapped_relation,
true,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index bb7cd74..9fdc471 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -675,6 +675,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace)
tupdesc,
NIL,
OldHeap->rd_rel->relkind,
+ OldHeap->rd_rel->relpersistence,
false,
RelationIsMapped(OldHeap),
true,
@@ -789,9 +790,9 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp rel.
+ * enabled AND it's not a WAL-logged rel.
*/
- use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;
+ use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
/* use_wal off requires smgr_targblock be initially invalid */
Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9407d0f..0940893 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -222,7 +222,7 @@ DefineIndex(RangeVar *heapRelation,
}
else
{
- tablespaceId = GetDefaultTablespace(rel->rd_istemp);
+ tablespaceId = GetDefaultTablespace(rel->rd_rel->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -1706,7 +1706,7 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
continue;
/* Skip temp tables of other backends; we can't reindex them at all */
- if (classtuple->relistemp &&
+ if (classtuple->relpersistence == RELPERSISTENCE_TEMP &&
!isTempNamespace(classtuple->relnamespace))
continue;
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 04b0c71..aae9f92 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -274,7 +274,7 @@ DefineSequence(CreateSeqStmt *seq)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!rel->rd_istemp)
+ if (RelationNeedsWAL(rel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -379,7 +379,7 @@ AlterSequenceInternal(Oid relid, List *options)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!seqrel->rd_istemp)
+ if (RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -609,7 +609,7 @@ nextval_internal(Oid relid)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (logit && !seqrel->rd_istemp)
+ if (logit && RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
@@ -786,7 +786,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
MarkBufferDirty(buf);
/* XLOG stuff */
- if (!seqrel->rd_istemp)
+ if (RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6ec8a85..6252622 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -223,7 +223,7 @@ static const struct dropmsgstrings dropmsgstringarray[] = {
static void truncate_check_rel(Relation rel);
-static List *MergeAttributes(List *schema, List *supers, bool istemp,
+static List *MergeAttributes(List *schema, List *supers, char relpersistence,
List **supOids, List **supconstr, int *supOidCount);
static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
static bool change_varattnos_walker(Node *node, const AttrNumber *newattno);
@@ -334,7 +334,7 @@ static void ATPrepAddInherit(Relation child_rel);
static void ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode);
static void ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode);
static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
- ForkNumber forkNum, bool istemp);
+ ForkNumber forkNum, char relpersistence);
static const char *storage_name(char c);
@@ -386,7 +386,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
/*
* Check consistency of arguments
*/
- if (stmt->oncommit != ONCOMMIT_NOOP && !stmt->relation->istemp)
+ if (stmt->oncommit != ONCOMMIT_NOOP
+ && stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("ON COMMIT can only be used on temporary tables")));
@@ -396,7 +397,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
* code. This is needed because calling code might not expect untrusted
* tables to appear in pg_temp at the front of its search path.
*/
- if (stmt->relation->istemp && InSecurityRestrictedOperation())
+ if (stmt->relation->relpersistence == RELPERSISTENCE_TEMP
+ && InSecurityRestrictedOperation())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("cannot create temporary table within security-restricted operation")));
@@ -429,7 +431,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
}
else
{
- tablespaceId = GetDefaultTablespace(stmt->relation->istemp);
+ tablespaceId = GetDefaultTablespace(stmt->relation->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -473,7 +475,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
* inherited attributes.
*/
schema = MergeAttributes(schema, stmt->inhRelations,
- stmt->relation->istemp,
+ stmt->relation->relpersistence,
&inheritOids, &old_constraints, &parentOidCount);
/*
@@ -552,6 +554,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
list_concat(cookedDefaults,
old_constraints),
relkind,
+ stmt->relation->relpersistence,
false,
false,
localHasOids,
@@ -1213,7 +1216,7 @@ storage_name(char c)
*----------
*/
static List *
-MergeAttributes(List *schema, List *supers, bool istemp,
+MergeAttributes(List *schema, List *supers, char relpersistence,
List **supOids, List **supconstr, int *supOidCount)
{
ListCell *entry;
@@ -1321,7 +1324,8 @@ MergeAttributes(List *schema, List *supers, bool istemp,
errmsg("inherited relation \"%s\" is not a table",
parent->relname)));
/* Permanent rels cannot inherit from temporary ones */
- if (!istemp && relation->rd_istemp)
+ if (relpersistence != RELPERSISTENCE_TEMP
+ && RelationUsesTempNamespace(relation))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot inherit from temporary relation \"%s\"",
@@ -5062,23 +5066,23 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
RelationGetRelationName(pkrel))));
/*
- * Disallow reference from permanent table to temp table or vice versa.
- * (The ban on perm->temp is for fairly obvious reasons. The ban on
- * temp->perm is because other backends might need to run the RI triggers
- * on the perm table, but they can't reliably see tuples the owning
- * backend has created in the temp table, because non-shared buffers are
- * used for temp tables.)
+ * References from permanent tables to temp tables are disallowed because
+ * the contents of the temp table disappear at the end of each session.
+ * References from temp tables to permanent tables are also disallowed,
+ * because other backends might need to run the RI triggers on the perm
+ * table, but they can't reliably see tuples in the local buffers of other
+ * backends.
*/
- if (pkrel->rd_istemp)
+ if (RelationUsesLocalBuffers(pkrel))
{
- if (!rel->rd_istemp)
+ if (!RelationUsesLocalBuffers(rel))
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot reference temporary table from permanent table constraint")));
}
else
{
- if (rel->rd_istemp)
+ if (RelationUsesLocalBuffers(rel))
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot reference permanent table from temporary table constraint")));
@@ -7285,7 +7289,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
* Relfilenodes are not unique across tablespaces, so we need to allocate
* a new one in the new tablespace.
*/
- newrelfilenode = GetNewRelFileNode(newTableSpace, NULL, rel->rd_backend);
+ newrelfilenode = GetNewRelFileNode(newTableSpace, NULL,
+ rel->rd_rel->relpersistence);
/* Open old and new relation */
newrnode = rel->rd_node;
@@ -7302,10 +7307,11 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
* NOTE: any conflict in relfilenode value will be caught in
* RelationCreateStorage().
*/
- RelationCreateStorage(newrnode, rel->rd_istemp);
+ RelationCreateStorage(newrnode, rel->rd_rel->relpersistence);
/* copy main fork */
- copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM, rel->rd_istemp);
+ copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM,
+ rel->rd_rel->relpersistence);
/* copy those extra forks that exist */
for (forkNum = MAIN_FORKNUM + 1; forkNum <= MAX_FORKNUM; forkNum++)
@@ -7313,7 +7319,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
if (smgrexists(rel->rd_smgr, forkNum))
{
smgrcreate(dstrel, forkNum, false);
- copy_relation_data(rel->rd_smgr, dstrel, forkNum, rel->rd_istemp);
+ copy_relation_data(rel->rd_smgr, dstrel, forkNum,
+ rel->rd_rel->relpersistence);
}
}
@@ -7348,7 +7355,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
*/
static void
copy_relation_data(SMgrRelation src, SMgrRelation dst,
- ForkNumber forkNum, bool istemp)
+ ForkNumber forkNum, char relpersistence)
{
char *buf;
Page page;
@@ -7367,9 +7374,9 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
- * enabled AND it's not a temp rel.
+ * enabled AND it's a permanent relation.
*/
- use_wal = XLogIsNeeded() && !istemp;
+ use_wal = XLogIsNeeded() && relpersistence == RELPERSISTENCE_PERMANENT;
nblocks = smgrnblocks(src, forkNum);
@@ -7408,7 +7415,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
* wouldn't replay our earlier WAL entries. If we do not fsync those pages
* here, they might still not be on disk when the crash occurs.
*/
- if (!istemp)
+ if (relpersistence == RELPERSISTENCE_PERMANENT)
smgrimmedsync(dst, forkNum);
}
@@ -7476,7 +7483,8 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
ATSimplePermissions(parent_rel, false, false);
/* Permanent rels cannot inherit from temporary ones */
- if (parent_rel->rd_istemp && !child_rel->rd_istemp)
+ if (RelationUsesTempNamespace(parent_rel)
+ && !RelationUsesTempNamespace(child_rel))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot inherit from temporary relation \"%s\"",
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 590eee5..c8192a3 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1045,8 +1045,8 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
/*
* GetDefaultTablespace -- get the OID of the current default tablespace
*
- * Regular objects and temporary objects have different default tablespaces,
- * hence the forTemp parameter must be specified.
+ * Temporary objects have different default tablespaces, hence the
+ * relpersistence parameter must be specified.
*
* May return InvalidOid to indicate "use the database's default tablespace".
*
@@ -1057,12 +1057,12 @@ assign_default_tablespace(const char *newval, bool doit, GucSource source)
* default_tablespace GUC variable.
*/
Oid
-GetDefaultTablespace(bool forTemp)
+GetDefaultTablespace(char relpersistence)
{
Oid result;
/* The temp-table case is handled elsewhere */
- if (forTemp)
+ if (relpersistence == RELPERSISTENCE_TEMP)
{
PrepareTempTablespaces();
return GetNextTempTableSpace();
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 0ac993f..cbdf97d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -268,10 +268,10 @@ static void
vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
{
/*
- * No need to log changes for temp tables, they do not contain data
- * visible on the standby server.
+ * Skip this for relations for which no WAL is to be written, or if we're
+ * not trying to support archive recovery.
*/
- if (rel->rd_istemp || !XLogIsNeeded())
+ if (!RelationNeedsWAL(rel) || !XLogIsNeeded())
return;
/*
@@ -664,8 +664,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
if (nfrozen > 0)
{
MarkBufferDirty(buf);
- /* no XLOG for temp tables, though */
- if (!onerel->rd_istemp)
+ if (RelationNeedsWAL(onerel))
{
XLogRecPtr recptr;
@@ -895,7 +894,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
MarkBufferDirty(buffer);
/* XLOG stuff */
- if (!onerel->rd_istemp)
+ if (RelationNeedsWAL(onerel))
{
XLogRecPtr recptr;
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 09ab24b..2b2b908 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -68,10 +68,10 @@ isViewOnTempTable_walker(Node *node, void *context)
if (rte->rtekind == RTE_RELATION)
{
Relation rel = heap_open(rte->relid, AccessShareLock);
- bool istemp = rel->rd_istemp;
+ char relpersistence = rel->rd_rel->relpersistence;
heap_close(rel, AccessShareLock);
- if (istemp)
+ if (relpersistence == RELPERSISTENCE_TEMP)
return true;
}
}
@@ -173,9 +173,9 @@ DefineVirtualRelation(const RangeVar *relation, List *tlist, bool replace)
/*
* Due to the namespace visibility rules for temporary objects, we
* should only end up replacing a temporary view with another
- * temporary view, and vice versa.
+ * temporary view, and similarly for permanent views.
*/
- Assert(relation->istemp == rel->rd_istemp);
+ Assert(relation->relpersistence == rel->rd_rel->relpersistence);
/*
* Create a tuple descriptor to compare against the existing view, and
@@ -454,10 +454,11 @@ DefineView(ViewStmt *stmt, const char *queryString)
* schema name.
*/
view = stmt->view;
- if (!view->istemp && isViewOnTempTable(viewParse))
+ if (view->relpersistence == RELPERSISTENCE_PERMANENT
+ && isViewOnTempTable(viewParse))
{
view = copyObject(view); /* don't corrupt original command */
- view->istemp = true;
+ view->relpersistence = RELPERSISTENCE_TEMP;
ereport(NOTICE,
(errmsg("view \"%s\" will be a temporary view",
view->relname)));
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 69f3a28..c4719f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2131,7 +2131,8 @@ OpenIntoRel(QueryDesc *queryDesc)
/*
* Check consistency of arguments
*/
- if (into->onCommit != ONCOMMIT_NOOP && !into->rel->istemp)
+ if (into->onCommit != ONCOMMIT_NOOP
+ && into->rel->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("ON COMMIT can only be used on temporary tables")));
@@ -2141,7 +2142,8 @@ OpenIntoRel(QueryDesc *queryDesc)
* code. This is needed because calling code might not expect untrusted
* tables to appear in pg_temp at the front of its search path.
*/
- if (into->rel->istemp && InSecurityRestrictedOperation())
+ if (into->rel->relpersistence == RELPERSISTENCE_TEMP
+ && InSecurityRestrictedOperation())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("cannot create temporary table within security-restricted operation")));
@@ -2168,7 +2170,7 @@ OpenIntoRel(QueryDesc *queryDesc)
}
else
{
- tablespaceId = GetDefaultTablespace(into->rel->istemp);
+ tablespaceId = GetDefaultTablespace(into->rel->relpersistence);
/* note InvalidOid is OK in this case */
}
@@ -2208,6 +2210,7 @@ OpenIntoRel(QueryDesc *queryDesc)
tupdesc,
NIL,
RELKIND_RELATION,
+ into->rel->relpersistence,
false,
false,
true,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e91044b..32aafc8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -955,7 +955,7 @@ _copyRangeVar(RangeVar *from)
COPY_STRING_FIELD(schemaname);
COPY_STRING_FIELD(relname);
COPY_SCALAR_FIELD(inhOpt);
- COPY_SCALAR_FIELD(istemp);
+ COPY_SCALAR_FIELD(relpersistence);
COPY_NODE_FIELD(alias);
COPY_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 73b28f9..1f7b5f3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -104,7 +104,7 @@ _equalRangeVar(RangeVar *a, RangeVar *b)
COMPARE_STRING_FIELD(schemaname);
COMPARE_STRING_FIELD(relname);
COMPARE_SCALAR_FIELD(inhOpt);
- COMPARE_SCALAR_FIELD(istemp);
+ COMPARE_SCALAR_FIELD(relpersistence);
COMPARE_NODE_FIELD(alias);
COMPARE_LOCATION_FIELD(location);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 4b268f3..f06f73b 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -15,6 +15,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -378,7 +379,7 @@ makeRangeVar(char *schemaname, char *relname, int location)
r->schemaname = schemaname;
r->relname = relname;
r->inhOpt = INH_DEFAULT;
- r->istemp = false;
+ r->relpersistence = RELPERSISTENCE_PERMANENT;
r->alias = NULL;
r->location = location;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 61aea61..66a5f33 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -839,7 +839,7 @@ _outRangeVar(StringInfo str, RangeVar *node)
WRITE_STRING_FIELD(schemaname);
WRITE_STRING_FIELD(relname);
WRITE_ENUM_FIELD(inhOpt, InhOption);
- WRITE_BOOL_FIELD(istemp);
+ WRITE_CHAR_FIELD(relpersistence);
WRITE_NODE_FIELD(alias);
WRITE_LOCATION_FIELD(location);
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2166a5d..933d58a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -373,7 +373,7 @@ _readRangeVar(void)
READ_STRING_FIELD(schemaname);
READ_STRING_FIELD(relname);
READ_ENUM_FIELD(inhOpt, InhOption);
- READ_BOOL_FIELD(istemp);
+ READ_CHAR_FIELD(relpersistence);
READ_NODE_FIELD(alias);
READ_LOCATION_FIELD(location);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 1394b21..06707da 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -311,7 +311,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
%type <fun_param_mode> arg_class
%type <typnam> func_return func_type
-%type <boolean> OptTemp opt_trusted opt_restart_seqs
+%type <boolean> opt_trusted opt_restart_seqs
+%type <ival> OptTemp
%type <oncommit> OnCommitOption
%type <node> for_locking_item
@@ -2278,7 +2279,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptInherit OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->relation = $4;
n->tableElts = $6;
n->inhRelations = $8;
@@ -2294,7 +2295,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $7->istemp = $2;
+ $7->relpersistence = $2;
n->relation = $7;
n->tableElts = $9;
n->inhRelations = $11;
@@ -2309,7 +2310,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTypedTableElementList OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->relation = $4;
n->tableElts = $7;
n->ofTypename = makeTypeNameFromNameList($6);
@@ -2325,7 +2326,7 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
OptTypedTableElementList OptWith OnCommitOption OptTableSpace
{
CreateStmt *n = makeNode(CreateStmt);
- $7->istemp = $2;
+ $7->relpersistence = $2;
n->relation = $7;
n->tableElts = $10;
n->ofTypename = makeTypeNameFromNameList($9);
@@ -2346,13 +2347,13 @@ CreateStmt: CREATE OptTemp TABLE qualified_name '(' OptTableElementList ')'
* NOTE: we accept both GLOBAL and LOCAL options; since we have no modules
* the LOCAL keyword is really meaningless.
*/
-OptTemp: TEMPORARY { $$ = TRUE; }
- | TEMP { $$ = TRUE; }
- | LOCAL TEMPORARY { $$ = TRUE; }
- | LOCAL TEMP { $$ = TRUE; }
- | GLOBAL TEMPORARY { $$ = TRUE; }
- | GLOBAL TEMP { $$ = TRUE; }
- | /*EMPTY*/ { $$ = FALSE; }
+OptTemp: TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | LOCAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | LOCAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | GLOBAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
+ | GLOBAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; }
;
OptTableElementList:
@@ -2832,7 +2833,7 @@ CreateAsStmt:
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("CREATE TABLE AS cannot specify INTO"),
parser_errposition(exprLocation((Node *) n->intoClause))));
- $4->rel->istemp = $2;
+ $4->rel->relpersistence = $2;
n->intoClause = $4;
/* Implement WITH NO DATA by forcing top-level LIMIT 0 */
if (!$7)
@@ -2898,7 +2899,7 @@ CreateSeqStmt:
CREATE OptTemp SEQUENCE qualified_name OptSeqOptList
{
CreateSeqStmt *n = makeNode(CreateSeqStmt);
- $4->istemp = $2;
+ $4->relpersistence = $2;
n->sequence = $4;
n->options = $5;
n->ownerId = InvalidOid;
@@ -6543,7 +6544,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
{
ViewStmt *n = makeNode(ViewStmt);
n->view = $4;
- n->view->istemp = $2;
+ n->view->relpersistence = $2;
n->aliases = $5;
n->query = $7;
n->replace = false;
@@ -6554,7 +6555,7 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
{
ViewStmt *n = makeNode(ViewStmt);
n->view = $6;
- n->view->istemp = $4;
+ n->view->relpersistence = $4;
n->aliases = $7;
n->query = $9;
n->replace = true;
@@ -7250,7 +7251,7 @@ ExecuteStmt: EXECUTE name execute_param_clause
ExecuteStmt *n = makeNode(ExecuteStmt);
n->name = $7;
n->params = $8;
- $4->rel->istemp = $2;
+ $4->rel->relpersistence = $2;
n->into = $4;
if ($4->colNames)
ereport(ERROR,
@@ -7811,42 +7812,42 @@ OptTempTableName:
TEMPORARY opt_table qualified_name
{
$$ = $3;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| TEMP opt_table qualified_name
{
$$ = $3;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| LOCAL TEMPORARY opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| LOCAL TEMP opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| GLOBAL TEMPORARY opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| GLOBAL TEMP opt_table qualified_name
{
$$ = $4;
- $$->istemp = true;
+ $$->relpersistence = RELPERSISTENCE_TEMP;
}
| TABLE qualified_name
{
$$ = $2;
- $$->istemp = false;
+ $$->relpersistence = RELPERSISTENCE_PERMANENT;
}
| qualified_name
{
$$ = $1;
- $$->istemp = false;
+ $$->relpersistence = RELPERSISTENCE_PERMANENT;
}
;
@@ -10838,16 +10839,12 @@ qualified_name_list:
qualified_name:
ColId
{
- $$ = makeNode(RangeVar);
- $$->catalogname = NULL;
- $$->schemaname = NULL;
- $$->relname = $1;
- $$->location = @1;
+ $$ = makeRangeVar(NULL, $1, @1);
}
| ColId indirection
{
check_qualified_name($2, yyscanner);
- $$ = makeNode(RangeVar);
+ $$ = makeRangeVar(NULL, NULL, @1);
switch (list_length($2))
{
case 1:
@@ -10868,7 +10865,6 @@ qualified_name:
parser_errposition(@1)));
break;
}
- $$->location = @1;
}
;
@@ -12085,6 +12081,7 @@ makeRangeVarFromAnyName(List *names, int position, core_yyscan_t yyscanner)
break;
}
+ r->relpersistence = RELPERSISTENCE_PERMANENT;
r->location = position;
return r;
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index a8aee20..aa7c144 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -158,10 +158,11 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
* If the target relation name isn't schema-qualified, make it so. This
* prevents some corner cases in which added-on rewritten commands might
* think they should apply to other relations that have the same name and
- * are earlier in the search path. "istemp" is equivalent to a
- * specification of pg_temp, so no need for anything extra in that case.
+ * are earlier in the search path. But a local temp table is effectively
+ * specified to be in pg_temp, so no need for anything extra in that case.
*/
- if (stmt->relation->schemaname == NULL && !stmt->relation->istemp)
+ if (stmt->relation->schemaname == NULL
+ && stmt->relation->relpersistence != RELPERSISTENCE_TEMP)
{
Oid namespaceid = RangeVarGetCreationNamespace(stmt->relation);
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index a617b88..be7a69a 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1967,7 +1967,7 @@ do_autovacuum(void)
* Check if it is a temp table (presumably, of some other backend's).
* We cannot safely process other backends' temp tables.
*/
- if (classForm->relistemp)
+ if (classForm->relpersistence == RELPERSISTENCE_TEMP)
{
int backendID;
@@ -2064,7 +2064,7 @@ do_autovacuum(void)
/*
* We cannot safely process other backends' temp tables, so skip 'em.
*/
- if (classForm->relistemp)
+ if (classForm->relpersistence == RELPERSISTENCE_TEMP)
continue;
relid = HeapTupleGetOid(tuple);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 54c7109..11d5827 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2076,7 +2076,7 @@ FlushRelationBuffers(Relation rel)
/* Open rel at the smgr level if not already done */
RelationOpenSmgr(rel);
- if (rel->rd_istemp)
+ if (RelationUsesLocalBuffers(rel))
{
for (i = 0; i < NLocBuffer; i++)
{
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index f5250a2..e352cda 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -612,16 +612,26 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
- /* If temporary, determine owning backend. */
- if (!relform->relistemp)
- backend = InvalidBackendId;
- else if (isTempOrToastNamespace(relform->relnamespace))
- backend = MyBackendId;
- else
+ /* Determine owning backend. */
+ switch (relform->relpersistence)
{
- /* Do it the hard way. */
- backend = GetTempNamespaceBackendId(relform->relnamespace);
- Assert(backend != InvalidBackendId);
+ case RELPERSISTENCE_PERMANENT:
+ backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ if (isTempOrToastNamespace(relform->relnamespace))
+ backend = MyBackendId;
+ else
+ {
+ /* Do it the hard way. */
+ backend = GetTempNamespaceBackendId(relform->relnamespace);
+ Assert(backend != InvalidBackendId);
+ }
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relform->relpersistence);
+ backend = InvalidBackendId; /* placate compiler */
+ break;
}
ReleaseSysCache(tuple);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 2a44303..963eaae 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -856,20 +856,30 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
relation->rd_isnailed = false;
relation->rd_createSubid = InvalidSubTransactionId;
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
- relation->rd_istemp = relation->rd_rel->relistemp;
- if (!relation->rd_istemp)
- relation->rd_backend = InvalidBackendId;
- else if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
- relation->rd_backend = MyBackendId;
- else
+ switch (relation->rd_rel->relpersistence)
{
- /*
- * If it's a temporary table, but not one of ours, we have to use
- * the slow, grotty method to figure out the owning backend.
- */
- relation->rd_backend =
- GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
- Assert(relation->rd_backend != InvalidBackendId);
+ case RELPERSISTENCE_PERMANENT:
+ relation->rd_backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ if (isTempOrToastNamespace(relation->rd_rel->relnamespace))
+ relation->rd_backend = MyBackendId;
+ else
+ {
+ /*
+ * If it's a local temp table, but not one of ours, we have to
+ * use the slow, grotty method to figure out the owning
+ * backend.
+ */
+ relation->rd_backend =
+ GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
+ Assert(relation->rd_backend != InvalidBackendId);
+ }
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c",
+ relation->rd_rel->relpersistence);
+ break;
}
/*
@@ -1432,7 +1442,6 @@ formrdesc(const char *relationName, Oid relationReltype,
relation->rd_isnailed = true;
relation->rd_createSubid = InvalidSubTransactionId;
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
- relation->rd_istemp = false;
relation->rd_backend = InvalidBackendId;
/*
@@ -1458,11 +1467,8 @@ formrdesc(const char *relationName, Oid relationReltype,
if (isshared)
relation->rd_rel->reltablespace = GLOBALTABLESPACE_OID;
- /*
- * Likewise, we must know if a relation is temp ... but formrdesc is not
- * used for any temp relations.
- */
- relation->rd_rel->relistemp = false;
+ /* formrdesc is used only for permanent relations */
+ relation->rd_rel->relpersistence = RELPERSISTENCE_PERMANENT;
relation->rd_rel->relpages = 1;
relation->rd_rel->reltuples = 1;
@@ -2440,7 +2446,8 @@ RelationBuildLocalRelation(const char *relname,
Oid relid,
Oid reltablespace,
bool shared_relation,
- bool mapped_relation)
+ bool mapped_relation,
+ char relpersistence)
{
Relation rel;
MemoryContext oldcxt;
@@ -2514,10 +2521,6 @@ RelationBuildLocalRelation(const char *relname,
/* must flag that we have rels created in this transaction */
need_eoxact_work = true;
- /* it is temporary if and only if it is in my temp-table namespace */
- rel->rd_istemp = isTempOrToastNamespace(relnamespace);
- rel->rd_backend = rel->rd_istemp ? MyBackendId : InvalidBackendId;
-
/*
* create a new tuple descriptor from the one passed in. We do this
* partly to copy it into the cache context, and partly because the new
@@ -2557,6 +2560,21 @@ RelationBuildLocalRelation(const char *relname,
/* needed when bootstrapping: */
rel->rd_rel->relowner = BOOTSTRAP_SUPERUSERID;
+ /* set up persistence; rd_backend is a function of persistence type */
+ rel->rd_rel->relpersistence = relpersistence;
+ switch (relpersistence)
+ {
+ case RELPERSISTENCE_PERMANENT:
+ rel->rd_backend = InvalidBackendId;
+ break;
+ case RELPERSISTENCE_TEMP:
+ rel->rd_backend = MyBackendId;
+ break;
+ default:
+ elog(ERROR, "invalid relpersistence: %c", relpersistence);
+ break;
+ }
+
/*
* Insert relation physical and logical identifiers (OIDs) into the right
* places. Note that the physical ID (relfilenode) is initially the same
@@ -2565,7 +2583,6 @@ RelationBuildLocalRelation(const char *relname,
* map.
*/
rel->rd_rel->relisshared = shared_relation;
- rel->rd_rel->relistemp = rel->rd_istemp;
RelationGetRelid(rel) = relid;
@@ -2642,7 +2659,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
/* Allocate a new relfilenode */
newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
- relation->rd_backend);
+ relation->rd_rel->relpersistence);
/*
* Get a writable copy of the pg_class tuple for the given relation.
@@ -2665,7 +2682,7 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid)
newrnode.node = relation->rd_node;
newrnode.node.relNode = newrelfilenode;
newrnode.backend = relation->rd_backend;
- RelationCreateStorage(newrnode.node, relation->rd_istemp);
+ RelationCreateStorage(newrnode.node, relation->rd_rel->relpersistence);
smgrclosenode(newrnode);
/*
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 97c808b..56dcdd5 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -56,6 +56,6 @@ extern Oid GetNewOid(Relation relation);
extern Oid GetNewOidWithIndex(Relation relation, Oid indexId,
AttrNumber oidcolumn);
extern Oid GetNewRelFileNode(Oid reltablespace, Relation pg_class,
- BackendId backend);
+ char relpersistence);
#endif /* CATALOG_H */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 7795bda..646ab9c 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -40,6 +40,7 @@ extern Relation heap_create(const char *relname,
Oid relid,
TupleDesc tupDesc,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool allow_system_table_mods);
@@ -54,6 +55,7 @@ extern Oid heap_create_with_catalog(const char *relname,
TupleDesc tupdesc,
List *cooked_constraints,
char relkind,
+ char relpersistence,
bool shared_relation,
bool mapped_relation,
bool oidislocal,
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index f50cf9d..1edbfe3 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -49,7 +49,7 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
- bool relistemp; /* T if temporary relation */
+ char relpersistence; /* see RELPERSISTENCE_xxx constants */
char relkind; /* see RELKIND_xxx constants below */
int2 relnatts; /* number of user attributes */
@@ -108,7 +108,7 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltoastidxid 12
#define Anum_pg_class_relhasindex 13
#define Anum_pg_class_relisshared 14
-#define Anum_pg_class_relistemp 15
+#define Anum_pg_class_relpersistence 15
#define Anum_pg_class_relkind 16
#define Anum_pg_class_relnatts 17
#define Anum_pg_class_relchecks 18
@@ -132,13 +132,13 @@ typedef FormData_pg_class *Form_pg_class;
*/
/* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId */
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f f r 28 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f f r 19 0 f f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 19 0 f f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f f r 25 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 25 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f f r 27 0 t f f f f f 3 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f 3 _null_ _null_ ));
DESCR("");
#define RELKIND_INDEX 'i' /* secondary index */
@@ -149,4 +149,7 @@ DESCR("");
#define RELKIND_VIEW 'v' /* view */
#define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
+#define RELPERSISTENCE_PERMANENT 'p'
+#define RELPERSISTENCE_TEMP 't'
+
#endif /* PG_CLASS_H */
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index d7b8731..f086b1c 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -20,7 +20,7 @@
#include "storage/relfilenode.h"
#include "utils/relcache.h"
-extern void RelationCreateStorage(RelFileNode rnode, bool istemp);
+extern void RelationCreateStorage(RelFileNode rnode, char relpersistence);
extern void RelationDropStorage(Relation rel);
extern void RelationPreserveStorage(RelFileNode rnode);
extern void RelationTruncate(Relation rel, BlockNumber nblocks);
diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h
index 327fbc6..1e3f6ca 100644
--- a/src/include/commands/tablespace.h
+++ b/src/include/commands/tablespace.h
@@ -47,7 +47,7 @@ extern void AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt);
extern void TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo);
-extern Oid GetDefaultTablespace(bool forTemp);
+extern Oid GetDefaultTablespace(char relpersistence);
extern void PrepareTempTablespaces(void);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b17adf2..ba5ae37 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -74,7 +74,7 @@ typedef struct RangeVar
char *relname; /* the relation/sequence name */
InhOption inhOpt; /* expand rel by inheritance? recursively act
* on children? */
- bool istemp; /* is this a temp relation/sequence? */
+ char relpersistence; /* see RELPERSISTENCE_* in pg_class.h */
Alias *alias; /* table alias & optional column aliases */
int location; /* token location, or -1 if unknown */
} RangeVar;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 9ad92c2..8474d8f 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -132,7 +132,6 @@ typedef struct RelationData
struct SMgrRelationData *rd_smgr; /* cached file handle, or NULL */
int rd_refcnt; /* reference count */
BackendId rd_backend; /* owning backend id, if temporary relation */
- bool rd_istemp; /* rel is a temporary relation */
bool rd_isnailed; /* rel is nailed in cache */
bool rd_isvalid; /* relcache entry is valid */
char rd_indexvalid; /* state of rd_indexlist: 0 = not valid, 1 =
@@ -391,6 +390,27 @@ typedef struct StdRdOptions
} while (0)
/*
+ * RelationNeedsWAL
+ * True if relation needs WAL.
+ */
+#define RelationNeedsWAL(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
+
+/*
+ * RelationUsesLocalBuffers
+ * True if relation's pages are stored in local buffers.
+ */
+#define RelationUsesLocalBuffers(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
+ * RelationUsesTempNamespace
+ * True if relation's catalog entries live in a private namespace.
+ */
+#define RelationUsesTempNamespace(relation) \
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+
+/*
* RELATION_IS_LOCAL
* If a rel is either temp or newly created in the current transaction,
* it can be assumed to be visible only to the current backend.
@@ -408,7 +428,8 @@ typedef struct StdRdOptions
* Beware of multiple eval of argument
*/
#define RELATION_IS_OTHER_TEMP(relation) \
- ((relation)->rd_istemp && (relation)->rd_backend != MyBackendId)
+ ((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP \
+ && (relation)->rd_backend != MyBackendId)
/* routines in utils/cache/relcache.c */
extern void RelationIncrementReferenceCount(Relation rel);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 10d82d4..3500050 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,8 @@ extern Relation RelationBuildLocalRelation(const char *relname,
Oid relid,
Oid reltablespace,
bool shared_relation,
- bool mapped_relation);
+ bool mapped_relation,
+ char relpersistence);
/*
* Routine to manage assignment of new relfilenode to a relation
unlogged-tables-v1.patchapplication/octet-stream; name=unlogged-tables-v1.patchDownload
commit b9ee81f34dadb5b5ea8981ff6eb9bb894e16563e
Author: Robert Haas <rhaas@postgresql.org>
Date: Sat Nov 13 08:30:55 2010 -0500
Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
crash-safe and do not appear on standby servers. On restart, they are
truncated.
Currently, only btree indexes are support on unlogged tables.
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 925aac4..c599b95 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -167,6 +167,17 @@ ambuild (Relation heapRelation,
<para>
<programlisting>
+void
+ambuildempty (Relation indexRelation);
+</programlisting>
+ Build an empty index, and write it to the initialization fork (INIT_FORKNUM)
+ of the given relation. This method is called only for unlogged tables; the
+ empty index written to the initialization fork will be copied over the main
+ relation fork on each server restart.
+ </para>
+
+ <para>
+<programlisting>
bool
aminsert (Relation indexRelation,
Datum *values,
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 8635e80..7b0e14d 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable> ( [
{ <replaceable class="PARAMETER">column_name</replaceable> <replaceable class="PARAMETER">data_type</replaceable> [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
| <replaceable>table_constraint</replaceable>
| LIKE <replaceable>parent_table</replaceable> [ <replaceable>like_option</replaceable> ... ] }
@@ -32,7 +32,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="PARAMETER">tablespace</replaceable> ]
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="PARAMETER">table_name</replaceable>
OF <replaceable class="PARAMETER">type_name</replaceable> [ (
{ <replaceable class="PARAMETER">column_name</replaceable> WITH OPTIONS [ DEFAULT <replaceable>default_expr</replaceable> ] [ <replaceable class="PARAMETER">column_constraint</replaceable> [ ... ] ]
| <replaceable>table_constraint</replaceable> }
@@ -164,6 +164,22 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] <repl
</varlistentry>
<varlistentry>
+ <term><literal>UNLOGGED</></term>
+ <listitem>
+ <para>
+ If specified, the table is created as an unlogged table. Data written
+ to unlogged tables is not written to the write-ahead log (see <xref
+ linkend="wal">), which makes them considerably faster than ordinary
+ tables. However, it also means that the data stored in the tables is not
+ copied to standby servers and does not survive if
+ <productname>PostgreSQL</productname> is restarted. Unlogged tables are
+ automatically truncated on restart. Any indexes created on an unlogged
+ table are automatically unlogged as well.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>IF NOT EXISTS</></term>
<listitem>
<para>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 86da68b..0ea7ec2 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name</replaceable>
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE <replaceable>table_name</replaceable>
[ (<replaceable>column_name</replaceable> [, ...] ) ]
[ WITH ( <replaceable class="PARAMETER">storage_parameter</replaceable> [= <replaceable class="PARAMETER">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -82,6 +82,16 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE <replaceable>table_name
</varlistentry>
<varlistentry>
+ <term><literal>UNLOGGED</></term>
+ <listitem>
+ <para>
+ If specified, the table is created as an unlogged table.
+ Refer to <xref linkend="sql-createtable"> for details.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable>table_name</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 8681ede..7ec12b0 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -412,6 +412,19 @@ ginbuild(PG_FUNCTION_ARGS)
}
/*
+ * ginbuildempty() -- build an empty gin index in the initialization fork
+ */
+Datum
+ginbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged GIN indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* Inserts value during normal insertion
*/
static uint32
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index a7dc2a5..fdfb5d4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -210,6 +210,19 @@ gistbuildCallback(Relation index,
}
/*
+ * gistbuildempty() -- build an empty gist index in the initialization fork
+ */
+Datum
+gistbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged GIST indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* gistinsert -- wrapper for GiST tuple insertion.
*
* This is the public interface routine for tuple insertion in GiSTs.
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index bb46446..cbe8682 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -114,6 +114,19 @@ hashbuild(PG_FUNCTION_ARGS)
}
/*
+ * hashbuildempty() -- build an empty hash index in the initialization fork
+ */
+Datum
+hashbuildempty(PG_FUNCTION_ARGS)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unlogged hash indexes are not supported")));
+
+ PG_RETURN_VOID();
+}
+
+/*
* Per-tuple callback from IndexBuildHeapScan
*/
static void
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 46aeb9e..6ccc16d 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -29,6 +29,7 @@
#include "storage/indexfsm.h"
#include "storage/ipc.h"
#include "storage/lmgr.h"
+#include "storage/smgr.h"
#include "utils/memutils.h"
@@ -205,6 +206,36 @@ btbuildCallback(Relation index,
}
/*
+ * btbuildempty() -- build an empty btree index in the initialization fork
+ */
+Datum
+btbuildempty(PG_FUNCTION_ARGS)
+{
+ Relation index = (Relation) PG_GETARG_POINTER(0);
+ Page metapage;
+
+ /* Construct metapage. */
+ metapage = (Page) palloc(BLCKSZ);
+ _bt_initmetapage(metapage, P_NONE, 0);
+
+ /* Write the page. If archiving/streaming, XLOG it. */
+ smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
+ (char *) metapage, true);
+ if (XLogIsNeeded())
+ log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ BTREE_METAPAGE, metapage);
+
+ /*
+ * An immediate sync is require even if we xlog'd the page, because the
+ * write did not go through shared_buffers and therefore a concurrent
+ * checkpoint may have move the redo pointer past our xlog record.
+ */
+ smgrimmedsync(index->rd_smgr, INIT_FORKNUM);
+
+ PG_RETURN_VOID();
+}
+
+/*
* btinsert() -- insert an index tuple into a btree.
*
* Descend the tree recursively, find the appropriate location for our
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 70f4cc5..9a7b45f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -49,6 +49,7 @@
#include "storage/latch.h"
#include "storage/pmsignal.h"
#include "storage/procarray.h"
+#include "storage/reinit.h"
#include "storage/smgr.h"
#include "storage/spin.h"
#include "utils/builtins.h"
@@ -5996,6 +5997,16 @@ StartupXLOG(void)
InRecovery = true;
}
+ /*
+ * Blow away any leftover data in unlogged relations. This should be
+ * done BEFORE starting up Hot Standby, so that read-only backends don't
+ * see residual data from a previous startup. If redo isn't required or
+ * Hot Standby isn't enabled, we could do both the
+ * UNLOGGED_RELATION_CLEANUP and UNLOGGED_RELATION_INIT phases in once
+ * pass later on ... but for now, we don't bother to detect that case.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
/* REDO */
if (InRecovery)
{
@@ -6524,6 +6535,13 @@ StartupXLOG(void)
PreallocXlogFiles(EndOfLog);
/*
+ * Reset initial contents of unlogged relations. This has to be done
+ * AFTER recovery is complete so that any unlogged relations created
+ * during recovery also get picked up.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+ /*
* Okay, we're officially UP.
*/
InRecovery = false;
@@ -7024,6 +7042,14 @@ ShutdownXLOG(int code, Datum arg)
ShutdownSUBTRANS();
ShutdownMultiXact();
+ /*
+ * Remove any unlogged relation contents. This will happen anyway at
+ * the next startup; the point of doing it here is to avoid consuming
+ * a potentially large amount of disk space while we're shut down, for
+ * data that will be discarded anyway.
+ */
+ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
+
ereport(LOG,
(errmsg("database system is shut down")));
}
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 88b5c2a..fc5a8fc 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -55,7 +55,8 @@
const char *forkNames[] = {
"main", /* MAIN_FORKNUM */
"fsm", /* FSM_FORKNUM */
- "vm" /* VISIBILITYMAP_FORKNUM */
+ "vm", /* VISIBILITYMAP_FORKNUM */
+ "init" /* INIT_FORKNUM */
};
/*
@@ -82,14 +83,14 @@ forkname_to_number(char *forkName)
* We use this to figure out whether a filename could be a relation
* fork (as opposed to an oddly named stray file that somehow ended
* up in the database directory). If the passed string begins with
- * a fork name (other than the main fork name), we return its length.
- * If not, we return 0.
+ * a fork name (other than the main fork name), we return its length,
+ * and set *fork (if not NULL) to the fork number. If not, we return 0.
*
* Note that the present coding assumes that there are no fork names which
* are prefixes of other fork names.
*/
int
-forkname_chars(const char *str)
+forkname_chars(const char *str, ForkNumber *fork)
{
ForkNumber forkNum;
@@ -97,7 +98,11 @@ forkname_chars(const char *str)
{
int len = strlen(forkNames[forkNum]);
if (strncmp(forkNames[forkNum], str, len) == 0)
+ {
+ if (fork)
+ *fork = forkNum;
return len;
+ }
}
return 0;
}
@@ -537,6 +542,7 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
case RELPERSISTENCE_TEMP:
backend = MyBackendId;
break;
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
break;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cda9000..cd287b1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -317,8 +317,8 @@ heap_create(const char *relname,
/*
* Have the storage manager create the relation's disk file, if needed.
*
- * We only create the main fork here, other forks will be created on
- * demand.
+ * We only create the main fork here, other forks will be created as
+ * needed.
*/
if (create_storage)
{
@@ -1207,6 +1207,41 @@ heap_create_with_catalog(const char *relname,
register_on_commit_action(relid, oncommit);
/*
+ * If this is an unlogged relation, it needs an init fork so that it
+ * can be correctly reinitialized on restart.
+ */
+ if (relpersistence == RELPERSISTENCE_UNLOGGED)
+ {
+ Page dummypage;
+
+ Assert(relkind == RELKIND_RELATION || relkind == RELKIND_TOASTVALUE);
+
+ /*
+ * Technically, we just write an empty file here, but then there's
+ * nothing to XLOG. We could introduce a dedicated XLOG record to
+ * create an empty relation fork, but it's easier to just
+ * XLOG a blank page, which (during redo) will create the fork
+ * automatically.
+ */
+ dummypage = (Page) palloc0(BLCKSZ);
+
+ /* Create form, write page. If archiving/streaming, XLOG it. */
+ smgrcreate(new_rel_desc->rd_smgr, INIT_FORKNUM, false);
+ smgrwrite(new_rel_desc->rd_smgr, INIT_FORKNUM, 0,
+ (char *) dummypage, true);
+ if (XLogIsNeeded())
+ log_newpage(&new_rel_desc->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ 0, dummypage);
+
+ /*
+ * An immediate sync is require even if we xlog'd the page, because the
+ * write did not go through shared_buffers and therefore a concurrent
+ * checkpoint may have move the redo pointer past our xlog record.
+ */
+ smgrimmedsync(new_rel_desc->rd_smgr, INIT_FORKNUM);
+ }
+
+ /*
* ok, the relation has been cataloged, so close our relations and return
* the OID of the newly created relation.
*/
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8fbe8eb..22f0959 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -967,6 +967,17 @@ index_create(Oid heapRelationId,
}
/*
+ * If this is an unlogged index, we need to write out an init fork for it.
+ */
+ if (relpersistence == RELPERSISTENCE_UNLOGGED)
+ {
+ RegProcedure ambuildempty = indexRelation->rd_am->ambuildempty;
+ RelationOpenSmgr(indexRelation);
+ smgrcreate(indexRelation->rd_smgr, INIT_FORKNUM, false);
+ OidFunctionCall1(ambuildempty, PointerGetDatum(indexRelation));
+ }
+
+ /*
* Close the heap and index; but we keep the locks that we acquired above
* until end of transaction.
*/
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 671aaff..34ec77d 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -111,6 +111,10 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
backend = MyBackendId;
needs_wal = false;
break;
+ case RELPERSISTENCE_UNLOGGED:
+ backend = InvalidBackendId;
+ needs_wal = false;
+ break;
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
needs_wal = true;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 06707da..790c585 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -536,8 +536,8 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P
- UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNTIL
- UPDATE USER USING
+ UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
+ UNTIL UPDATE USER USING
VACUUM VALID VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING
VERBOSE VERSION_P VIEW VOLATILE
@@ -2353,6 +2353,7 @@ OptTemp: TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
| LOCAL TEMP { $$ = RELPERSISTENCE_TEMP; }
| GLOBAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; }
| GLOBAL TEMP { $$ = RELPERSISTENCE_TEMP; }
+ | UNLOGGED { $$ = RELPERSISTENCE_UNLOGGED; }
| /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; }
;
@@ -7839,6 +7840,11 @@ OptTempTableName:
$$ = $4;
$$->relpersistence = RELPERSISTENCE_TEMP;
}
+ | UNLOGGED opt_table qualified_name
+ {
+ $$ = $3;
+ $$->relpersistence = RELPERSISTENCE_UNLOGGED;
+ }
| TABLE qualified_name
{
$$ = $2;
@@ -11305,6 +11311,7 @@ unreserved_keyword:
| UNENCRYPTED
| UNKNOWN
| UNLISTEN
+ | UNLOGGED
| UNTIL
| UPDATE
| VACUUM
diff --git a/src/backend/storage/file/Makefile b/src/backend/storage/file/Makefile
index 3b93aa1..d2198f2 100644
--- a/src/backend/storage/file/Makefile
+++ b/src/backend/storage/file/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/storage/file
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = fd.o buffile.o copydir.o
+OBJS = fd.o buffile.o copydir.o reinit.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 4a10563..5af64d7 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -38,7 +38,6 @@
#endif
-static void copy_file(char *fromfile, char *tofile);
static void fsync_fname(char *fname, bool isdir);
@@ -142,7 +141,7 @@ copydir(char *fromdir, char *todir, bool recurse)
/*
* copy one file
*/
-static void
+void
copy_file(char *fromfile, char *tofile)
{
char *buffer;
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index fd5ec78..b218f70 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2054,7 +2054,7 @@ looks_like_temp_rel_name(const char *name)
/* We might have _forkname or .segment or both. */
if (name[pos] == '_')
{
- int forkchar = forkname_chars(&name[pos+1]);
+ int forkchar = forkname_chars(&name[pos+1], NULL);
if (forkchar <= 0)
return false;
pos += forkchar + 1;
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
new file mode 100644
index 0000000..b75178b
--- /dev/null
+++ b/src/backend/storage/file/reinit.c
@@ -0,0 +1,396 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.c
+ * Reinitialization of unlogged relations
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/storage/file/reinit.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "catalog/catalog.h"
+#include "storage/copydir.h"
+#include "storage/fd.h"
+#include "storage/reinit.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+
+static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
+ int op);
+static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
+ int op);
+static bool parse_filename_for_nontemp_relation(const char *name,
+ int *oidchars, ForkNumber *fork);
+
+typedef struct {
+ char oid[OIDCHARS+1];
+} unlogged_relation_entry;
+
+/*
+ * Reset unlogged relations from before the last restart.
+ *
+ * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any
+ * relation with an "init" fork, except for the "init" fork itself.
+ *
+ * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main
+ * fork.
+ */
+void
+ResetUnloggedRelations(int op)
+{
+ char temp_path[MAXPGPATH];
+ DIR *spc_dir;
+ struct dirent *spc_de;
+ MemoryContext tmpctx, oldctx;
+
+ /* Log it. */
+ ereport(DEBUG1,
+ (errmsg("resetting unlogged relations: cleanup %d init %d",
+ (op & UNLOGGED_RELATION_CLEANUP) != 0,
+ (op & UNLOGGED_RELATION_INIT) != 0)));
+
+ /*
+ * Just to be sure we don't leak any memory, let's create a temporary
+ * memory context for this operation.
+ */
+ tmpctx = AllocSetContextCreate(CurrentMemoryContext,
+ "ResetUnloggedRelations",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ oldctx = MemoryContextSwitchTo(tmpctx);
+
+ /*
+ * First process unlogged files in pg_default ($PGDATA/base)
+ */
+ ResetUnloggedRelationsInTablespaceDir("base", op);
+
+ /*
+ * Cycle through directories for all non-default tablespaces.
+ */
+ spc_dir = AllocateDir("pg_tblspc");
+
+ while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL)
+ {
+ if (strcmp(spc_de->d_name, ".") == 0 ||
+ strcmp(spc_de->d_name, "..") == 0)
+ continue;
+
+ snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
+ spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
+ ResetUnloggedRelationsInTablespaceDir(temp_path, op);
+ }
+
+ FreeDir(spc_dir);
+
+ /*
+ * Restore memory context.
+ */
+ MemoryContextSwitchTo(oldctx);
+ MemoryContextDelete(tmpctx);
+}
+
+/* Process one tablespace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
+{
+ DIR *ts_dir;
+ struct dirent *de;
+ char dbspace_path[MAXPGPATH];
+
+ ts_dir = AllocateDir(tsdirname);
+ if (ts_dir == NULL)
+ {
+ /* anything except ENOENT is fishy */
+ if (errno != ENOENT)
+ elog(LOG,
+ "could not open tablespace directory \"%s\": %m",
+ tsdirname);
+ return;
+ }
+
+ while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
+ {
+ int i = 0;
+
+ /*
+ * We're only interested in the per-database directories, which have
+ * numeric names. Note that this code will also (properly) ignore "."
+ * and "..".
+ */
+ while (isdigit((unsigned char) de->d_name[i]))
+ ++i;
+ if (de->d_name[i] != '\0' || i == 0)
+ continue;
+
+ snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
+ tsdirname, de->d_name);
+ ResetUnloggedRelationsInDbspaceDir(dbspace_path, op);
+ }
+
+ FreeDir(ts_dir);
+}
+
+/* Process one per-dbspace directory for ResetUnloggedRelations */
+static void
+ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
+{
+ DIR *dbspace_dir;
+ struct dirent *de;
+ char rm_path[MAXPGPATH];
+
+ /* Caller must specify at least one operation. */
+ Assert((op & (UNLOGGED_RELATION_CLEANUP | UNLOGGED_RELATION_INIT)) != 0);
+
+ /*
+ * Cleanup is a two-pass operation. First, we go through and identify all
+ * the files with init forks. Then, we go through again and nuke
+ * everything with the same OID except the init fork.
+ */
+ if ((op & UNLOGGED_RELATION_CLEANUP) != 0)
+ {
+ HTAB *hash = NULL;
+ HASHCTL ctl;
+
+ /* Open the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ return;
+ }
+
+ /*
+ * It's possible that someone could create a ton of unlogged relations
+ * in the same database & tablespace, so we'd better use a hash table
+ * rather than an array or linked list to keep track of which files
+ * need to be reset. Otherwise, this cleanup operation would be
+ * O(n^2).
+ */
+ ctl.keysize = sizeof(unlogged_relation_entry);
+ ctl.entrysize = sizeof(unlogged_relation_entry);
+ hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /*
+ * Put the OID portion of the name into the hash table, if it isn't
+ * already.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_ENTER, NULL);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+
+ /*
+ * If we didn't find any init forks, there's no point in continuing;
+ * we can bail out now.
+ */
+ if (hash_get_num_entries(hash) == 0)
+ {
+ hash_destroy(hash);
+ return;
+ }
+
+ /*
+ * Now, make a second pass and remove anything that matches. First,
+ * reopen the directory.
+ */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ hash_destroy(hash);
+ return;
+ }
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ bool found;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* We never remove the init fork. */
+ if (forkNum == INIT_FORKNUM)
+ continue;
+
+ /*
+ * See whether the OID portion of the name shows up in the hash
+ * table.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_FIND, &found);
+
+ /* If so, nuke it! */
+ if (found)
+ {
+ snprintf(rm_path, sizeof(rm_path), "%s/%s",
+ dbspacedirname, de->d_name);
+ /*
+ * It's tempting to actually throw an error here, but since
+ * this code gets run during database startup, that could
+ * result in the database failing to start. (XXX Should we do
+ * it anyway?)
+ */
+ if (unlink(rm_path))
+ elog(LOG, "could not unlink file \"%s\": %m", rm_path);
+ else
+ elog(DEBUG2, "unlinked file \"%s\"", rm_path);
+ }
+ }
+
+ /* Cleanup is complete. */
+ FreeDir(dbspace_dir);
+ hash_destroy(hash);
+ }
+
+ /*
+ * Initialization happens after cleanup is complete: we copy each init
+ * fork file to the corresponding main fork file. Note that if we are
+ * asked to do both cleanup and init, we may never get here: if the cleanup
+ * code determines that there are no init forks in this dbspace, it will
+ * return before we get to this point.
+ */
+ if ((op & UNLOGGED_RELATION_INIT) != 0)
+ {
+ /* Open the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ if (dbspace_dir == NULL)
+ {
+ /* we just saw this directory, so it really ought to be there */
+ elog(LOG,
+ "could not open dbspace directory \"%s\": %m",
+ dbspacedirname);
+ return;
+ }
+
+ /* Scan the directory. */
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ char oidbuf[OIDCHARS+1];
+ char srcpath[MAXPGPATH];
+ char dstpath[MAXPGPATH];
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /* Construct source pathname. */
+ snprintf(srcpath, sizeof(srcpath), "%s/%s",
+ dbspacedirname, de->d_name);
+
+ /* Construct destination pathname. */
+ memcpy(oidbuf, de->d_name, oidchars);
+ oidbuf[oidchars] = '\0';
+ snprintf(dstpath, sizeof(dstpath), "%s/%s%s",
+ dbspacedirname, oidbuf, de->d_name + oidchars + 1 +
+ strlen(forkNames[INIT_FORKNUM]));
+
+ /* OK, we're ready to perform the actual copy. */
+ elog(DEBUG2, "copying %s to %s", srcpath, dstpath);
+ copy_file(srcpath, dstpath);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+ }
+}
+
+/*
+ * Basic parsing of putative relation filenames.
+ *
+ * This funtion returns true if the file appears to be in the correct format
+ * for a non-temporary relation and false otherwise.
+ *
+ * NB: If this function returns true, the caller is entitled to assume that
+ * *oidchars has been set to the a value no more than OIDCHARS, and thus
+ * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
+ * portion of the filename. This is critical to protect against a possible
+ * buffer overrun.
+ */
+static bool
+parse_filename_for_nontemp_relation(const char *name, int *oidchars,
+ ForkNumber *fork)
+{
+ int pos;
+
+ /* Look for a non-empty string of digits (that isn't too long). */
+ for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
+ ;
+ if (pos == 0 || pos > OIDCHARS)
+ return false;
+ *oidchars = pos;
+
+ /* Check for a fork name. */
+ if (name[pos] != '_')
+ *fork = MAIN_FORKNUM;
+ else
+ {
+ int forkchar;
+
+ forkchar = forkname_chars(&name[pos+1], fork);
+ if (forkchar <= 0)
+ return false;
+ pos += forkchar + 1;
+ }
+
+ /* Check for a segment number. */
+ if (name[pos] == '.')
+ {
+ int segchar;
+ for (segchar = 1; isdigit((unsigned char) name[pos+segchar]); ++segchar)
+ ;
+ if (segchar <= 1)
+ return false;
+ pos += segchar;
+ }
+
+ /* Now we should be at the end. */
+ if (name[pos] != '\0')
+ return false;
+ return true;
+}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e352cda..f33c29e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -615,6 +615,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)
/* Determine owning backend. */
switch (relform->relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
backend = InvalidBackendId;
break;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 963eaae..1d10494 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -858,6 +858,7 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
switch (relation->rd_rel->relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
relation->rd_backend = InvalidBackendId;
break;
@@ -2564,6 +2565,7 @@ RelationBuildLocalRelation(const char *relname,
rel->rd_rel->relpersistence = relpersistence;
switch (relpersistence)
{
+ case RELPERSISTENCE_UNLOGGED:
case RELPERSISTENCE_PERMANENT:
rel->rd_backend = InvalidBackendId;
break;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 55ea684..30ca0b2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3447,6 +3447,7 @@ getTables(int *numTables)
int i_relhasrules;
int i_relhasoids;
int i_relfrozenxid;
+ int i_relpersistence;
int i_owning_tab;
int i_owning_col;
int i_reltablespace;
@@ -3477,7 +3478,40 @@ getTables(int *numTables)
* we cannot correctly identify inherited columns, owned sequences, etc.
*/
- if (g_fout->remoteVersion >= 90000)
+ if (g_fout->remoteVersion >= 90100)
+ {
+ /*
+ * Left join to pick up dependency info linking sequences to their
+ * owning column, if any (note this dependency is AUTO as of 8.2)
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.tableoid, c.oid, c.relname, "
+ "c.relacl, c.relkind, c.relnamespace, "
+ "(%s c.relowner) AS rolname, "
+ "c.relchecks, c.relhastriggers, "
+ "c.relhasindex, c.relhasrules, c.relhasoids, "
+ "c.relfrozenxid, c.relpersistence, "
+ "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
+ "d.refobjid AS owning_tab, "
+ "d.refobjsubid AS owning_col, "
+ "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
+ "array_to_string(c.reloptions, ', ') AS reloptions, "
+ "array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ', ') AS toast_reloptions "
+ "FROM pg_class c "
+ "LEFT JOIN pg_depend d ON "
+ "(c.relkind = '%c' AND "
+ "d.classid = c.tableoid AND d.objid = c.oid AND "
+ "d.objsubid = 0 AND "
+ "d.refclassid = c.tableoid AND d.deptype = 'a') "
+ "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "WHERE c.relkind in ('%c', '%c', '%c', '%c') "
+ "ORDER BY c.oid",
+ username_subquery,
+ RELKIND_SEQUENCE,
+ RELKIND_RELATION, RELKIND_SEQUENCE,
+ RELKIND_VIEW, RELKIND_COMPOSITE_TYPE);
+ }
+ else if (g_fout->remoteVersion >= 90000)
{
/*
* Left join to pick up dependency info linking sequences to their
@@ -3489,7 +3523,7 @@ getTables(int *numTables)
"(%s c.relowner) AS rolname, "
"c.relchecks, c.relhastriggers, "
"c.relhasindex, c.relhasrules, c.relhasoids, "
- "c.relfrozenxid, "
+ "c.relfrozenxid, 'p' AS relpersistence, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3522,7 +3556,7 @@ getTables(int *numTables)
"(%s c.relowner) AS rolname, "
"c.relchecks, c.relhastriggers, "
"c.relhasindex, c.relhasrules, c.relhasoids, "
- "c.relfrozenxid, "
+ "c.relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3555,7 +3589,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "relfrozenxid, "
+ "relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3587,7 +3621,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3619,7 +3653,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -3647,7 +3681,7 @@ getTables(int *numTables)
"(%s relowner) AS rolname, "
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3670,7 +3704,7 @@ getTables(int *numTables)
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, "
"'t'::bool AS relhasoids, "
- "0 AS relfrozenxid, "
+ "0 AS relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3703,7 +3737,7 @@ getTables(int *numTables)
"relchecks, (reltriggers <> 0) AS relhastriggers, "
"relhasindex, relhasrules, "
"'t'::bool AS relhasoids, "
- "0 as relfrozenxid, "
+ "0 as relfrozenxid, 'p' AS relpersistence, "
"NULL AS reloftype, "
"NULL::oid AS owning_tab, "
"NULL::int4 AS owning_col, "
@@ -3749,6 +3783,7 @@ getTables(int *numTables)
i_relhasrules = PQfnumber(res, "relhasrules");
i_relhasoids = PQfnumber(res, "relhasoids");
i_relfrozenxid = PQfnumber(res, "relfrozenxid");
+ i_relpersistence = PQfnumber(res, "relpersistence");
i_owning_tab = PQfnumber(res, "owning_tab");
i_owning_col = PQfnumber(res, "owning_col");
i_reltablespace = PQfnumber(res, "reltablespace");
@@ -3783,6 +3818,7 @@ getTables(int *numTables)
tblinfo[i].rolname = strdup(PQgetvalue(res, i, i_rolname));
tblinfo[i].relacl = strdup(PQgetvalue(res, i, i_relacl));
tblinfo[i].relkind = *(PQgetvalue(res, i, i_relkind));
+ tblinfo[i].relpersistence = *(PQgetvalue(res, i, i_relpersistence));
tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0);
@@ -10968,8 +11004,12 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
if (binary_upgrade)
binary_upgrade_set_relfilenodes(q, tbinfo->dobj.catId.oid, false);
- appendPQExpBuffer(q, "CREATE TABLE %s",
- fmtId(tbinfo->dobj.name));
+ if (tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED)
+ appendPQExpBuffer(q, "CREATE UNLOGGED TABLE %s",
+ fmtId(tbinfo->dobj.name));
+ else
+ appendPQExpBuffer(q, "CREATE TABLE %s",
+ fmtId(tbinfo->dobj.name));
if (tbinfo->reloftype)
appendPQExpBuffer(q, " OF %s", tbinfo->reloftype);
actual_atts = 0;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 7885535..4313fd8 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -220,6 +220,7 @@ typedef struct _tableInfo
char *rolname; /* name of owner, or empty string */
char *relacl;
char relkind;
+ char relpersistence; /* relation persistence */
char *reltablespace; /* relation tablespace */
char *reloptions; /* options specified by WITH (...) */
char *toast_reloptions; /* ditto, for the TOAST table */
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c4370a1..207d028 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1118,6 +1118,7 @@ describeOneTableDetails(const char *schemaname,
Oid tablespace;
char *reloptions;
char *reloftype;
+ char relpersistence;
} tableinfo;
bool show_modifiers = false;
bool retval;
@@ -1138,6 +1139,23 @@ describeOneTableDetails(const char *schemaname,
"SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
"c.relhastriggers, c.relhasoids, "
"%s, c.reltablespace, "
+ "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
+ "c.relpersistence\n"
+ "FROM pg_catalog.pg_class c\n "
+ "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+ "WHERE c.oid = '%s'\n",
+ (verbose ?
+ "pg_catalog.array_to_string(c.reloptions || "
+ "array(select 'toast.' || x from pg_catalog.unnest(tc.reloptions) x), ', ')\n"
+ : "''"),
+ oid);
+ }
+ else if (pset.sversion >= 90000)
+ {
+ printfPQExpBuffer(&buf,
+ "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
+ "c.relhastriggers, c.relhasoids, "
+ "%s, c.reltablespace, "
"CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END\n"
"FROM pg_catalog.pg_class c\n "
"LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
@@ -1218,6 +1236,8 @@ describeOneTableDetails(const char *schemaname,
atooid(PQgetvalue(res, 0, 7)) : 0;
tableinfo.reloftype = (pset.sversion >= 90000 && strcmp(PQgetvalue(res, 0, 8), "") != 0) ?
strdup(PQgetvalue(res, 0, 8)) : 0;
+ tableinfo.relpersistence = (pset.sversion >= 90100 && strcmp(PQgetvalue(res, 0, 9), "") != 0) ?
+ PQgetvalue(res, 0, 9)[0] : 0;
PQclear(res);
res = NULL;
@@ -1269,8 +1289,12 @@ describeOneTableDetails(const char *schemaname,
switch (tableinfo.relkind)
{
case 'r':
- printfPQExpBuffer(&title, _("Table \"%s.%s\""),
- schemaname, relationname);
+ if (tableinfo.relpersistence == 'u')
+ printfPQExpBuffer(&title, _("Unlogged Table \"%s.%s\""),
+ schemaname, relationname);
+ else
+ printfPQExpBuffer(&title, _("Table \"%s.%s\""),
+ schemaname, relationname);
break;
case 'v':
printfPQExpBuffer(&title, _("View \"%s.%s\""),
@@ -1281,8 +1305,12 @@ describeOneTableDetails(const char *schemaname,
schemaname, relationname);
break;
case 'i':
- printfPQExpBuffer(&title, _("Index \"%s.%s\""),
- schemaname, relationname);
+ if (tableinfo.relpersistence == 'u')
+ printfPQExpBuffer(&title, _("Unlogged Index \"%s.%s\""),
+ schemaname, relationname);
+ else
+ printfPQExpBuffer(&title, _("Index \"%s.%s\""),
+ schemaname, relationname);
break;
case 's':
/* not used as of 8.2, but keep it for backwards compatibility */
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
index e2d7b45..b1eef92 100644
--- a/src/include/access/gin.h
+++ b/src/include/access/gin.h
@@ -389,6 +389,7 @@ extern void ginUpdateStats(Relation index, const GinStatsData *stats);
/* gininsert.c */
extern Datum ginbuild(PG_FUNCTION_ARGS);
+extern Datum ginbuildempty(PG_FUNCTION_ARGS);
extern Datum gininsert(PG_FUNCTION_ARGS);
extern void ginEntryInsert(Relation index, GinState *ginstate,
OffsetNumber attnum, Datum value,
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 34cc5d5..1853696 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -235,6 +235,7 @@ typedef struct
/* gist.c */
extern Datum gistbuild(PG_FUNCTION_ARGS);
+extern Datum gistbuildempty(PG_FUNCTION_ARGS);
extern Datum gistinsert(PG_FUNCTION_ARGS);
extern MemoryContext createTempGistContext(void);
extern void initGISTstate(GISTSTATE *giststate, Relation index);
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index d5899f4..52d1c93 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -242,6 +242,7 @@ typedef HashMetaPageData *HashMetaPage;
/* public routines */
extern Datum hashbuild(PG_FUNCTION_ARGS);
+extern Datum hashbuildempty(PG_FUNCTION_ARGS);
extern Datum hashinsert(PG_FUNCTION_ARGS);
extern Datum hashbeginscan(PG_FUNCTION_ARGS);
extern Datum hashgettuple(PG_FUNCTION_ARGS);
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 3bbc4d1..283612e 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -555,6 +555,7 @@ typedef BTScanOpaqueData *BTScanOpaque;
* prototypes for functions in nbtree.c (external entry points for btree)
*/
extern Datum btbuild(PG_FUNCTION_ARGS);
+extern Datum btbuildempty(PG_FUNCTION_ARGS);
extern Datum btinsert(PG_FUNCTION_ARGS);
extern Datum btbeginscan(PG_FUNCTION_ARGS);
extern Datum btgettuple(PG_FUNCTION_ARGS);
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 56dcdd5..40cb9ff 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -25,7 +25,7 @@
extern const char *forkNames[];
extern ForkNumber forkname_to_number(char *forkName);
-extern int forkname_chars(const char *str);
+extern int forkname_chars(const char *str, ForkNumber *);
extern char *relpathbackend(RelFileNode rnode, BackendId backend,
ForkNumber forknum);
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index c9b8e2d..e4d2c39 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -59,6 +59,7 @@ CATALOG(pg_am,2601)
regproc ammarkpos; /* "mark current scan position" function */
regproc amrestrpos; /* "restore marked scan position" function */
regproc ambuild; /* "build new index" function */
+ regproc ambuildempty; /* "build empty index" function */
regproc ambulkdelete; /* bulk-delete function */
regproc amvacuumcleanup; /* post-VACUUM cleanup function */
regproc amcostestimate; /* estimate cost of an indexscan */
@@ -76,7 +77,7 @@ typedef FormData_pg_am *Form_pg_am;
* compiler constants for pg_am
* ----------------
*/
-#define Natts_pg_am 26
+#define Natts_pg_am 27
#define Anum_pg_am_amname 1
#define Anum_pg_am_amstrategies 2
#define Anum_pg_am_amsupport 3
@@ -99,26 +100,27 @@ typedef FormData_pg_am *Form_pg_am;
#define Anum_pg_am_ammarkpos 20
#define Anum_pg_am_amrestrpos 21
#define Anum_pg_am_ambuild 22
-#define Anum_pg_am_ambulkdelete 23
-#define Anum_pg_am_amvacuumcleanup 24
-#define Anum_pg_am_amcostestimate 25
-#define Anum_pg_am_amoptions 26
+#define Anum_pg_am_ambuildempty 23
+#define Anum_pg_am_ambulkdelete 24
+#define Anum_pg_am_amvacuumcleanup 25
+#define Anum_pg_am_amcostestimate 26
+#define Anum_pg_am_amoptions 27
/* ----------------
* initial contents of pg_am
* ----------------
*/
-DATA(insert OID = 403 ( btree 5 1 t t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions ));
+DATA(insert OID = 403 ( btree 5 1 t t t t t t t f t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcostestimate btoptions ));
DESCR("b-tree index access method");
#define BTREE_AM_OID 403
-DATA(insert OID = 405 ( hash 1 1 f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
+DATA(insert OID = 405 ( hash 1 1 f t f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
DESCR("hash index access method");
#define HASH_AM_OID 405
-DATA(insert OID = 783 ( gist 0 7 f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
+DATA(insert OID = 783 ( gist 0 7 f f f t t t t t t 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
DESCR("GiST index access method");
#define GIST_AM_OID 783
-DATA(insert OID = 2742 ( gin 0 5 f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
+DATA(insert OID = 2742 ( gin 0 5 f f f t t f f t f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
DESCR("GIN index access method");
#define GIN_AM_OID 2742
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 1edbfe3..39f9743 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -150,6 +150,7 @@ DESCR("");
#define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
#define RELPERSISTENCE_PERMANENT 'p'
+#define RELPERSISTENCE_UNLOGGED 'u'
#define RELPERSISTENCE_TEMP 't'
#endif /* PG_CLASS_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 4f444ae..a565038 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -689,6 +689,8 @@ DATA(insert OID = 337 ( btrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 227
DESCR("btree(internal)");
DATA(insert OID = 338 ( btbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ btbuild _null_ _null_ _null_ ));
DESCR("btree(internal)");
+DATA(insert OID = 328 ( btbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ btbuildempty _null_ _null_ _null_ ));
+DESCR("btree(internal)");
DATA(insert OID = 332 ( btbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ btbulkdelete _null_ _null_ _null_ ));
DESCR("btree(internal)");
DATA(insert OID = 972 ( btvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ btvacuumcleanup _null_ _null_ _null_ ));
@@ -808,6 +810,8 @@ DATA(insert OID = 447 ( hashrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("hash(internal)");
DATA(insert OID = 448 ( hashbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ hashbuild _null_ _null_ _null_ ));
DESCR("hash(internal)");
+DATA(insert OID = 327 ( hashbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ hashbuildempty _null_ _null_ _null_ ));
+DESCR("hash(internal)");
DATA(insert OID = 442 ( hashbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ hashbulkdelete _null_ _null_ _null_ ));
DESCR("hash(internal)");
DATA(insert OID = 425 ( hashvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ hashvacuumcleanup _null_ _null_ _null_ ));
@@ -1104,6 +1108,8 @@ DATA(insert OID = 781 ( gistrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("gist(internal)");
DATA(insert OID = 782 ( gistbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ gistbuild _null_ _null_ _null_ ));
DESCR("gist(internal)");
+DATA(insert OID = 326 ( gistbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ gistbuildempty _null_ _null_ _null_ ));
+DESCR("gist(internal)");
DATA(insert OID = 776 ( gistbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ gistbulkdelete _null_ _null_ _null_ ));
DESCR("gist(internal)");
DATA(insert OID = 2561 ( gistvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ gistvacuumcleanup _null_ _null_ _null_ ));
@@ -4337,6 +4343,8 @@ DATA(insert OID = 2737 ( ginrestrpos PGNSP PGUID 12 1 0 0 f f f t f v 1 0 22
DESCR("gin(internal)");
DATA(insert OID = 2738 ( ginbuild PGNSP PGUID 12 1 0 0 f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ ginbuild _null_ _null_ _null_ ));
DESCR("gin(internal)");
+DATA(insert OID = 325 ( ginbuildempty PGNSP PGUID 12 1 0 0 f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ ginbuildempty _null_ _null_ _null_ ));
+DESCR("gin(internal)");
DATA(insert OID = 2739 ( ginbulkdelete PGNSP PGUID 12 1 0 0 f f f t f v 4 0 2281 "2281 2281 2281 2281" _null_ _null_ _null_ _null_ ginbulkdelete _null_ _null_ _null_ ));
DESCR("gin(internal)");
DATA(insert OID = 2740 ( ginvacuumcleanup PGNSP PGUID 12 1 0 0 f f f t f v 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ ginvacuumcleanup _null_ _null_ _null_ ));
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 2c44cf7..3b038a0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -388,6 +388,7 @@ PG_KEYWORD("union", UNION, RESERVED_KEYWORD)
PG_KEYWORD("unique", UNIQUE, RESERVED_KEYWORD)
PG_KEYWORD("unknown", UNKNOWN, UNRESERVED_KEYWORD)
PG_KEYWORD("unlisten", UNLISTEN, UNRESERVED_KEYWORD)
+PG_KEYWORD("unlogged", UNLOGGED, UNRESERVED_KEYWORD)
PG_KEYWORD("until", UNTIL, UNRESERVED_KEYWORD)
PG_KEYWORD("update", UPDATE, UNRESERVED_KEYWORD)
PG_KEYWORD("user", USER, RESERVED_KEYWORD)
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 62d15cc..ebf6855 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -203,7 +203,7 @@
* Enable debugging print statements for WAL-related operations; see
* also the wal_debug GUC var.
*/
-/* #define WAL_DEBUG */
+#define WAL_DEBUG
/*
* Enable tracing of resource consumption during sort operations;
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index 194d98e..ef60082 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -14,5 +14,6 @@
#define COPYDIR_H
extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void copy_file(char *fromfile, char *tofile);
#endif /* COPYDIR_H */
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
new file mode 100644
index 0000000..9999dff
--- /dev/null
+++ b/src/include/storage/reinit.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * reinit.h
+ * Reinitialization of unlogged relations
+ *
+ *
+ * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/storage/fd.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REINIT_H
+#define REINIT_H
+
+extern void ResetUnloggedRelations(int op);
+
+#define UNLOGGED_RELATION_CLEANUP 0x0001
+#define UNLOGGED_RELATION_INIT 0x0002
+
+#endif /* REINIT_H */
diff --git a/src/include/storage/relfilenode.h b/src/include/storage/relfilenode.h
index 24a72e6..f71b233 100644
--- a/src/include/storage/relfilenode.h
+++ b/src/include/storage/relfilenode.h
@@ -27,7 +27,8 @@ typedef enum ForkNumber
InvalidForkNumber = -1,
MAIN_FORKNUM = 0,
FSM_FORKNUM,
- VISIBILITYMAP_FORKNUM
+ VISIBILITYMAP_FORKNUM,
+ INIT_FORKNUM
/*
* NOTE: if you add a new fork, change MAX_FORKNUM below and update the
@@ -35,7 +36,7 @@ typedef enum ForkNumber
*/
} ForkNumber;
-#define MAX_FORKNUM VISIBILITYMAP_FORKNUM
+#define MAX_FORKNUM INIT_FORKNUM
/*
* RelFileNode must provide all that we need to know to physically access
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 8474d8f..d952d6b 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -114,6 +114,7 @@ typedef struct RelationAmInfo
FmgrInfo ammarkpos;
FmgrInfo amrestrpos;
FmgrInfo ambuild;
+ FmgrInfo ambuildempty;
FmgrInfo ambulkdelete;
FmgrInfo amvacuumcleanup;
FmgrInfo amcostestimate;
relax-sync-commit-v1.patchapplication/octet-stream; name=relax-sync-commit-v1.patchDownload
commit bdd697e5f0a16db2a672e5e14d11744958364101
Author: Robert Haas <rhaas@postgresql.org>
Date: Sat Nov 13 09:52:11 2010 -0500
Assume synchronous_commit=off for transactions that don't write WAL.
This is advantageous for transactions that write only to temporary or
unlogged tables, where loss of the transaction commit record is not
critical.
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d2e2e11..088daa0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -907,6 +907,7 @@ RecordTransactionCommit(void)
int nmsgs = 0;
SharedInvalidationMessage *invalMessages = NULL;
bool RelcacheInitFileInval = false;
+ bool wrote_xlog;
/* Get data needed for commit record */
nrels = smgrGetPendingDeletes(true, &rels);
@@ -914,6 +915,7 @@ RecordTransactionCommit(void)
if (XLogStandbyInfoActive())
nmsgs = xactGetCommittedInvalidationMessages(&invalMessages,
&RelcacheInitFileInval);
+ wrote_xlog = (XactLastRecEnd.xrecoff != 0);
/*
* If we haven't been assigned an XID yet, we neither can, nor do we want
@@ -940,7 +942,7 @@ RecordTransactionCommit(void)
* assigned is a sequence advance record due to nextval() --- we want
* to flush that to disk before reporting commit.)
*/
- if (XactLastRecEnd.xrecoff == 0)
+ if (!wrote_xlog)
goto cleanup;
}
else
@@ -1028,16 +1030,21 @@ RecordTransactionCommit(void)
}
/*
- * Check if we want to commit asynchronously. If the user has set
- * synchronous_commit = off, and we're not doing cleanup of any non-temp
- * rels nor committing any command that wanted to force sync commit, then
- * we can defer flushing XLOG. (We must not allow asynchronous commit if
- * there are any non-temp tables to be deleted, because we might delete
- * the files before the COMMIT record is flushed to disk. We do allow
- * asynchronous commit if all to-be-deleted tables are temporary though,
- * since they are lost anyway if we crash.)
+ * Check if we want to commit asynchronously. If we're doing cleanup of
+ * any non-temp rels or committing any command that wanted to force sync
+ * commit, then we must flush XLOG immediately. (We must not allow
+ * asynchronous commit if there are any non-temp tables to be deleted,
+ * because we might delete the files before the COMMIT record is flushed to
+ * disk. We do allow asynchronous commit if all to-be-deleted tables are
+ * temporary though, since they are lost anyway if we crash.) Otherwise,
+ * we can defer the flush if either (1) the user has set synchronous_commit
+ * = off, or (2) the current transaction has not performed any WAL-logged
+ * operation. This latter case can arise if the only writes performed by
+ * the current transaction target temporary or unlogged relations. Loss
+ * of such a transaction won't matter anyway, because temp tables will be
+ * lost after a crash anyway, and unlogged ones will be truncated.
*/
- if (XactSyncCommit || forceSyncCommit || nrels > 0)
+ if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0)
{
/*
* Synchronous commit case:
Robert Haas <robertmhaas@gmail.com> writes:
2. The second one (unlogged-tables-v1) adds support for unlogged
tables by adding a new supported value for relpersistence. I made this
work by having backend that creates an unlogged relation write out an
"init" fork for that relation. The main fork is nuked and replaced by
the contents of the init fork during startup. But I haven't made this
code work yet for index types other than btree, so attempting to
define a non-btree index on an unlogged relation will currently result
in an error. I don't think that's probably too hard to fix, but I
haven't done it yet.
That seems pretty gross. If you're going to have to take a special
action at startup anyway, why wouldn't it take the form of "truncate,
then if it's an index, call the appropriate ambuild function"? Maybe
that's a bit ugly, but at least the ugliness is localized rather than
scribbled all over the filesystem. I'm also concerned about possible
failure modes having to do with the "init fork" being missing or
corrupted.
regards, tom lane
On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
2. The second one (unlogged-tables-v1) adds support for unlogged
tables by adding a new supported value for relpersistence. I made this
work by having backend that creates an unlogged relation write out an
"init" fork for that relation. The main fork is nuked and replaced by
the contents of the init fork during startup. But I haven't made this
code work yet for index types other than btree, so attempting to
define a non-btree index on an unlogged relation will currently result
in an error. I don't think that's probably too hard to fix, but I
haven't done it yet.That seems pretty gross. If you're going to have to take a special
action at startup anyway, why wouldn't it take the form of "truncate,
then if it's an index, call the appropriate ambuild function"?
We've been over this ground before. You can't read from non-shared
catalogs without binding to a database, and you must reinitialize all
unlogged relations before opening the database for a connection. So
what you're proposing would involving launching a worker process for
each database in the cluster, having it do its thing and then exit,
and only after all that's done opening the database for connections.
That seems vastly more complex and less performant than what I've done
here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
Here is a series of three patches related to unlogged tables.
1. The first one (relpersistence-v1) is a mostly mechanical patch that
replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
(a character), so that we can support more than two values. BE SURE
YOU INITDB, since the old catalog format will not work with this patch
applied.
While I'm griping ... is there a really good reason to do it that way,
rather than adding a new column? This will break clients that are
looking at relistemp. Maybe there aren't any, but I wouldn't bet on
that, and it doesn't seem like you're buying a lot by creating this
incompatibility. I would also argue that temp-ness is a distinct
concept from logged-ness.
regards, tom lane
On 11/13/2010 07:59 PM, Tom Lane wrote:
I would also argue that temp-ness is a distinct
concept from logged-ness.
I agree.
cheers
andrew
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
That seems pretty gross. �If you're going to have to take a special
action at startup anyway, why wouldn't it take the form of "truncate,
then if it's an index, call the appropriate ambuild function"?
We've been over this ground before. You can't read from non-shared
catalogs without binding to a database, and you must reinitialize all
unlogged relations before opening the database for a connection. So
what you're proposing would involving launching a worker process for
each database in the cluster, having it do its thing and then exit,
and only after all that's done opening the database for connections.
That seems vastly more complex and less performant than what I've done
here.
The fact that it's easy doesn't make it workable. I would point out for
starters that AMs might (do) put WAL locations and/or XIDs into indexes.
Occasionally copying very old LSNs or XIDs back into active files seems
pretty dangerous.
Cleanup at first connection is something we've been avoiding for years,
but maybe it's time to bite the bullet and do that?
BTW, how will all of this activity look to a hot-standby slave?
regards, tom lane
On Sat, Nov 13, 2010 at 7:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Here is a series of three patches related to unlogged tables.
1. The first one (relpersistence-v1) is a mostly mechanical patch that
replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
(a character), so that we can support more than two values. BE SURE
YOU INITDB, since the old catalog format will not work with this patch
applied.While I'm griping ... is there a really good reason to do it that way,
rather than adding a new column? This will break clients that are
looking at relistemp. Maybe there aren't any, but I wouldn't bet on
that, and it doesn't seem like you're buying a lot by creating this
incompatibility. I would also argue that temp-ness is a distinct
concept from logged-ness.
I think that would be a recipe for bugs. Look at the three new macros
I introduced. If you keep relistemp around, then any code which
relies on it is likely testing for one of those three things, or maybe
even something subtly different from any of them, as in the cases
where I needed to add a switch statement. The way I see it, this is
ultimately a four-level hierarchy: permanent tables (write WAL, shared
buffers, ordinary namespace), unlogged tables (don't write WAL, shared
buffers, ordinary namespace), global temporary tables (don't write
WAL, local buffers, ordinary namespace), local temporary tables (don't
write WAL, local buffers, private namespace). Even if we don't end up
implementing global temporary tables in the way I'm envisioning (I
know you have an alternate proposal), it seem inevitable that a
boolean for relistemp isn't going to be sufficient.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Nov 13, 2010 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
That seems pretty gross. If you're going to have to take a special
action at startup anyway, why wouldn't it take the form of "truncate,
then if it's an index, call the appropriate ambuild function"?We've been over this ground before. You can't read from non-shared
catalogs without binding to a database, and you must reinitialize all
unlogged relations before opening the database for a connection. So
what you're proposing would involving launching a worker process for
each database in the cluster, having it do its thing and then exit,
and only after all that's done opening the database for connections.
That seems vastly more complex and less performant than what I've done
here.The fact that it's easy doesn't make it workable. I would point out for
starters that AMs might (do) put WAL locations and/or XIDs into indexes.
Occasionally copying very old LSNs or XIDs back into active files seems
pretty dangerous.
I haven't examined the GIST, GIN, or hash index code in detail so I am
not sure whether there are any hazards there; the btree case does not
seem to have any issues of this type. Certainly, if an index AM puts
an XID into an empty index, that's gonna break. I would consider that
a pretty odd thing to do, though. An LSN seems less problematic since
the LSN space does not wrap; it should just look like an index that
was created a long time ago and never updated (which, in effect, it
is).
Cleanup at first connection is something we've been avoiding for years,
but maybe it's time to bite the bullet and do that?
There would certainly be some advantage to doing cleanup at first
connection even if we stick with the overall approach I've adopted
here, because you could avoid the overhead of cleaning up databases
that are never actually accessed. There are a few downsides, though.
If you happened to leave a large amount of unlogged data on disk after
a crash, and then for some reason never connected to that database
again, you'd never reclaim that disk space; although perhaps you could
somehow arrange for autovacuum to clean up in that case. Also, the
first connection to the offending database would need to lock out all
other connections until cleanup was completed, although I suppose
that's still better than doing the cleanup in the startup process as
is presently the case. I guess the main problem is you'd need a
reliable and *inexpensive* way of identifying the first connection to
each database. Paying something extra at startup time is better than
paying even a small penalty on each individual connection; goodness
knows our connections are expensive enough already.
BTW, how will all of this activity look to a hot-standby slave?
The table will appear to exist but you'll get an error if you try to
access it. I think at present it'll complain about the underying
files being missing; that could probably be fine-tuned if we're so
inclined.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Cleanup at first connection is something we've been avoiding for years,
but maybe it's time to bite the bullet and do that?
Another alternative is to initialize the unlogged tables when you
first access them. If you try to open a table and there are no files
attached them go ahead and initialize it by creating an empty table
and building any indexes.
Hm, I had been assuming recovery would be responsible for cleaning up
the tables even if the first access is responsible for rebuilding
them. But there's a chance there have been no modifications to them
since the last checkpoint. But in that case the data in them is fine.
It would be a weird interface if it only cleared them out sometimes
based on unpredictable timing though. Avoiding that does require some
kind of alternate storage scheme other than the WAL to indicate what
needs to be cleared out. .init files are as good a mechanism even if
they just mean "unlink this file on startup".
--
greg
On Sat, Nov 13, 2010 at 9:17 PM, Greg Stark <gsstark@mit.edu> wrote:
On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Cleanup at first connection is something we've been avoiding for years,
but maybe it's time to bite the bullet and do that?Another alternative is to initialize the unlogged tables when you
first access them. If you try to open a table and there are no files
attached them go ahead and initialize it by creating an empty table
and building any indexes.
I thought about that (I've thought about a lot of things in regards to
this feature...). One problem is that you presumably will need to
open the relation before you can decide whether this is the first
access since restart. But by the time you've opened them, you've
already taken an AccessShareLock, and you'll presumably need something
a whole lot stronger than that to do the rebuild. Lock upgrades are
usually a good thing to avoid when possible, although maybe it would
be OK in this case, not sure. Another problem is that it's not too
clear to me where you'd hook in the logic to do the cleanup. The
relcache code seems like an awfully low-level place to be trying to
perpetrate this sort of monkey business.
Hm, I had been assuming recovery would be responsible for cleaning up
the tables even if the first access is responsible for rebuilding
them. But there's a chance there have been no modifications to them
since the last checkpoint. But in that case the data in them is fine.
It would be a weird interface if it only cleared them out sometimes
based on unpredictable timing though. Avoiding that does require some
kind of alternate storage scheme other than the WAL to indicate what
needs to be cleared out. .init files are as good a mechanism even if
they just mean "unlink this file on startup".
One idea I had was to trigger the rebuild when we notice that the main
relation fork is missing. Then the startup code can just notice the
init fork, annihilate everything else, and call it good. However, this
appears to require modifying some fairly fundamental assumptions of
the current system. smgr.c/md.c believe that nobody should ever try
to read a nonexistent block, and unconditionally throw an error if the
caller tries to do so. You could provide a mode where they don't do
that, and instead return an error indication to the caller. Then you
could add an additional ReadBuffer mode, say RBM_FAIL, to let the
error percolate back up through that layer to the index AM or heap
code, which could then try to upgrade its lock and recreate the main
fork. However, I really couldn't work up much enthusiasm for
implementing this feature in a way that requires drilling a hole in
the abstraction stack from top to bottom.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Nov 14, 2010 at 02:16, Robert Haas <robertmhaas@gmail.com> wrote:
Here is a series of three patches related to unlogged tables.
Just wondering, have you thought of any mechanisms how application
code might detect that an unlogged table was truncated due to restart?
While polling with something like "SELECT 1 FROM table LIMIT 1" might
work, it's an awful hack.
One obvious use case for these unlogged tables would be materalized
views. I think it would be useful to execute e.g. a TRUNCATE trigger
so that an the view could be initialized. If an exclusive lock were
passed on to the trigger procedure, this could even be done in a
race-condition-free manner as far as I can tell.
Would there be a problem with invoking this trigger from the session
that first touches the table?
Regards,
Marti
On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti@juffo.org> wrote:
On Sun, Nov 14, 2010 at 02:16, Robert Haas <robertmhaas@gmail.com> wrote:
Here is a series of three patches related to unlogged tables.
Just wondering, have you thought of any mechanisms how application
code might detect that an unlogged table was truncated due to restart?
While polling with something like "SELECT 1 FROM table LIMIT 1" might
work, it's an awful hack.One obvious use case for these unlogged tables would be materalized
views. I think it would be useful to execute e.g. a TRUNCATE trigger
so that an the view could be initialized. If an exclusive lock were
passed on to the trigger procedure, this could even be done in a
race-condition-free manner as far as I can tell.Would there be a problem with invoking this trigger from the session
that first touches the table?
Yeah, this infrastructure doesn't really allow that. The truncate
happens way too early on in startup to execute any user-provided code.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Marti Raudsepp <marti@juffo.org> writes:
Would there be a problem with invoking this trigger from the session
that first touches the table?
Other than security?
regards, tom lane
On Mon, Nov 15, 2010 at 18:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marti Raudsepp <marti@juffo.org> writes:
Would there be a problem with invoking this trigger from the session
that first touches the table?Other than security?
Right, I guess that would only make sense with SECURITY DEFINER.
On Mon, Nov 15, 2010 at 18:22, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti@juffo.org> wrote:
Just wondering, have you thought of any mechanisms how application
code might detect that an unlogged table was truncated due to restart?
Yeah, this infrastructure doesn't really allow that. The truncate
happens way too early on in startup to execute any user-provided code.
The truncate itself can be performed early and set a flag somewhere
that would invoke a trigger on the first access. I suppose it cannot
be called a "truncate trigger" then.
Or maybe provide hooks for pgAgent instead?
Do you see any alternatives to be notified of unlogged table
truncates? Without notification, this feature would seem to have
limited usefulness.
Regards,
Marti
On Mon, Nov 15, 2010 at 12:02 PM, Marti Raudsepp <marti@juffo.org> wrote:
On Mon, Nov 15, 2010 at 18:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marti Raudsepp <marti@juffo.org> writes:
Would there be a problem with invoking this trigger from the session
that first touches the table?Other than security?
Right, I guess that would only make sense with SECURITY DEFINER.
On Mon, Nov 15, 2010 at 18:22, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti@juffo.org> wrote:
Just wondering, have you thought of any mechanisms how application
code might detect that an unlogged table was truncated due to restart?Yeah, this infrastructure doesn't really allow that. The truncate
happens way too early on in startup to execute any user-provided code.The truncate itself can be performed early and set a flag somewhere
that would invoke a trigger on the first access. I suppose it cannot
be called a "truncate trigger" then.Or maybe provide hooks for pgAgent instead?
Do you see any alternatives to be notified of unlogged table
truncates? Without notification, this feature would seem to have
limited usefulness.
Well, you're only monitoring for a server restart. That's probably
something you need a way to monitor for anyway. I don't think we have
a function that exposes the time of the last server restart at the SQL
level, but maybe we should. You can monitor for it by watching the
logs, of course.
This is really intended for things like caches of session information
where loss is annoying (because users have to log back into the
webapp, or whatever) but not so critical that we want to take a
performance penalty to prevent it. It will also be helpful to people
who want to make PG run very very quickly even at the risk of data
loss, as in the recent discussion on -performance and some
conversations I had at PG West; it provides a more structured, and
hopefully also more performant, alternative to turning off fsync,
full_page_writes, and synchronous commit. For some such apps, it may
be sufficient to check for truncating at each reconnect, which will be
a whole lot easier than what they have to do now (which is rebuild the
entire cluster every time PG restarts).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Nov 15, 2010 at 11:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, this infrastructure doesn't really allow that. The truncate
happens way too early on in startup to execute any user-provided code.
But you could use the very feature of unlogged tables to know if
you've "initialized" some unlogged table by using an unlogged table to
note the initilization.
If the value you expect isn't in your "noted" table, you know that
it's been truncated...
Sure, it's "app side", but the hole point of unlogged tables it to
allow optimzations when the "appside" knows the data's dispensable and
rebuild-able.
a.
--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.
On lör, 2010-11-13 at 19:16 -0500, Robert Haas wrote:
1. The first one (relpersistence-v1) is a mostly mechanical patch that
replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
(a character), so that we can support more than two values. BE SURE
YOU INITDB, since the old catalog format will not work with this patch
applied.
Btw., I would recommend that even in-progress or proposed patches
include catversion updates, which helps communicate the message such as
yours in a more robust manner and also reduces the chance of forgetting
the catversion change in the final commit.
On Tue, Nov 16, 2010 at 2:49 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On lör, 2010-11-13 at 19:16 -0500, Robert Haas wrote:
1. The first one (relpersistence-v1) is a mostly mechanical patch that
replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
(a character), so that we can support more than two values. BE SURE
YOU INITDB, since the old catalog format will not work with this patch
applied.Btw., I would recommend that even in-progress or proposed patches
include catversion updates, which helps communicate the message such as
yours in a more robust manner and also reduces the chance of forgetting
the catversion change in the final commit.
I thought we had a policy of NOT doing that, because of the merge
conflicts thereby created. It's also hard to know what value to set
it to; whatever you pick will certainly be obsolete by commit time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On tis, 2010-11-16 at 15:08 -0500, Robert Haas wrote:
Btw., I would recommend that even in-progress or proposed patches
include catversion updates, which helps communicate the message suchas
yours in a more robust manner and also reduces the chance of
forgetting
the catversion change in the final commit.
I thought we had a policy of NOT doing that, because of the merge
conflicts thereby created.
I don't know, but I for one *want* the merge conflict, because if I'm
actually merging two diverging lines of system catalog changes, then I
had better stop and think about it.
It's also hard to know what value to set
it to; whatever you pick will certainly be obsolete by commit time.
Well, the most important thing is that it's different from the last
value, but I have occasionally wondered about a way to support tagging
branches separately.
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Nov 16, 2010 at 2:49 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
Btw., I would recommend that even in-progress or proposed patches
include catversion updates,
I thought we had a policy of NOT doing that, because of the merge
conflicts thereby created. It's also hard to know what value to set
it to; whatever you pick will certainly be obsolete by commit time.
Well, my expectation would be that the committer would reset the
catversion to current date when it goes into master. The question is
whether such a practice would be (a) helpful to testers and/or (b)
useful to the committer.
As for (a), it likely would be, except that a patch that's not very
recent is almost certainly going to get a merge failure here when the
tester tries to apply it locally. That doesn't really seem like a gain.
Still, I can see the point of forcing an initdb when needed.
As for (b), I'm not sure I buy Peter's argument about a merge conflict
on that being a helpful flag. I don't see any reason to think that
system catalog changes are much more (or less) likely to result in
hidden merge conflicts than any other type of change. I'm not
personally going to rely on a submitter's determination of whether a
catversion bump is needed anyhow.
So I lean towards being happy with the current approach, though I could
be talked into the other given a better argument for it.
regards, tom lane
On tis, 2010-11-16 at 16:04 -0500, Tom Lane wrote:
Well, my expectation would be that the committer would reset the
catversion to current date when it goes into master. The question is
whether such a practice would be (a) helpful to testers and/or (b)
useful to the committer.
As with most such things, it's a matter of personal preference. I
started doing this out of necessity a while ago, and it has turned out
to be very helpful.
As for (a), it likely would be, except that a patch that's not very
recent is almost certainly going to get a merge failure here when the
tester tries to apply it locally. That doesn't really seem like a gain.
Arguably, it means that the patch should be updated. At least, it's a
warning sign to the reviewer.
Still, I can see the point of forcing an initdb when needed.
Especially because it prevents novice patch reviewers from mixing
mismatching source and data directories and wasting time on obscure
"bugs".
As for (b), I'm not sure I buy Peter's argument about a merge conflict
on that being a helpful flag. I don't see any reason to think that
system catalog changes are much more (or less) likely to result in
hidden merge conflicts than any other type of change.
Actually, in a recent sample, the likelihood for a hidden merge conflict
in near 100% because different patches keep reassigning the same OID for
new database objects.
In addition, there is the Git philosophy argument that every branch
should stand on its own. If more than one person collaborates on a
branch for more than one week, all the original reasons for having the
catversion in the first place come back into play. And so while I do
not wish to be radical about requiring catversion updates in random
patches, we should recognize the possibility that catversion updates
outside of the mainline are reasonable.
On 14.11.2010 02:16, Robert Haas wrote:
3. The third patch (relax-sync-commit-v1) allows asynchronous commit
even when synchronous_commit=on if the transaction has not written
WAL. Of course, a read-only transaction won't even have an XID and
therefore won't need a commit record, so what this is really doing is
allowing transactions that have written only to temp - or unlogged -
tables to commit asynchronously. This should be OK, because if the
system crashes before the commit record hits the disk, we haven't
really lost anything we wouldn't lose anyway: the temp tables will
disappear on restart, and the unlogged ones will be truncated. This
path actually could be applied independently of the first two, if I
adjusted the comments a bit.
Looks ok. I'd suggest rewording this comment though:
/*
* Check if we want to commit asynchronously. If we're doing cleanup of
* any non-temp rels or committing any command that wanted to force sync
* commit, then we must flush XLOG immediately. (We must not allow
* asynchronous commit if there are any non-temp tables to be deleted,
* because we might delete the files before the COMMIT record is flushed to
* disk. We do allow asynchronous commit if all to-be-deleted tables are
* temporary though, since they are lost anyway if we crash.) Otherwise,
* we can defer the flush if either (1) the user has set synchronous_commit
* = off, or (2) the current transaction has not performed any WAL-logged
* operation. This latter case can arise if the only writes performed by
* the current transaction target temporary or unlogged relations. Loss
* of such a transaction won't matter anyway, because temp tables will be
* lost after a crash anyway, and unlogged ones will be truncated.
*/
It's a bit hard to follow, as it first lists exceptions on when we must
flush XLOG immediately, and then lists conditions on when we can skip
it. How about:
/*
* Check if we can commit asynchronously. We can skip flushing the XLOG
* if synchronous_commit=off, or if the current transaction has not
* performed any WAL-logged operation. The latter case can arise if the
* only writes performed by the current transaction target temporary or
* unlogged relations. Loss of such a transaction won't matter anyway,
* because temp tables will be lost after a crash anyway, and unlogged
* ones will be truncated.
*
* However, if we're doing cleanup of any non-temp rels or committing
* any command that wanted to force sync commit, then we must flush
* XLOG immediately anyway. (We must not allow asynchronous commit if
* there are any non-temp tables to be deleted, because we might delete
* the files before the COMMIT record is flushed to disk. We do allow
* asynchronous commit if all to-be-deleted tables are temporary
* though, since they are lost anyway if we crash.)
*/
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Sat, 2010-11-13 at 20:55 -0500, Robert Haas wrote:
I think that would be a recipe for bugs. Look at the three new macros
I introduced. If you keep relistemp around, then any code which
relies on it is likely testing for one of those three things, or maybe
even something subtly different from any of them, as in the cases
where I needed to add a switch statement. The way I see it, this is
ultimately a four-level hierarchy
That argument isn't clear enough to avoid me agreeing so far with Tom
and Andrew that logged-ness is separate from temp-ness. As you say
though, it might be a recipe for bugs, so please explain a little more.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Sat, 2010-11-13 at 19:16 -0500, Robert Haas wrote:
3. The third patch (relax-sync-commit-v1) allows asynchronous commit
even when synchronous_commit=on if the transaction has not written
WAL. Of course, a read-only transaction won't even have an XID and
therefore won't need a commit record, so what this is really doing is
allowing transactions that have written only to temp - or unlogged -
tables to commit asynchronously.
I like this, great idea.
Avoiding the commit record entirely will break Hot Standby though, since
we rely on the assumption that all xids that are assigned are also
logged. The xids would be "known assigned", yet since they never
actually appear they will clog up the machinery (pun unintended).
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Dec 15, 2010 at 4:20 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Sat, 2010-11-13 at 19:16 -0500, Robert Haas wrote:
3. The third patch (relax-sync-commit-v1) allows asynchronous commit
even when synchronous_commit=on if the transaction has not written
WAL. Of course, a read-only transaction won't even have an XID and
therefore won't need a commit record, so what this is really doing is
allowing transactions that have written only to temp - or unlogged -
tables to commit asynchronously.I like this, great idea.
Avoiding the commit record entirely will break Hot Standby though, since
we rely on the assumption that all xids that are assigned are also
logged. The xids would be "known assigned", yet since they never
actually appear they will clog up the machinery (pun unintended).
Uggh, that's a really, really bad pun.
I made the same observation to Tom somewhere-or-other (must have been
a different thread because I don't see it on this one), along with the
further observation that we actually could suppress the commit record
entirely if wal_level < hot_standby, but I'm not sure there's enough
benefit to doing that to worry about the additional complexity.
Changing it from a foreground flush to a background flush already wins
so much that I don't really see the point of doing anything further.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Dec 15, 2010 at 4:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Sat, 2010-11-13 at 20:55 -0500, Robert Haas wrote:
I think that would be a recipe for bugs. Look at the three new macros
I introduced. If you keep relistemp around, then any code which
relies on it is likely testing for one of those three things, or maybe
even something subtly different from any of them, as in the cases
where I needed to add a switch statement. The way I see it, this is
ultimately a four-level hierarchyThat argument isn't clear enough to avoid me agreeing so far with Tom
and Andrew that logged-ness is separate from temp-ness. As you say
though, it might be a recipe for bugs, so please explain a little more.
Sure. Most of the existing checks for rd_istemp were actually
checking whether the relation required WAL-logging. If there's any
third-party code out there that is checking rd_istemp, it likely also
needs to be revised to check whether WAL-logging is needed, not
whether the relation is temp. The way I've coded it, such code will
fail to compile, and can be very easily fixed by substituting a call
to RelationNeedsWAL() or RelationUsesLocalBuffers() or
RelationUsesTempNamespace(), depending on which property the caller
actually cares about. That's better than having the code compile, but
then not work as expected.
As of today, RelationNeedsWAL() always gives an answer which is
directly opposite to the answer given by RelationUsesLocalBuffers()
and RelationUsesTempNamespace(). But the main unlogged tables patch
changes that. RelationNeedsWAL() will return true for permanent
tables and false for unlogged and temp tables, while
RelationUsesLocalBuffers() and RelationUsesTempNamespace() will return
false for permanent and unlogged tables and true for temp tables.
When and if we get global temporary tables, there will be a further
split between RelationUsesLocalBuffers() and
RelationUsesTempNamespace(). The former will return true for both
global and local temporary tables, and the latter only for local
temporary tables.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company